“Combine these files” design and behaviors.

For the foreseeable future:

The ability to combine files into a single file will be implemented only for files that are the same type.

We will allow the user to perform this activity only on the Sort by Data format pages.

The buttons to combine files into a single file will appear only on pages where data is presented as “Sort by Data Format “ 

Today:

We will first implement this feature for fasta sequences, protein or nucleic acid.

When a user clicks two or more fasta files; the button that allows concatenation to occur becomes active. 

The user will be prompted to give a label for the file produced, and a “GO” button.

If the name is not provided, the user will be reminded.
 
If the data items are all nucleic acid/sequence or protein/sequence, with the same entity/data types, the action will proceed.
 
If a mixture of nucleic acid/sequence or protein/sequence or any unknown entity types are submitted, the user will be shown a warning.

“the sequences you have selected are of unknown or mixed types; proceed anyway?”

A Yes click will allow the action to proceed. A cancel click will return to the fasta format page.

If the user selected yes, the action is performed: 

The behavior will be simply to concatenate the files, with a line break between the last character in one sequence, and the > character of the next one. This algorithm will work for single sequence fasta files, or multiple sequence fasta files, there is no need to restrict ourselves.

The product of this activity will then be displayed on the same tab: fasta format with a NGBW ID and the user label.

Next steps:

Using this as an archetype, we can permit concatenation of any file format, if it can then be used as input for some tool. 

The behavior will be basically the same. As soon as I can test what other tools accept natively, 
I will specify which other formats we will concatenate.

As a more far-reaching goal, we may want to let users concatenate individual files into a new format.

I believe this is relevant primarily for converting any set of sequences into a fasta file. 

I also believe we have the ability to do that currently.


“Split these files”: design and behaviors.

Today:

We will first implement this feature only for fasta files that contain multiple fasta sequences, protein or nucleic acid. 

Based on the display in the NGBW, it seems we know which files these are.

The controls to split fasta files into single files will appear only on the fasta format data page where data is presented as “Sort by Data Format“ 

When a user clicks (only) one fasta file with two or more fasta sequences in it, the buttons that allow file splitting occur become active. 

The first control is a drop down that allows the user to specify DNA protein or unknown for entity type. The button does not have a default, value, it is blank by default. The choices are Nucleic acid/protein/ and unknown. 

The second button is a GO button. If no value is set for data type, the user is prompted.

No submission is allowed without conscious choice for the data type value.
 
The product of this activity will then be displayed on the same tab: fasta format.

Next step: None