Cell values as rows in a spreadsheet

I have some spreadsheet data where the first column lists the position of a certain amino acid within a protein. There are  many amino acid sites that are skipped, therefore the rows do not follow the actual, uninterrupted, amino acid sequence. I use Gnumeric, and the resulting list looks like that:

data1

 

I want the program to insert empty rows between the non-consecutive amino acids — similarly to the gaps in a multiple sequence alignment. This can be done in a relatively straightforward way:

  • Go to the end of your data, select an empty cell and start enumerating the column downwards consecutively until reaching a value that equals or surpasses the last amino acid number of the protein:

data23

 

  • Then, select the columns that need to be sorted:

data331

 

  • Sort everything. This can be done from Data > Sort, or directly from the toolbar which will sort the data automatically by the first column:

data41

 

  • Now data will be sorted and the missing numbers of column A will be inserted in between the original data. The previously existing rows will be paired with empty ones that share the same value of column A:

data7

 

  • To remove the repeating rows, go to Data > Filter > Advanced Filter, set Criteria range to be listed by column A and check Unique records only. This will eliminate empty duplicates:

data71

 

  • The result is as follows:

data63

This is it! Now just cut and paste! :)

 



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s