Friday, 4 March 2016

Merging Datasets into a matrix for analysis

I have n samples/datasets, each which is giving me variables and their associated values.
In the case of mass spec data, this is a (string) formula and the (float) relative abundance of that formula.

For each sample, there may be a thousand or more formula.

Across all the samples, there will be formula common to some or all samples, and formula unique to some samples.

Examples shown at end of post.

Are there any easy ways to merge all of these into a single table which is then easily imported into R or Python for analysis?

(Sidenote: I wrote a script to do this with masses and intensities, its not particularly efficient or easily modified to work on strings).


Thanks :)

From four CSVs with a structure like this:

To one data table with a structure like this:

Sample ID C10H20O2 C11H22O2 C10H20O3 C11H22O3
Sample1 0 15 26 3
Sample2 19 88 29 0
Sample3 54 0 66 0
Sample4 30 32 0 0

No comments: