In the case of mass spec data, this is a (string) formula and the (float) relative abundance of that formula.
For each sample, there may be a thousand or more formula.
Across all the samples, there will be formula common to some or all samples, and formula unique to some samples.
Examples shown at end of post.
Are there any easy ways to merge all of these into a single table which is then easily imported into R or Python for analysis?
(Sidenote: I wrote a script to do this with masses and intensities, its not particularly efficient or easily modified to work on strings).
Help!
Thanks :)
Example:
From four CSVs with a structure like this:
To one data table with a structure like this:
Sample ID | C10H20O2 | C11H22O2 | C10H20O3 | C11H22O3 |
---|---|---|---|---|
Sample1 | 0 | 15 | 26 | 3 |
Sample2 | 19 | 88 | 29 | 0 |
Sample3 | 54 | 0 | 66 | 0 |
Sample4 | 30 | 32 | 0 | 0 |