Tidy data is a standardized way of organizing data to make it easier to analyze. Applications expect data to be tidy to create plots and analyses in a standardized way. Sphinx helps you create tidy data during the data import process and by applying transformations to your data in an analysis.

The key principles of tidy data are:

  1. Each variable forms a column: Each measured variable is placed in its own column.
  2. Each observation forms a row: Each different observation of that variable is placed in its own row.
  3. Each type of observational unit forms a table: Each dataset is organized into a table.

Messy vs Tidy Data

This table contains messy data, as each observations for a given wavelength and sample type are split across multiple columns. The experimental variable might be the gene and the treatment

IDABS_280_ControlABS_320_ControlABS_280_SampleABS_320_Sample
10.77
20.62
30.55
40.53

Here are the data in a tidy format. Notice that the variables encoded in columns names like ABS_280_Control is represented in appropriate columns by splitting on the _.

IDConditionWavelengthAbsorbance
1Control2800.77
2Control3200.62
3Sample2800.55
4Sample3200.53

Further Reading

You can read more about tidy data and its impact.