Dataset
Datasets are the clean, contextualized data that forms the backbone of the work you can do in Sphinx. The columns of a dataset are tracked in schema that ensures all data are managed the same way.
You can see the list of Datasets you have access to in the Dataset Library.
How to Create a Dataset
There are multiple ways to import data to create a Dataset and these steps focus on the most basic case. See Advanced Topics for explanation of the other options.
Creating a Dataset
You can create a new dataset from the Dataset Library using the “New Dataset” button or anywhere you see the button. This will open the Create Dataset modal.
Selecting an Upload Mode
You can upload multiple types of data or manipulate existing Datasets. In this flow we will focus on uploading of “Tabular” data.
For “Tabular” data, the first row of data is assumed to be the header.
Defining Upload Details
Creating a Dataset will allow you to define the name
, description
, and other attributes.
You can upload a file from your computer, or manually enter your data using the “Manual Entry” section.
Remember — tabular data has one row for each observation, and one column for each variable.
After you select the “Create” button you will get a preview of the data. If the table looks correct you can select “Confirm Dataset” to add it to your library. After confirming, you will be taken to a view of the Dataset.
Defining Dataset Details
On the page for a Dataset, you can access the preview, lineage, schema, related analyses, annotations, and settings. A brief overview on these options:
- “Preview Dataset” shows the first 50 rows, Bio Entity relationships, values, and column data types.
- ”Lineage” shows how this Dataset is related to other Datasets created from it. More on this in Dataset Details.
- ”Edit Schema” lets you define the data type, unit, and related Bio Entity.
- ”Analyses” shows any related Anlyses where the Dataset is used.
- ”Annotations” lists any marked data points from analyses.
- ”Dataset Settings” lets you update the
name
,description
,ELN entry
, andtags
for the Dataset.
Once you have created a Dataset you can then create an Analysis.
Advanced Topics
Data Upload Modes
Multiple data upload modes are possible based on the shape and content of your data. Selecting each ooption will give you an overview of what Sphinx expects when you add new data.
Dataset Details
On the page for each Dataset, multiple options are available to provide additional informaiton and customization for a Dataset.