Datasets are the clean, contextualized data that forms the backbone of the work you can do in Sphinx. The columns of a dataset are tracked in schema that ensures all data are managed the same way.

Dataset

You can see the list of Datasets you have access to in the Dataset Library.

How to Create a Dataset

There are multiple ways to import data to create a Dataset and these steps focus on the most basic case. See Advanced Topics for explanation of the other options.

1

Creating a Dataset

You can create a new dataset from the Dataset Library using the “New Dataset” button or anywhere you see the button. This will open the Create Dataset modal.

2

Selecting an Upload Mode

You can upload multiple types of data or manipulate existing Datasets. In this flow we will focus on uploading of “Tabular” data.

For “Tabular” data, the first row of data is assumed to be the header.

iCreate Dataset Modal

3

Defining Upload Details

Creating a Dataset will allow you to define the name, description, and other attributes. You can upload a file from your computer, or manually enter your data using the “Manual Entry” section. Remember — tabular data has one row for each observation, and one column for each variable.

Add Dataset Details Modal

After you select the “Create” button you will get a preview of the data. If the table looks correct you can select “Confirm Dataset” to add it to your library. After confirming, you will be taken to a view of the Dataset.

4

Defining Dataset Details

On the page for a Dataset, you can access the preview, lineage, schema, related analyses, annotations, and settings. A brief overview on these options:

  1. “Preview Dataset” shows the first 50 rows, Bio Entity relationships, values, and column data types.
  2. ”Lineage” shows how this Dataset is related to other Datasets created from it. More on this in Dataset Details.
  3. ”Edit Schema” lets you define the data type, unit, and related Bio Entity.
  4. ”Analyses” shows any related Anlyses where the Dataset is used.
  5. ”Annotations” lists any marked data points from analyses.
  6. ”Dataset Settings” lets you update the name, description, ELN entry, and tags for the Dataset.

Once you have created a Dataset you can then create an Analysis.

Advanced Topics

Data Upload Modes

Multiple data upload modes are possible based on the shape and content of your data. Selecting each ooption will give you an overview of what Sphinx expects when you add new data.

Upload ModeDescription
TabularLets you upload file where there one row for each observation, and one column for each variable.
PlateLets you upload a file where the first column and the first row are positional identifiers to the data. You can also upload data of the same format that identifies samples in the plate (a “plate map”)
Import from DatabaseLets you query directly from a Postgres data warehouse. Please Contact Support for more details.
LIMS ImportLets you query directly from your Benchling Datawarehouse. Please Contact Support for more details.
Combine data from two DatasetsLets you combine two existing Datasets in Sphinx. This will make a longer tables with data from both Datasets.
Connect data across two datasetsLets you connect two existing Datasets in Sphinx. This will make a wider tables with data from both Datasets.

Dataset Details

On the page for each Dataset, multiple options are available to provide additional informaiton and customization for a Dataset.

Dataset OptionDescription
Preview Dataset”Preview Dataset” shows the first 50 rows, related, values, and column data types.
LineageShows the relationship between Datasets in Sphinx (nodes) and how they are connected (connections). You can add addition connections as your knowledge about your Datasets grow.
Edit SchemaAllows you to edit the data type and units for each column in the Dataset.
AnalysesShows the Analyses that use this Dataset.
Dataset Settings”Dataset Settings” lets you update the name, description, ELN entry, and tags for the Dataset.