Updating the Data Model
Schema Validation and Submission
Data administrators can configure custom data submission validations for Song by creating and submitting Song schemas. These schemas act as blueprints for validating submissions, ensuring that every piece of data adheres to the requirements specified by the administrators. This validation process guarantees that all essential fields are included and that the data within these fields conforms to the designated data types or permitted value sets.
Integration with Song's Base Schema
Song merges all admin-defined schemas with its pre-existing base schema. Therefore, when creating your schemas, it is important to reference the base schema to avoid specifying conflicting properties.
The Song base schema can be restrictive for data models outside of cancer research contexts, as it requires tumor and normal samples. We are aware of this limitation and are currently working on a new data-agnostic submission system. For more information, check out our software currently under development section
Building Schemas
Schema Basics
The following schema defines any data submission using the analysisType of exampleSchema must contain two fields, field1 (a string) and field2 (a number):
{
"name": "exampleSchema",
"schema": {
"type": "object",
"required": ["field1", "field2"],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "number"
}
}
}
}
There are many different type values available in JSON schema, here is a list of commonly used JSON type values definitions:
stringTextual data e.g., "a word"numberNumeric data (integer or float), e.g.-5,10,-5.8,10.2integerInteger values (16,0,-20)booleanBoolean values (trueorfalse)objectKey-value pairs where keys are strings and values can be any typearrayOrdered lists of items, which can contain any data typeenumA fixed set of values.nullRepresents a null value
JSON Schema can also include various additional constraints:
-
Regex Patterns: Fields can use regex patterns to enforce specific formatting rules
"field1": {
"type": "string",
"pattern": "^[A-Za-z]+$"
} -
Required Fields: Defines which fields must be present in the data object
"required": ["field1", "field2"] -
Array Constraints: Allows setting minimum (minItems) and maximum (maxItems) array lengths
"field3": {
"type": "array",
"minItems": 1,
"maxItems": 5
} -
Conditional Logic (if-then): Logic to enforce required fields based on conditions.
"if": {
"properties": { "field4": { "const": "value1" } }
},
"then": {
"required": ["field5"]
}
Basic Example
In the context of Song here is a basic schema, it requires at a minimum, an analysisType defined by the name field and a single object within it. In the example below, this object is a field termed experiment:
{
"name": "basicSchemaExample",
"schema":{
"type": "object",
"required":[
"experiment"
],
"properties":{
"experiment":{}
}
}
}
Click here for a detailed breakdown
-
nameis the name of the schema, which identifies the schema for validation purposes -
schemacontains the schema definition -
typeis the data type of the schema, in this case, an object -
requiredis a list of fields that must be included in any data submission validated against this schema -
propertiesare the fields that the schema expects. In this example, the schema expects anexperimentfield.
The analysisType is defined in data submissions to Song. This field informs Song which data model the submission should be validated against. Below is an example of a mock data submission:
{
"studyId": "MICR-CA",
"analysisType": {
"name": "basicSchemaExample"
},
"experiment": "myNewExperiment"
}
-
In this example, the schema named
basicSchemaExampleis used to validate the data submission -
The
analysisTypefield specifies that the submission should adhere to thebasicSchemaExamplemodel, ensuring that all required fields, such asexperiment, are present and correctly formatted -
The field
studyIdcomes from and is required by Song's base schema and is used to identify what group this collection of data belongs to
Detailed Examples
Let's break down some more complex schema examples. We will pull from a reference schema that can be found here. In the following sections, we will provide snippets of this schema along with explanations of the structure, function, and any embedded logic.
-
Required Fields: Here, our Schema dictates that
"donor","specimen","workflow", and"experiment"are required fields:{
"name": "quickStartSchema",
"schema": {
"type": "object",
"required": ["donor", "specimen", "workflow", "experiment"],
"properties": { ... -
Enum, Types, and Patterns: Within
workflow, we can see the use ofpropertyNames,enum,required fields,typesandregex patterns"workflow": {
"propertyNames": {
"enum": ["workflowName", "workflowShortName", "workflowVersion", "genomeBuild", "inputs","sessionId","runId"]
},
"required": ["workflowName", "genomeBuild", "inputs"],
"type": "object",
"properties": {
"workflowName": {
"type": "string",
"pattern": "^[a-zA-Z][a-zA-Z0-9 _\\-]+[a-zA-Z0-9]+$"
},
"workflowShortName": {
"type": "string",
"pattern": "^[a-zA-Z][a-zA-Z0-9_\\-]+[a-zA-Z0-9]+$"
},
"workflowVersion": {
"type": "string"
},
"genomeBuild": {
"type": "string",
"enum": ["GRCh37", "GRCh38_hla_decoy_ebv", "GRCh38_Verily_v1"]
},Click here for a detailed breakdown
-
workflowDefines an object containing properties related to workflow. -
propertyNamesLimits the allowed property names within workflow to those listed -
requiredspecifies thatworkflowName,genomeBuild, andinputsmust be present within the workflow object -
typeindicates that workflow is an object type and therefore contains nested key value pairs -
workflowNameRequires a string ("type": "string") that matches the specified regex pattern ("pattern": ^[a-zA-Z][a-zA-Z0-9 _\\-]+[a-zA-Z0-9]+$). This ensures it starts with a letter, allows alphanumeric characters, spaces, underscores, and hyphens, and ends with alphanumeric characters. -
genomeBuildrequires a string ("type": "string") that can only be one of the specified values ("enum": ["GRCh37", "GRCh38_hla_decoy_ebv", "GRCh38_Verily_v1"]).
-
-
minItems & maxItems: The JSON Schema below is a simplified workflow property made to show the usage of
minItemsandmaxItems"workflow": {
"propertyNames": {
"enum": ["inputs"]
},
"required": ["inputs"],
"type": "object",
"properties": {
"inputs": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tumourAnalysisId": {
"type": "string",
"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{13}$"
},
"normalAnalysisId": {
"type": "string",
"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{13}$"
},
"analysisType": {
"type": "string"
}
}
},
"minItems": 1,
"maxItems": 2
},
}Click here for a detailed breakdown
-
"minItems": 1,If you submit data according to this schema then you must include at least one complete set of inputs (A complete set consists ofanalysisType,normalAnalysisId, andtumourAnalysisId). -
"maxItems": 2You can include up to two complete sets of inputs in a single submission.
The
minItemsandmaxItemsconstraints apply to the number of these sets (or arrays) within the inputs array, not to the individual fields within each set. -
-
Conditional Logic: The Schema segment below demonstrates the usage of conditional if, and then logic used to determine if the fields
causeOfDeathandsurvivalTimeare required"vitalStatus": {
"type": "string",
"enum": ["Alive", "Deceased"]
},
"if": {
"properties": {
"vitalStatus": {
"const": "Deceased"
}
}
},
"then": {
"required": ["causeOfDeath", "survivalTime"]
}
},Click here for a detailed breakdown
This conditional schema structure allows for dynamic validation based on the value of
vitalStatus, ensuring thatcauseOfDeathandsurvivalTimeare only required whenvitalStatusisDeceased.-
If
vitalStatusis"Deceased", then the submission must includecauseOfDeathandsurvivalTime. -
If
vitalStatusis"Alive", then there are no additional requirements needed. -
constis a validation keyword that specifies that a property's value must exactly match for the submission to be considered valid
-
-
Null Values: Null values can provide flexibility by allowing a property to be explicitly null when no valid string value is applicable or known. The schema segment below shows the use of a
nullenum value for arelapseTypeproperty."relapseType": {
"type": ["string", "null"],
"enum": [
"Distant recurrence/metastasis",
"Local recurrence",
"Local recurrence and distant metastasis",
"Progression (liquid tumours)",
null
]
Click here for a detailed breakdown
If relapseType is a string, it must match exactly one of the values listed in the enum array ("Distant recurrence/metastasis", "Local recurrence", "Local recurrence and distant metastasis", "Progression (liquid tumours)")
If relapseType is null
{
"relapseType": null
}
The above key value pair is considered valid according to the schema. This allows for scenarios where relapseType might not have a defined value or where its value is intentionally absent or unknown.
-
Minimum & Maximum: Minimum and maximum keywords in JSON Schema provide straightforward ways to enforce numerical constraints, ensuring that data adheres to specified ranges or limits
"treatmentDuration": {
"type": "integer",
"minimum": 0
}Here,
"minimum": 0ensures thattreatmentDurationcan only accept non-negative integer valuesWant to learn more?If you want to learn more about JSON schema take a look at the following JSON Schema guide.
Updating the Schema
You can update Song schemas using the Song server's Swagger UI or using curl commands.
Using the Swagger UI
The Song Swagger UI provides a user-friendly interface to interact with Song's API endpoints. You can access it at the Song server URL appended with /swagger-ui.html. For the quickstart, this will be http://localhost:8080/swagger-ui.html.
To update the schema using the Swagger UI:
-
Locate the Schema API endpoints From the schema dropdown, find the
POSTRegisterAnalysisType endpoint.
-
Select Try it out and input your API key and Schema: enter your authorization token in the authorization field (
Bearer {API-Key}), and place your new schema inside the request field.API Keys are brokered by Keycloak and accessible when logged in to the Stage UI. For the Overture QuickStart, Stage can access from
localhost:3000-
Login through the Stage UI by selecting login from the top right. Default credentials when using the Overture QuickStart will be username
adminand passwordadmin123. -
Generate a new API token by selecting Profile and Token from your user drop down found on the top right of the Stage UI, select Generate New Token.

-
-
Select Execute: expected responses as well as response codes and descriptions, are conveniently documented within Swagger-UI.
Verifying SchemasTo verify your schema has successfully been added, you can use the
GETListAnalysisTypes endpoint found under the Schema dropdown. If updating a pre-existing schema, use theGETGetAnalysisTypeVersion endpoint.
Using the Curl command
The following curl command makes a POST request with the required authorization tokens, headers and data:
curl -X POST "https://localhost:8080/schemas" -H "accept: */*" -H "Authorization: Bearer {Insert-API-Key}" -H "Content-Type: application/json" -d "{ \"name
\":\"example_demo\", \"schema\": { \"type\":\"object\", \"required\":[ \"experiment\" ], \"properties\":{ \"experiment\": { \"type\": \"object\", \"required\": [\"experiment_type\"], \"propertyNames\": { \"experiment_type\":{ \"type\":\"string\", \"enum\": [\"WGS\",\"RNA-Seq\"] }, } } } }}"
-
-X POST "https://localhost:8080/schemas"specifies the request method to be used, in this case, POST, this points to Song'sschemasendpoint -
-H "accept: */*"adds an HTTP header specifying that the client accepts any type of response -
-H "Authorization: Bearer {Insert-API-Key}"adds an HTTP header for authorization, with a Bearer token (replace{Insert-API-Key}with the actual API key). -
-H "Content-Type: application/json"Adds an HTTP header specifying the content type of the request body as JSON. -
-d {...}is the data to be sent with the POST request. This is the JSON payload defining the schema.
Useful Links
Here are some resources to help with the creation of new schemas for your projects:
-
Understanding JSON Schema guide: This comprehensive guide provides detailed information on JSON Schema formatting.
-
Example schema: For a practical reference, you can examine this sample schema used in CanCOGeN's VirusSeq Portal.
-
Base schema reference: Song incorporates a base schema that combines with all user schemas. When developing your schemas, it's crucial to reference this base schema to avoid conflicting properties and ensure compatibility with Song's base schema structure.
There's no need to write your JSON Schema manually. Several existing tools can help you format your data efficiently:
- For basic schemas, JSONschema.net or Liquid Technologies' Online JSON to Schema Converter are excellent resources. These tools allow you to easily convert JSON to JSON Schema.
With your data model updated, we next need to ensure we configure an accurate index mapping to help enable our downstream search and portal UI components