Introduction to the classification
The Genotype ontology (GENO, https://github.com/monarch-initiative/GENO-ontology/) provides a graph-based model for the representation of genetic variations described in genotypes implemented in OWL2. The ontology allows a thorough description of genotype components, relationships and characteristic (e.g., genetic variant features) found in human and other model organism, enabling the aggregation and analysis of genotype-to-phenotype (G2P) data from different sources.
The GENO ontology is developed within the scope of the Monarch initiative (https://monarchinitiative.org), whose goal is to develop tools that benefits precision medicine, disease modeling and the exploration of genotype-environment-phenotype interactions.
The ontologies developed under the umbrella of the Monarch initiative, such as the Sequence Ontology (SO) or the Human Phenotype Ontology (HPO), maintain a high degree of interoperability with GENO. In particular, GENO reuses several terms and relationships of SO and extends its section about variants, allowing together better descriptions of genetic variation data.
Information for use in data science
At its core, the GENO ontology relies on a representation of the genotype reduced to its essential components. By doing so, a complex genotype can be summarized by a combination of its genetic variations starting from the full genotype down the specific alleles and sequences alterations as seen in Figure 1.
Figure 1. Genotype decomposition in GENO. The genotype can be represented at the highest level (total genomic changes across the entire genome) down to its subcomponents such as multi-loci variations, single-locus variations and single alleles (source: https://github.com/monarch-initiative/GENO-ontology).
The concept of genotype in GENO is divided into three categories:
Intrinsic genotypes or variation within the sequence.
Extrinsic genotypes or variation in gene expression.
Effective genotypes: sum of intrinsic and extrinsic genotypes. Equal to the total variation in gene sequence or expression.
This broad definition of genotypes featured by GENO allows to annotate the classing sequence-level variations that lead to a certain phenotype as well as expression level variations caused by external agents such as RNA interference constructs.
In addition to genotypes, GENO borrows from other Monarch’s ontologies to allows the description of experimental tools and reagents used for the generation genotype data. By using the GENO ontology, research projects are able to annotate genotype information in a structured way, enabling the integration of G2P data across diverse systems and laying the logical foundation for analysis and inference between phenotypes and variants.
Implementation in RDF for SPHN
GENO is made available as-is by the SPHN DCC.
The namespace used is:
A version IRI is provided for each version of GENO in RDF which indicates the version (or release) of GENO.
that the ontology is from a
2022-08-10 release of GENO.
In GENO, a concept is defined with the following structure:
GENO:0000054 a owl:Class ; rdfs:label "homo sapiens gene" ; IAO:0000115 "A gene that originates from the genome of a homo sapiens." ; rdfs:subClassOf [ a owl:Restriction ; owl:onProperty RO:0002162 ; owl:someValuesFrom NCBITaxon:9606 ], SO:0000704 ; owl:equivalentClass [ a owl:Class ; owl:intersectionOf ( SO:0000704 [ a owl:Restriction ; owl:onProperty RO:0002162 ; owl:someValuesFrom NCBITaxon:9606 ] ) ] .
Availability and usage rights
The GENO RDF file is available via the Terminology Service.
GENO is published by the Monarch initiative and is an open-source ontology, implemented in OWL2 under a Creative Commons 4.0 BY license. (https://github.com/monarch-initiative/GENO-ontology/)