HGNC

Introduction to the nomenclature

The Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) is the global authority that assigns standardized nomenclature to human genes. All approved symbols are complemented with additional information such as gene groups, genomic, proteomic and phenotypic information and are stored and accessible within a curated online repository (http://genenames.org). The HGNC database is updated on monthly basis.

The nomenclature provides a symbol and a name for protein-coding genes, pseudogenes that retain significant homology to a functional ancestor, non-coding RNA genes and functional read-through transcripts that have been previously annotated. Importantly, the following cases are not covered by the HGNC nomenclature:

  1. Sequence-variant nomenclature: This issue is the responsibility of the Human Genome Variation Society (HGVS)

  2. Product of gene translocations or fusions: No official naming guidelines exist at the moment. HGNC issued a recommendation for the use of the SYMBOL1/SYMBOL2 format.

  3. Protein nomenclature: The naming of proteins does not involve HGNC. The International Protein Nomenclature Guidelines were written with the involvement of the HGNC.

  4. Nomenclature for regulatory elements: such as promoters, enhancers and transcription-factor binding sites.

  5. Nomenclature for human loci associated with clinical phenotypes and complex traits: The naming of these loci is covered by Online Mendelian Inheritance in Man (OMIM)

Gene naming guideline

HGNC follows a strict set of rules to define gene symbols and gene names. The major key factors are summarized in the table below:

HGNC Guidelines

Table 1. Key factors in assigning gene names provided by HGNC (source: Bruford, E.A et al. Nat Genet 52, 754–758 (2020)).

Information for use in data science

Each gene featured within the HGNC database is assigned with a unique identifier or HGNC ID, that is linked to the gene sequence. As a consequence, the HGNC ID will remain constant even if the nomenclature is changed. Although HGNC gene symbols are supposedly constant, these can undergo modifications in exceptional circumstances. For this reasons the HGNC ID is used to unambiguously identify the gene of interest.

Alternate gene symbols (or aliases), locus types (genetic class according to HGNC classification) and chromosomal locations can be accessed by referring to a gene using its identifier.

Implementation in RDF for SPHN

HGNC in RDF is prepared by the SPHN DCC as follows:

  • download HGNC data from HGNC

  • translate the data to RDF

The namespace used is: https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/

A version IRI is provided for each version of HGNC in RDF which indicates the version (or release) of HGNC. For example, https://biomedit.ch/rdf/sphn-resource/hgnc/20221107 indicates that the original data was downloaded on 2022-11-07.

For the RDF implementation of the HGNC nomenclature, the following fields have been made available (Table 2):

HGNC ID

Unique HGNC internal identifier.

Approved symbol

Unique symbol approved by HGNC and based on the HGNC nomenclature guidelines.

Approved name

Official gene name approved by HGNC in accordance with the HGNC nomenclature guidelines

Status

Status of an HGNC record.

Locus type

Description of the type of locus associated to the record.

Locus group

Group labels that gather similar locus types.

Alias symbols

When available, list of additional symbols, based on literature, used to refer to the gene.

Alias names

When available, list of additional names, based on literature, used to refer to the gene.

Accession numbers

When available, curated list of accession numbers used to link to external databases (e.g., GeneBank).

Chromosome

Chromosomal location where the gene lies using cytogenetic coordinates.

Enzyme ID

When available, list of identifiers given by the Enzyme Commission to gene products having enzymatic properties.

NCBI Gene ID

When available, curated identifier enabling the link with NCBI gene entries.

Ensembl gene ID

When available, curated identifier enabling the link with Ensembl gene entries.

Mouse genome database ID

When available, curated list of identifiers enabling the link with the Mouse Genomes Informatics (MGI) database gene entries.

Pubmed ID

When available, list of identifiers pointing at published articles relevant to the gene entry and featured on Pubmed.

RefSeq ID

When available, curated identifier enabling the link with NCBI’s Reference Sequence (RefSeq) collection. Only one selected RefSeq is displayed per gene report.

Gene group ID

List of internal HGNC identifiers used to refer to a specific gene group.

Gene group name

Gene groups names associated to a gene group identifier.

CCDC ID

When available, list of identifiers enabling the link with the Consensus CDS (CCDS) project database. This allows to couple the gene entry with high quality annotated coding regions.

Vega ID

When available, curated identifier enabling the link with VEGA entries.

Table 2. List of HGNC fields available in the current SPHN RDF implementation.

In HGNC, a gene is represented with the following structure:

hgnc:20001 a owl:Class ;
  rdfs:label "PCSK9" ;
  RO:0002162 NCBITaxon:9606 ;
  RO:0002350 hgnc.genegroup:973 ;
  RO:0002525 "1p32.3" ;
  dc:description "proprotein convertase subtilisin/kexin type 9" ;
  oboInOwl:Synonym "FH3",
      "NARC-1" ;
  oboInOwl:hasDbXref ensembl:ENSG00000169174,
      MGI:2140260,
      NCBIGene:255738,
      genbank:AX207686,
      refseq:NM_174936,
      vega:OTTHUMG00000008136,
      ccds:CCDS603 ;
  rdfs:comment "Approved" .

Availability and usage rights

The HGNC RDF file is available via the Terminology Service.

HGNC is jointly published by the US National Human Genome Research Institute (NHGRI) and Wellcome (UK).

The copyright follows the instructions provided by the EMBL-EBI, Wellcome Genome Campus, Hinxton, CB10 1SD. For more details on the usage rights, please consult the EMBL-EBI terms of use (https://www.ebi.ac.uk/about/terms-of-use).