Syntenic Blocks in ancestral species

The Genomicus browser displays (when possible) the predicted order of genes in ancestral species. The method used to predict this order is briefly described here, and in more details in a poster presented at the Cold Spring Harbor meeting on Genome Informatics in october 2009. The method is described in full in a manuscript in preparation.

  1. A pairwise comparison between ALL available species is performed to identify pairwise synteny blocs. Two consecutive genes A1 and B1 in species 1 will belong to a syntenic block with their respective orthologs A2 and B2 in species 2 if A2 and B2 are also consecutive and in the same respective orientation as A1 and B1. This definition is applied strictly for any number of consecutive genes.
  2. All pairwise syntenic blocs are compared and when two such blocks overlap without any inconsistencies, the two are merged into a larger block.
  3. Merged blocks represent the ancestral gene order in the common ancestor of those extant species that contributed pairwise syntenic blocs.

Because the definition of pairwise syntenic blocks is very strict, it is assumed that this order reflects accurately the order and orientation of genes in their last common ancestor. Merging pairwise syntenic blocks solves the problem of gene losses or duplications in terminal branches of the tree that disrupt the above definition.

Conserved Non-coding Elements (CNEs)

CNEs were computed from multiple alignments between 46 vertebrate genomes projected on the human genome, generated using multiz and other tools by the UCSC and Penn State Bioinformatics groups, and made available on the UCSC web site.

The current algorithm scans the alignment and looks for conserved regions of a minimal length (10bp) and identity (90%) and extends them by accepting up to 3 non-conserved columns on each side (less than 88% of identity). This algorithm does not require a fixed set of key species in the alignment, but instead a minimal number – eight – of them. The displayed CNEs are filtered on a minimal size (20bp) and according to four levels of conservation: Boreoeutheria genomes (using human, mouse, dog, cow), Mammalians (using previous set and Opossum), Amniotes (using previous Mammalians and Chicken) and Vertebrates (using previous Amniotes and 4 fish). The consensus sequence and the conservation displayed are computed on the 46 species

The current CNE set was generated with four levels of conservation :

Set Species Color
Set 1 human + mouse + dog + cowgreen
Set 2 human + mouse + dog + cow + opossumorange
Set 3 human + mouse + dog + cow + opossum + chickenred
Set 4 human + mouse + dog + cow + opossum + chicken + at least one fishblue

CNEs are excluded from regions overlapping protein coding sequences in all of the species considered. By convention, intronic CNEs are displayed on the right-hand side of the gene in which they are included (regardless of the transcription orientation of the gene).