FB2006_01 Release Notes
THE FLYBASE 2006_01 UPDATE
|
|
|---|---|
|
This release consists of a full update of FlyBase data which incorporates additions and changes to several classes of data including; the bibliography, genetic and molecular information curated from the literature, fly stocks and images, Drosophila melanogaster annotated gene models and reagent sets that have been mapped to the melanogaster genome sequence. In addition, this update provides the ability to view the genome sequences and gene prediction sets of 10 newly sequenced Drosophila species via the gbrowse genome browser. As consensus annotation sets become available these will be fully incorporated into FlyBase. Drosophila melanogaster Release 5.1 Annotation Update. Release 5.1 annotations are available in FlyBase and from NCBI. The release 5 sequence by BDGP (see the BDGP release notes) contains some major improvements to the assembly of the major chromosome arm scaffolds as well as improvements to the assembly of those portions of the centric heterochromatin that cannot currently be attached to the major arms. The improvements to the arms include the major differences noted below plus an additional 4.7 Mbp of heterochromatic sequence attached to the proximal ends of the arms. Further, Release_5 is the first non-redundant assembly of the D. melanogaster genome, unifying the previously separate assemblies of the largely euchromatic arm scaffolds and the heterochromatic scaffolds. The entire Release_5 assembly can be downloaded from the BDGP web site. Some features of the assembly are enumerated in Table 1. TABLE 1: Release_5 Assembly (from BDGP) and Release_5.1 Accessions
Scaffold
Length (bp)
Gaps
Release 5.1
GenBank Accession
Major Difference Compared to Release 4
ArmX
22,422,827
3
AE014298.4
8kb added to the distal end, gaps filled in regions 1-11
Arm2L
23,011,544
2
AE014134.5
591kb added to the proximal end of the arm
Arm2R
21,146,708
1
AE013599.4
380kb added to the proximal end
Arm3L
24,543,557
1
AE014296.4
16kb added on distal end, 718kb added to proximal end, other gaps filled
Arm3R
27,905,053
0
AE014297.2
None
Arm4
1,351,857
1
AE014135.3
70kbp added to the distal end
XHet
204,112
n.a.
n.a.
n.a.
YHet
347,038
n.a.
n.a.
n.a.
2LHet
368,872
n.a.
n.a.
n.a.
2RHet
3,288,761
n.a.
n.a.
n.a.
3LHet
2,555,491
n.a.
n.a.
n.a.
3RHet
2,517,507
n.a.
n.a.
n.a.
ArmU
10,049,037
n.a.
n.a.
n.a.
Annotation Release_5.1 includes the full annotation of gene models for the six major chromosome arm scaffolds (X, 2L, 2R, 3L, 3R, 4), merging previously existing and revised annotations by FlyBase of the euchromatic regions of the arms with the revised annotations of the heterochromatic segments by the Drosophila Heterochromatin Genome Project (DHGP). The remainder of the heterochromatin is currently being annotated by the DHGP and should be integrated into FlyBase as part of Annotation Release_5.2, which is scheduled for the first quarter of 2007. In the interim, we recommend that users interested in the heterochromatin annotations continue to refer to the DHGP Release_3.2b heterochromatin assembly/annotation set. TABLE 2: Known Gaps in the Release 5 Assembly
Scaffold
GenBank Accession
Gaps
Notes
ArmX
AE014298.4
111523..129522
sized
21684450..21684549
unsized
21687344..21759343
sized
Arm2L
AE014134.5
21485539..21485638
unsized
22420242..22420341
unsized
Arm2R
AE013599.4
16668213..16668312
unsized
Arm3L
AE014296.4
5107767..5107866
Arm3R
AE014297.2
None
Arm4
AE014135.3
1221289..1221388
Release 5 of the euchromatic sequence contains eight (known) gaps. There are two gaps on the X that have estimates for their size and 6 other gaps in the genome which are not sized. (Gaps of unknown size are denoted by 100 N's in the fasta files.). The gap 21485539..21485638 of scaffold Arm2L is the Histone gene cluster which reputedly contains ca. 100 copies of a ca. 5kb repeat unit containing the His1, His2A, His2B, His3 and His4 genes. Additional data missing from this annotation release includes the approximately 6000 repeat regions with homology to transposable elements that were located primarily near the ends of the release 4.3 arm sequences. For details on new, split and merged gene models see the DROSOPHILA MELANOGASTER ANNOTATION STATISTICS (Release 5.1) section below. There have been updates to the evidence tracks that inform the gene model annotation. This includes alignments of cDNAs and proteins submitted to genbank since the last update as well as a number of new homology based gene prediction tracks that were submitted to the AAA site and generated in conjunction with the collaborative effort to annotate 11 new Drosophila species genomes. |
|
GENERAL FLYBASE STATISTICS
|
|
|
Number of references in FlyBase
177010
Number of research papers
75864
Number of abstracts
34004
Number of personal communications to FlyBase
3511
Drosophila workers registered with FlyBase
7223
Number of fly stocks
60779
Number of fly images
1267
|
|
DROSOPHILA MELANOGASTER (R5.1)
|
|
| Statistics |
Gene records
30557
Genes located to the genome
14601
Genes not located to the genome
15951
Alleles
83537
Alleles of located genes
64377
Alleles of unlocated genes
19160
Aberrations
29902
Deficiencies
19455
Deficiencies with mapped endpoints
5601
Transposable element insertions
63855
Insertions mapped to the sequence
37943
|
| Annotation release 5.1 | |
| Summary of changes from previous release |
New Gene Models
76
Restored Gene Models
1
Deleted Gene Models
12
Merged Gene Models
63 old -> 28 new
Split Gene Models
6 old -> 12 new
Unchanged peptides
18978
|
ANNOTATED GENE MODELS
|
|
|
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Genes
14601
5092
1178806
16
Protein coding genes
13854
5344
1178806
141
Protein coding transcripts
19783
2301
69571
132
Exons
65166
482
27725
1
Introns
48541
1226
1174875
11
5' Untranslated regions
17728
185
3391
1
3' Untranslated regions
12236
377
5684
1
Unique peptides
17093
567
23015
25
rRNA
98
152
1325
133
tRNA
314
75
186
61
snRNA
47
115
275
36
snoRNA
65
88
316
16
miRNA
92
22
29
20
miscellaneous non-coding RNA
79
2197
31065
31
pseudogenes
51
1122
13064
53
Transposable Elements Present in the Sequenced Strain
6002
1256
66001
21
|
|
OTHER ANNOTATED GENE FEATURES
|
|
| MAPPED NUCLEOTIDE CHANGES | |
|
Annotated Gene Features
Count
total mapped nucleotide changes
2618
aberration junction
160
complex substitution
18
deletion
73
insertion site
21
point mutation
1984
sequence variant
344
TE target site duplication
4
uncharacterized change in nucleotide sequence
14
|
|
| MAPPED REGULATORY ELEMENTS | |
|
Annotated Gene Features
Count
total mapped regulatory elements
1969
enhancer
32
poly A site
118
protein binding site
1380
regulatory region
204
rescue fragment
240
signal peptide
1
|
|
| MAPPED REAGENT FEATURES | |
|
Annotated Gene Features
Count
transposable element insertion site
37943
microarray oligonucleotide
448532
microarray amplicons
14095
dsRNA amplicons
32522
BAC
958
|
|
ALIGNED EVIDENCE FEATURES
|
|
| NUCLEOTIDE ALIGNMENTS | |
|
Annotated Gene Features
Algorithm
Count
D. melanogaster cDNA inserts
sim4tandem,splign
13941
D. melanogaster EST (total)
sim4
297834
EST from sequenced strain
sim4
142744
EST from different strains
sim4
155090
Other melanogaster DNA sequences
sim4tandem
12700
|
|
| GENE PREDICTIONS | |
|
Annotated Gene Features
Algorithm
Count
Genie prediction
Genie v2.2/flyGenie
11145
Genscan prediction
Genscan 1.0
18046
| |
THE FLYBASE 2006_01 UPDATE