Publications

Plant physiology. 2001-03-01; 125.3: 1166-74.

Rice bioinformatics. analysis of rice sequence data and leveraging the data to other plant species

Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR

PMID: 11244096

Abstract

Rice (Oryza sativa) is a model species for monocotyledonous plants, especially for members in the grass family. Several attributes such as small genome size, diploid nature, transformability, and establishment of genetic and molecular resources make it a tractable organism for plant biologists. With an estimated genome size of 430 Mb (Arumuganathan and Earle, 1991), it is feasible to obtain the complete genome sequence of rice using current technologies. An international effort has been established and is in the process of sequencing O. sativa spp. japonica var "Nipponbare" using a bacterial artificial chromosome/P1 artificial chromosome shotgun sequencing strategy. Annotation of the rice genome is performed using prediction-based and homology-based searches to identify genes. Annotation tools such as optimized gene prediction programs are being developed for rice to improve the quality of annotation. Resources are also being developed to leverage the rice genome sequence to partial genome projects such as expressed sequence tag projects, thereby maximizing the output from the rice genome project. To provide a low level of annotation for rice genomic sequences, we have aligned all rice bacterial artificial chromosome/P1 artificial chromosome sequences with The Institute of Genomic Research Gene Indices that are a set of nonredundant transcripts that are generated from nine public plant expressed sequence tag projects (rice, wheat, sorghum, maize, barley, Arabidopsis, tomato, potato, and barrel medic). In addition, we have used data from The Institute of Genomic Research Gene Indices and the Arabidopsis and Rice Genome Projects to identify putative orthologues and paralogues among these nine genomes.

Metrics