- Design a pipeline for the functional annotation of maize genes.
- Use manually curated test data to evaluate the annotations and generate a best subset of annotations for use
- Design a user friendly review system for the community to provide feedback and endorsements of the annotations
GO annotations are generated using three different approaches in the pipeline.
- Sequence similarity to Arabidopsis (TAIR) and existing plant genes with curated GO annotations.
- InterproScan to detect protein domains which have GO terms annotated to them.
- CAFA (Critical Assessment of Functional Annotation) tools (Argot2, FANNGO, PANNZER) that use a combination of machine learning and statistics to predict GO terms for input genes
These annotations will be compared to available GO annotations for maize from Gramene. Gramene uses the Ensembl Compara pipeline to generate GO annotations. RBH – Reciprocal Best Hit
Represents the part of the pipeline which is used to evaluate the annotations by calculating and comparing the performance measures.
- Test datasets is comprised of Gold Standard - manually curated annotations from MaizeGDB. About 4% of the maize protein coding genes are represented in this test dataset.
- Protein-centric evaluation metrics from the CAFA project are currently being used to evaluate different tools.
- Precision (PR) is the mean of the proportion of correctly predicted annotations for a given protein compared to the total number of predictions
- Recall (RC) is the mean proportion of correctly predicted annotations for a given protein compared to the total number of annotations in the test dataset for the given protein.
- F-score is a single value which reflects a tool’s accuracy, and is calculated from RP and RC
Represents the outline of the Review system which will be implemented at the end of evaluation step.
- Basic View will have minimal information necessary for subject experts to review their gene(s) of their choice quickly.
- Evidence View will allow users to look at the tools that support a particular GO annotation. Each tool supporting the particular annotation will have a simple graphic showing the details of the annotations. E.g., Sequence similarity based methods will have a simple diagram representing the representative target, coverage, identity, and E-value of a given BLAST hit.
All this data will be made available for download for downstream analysis of you own experiments. Non-reviewed annotations will be made available as soon as the evaluation of the results from the pipeline are completed. Reviewed annotations will be made available after the release of the tool to the maize community and sufficient time has been given for accumulation of community effort for revision of the Non-reviewed annotations.