T3/Wheat

Triticeae Toolbox / WheatCAP Prediction Competition

Mailing List

We have set up a mailing list for those participating in the Prediction Challenge. We will use this list to send occasional updates about new data added to the database or tools that may be relevant for the challenge.

To sign up, go to the T3 Mailing List Subscription Form, enter your name and email address, and make sure the T3 Prediction Challenge list is selected.

The objective will be to use data available on The Triticeae Toolbox (T3/Wheat) to predict yield (Grain yield - kg/ha|CO_321:0001218) performance of wheat accessions in 9 separate test trials across the USA. Important metadata for these trials is on T3/Wheat, but the phenotypes measured in them are not. For all of these trials, genotypic data is available, though it is somewhat messy: 1. Not all accessions in the trial are genotyped; 2. Different genotyping protocols have been used in different trials.

The names of the 9 trials are:

AWY1_DVPWA_2024
TCAP_2025_MANKS
25_Big6_SVREC_SVREC
OHRWW_2025_SPO
CornellMaster_2025_McGowan
24Crk_AY2-3
2025_AYT_Aurora
YT_Urb_25
STP1_2025_MCG

These trials are also available as the public list named T3 Prediction Challenge Trials on T3/Wheat.

Contestants will submit a folder containing predictions for genotyped entries in the trials for "CV0" and "CV00" cases (Jarquin et al. 2017; see below). The winning algorithm will be judged on the basis of the highest average prediction accuracy across trials and cross validation cases.

Training Data

We have identified T3 data that could be reasonably applied to predicting the Predictathon trials using simple heuristics:

We identified phenotyping trials that evaluated some of the same accessions as each Predictathon trial. The assumption is that the overall germplasm in those trials will be related to the Predictathon accessions.
We identified archived VCFs with genotype data from a minimum number of each of the accessions in the putative training trials.

Below are links to the data and metadata that could be used as training sets for each of the prediction trials:

Study Name	Study ID	Training Data
AWY1_DVPWA_2024	10678	Download
TCAP_2025_MANKS	10680	Download
25_Big6_SVREC_SVREC	10675	Download
OHRWW_2025_SPO	10679	Download
CornellMaster_2025_McGowan	10676	Download
24Crk_AY2-3	10674	Download
2025_AYT_Aurora	10673	Download
YT_Urb_25	10677	Download
STP1_2025_MCG	10681	Download

This data was generated from scripts found in this GitHub repository:
https://github.com/jeanlucj/T3_predictathon_find_training_trials

Genotype Data for Prediction Trials

Below is a table listing the best genotype projects and protocols (in this case, having the highest number of genotyped accessions from the trial) on T3 for each prediction trial.

study_name	project_name	project_id	protocol_name	protocol_id	download
2025_AYT_Aurora	SDSU_2025_GBS	11050	GBS SDSU 2025	301	Download Archived VCF File
24Crk_AY2-3	UMN_2022_GBS	10671	GBS UMN 2022	294	Download Archived VCF File
25_Big6_SVREC_SVREC	BIGSIX_2024_AgriSeq_4K	10684	ThermoFisher AgriSeq 4K	296	Download Archived VCF File
CornellMaster_2025_McGowan	Cornell_WWMasterScreening_2024	11049	GBS Cornell 2024	300	Download Archived VCF File
YT_Urb_25	UIUC_2024_GBS_V2	11054	GBS UIUC 2024	302	Download Archived VCF File
AWY1_DVPWA_2024	WSU_2023_GBS	14512	GBS WSU 2023	311	Download Archived VCF File
OHRWW_2025_SPO	UWM_2023_3K	11009	Wheat 3K	299	Download Archived VCF File
TCAP_2025_MANKS	KSU_2024_Allegro_V2	10830	Allegro V2	297	Download Archived VCF File
STP1_2025_MCG	TAMU_2025_MCG25	11052	GBS TAMU MCG25	303	Download Archived VCF File

NOTE: New data (in particular genotypic data) will be posted to T3/Wheat during the contest period. It will be valuable to recheck before the competition closes to assemble final training data.

A presentation describing useful methods of programmatic access to T3 to facilitate algorithm development will be given at PAG in the Breedbase Workshop Sunday, January 11, 2026: https://pag.confex.com/pag/33/meetingapp.cgi/Session/14070

The competition will close on Friday, March 13th, 2026

The deadline to submit predictions has been extended to Friday, March 27th.

An upload link to submit a zipped folder containing the predictions will be posted here prior to that date.

Details

Descriptions of CV0 and CV00 cross validation scenarios (Jarquin et al. 2017).

Jarquín, D., C. Lemes da Silva, R.C. Gaynor, J. Poland, A. Fritz, et al. 2017. Increasing genomic-enabled prediction accuracy by modeling genotype × environment interactions in Kansas wheat. Plant Genome 10(2): plantgenome2016.12.0130. doi: 10.3835/plantgenome2016.12.0130.

For CV0, for the purpose of making predictions, the function should exclude the focal trial itself from the training data but may use all data in any other trial present in the database, including data on the entries in the focal trial. For CV00, the function should exclude the focal trial and any observations on entries in the focal trial occurring in other trials from the training data prior to making predictions. For each prediction task, accuracy will be calculated as the correlation between prediction and observed phenotypes over all genotyped accessions in the trial. Average accuracy will be calculated across 18 prediction tasks (9 separate test trials; CV0 and CV00 accuracy for each trial).

Submission Requirements

Folder Structure:

The zipped folder submitted should contain one methods description text file and 9 sub-directories, each named with the trial name for which the sub-directory contains predictions. For example, if the sub-directory contains predictions for the trial "AYT_Timbuktu24", then the sub-directory should be named "AYT_Timbuktu24". Each sub-directory should contain six csv files. Continuing the example, the sub-directory should contain:

CV0_Predictions.csv
CV0_Trials.csv
CV0_Accessions.csv
CV00_Predictions.csv
CV00_Trials.csv
CV00_Accessions.csv

These files should be formatted as follows.

CSV file formats:

CV0_Predictions.csv and CV00_Predictions.csv: csv file with two columns, "germplasmName" and "prediction". All accessions in the focal trial that are genotyped should have a prediction.
CV0_Trials.csv and CV00_Trials.csv: a csv file with a single column, "studyName", containing the trial names of the trials used for training the prediction model.
CV0_Accessions.csv and CV00_Accessions.csv: a csv file with a single column, "germplasmName", containing the accession names used for training the prediction model.

Methods description: Common sections (these sections can be brief)

This should be a text, rtf, gdoc, docx, or pdf file.

URL to a publicly accessible code repository (e.g., github.com or bitbucket.org) that contains the prediction algorithm.
Data retrieval method: programmatic access or GUI download
- If the former, a list of the BrAPI calls used
- If the latter, the date when data were downloaded
Genomic relationship matrix construction (if applicable)
One-step or two-step model
- If the latter, preliminary individual trial analysis methods
Prediction model training

Developing prediction algorithms

There will be no attempt to determine if AI has been used for coding. In fact, we will probably (no promise) set up a chatBot to answer questions about functions for programmatic access to T3.
Prediction algorithms need not limit themselves to data on T3/Wheat, though only publicly available data should be used. Weather data available online could, for example, be useful.
These trials come from public-sector wheat breeding programs in the United States. If you (as a contestant) happen to work at a program that contributed a test trial, you probably have access to the phenotypes for that trial. On the honor system, you should not use your access to that data while developing your prediction algorithm.
We will use the CV0_Trials.csv, CV00_Trials.csv, CV0_Accessions.csv, and CV00_Accessions.csv files to verify that appropriate training data is being used. On the honor system, we will assume you are correctly populating those files.
We welcome submissions from Teams. For such submissions, the roles of each team member should be briefly described in the methods document.

Follow up

Following the competition, a manuscript describing the competition process and the algorithms used will be submitted to G3. All contestants are welcome to be co-authors on that manuscript. Writing assignments and author order will be decided following the March 13th, 2026 deadline.

Questions?

Use the T3/Wheat Contact Us Form. Include the word "Predictathon" in the Subject line.

Accessions	Seedlots
Accessions		Breeding Program	Seedlot Name	Contents	Seedlot Location	Count	Weight(g)

Attribute	Value
Project Name:
Start Date:
End Date:
Event Type:
Event Description:
Event Web URL: