2023 WheatCAP Workshop
Dates / Times
Mon, Feb 6 - Thu, Feb 9th
Fri, Feb 10th is available for extra help / questions
12 PM EST / 9 AM PST --> 2 PM EST / 11 AM PST
- 12 PM EST / 9 AM PST: First 30 minutes used to answer questions and help with that day’s assignment. If you don’t have any questions about the assignment, you can join at 12:30 / 9:30.
- 12:30 PM EST / 9:30 AM PST: start covering new material
Zoom Link
If you have any questions throughout the Workshop or afterwards, feel free to reach out to us:
- Slack: There is a t3 channel in the WheatCAP-student group Slack workspace. This is a great place to ask general questions, since there will likely be others facing the same issue that will benefit from the answer.
- Contact Us Form: There is a Contact Us button in the toolbar of all of the T3/Breedbase instances. You can use this to reach out to us directly.
- Email: You can email David at djw64@cornell.edu with specific questions or if you need to send us a file that is not uploading for an unknown reason.
Important Links
Blank Upload Templates
Video Recordings
Overview
Below is an overview of each day’s assignment and new material we’ll be covering. See the details for each day below for more information.
Each Assignment should be completed BEFORE the day it’s assigned.
Day One
Assignment
- Gather your own trial data
(More instructions in the details of Day One below)
New Topics
Day Two
Assignment
- Upload Locations and Accessions to Sandbox
New Topics
- Field Layouts
- Trait Observations
Day Three
Assignment
- Upload Trials and Trait Observations
New Topics
- Lists
- Data Aggregation, Subsetting & Download
- PHG / Imputed Data
Day Four
Assignment
- Download Data related to your Breeding Program
New Topics
- Android Field Book
- Seedlots
- Barcodes
- Submitting Genotype Data
Day One
Assignment
Throughout the workshop, we’ll be helping you load your own trials into T3/Breedbase. In order to do this, you’ll need to get together the required information for loading your trials into the database. This includes:
- Accession Lists
- You’ll need the names of the Accessions you’ll be growing in your trials. We’ll show you how to check your Accession names against those that already exist in the database to find matches (in order to reduce the number of duplicate Accession entries). Then, you’ll be able to add any new Accessions to the database. Ideally, you’ll also have pedigree information (such as Purdy pedigree strings) for your Accessions.
- Locations
- You’ll need to create Location entries in the database to associate with each of your field trials. The specificity of the Locations is up to you - they can be specific field locations within a research farm or the general location of the farm. For each location, you’ll need a unique name, the GPS coordinates (in decimal degrees), and the elevation (in meters).
- Trial Designs
- You’ll be uploading trial layouts for each of your field locations. For each trial, you’ll need to generate the plot layout based on your trial design and know the Accession used in each plot and the plot’s row/col position within the field.
- Traits
- Even if you don’t have trait observation data for any trials yet, you should still know which traits you plan on observing in the trials. You’ll need a list of your traits and the units each will be measured in. We’ll then show you how to match your traits to the existing traits in the database (and if the values need to be converted between different units).
You’ll be uploading your trials to the T3/Wheat Sandbox. If you don’t already have an account on the Sandbox, make sure to create one before the workshop begins. To create an account:
- Go to https://wheat-sandbox.triticeaetoolbox.org
- Click the “Register” button in the top right of the Toolbar
- Fill out the new account form
- You will get an email from noreply@triticeaetoolbox.org with a link to verify your email address.
- Make sure you can login with your new account
Each version of T3 (production, sandbox, WheatCAP, etc) has its own database with its own accounts. You’ll need to create an account on each database you want to use.
New Topics
1) Databases
- There are three separate T3/Wheat databases:
- T3/Wheat Production - read-only database where data is added only by a database curator. This is a publicly accessible database, so anyone can create an account to view and/or download the data.
- T3/Wheat Sandbox - anyone can use this database to upload data. It can be used as a practice database to make sure your upload templates are formatted correctly and to see how the data looks after it’s been uploaded.
- T3/WheatCAP - only WheatCAP participants have access. WheatCAP breeding programs can add their data directly to this database.
- We’ll be using the sandbox for the workshop
- Going forward, you’ll be uploading your trials directly to the WheatCAP instance
- You’ll need to have an account on each database you’ll be using
2) Locations
Check if any of your Locations already exist
- Go to the Manage > Locations page
- Search the table or the map to see if your Location exists
Location Metdata
Make sure you have all of the required information for any new Locations
- Name (required): a unique name of the Location
- Generally, this is the town name and state abbreviation of the closest town to the trial location (such as Aberdeen, ID)
- If you’re creating a location of a specific field, use the naming convention of “{Town}, {ST} - {Field Name}” (such as Ithaca, NY - Caldwell)
- Abbreviation (required): a unique abbreviation for the location
- For town locations, we’ve been using the convention of the first 3 letters of the town, followed by the state abbreviation (such as ABEID)
- For field locations, we’ve been using the convention of the first 3 letters of the town, followed by the state abbreviation, followed by a ‘-’ and a 3 letter field abbreviation (such as ITHNY-CAL)
- Country Code (required): The 3-letter country code (USA)
- Country Name (required): The full country name (United States of America)
- Program (required): The name of the Breeding Program (as it exists in the Database) that uses the Location
- Type (required): can be
Farm, Field, Greenhouse, Screenhouse, Lab, Storage, Other
- We’ve been using Farm for town locations and Field for specific field locations
- Latitude (required): the latitude of the Location (in decimal degrees)
- Longitude (required): the longitude of the Location (in decimal degrees). Use a negative number for Locations in the Western hemisphere.
- Altitude (required): the elevation of the Location (in meters). You can use Elevation Finder to lookup the elevation for a location.
- NOAA Station ID (required): enter ‘none’ for the field
Adding a Single Location on the Map
If you only have a small number of Locations to add, you can add them individually using the map on the website
- Go to the Manage > Locations page
- Zoom into the area of your Location
- You can click the Layers button in the bottom right corner to turn on a Sattelite view
- Click the center point of your Location
- Click the Add Location link in the pop up
- Fill out the Location form
- Click the Store Location Details button to save the Location
Adding Multiple Locations with Upload Template
If you have multiple Locations to add, you can submit them all using the Location upload template
- Download a blank Location template
- Add a row to the template for each Location, filling in all of the required fields and any additional optional fields
- To upload the Locations, go to the Manage > Locations page
- Click the Upload New Locations link near the top right of the page
- Select your excel file
- Click the Upload button to upload and save the Locations
3) Accessions
Check if your Accessions already exist in the Database
Use the Synonym Search Tool to find potential matches
- Go to https://synonyms.triticeaetoolbox.org
- Select a Wheat Database
- Include all of the Database terms in the Search
- Enable Search Routines
- I’ve found the Exact Match, Remove Punctuation, and Substring Match search routines to be the most useful
- Enable a Case-Sensitive search
- Breedbase is case-sensitive, so it will treat the names “JERRY” and “Jerry” as different entries
- With this option enabled, the search tool will warn you if it finds a match that differs only in letter case
This tool is now integrated into the Accession and Trial uploads, so it’s not required to use the Synonym Search Tool separately. However, doing a separate search yourself gives you more control over the options it uses and you can find issues with Accession names before you try to upload a file.
Make sure you have all of the required information for any new Accessions that need to be added to the Database
- accession_name (required): The unique name of the Accession
- species_name (required): The name of the species
- Triticum aestivum
- Triticum durum
- organization_name: A comma-separated list of Breeding Programs that use the Accession
- synonym: A comma-separated list of alternate names (including differences in punctuation and/or letter case)
- notes: Additional comments about the Accession
- accession_number: Registered Accession numbers (such as GRIN PI numbers)
- purdy_pedigree: The Purdy pedigree string (female parent/male parent)
Adding new Accessions with the Upload Template
Fill out the Accession Upload Template for any new and/or updated Accession entries
- Download a blank Accession template
- Add a new row for each new or updated Accession
- Fill in the accession_name and species_name columns. Add any additional information (such as synonyms or Purdy pedigree strings)
Upload your completed Accession Upload Template
- Go to the Manage > Accessions page
- Click the Add Accessions Or Upload Accession Info link near the top right corner of the page
- Select the Uploading a File tab
- Select your Excel file
- Click the Continue button
- The upload will automatically send your Accession names to the Synonym Search Tool to perform a search
- Look through any potential matches and select those that do match
- Review any selected replacements
- Continue to save the Accessions in the database
Adding Pedigrees
Purdy Pedigree strings can be added in the Accession Upload Template
- these can be included when you first upload a new Accession
- they can also be added to existing Accessions by uploading the same template (just include the existing accession_name and species_name and the purdy pedigree string to add)
Day Two
Assignment
Follow the instructions from Day One to upload your own Locations and Accessions to the T3/Wheat Sandbox.
You should create/upload a Location for each field trial location.
You should upload any missing Accessions that are not already in the database that are used in your field trials.
New Topics
A Trial on T3/Breedbase is a planting at a single location in a single year. This means, you’ll need to create a new Trial for each new location and/or year.
An Experiment is a grouping of related Trials from the same Breeding Program. (Experiments are created by putting related Trials in the same folder on the Manage > Field Trials page).
1) Field Layouts
Multi-Trial Upload Template
The Multi-Trial Upload Template can be used to upload one or more Trials to the database. Each row in the template corresponds to one plot in a single trial. Add a new row to the template for each plot in the trial. If you’re uploading more than one Trial, you can add additional rows to the template below the first Trial.
Keep the plots/rows from the same Trial grouped together - the uploader will process one Trial at a time and expects the plots to be next to each other in the file.
Trial Naming Conventions
The Database includes Trials from many different Breeding Programs and the Trial name must be unique across all Trials for all Breeding Programs in the Database. Thefore, we recommend including these three components in the trial name (separated by a _
):
- Breeding Program and/or experiment name (or abbreviation)
- year
- location
For example: CornellMaster_2007_McGowan
, 5STADV_2001_Urbana
Plot Naming Conventions
Each Plot within a Trial also needs to have a globally unique name. This is what you use when you upload trait observations (you’ll reference the Plot by its unique name). The easiest way to create a unique Plot name is to combine the Trial name with -PLOT_{plot number}
.
For example: CornellMaster_2007_McGowan-PLOT_101
, CornellMaster_2007_McGowan-PLOT_102
.
You can use the CONCAT
function in Excel to combine the Trial Names and the Plot Numbers to generate the Plot Names.
For example: =CONCAT(A2, "-PLOT_", O2)
Plot Position Coordinates
Each plot can have a row and column position within the field. This information is critical in knowing the spatial layout of the plots within the field and is required for processing drone imagery (via UASHub). Each plot should have its own unique row/col pair (there should only by one plot at each position).
Template Structure
The Multi-Trial Upload Template contains trial-level metadata (such as the Trial name, planting date, etc). This information is the same for all plots in the Trial, so each row/plot from the same Trial should have the same values. The trial-level columns are on the left.
The Upload Template also contains plot-level metadata (such as the Plot name, Accession used in the plot, etc). This information will be different for each plot in the Trial, so each row/plot will have a different value. The plot-level columns are on the right.
Make sure you have all of the required metadata for each new Trial. All of the associated Breeding Programs, Locations, and Accessions must already exist in the Database before you can upload the Trial.
Trial Properties
The following properties relate to the trial:
-
trial_name (required) A unique name for the trial (cannot contain spaces). By convention, this is typically a concatenation of an experiment code, year, and location separated by underscores (ex: “USSRWWN_2008_Blacksburg”).
-
breeding_program (required) The name of the Breeding Program that performed the trial. The Breeding Program must already exist in the database (go to the Manage > Breeding Programs page to view and/or add a Breeding Program). For cooperative nurseries, we have created Breeding Programs to contain cooperative nursery trials.
-
location (required) The name of the location where the trial was help. The location must already exist in the database (go to the Manage > Locations page to view and/or add a location).
-
year (required) The year the trial was held.
-
design_type (required) The code for the trial design type. It can be one of:
- CRD Completely Randomized
- RCBD Complete Block
- RRC Resolvable Row-Column
- Doubly-Resolvable Row-Column Doubly-Resolvable Row-Column
- Augmented Row-Column Augmented Row-Column
- Alpha Alpha Lattice
- Lattice Lattice
- Augmented Augmented
- MAD Modified Augmented Design
- greenhouse Nursery/Greenhouse
- splitplot Split Plot
- p-rep Partially Replicated
- Westcott Westcott
-
description (required) A description of the trial and any additional notes with relevant information about the trial
-
trial_type The name of the trial type. It can be one of:
- Seedling Nursery
- phenotyping_trial
- Advanced Yield Trial
- Preliminary Yield Trial
- Uniform Yield Trial
- Variety Release Trial
- Clonal Evaluation
- genetic_gain_trial
- storage_trial
- heterosis_trial
- health_status_trial
- grafting_trial
- Screen House
- Seed Multiplication
- cross_block_trial
- Specialty Trial
-
plot_width plot width (meters)
-
plot_length plot length (meters)
-
field_size field size (hectares)
-
planting_date planting date (YYYY-MM-DD)
-
harvest_date harvest date (YYYY-MM-DD)
Plot Properties
The following properties relate to a plot:
- plot_name (required) A unique name (cannot contain spaces and must be unique across the entire database) for the plot. By convention, this is typically a concatenation of the trial name and plot number separated by a dash (ex: “USSRWWN_2008_Blacksburg-PLOT1”)
- accession_name (required) The name of the Accession observed in the plot (must already exist in the database)
- plot_number (required) the plot number
- block_number (required) the block number
- is_a_control use
1
for a control/check and 0
(or blank) for an experimental line
- rep_number The replicate number
- range_number The range number
- row_number The row number of the plot in the field
- col_number The column number of the plot in the field
- seedlot_name The name of the seedlot where the planted seed originated (must already exist in the database)
- num_seed_per_plot The number of seed transferred from the seedlot to the plot
- weight_gram_seed_per_plot The weight of seed (g) transferred from the seedlot to the plot
Upload your completed Multi-Trial Upload Template
Once you’ve filled out your Multi-Trial Upload Template, you can upload it to the Database:
- Go to the Manage > Field Trials page
- Click the Upload Existing Trials button near the top right corner of the page
- In Step 2, make sure the Multiple Trial Designs tab is selected
- Select your Excel file
- Click the Upload Trial Designs button
- The upload will automatically send your Accession names to the Synonym Search Tool to perform a search
- Look through any potential matches and select those that do match
- Review any selected replacements
- Continue to save the Trial(s) to the Database
2) Trait Observations
All of the Traits in T3/Breedbase are stored in a collaborative trait ontology (available via cropontology.org). This means that each Trait in the Database has an official name and identifier which is used to reference the Trait and is linked with a specified trait definition, method description, and unit/scale of measurement.
To upload your own trait observations, you’ll need to match your traits with those stored in the database. You’ll need to make sure the trait definition, method description, and unit/scale are all the same. If the only difference is the unit/scale, then we ask that you convert your values to match the unit/scale of the database trait - this reduces the number of similar traits and keeps the data more comparable across Breeding Programs.
If you have a trait that does not have a corresponding match in the Database, let us know and we can work on adding it to the trait ontology.
Trait Search
To search the existing Traits
- Go to the Search > Traits page
- Enter a search term in the Trait Name or Definition fields
- Click the Search button
- Review the Trait definitions to see if any of the traits match
Make sure to check the SCALE term. If the scale is different, you will need to convert your values before uploading them to the Database.
When you find a matching trait, you will need to know its name and ID when you are ready to upload observations.
If you want to save your Traits in the Database, you can create a List of traits to use later.
- From the Trait Search page, select one or more Traits from the results table
- Below the results table, you can add the selected Traits to a new or existing List
- Later, you can use this List to generate the Observations Upload Template with the correct column headers filled in
Composed Traits
T3/Breedbase currently only allows one value to be stored for each combination of plot and trait. In order to save multiple values for a trait (such as for a timeseries), you’ll need to associate each value with a “Composed Trait” - which is a combination of a base trait and a time term.
For example, if you’re measuring Canopy Cover with multiple drone flights, Canopy Cover - UASHub - %
will be the base trait and the time term will be the Julian Day (1-365: 1=Jan 1st, …, 365=Dec 31st) of the observation. This will give you a different trait for each day:
- May 1st = JD 121 =
Canopy Cover - UASHub - %|day 121|COMP:0000064
- July 19th = JD 200 =
Canopy Cover - UASHub - %|day 200|COMP:0000152
Drone Trait Lookup
To make it easier to calculate the Julian Day for a Date and to lookup the matching Composed Trait Name and ID, you can use the Drone Trait Lookup tool:
- Go to the Search > Drone Traits page
- Select a base Trait
- Select a calendar Date
- The tool will calculate the Julian Day and find the matching Composed Trait
- Use the Trait Header value in your Observations Upload Template
Observation Upload Template
Once you have found the Database Traits that correspond to your Traits, you can create the Observation Upload Template
In order to associate the observations with specific Plots, you’ll use the unique Plot Name for each plot (these were created in the Multi-Trial Upload Template)
We’ll be using the simple phenotyping spreadsheet format. This format has one required column for the plot name followed by a column for each observed trait.
The column headers are:
- observationunit_name for the plot name - which are the names we gave each of the plots earlier in the trial upload template.
- trait_name|trait_id is the format used for the column header for each observed trait. For example, for wheat grain yield in kg/ha, the column header would be
Grain yield - kg/ha|CO_321:0001218
. This format can be repeated across multiple columns for multiple observed traits.
Each row in the Observation Upload Template corresponds to a single Plot and each column corresponds to a single Trait.
The Plots from more than one Trial can be included in the same Observation Upload Template.
The cells for missing / unobserved values should be left blank. If you’re exporting from another program, make sure it doesn’t fill in NA
or .
's for missing values.
Example Observation Upload Template:
observationunit_name | Grain yield - kg/ha|CO_321:0001218 | Grain test weight - g/l|CO_321:0001210 | FHB incidence - %|CO_321:0001149 | FHB severity - %|CO_321:0001440 | FHB DON content - ppm|CO_321:0001154 |
AYT_2019_Ithaca-PLOT_101 | 1984.58 | 739.01 | 53.15 | 8.91 | 1.13 |
AYT_2019_Ithaca-PLOT_102 | 2044.19 | 756.15 | 35.59 | 8.08 | 1.75 |
AYT_2019_Ithaca-PLOT_103 | 2356.95 | 787.56 | 36.02 | 20.78 | 1.52 |
AYT_2019_Ithaca-PLOT_104 | 2061.00 | 752.54 | 69.92 | 20.64 | 2.25 |
AYT_2019_Ithaca-PLOT_105 | 1535.94 | 769.60 | 78.59 | 9.40 | 1.38 |
Once you have a filled-in Observation Upload Template, you can upload it to the Database:
- Go to the Manage > Phenotyping Results page
- Click the Upload Spreadsheet link near the top right corner of the page
- Make sure the Simple format is selected
- Select your Excel file
- Click the Verify button
- This will check your file to make sure all of the Plot and Trait IDs exist
- If the file verification succeeds, click the Store button to save the values to the Database
Day Three
Assignment
Follow the instructions from Day Two to upload your own Trials to the T3/Wheat Sandbox.
If you have any observation data at this point, you can add the observations to your Trials. If not, you can match up all of your traits to the Trait names and IDs in the database.
New Topics
1) Lists
A List is a collection of items of a single data type (such as Accessions, Trials, Traits, etc). They can be saved so you have easy access to them and they can be made public so others can use them.
Creating a List
There are a number of different ways to create a List and add items to a List
-
List Manager:
- Click the Lists button in the Toolbar to open the List Manager
- Enter a Name and click the New List button at the top to create a new list
- The table will display all of the Lists associated with your account
- Click the name of the List to edit its contents
- Editing a List:
- set its type (Accessions, Traits, Trials, etc)
- validate the List contents (make sure all of the items exist in the database)
- add or remove items
- Make a list public by clicking the Share icon in the List table
-
Search Wizard:
- Items can be added to a new or existing List directly from the Search Wizard
- Select a data type and items of that type in a column
- Below the column choose to add to add the selected items to an existing List or enter a name of a new List
Be careful not mix items of different data types (don’t add Traits to a List containing Accessions). The drop-down menu will include all Lists and not just those of the selected data type.
-
Search Pages:
- Items can also be added to new or existing Lists from most of the search pages
- Search for or select displayed items in the results table and then look for the Copy Results to a List section below
Uses
Lists are used throughout the website whenever a collection of Items is needed, such as selecting data for download or using data in an analysis tool
2) Data Aggregation, Subsetting, & Download
-
Search Wizard
The Search Wizard is the best way to aggregate, subset, and download data. Here you can select one or more data types, items from the data type, and then download the matching data.
The Related Genotype Data and Related Phenotype Data sections below the Search Wizard can be used to download the selected data.
-
Download using Lists
The Manage > Download using Lists page can be used to download information about items of a single data type (ie, Accession Metadata and pedigree) or Trial Phenotype data using Lists of Accessions, Trials, and Traits.
3) PHG / Imputed Data
- Finding Data FTP site
About => FTP Data
https://files.triticeaetoolbox.org/
- Finding Data Wizard
Search => Wizard
https://wheat.triticeaetoolbox.org/breeders/search
Select Genotype protocol for column 1
Select Genotype project for column 2
Only GMS, 9K v2.1 and 90K v2.1 have been imputed
- JBrowse
Maps => JBrowse PHG
shows genes, number of haplotypes for each region
exome markers used to build PHG
- Limitations
Exome and Array data will give best results
GBS imputations will have lower accuracy
US Cultivars will give best results
Land Races will have lower accuracy
- PHG v2 created by Katie Jordan with 470 accessions
- Imputation Requirements
VCF file of genotype data
Check and correct Accession names if present in T3
Alligned to RefSeq v2.1 assembly
Chrom - 1A, … 7D
Pos, Ref, Alt - must match RefSeq v2.1
ID - unique name
Day Four
Assignment
For this assignment, you’ll be trying to find data in T3 that is related to Accessions in your Breeding Program.
First, find some Accessions from your Breeding Program that might have had data submitted to T3 (such as past phenotyping trials or data submitted to a genotyping lab).
Next, put your Accession names through the Synonym Search Tool to try to find matching Accessions in the database.
Then, create a List of the matching Accessions and use the List to download some related phenotype and/or genotype data.
New Topics
1) Android Field Book
- Features
Integrated with Breedbase
phenotyping
reads QR codes
captures photos
- Connecting with Breedbase
About => Field Book App
- Loading fields, trials, and traits
- Exporting to Breedbase
2) Seedlots
A Seedlot represents a single packet of collected seed
A Seedlot has the following properties:
- Unique Name
- Location
- Box Name
- Accession
- Contents
- Count or Weight
Transactions:
- Created each time seed is added or removed from the Seedlot
- Linked to Plots that use seed from the Seedlot
- Linked to other Seedlots when splitting / combining
Creating Seedlots
Using Seedlots
- A Seedlot can be referenced (by name) for each Plot in the Trial Upload template along with the amount/count of seed used in the Plot
- This will create a Transaction for the Seedlot - deducting the amount of seed used from the Seedlot’s total amount
- The transaction will track the Plots that use seed from the Seedlot and where the seed comes from for each Plot
3) Barcodes
Barcodes can be generated for:
- Accessions
- Trials
- Plots
- Genotyping Plates
Features:
- 1D or 2D (QR Codes) supported
- Pre-defined templates
- Custom label designer
- Custom templates can be saved & shared
Use Cases:
- Barcoded Plots:
- Generate a unique barcode for each Plot in a Trial
- Use Android Field Book app to scan the barcode before recording observations
- Ensures the user is recording traits for the correct plot
- Seedlots:
- Generate a unique barcode for each Seedlot
- Use the Android Coordinate app to scan the barcode when collecting samples for Genotyping
- Ensures the correct Seedlot is recorded for each sample
- Anything Other Text:
- You can create a list of any free-form text
- Use the list to generate barcodes for the items
- The items don’t need to already exist in the Database
- See video from Jessica Rutkoski
Custom Label Designer:
The Custom Label Designer is available from the Manage > Barcodes page and clicking the Design Custom Barcode button at the top right of the page.
-
Choose Data Source
- Field Trial
- Genotyping Plate
- List (can be a List of any data type)
-
Select Page Size & Label Dimensions
-
Add Components to Label
-
Download PDF of Labels
4) Submitting Genotype Data
- Requirements
- accessions
- reference genome
- file format
- protocol, project
- Loading through website
- check for accessions not in T3
- check chromosome name, ref/alt, and ID
- Define genotype project
- Define genotype protocol
- Select file