AgYields – a national database for collation of past, present and future pasture and crop yield data

The New Zealand agricultural sector has a rich heritage of measuring yield and growth rates for pastures and crops. These data are expensive to collect, spatially and temporally patchy, and stored in a range of electronic and physical platforms. A challenge for data collection and storage is the different priorities and skill sets of those undertaking the task. Thus, there is a need to provide guidelines for the collection, collation and publication of such data to standardize best practice and maximize the value gained from increasingly scarce resources available for pasture and crop research to support the primary industries. In addition, declining funding for field research, means there is an urgent need to draw together existing and future data into a publicly accessible industry good resource. This paper outlines the development of the AgYields web-based repository for pasture and crop growth rate and yield data. It describes the rationale for the database and the need for standardization of data collection to maximize the value of stored data in common formats. The intent is to provide a resource to enhance livestock and crop production systems throughout New Zealand and provide guidelines for future data collection.


Introduction
There is a paradigm shift in research to make data easily accessible and FAIR, which requires information to be findable, accessible, interoperable and reusable (Wilkinson et al., 2016). A database is a collection of organized data which must be easily accessed, managed and updated. Computer databases usually contain aggregations of data records or files, containing information about different activities, publications, measurements or interactions with specific consumers/ purchases clients, authors, and locations. A database is typically used for academic and research purposes in all study fields. However, the end-users of a database may be farmers, students and businesses. Consequently, any database needs to be designed to fit multiple purposes.
Knowing what those purposes might be requires a degree of imagination, design and analysis. Regardless of future uses, it is likely that researchers will continue to investigate pasture and crop growth and development (phenology). These measurements have a long history of investigation by botanists, agronomists and ecologists, with particular interest being the interactions between plant biology and the environment (de Reaumer, 1735; cited in Chuine et al., 2003). The advent of computing power, through the 1980s enabled the integration of crop and pasture physiology knowledge into models for individual plant species, particularly those with economic relevance such as wheat (Jamieson et al., 1995) and maize (Ritchie and Alagarswamy 2003). Expansion of this approach has led to interchangeable modelling platforms for different crops, e.g., APSIMX (Holzworth et al., 2014;Brown et al., 2018) and animal production systems (e.g. DairyMod; Johnson et al., 2003).
More recently, FAOSTAT (www.fao.org/faostat/ en/#data) aimed to build digital databases to summarize information on crop yields. This is the most comprehensive statistical database on food, agriculture, fisheries, forestry, natural resources management and nutrition (FAO 2021). It provides data about area harvested, yield and quantity produced of main crops by country or region from 1961 to 2019. FAOSTAT provides a useful reference for 173 primary products with an emphasis on crops defined as fibre crops, cereals, coarse grain, citrus fruit, fruits, jute-like fibres, oilcakes equivalent, oil crops, pulses, roots and tubers, tree nuts and vegetables and melons. It contains less information about pastures and forages which are direct grazed, or harvested for hay/silage, or green feed crops which are particularly important in New Zealand's pastoral-based agriculture .

New Zealand pasture and crop data
In New Zealand, knowledge of species yields, growth rates and other traits, such as flowering time, have long been recorded. However, the data remain scattered over many sources, including peer reviewed journals, monographs, technical reports and on-farm field notes. These data may reside on many different platforms, and have been collected and stored in many different ways. Definitions and measurements are highly variable despite well-documented guidelines for standardization (Hodgson 1979;Hodgson and British Grassland Society 1981;Allen et al., 2011). Data can be retrieved from different databases, however, these are often restricted to single institutions and cover a limited number of species. The AgYields database aims to provide a repository for all pasture and crop yield data collected in New Zealand.
For cropping, the New Zealand Foundation for Arable Research (FAR), established in 1995, has a research focus on crop species such as wheat, barley, oats, maize, pulses, herbage seeds, brassicas, forage, vegetable seeds and cereal silage. The common integration of cropping and pastoral farming in New Zealand means a central repository for both plant types is a desirable outcome for the benefit of the primary industries.

AgYields; an overview
The AgYields National Database https://www.agyields.co.nz/home pools historic data, and information from current and future studies, to provide as much data as possible for all agricultural regions of New Zealand. AgYields is a relational database that links plant species and mixed species sward data to reference information about data source, location (region and sites), soil type, basic management practices (i.e., irrigated vs. rainfed) and the dominant species at the site as base information. Once formatted for a given study, specific treatments (i.e., fertilizer levels, species/cultivars, sowing date) from that study can be included. The compulsory data required are from successive measurements (to enable a time interval to be calculated) of either (i) yield (biomass and grain, i.e., kg DM/ha) or (ii) growth rate (kg DM/ha/day), and there is an option to include (iii) flowering date.
The general objective of the AgYields National Database is to provide a tool for farmers, rural professionals and researchers to identify the most suitable pastures and crops for different districts and thereby help develop more resilient pasture and crop systems. Several reasons have been identified for the formation of the database: 1) The emergence of models which require these fundamental data, for wider regional locations, which are being applied to land use evaluation and planning such as AgInform (Rendel et al., 2020) and APSIM (Vogeler et al., 2016). 2) Emerging challenges to crop and pasture productivity and persistence (e.g., climate change; environmental regulations) which will require data on a range of species. 3) Need for local data for different species to inform feed budgeting programmes for individual farms within environmental boundaries. 4) Ongoing loss of legacy knowledge with regard to where and when such data have been collected historically. 5) Cost and risk of archiving paper-based material such as old reports and associated data.

Structure of the AgYields National Database
AgYields is based on a prototype 'National Forage Database (NFD)' developed by DairyNZ for compiling dry matter yields of pasture and forage species up to 2019. The data entry framework from the NFD has Journal of New Zealand Grasslands 83: 15-24 (2021) been retained and expanded in the AgYields database. AgYields has been programmed to be accessible online ( Figure 1) and does not require additional software installation. The geographical range covered by AgYields encompasses all New Zealand territory between latitudes -32.000 to -47.9999 and longitude 165.0000 to 179.9999. One key factor in recording spatial (region, site, latitude, longitude, altitude) and temporal (sowing, measurement dates) data is the ability to link the yield and flowering data with meteorological information to generate mathematical relationships, such as thermal time requirements for growth and development (Moot et al., 2003; or water use (Brown 2004). The regions designated for data collection have been aligned with those used in the Pasture Growth Forecaster that integrates with the FARMAX® farm modelling platform (Beef + Lamb New Zealand and Farmax 2012).

Data collation methods
AgYields datasets can be drawn from published (i.e., peer reviewed journals) or unpublished (theses, raw datasets, technical reports, farmers records) sources. It is expected that some sources of unpublished data will not have the same rigour as fully replicated experiments. However, it is believed there is value in gathering all possible data to build a complete picture of pasture and crop growth throughout New Zealand. On-farm data may become more common as farmers are required to collect and report more information for regulatory compliance purposes. To date, the oldest dataset extracted and collated dates back to the 1960s (Vartha 1973), but it is expected that older datasets will become available.
The process of data input involves several distinct steps: Step 1: Collation of historic published papers and datasets into a common repository. An initial screening of published papers (up to 2019) was performed by DairyNZ with an emphasis on papers reporting seasonal and total annual DM yield of grazed ryegrass and white clover pastures in dairy regions. These data were entered into their NFD which provided the template for AgYields ( Figure 1). Restriction to New Zealand publications was a pragmatic decision necessitated by limited time availability of staff. Data from classic publications that describe seasonal pasture distributions across the country (e.g., Radcliffe and Cossens 1974) were included. Inevitably, some papers published in international journals, such as Grass and Forage Science, will have been missed during this exercise, but modern on-line search programmes will help rectify this limitation.
Step 2: Input of existing unpublished (not peer reviewed) datasets. Unpublished datasets are technical reports, monographs, dissertations, data sheets which have not been peer reviewed. Commercial trials by fertilizer or seed companies would be included in this category. The number and quality of these datasets are variable. The risk of losing historical datasets over time is substantial, particularly for records before 2000 when most notes were handwritten and kept in personal hard copy files. Libraries and some research institutions have been scanning and saving a digital copy of these archives. However, there are still field notes and records which have been lost or are in threat of vanishing (Hawke, personal communication). Another challenge is the cost associated with archiving hard copies of reports, booklets, notes and maps. Anecdotally, a change to open plan offices and retirements has led to the loss of these data, as down-sizing reduces storage space. An objective of AgYields is to provide a repository to compile, digitise, collate, and save as many field notes and historical pasture records as possible.
Step 3: Collection of current and future datasets. Further development requires current and future datasets to be included, particularly those from publicly funded research programmes, such as from students at Lincoln and Massey University, and industry research, such as the Hill Country Futures research programme. A systematic approach is required to input these datasets because field notes and raw data sheets are not standardised and differ according to the user. Dry matter yield entries from published and unpublished sources have been deposited during 2020/21( Figure 2).
Step 4: The development of a software package for data capture, storage and retrieval.
In 2021 the development of the relational database (AgYields) enabled experimental details to be stored separately from the datapoint value. This minimized the  Step 4: The development of a software package for data capture, storage and retrieval.
In 2021 the development of the relational database (AgYields) enabled experimental details to b stored separately from the datapoint value. This minimized the number of fields (or columns) th need to be inserted in data entry mode. The software development was funded by the T.R. Ellett Agricultural Research Trust and Lincoln University is hosting the database to provide the public access platform. The AgYields database is available at https://www.agyields.co.nz/home.
Users must register to add data. This ensures security of data and data quality control. Once the is registered and logged in, the dashboard page will be displayed showing a list of all available datasets (Figure 3a). From this page, users can insert a dataset (input) or query (output) all data Users can only delete their own datasets but have access to all submitted datasets as part of their download. It is important to ensure that each entry in the data base has a full reference to its orig

Figure 2
Number of dry matter (DM) yield entries into the AgYields database from January 2020 to June 2021 (Draft datasets).
number of fields (or columns) that need to be inserted in data entry mode. The software development was funded by the T.R. Ellett Agricultural Research Trust and Lincoln University is hosting the database to provide the public access platform. The AgYields database is available at https://www.agyields.co.nz/home. Users must register to add data. This ensures security of data and data quality control. Once the user is registered and logged in, the dashboard page will be displayed showing a list of all available datasets ( Figure  3a). From this page, users can insert a dataset (input) or query (output) all data sets. Users can only delete their own datasets but have access to all submitted datasets as part of their download. It is important to ensure that each entry in the data base has a full reference to its original source, whether a published book, article, database or on-farm measurement. Data are referenced according to the site, including georeferenced (if known) information and site characteristics (Table 1). When specific site characteristics are missing these can be derived from an indicator (i.e., geographical references) or other sources (i.e., S-map; Lilburne et al., 2004; Manaaki Whenua -Landcare Research 2019).

Data retrieval
Users need a web browser and username to query the database (Figure 1). There are many options to retrieve data, for example according to the region and/or species of interest using the buttons and commands shown in the menu bars (Figure 3a, b) or filter options in each of the columns displayed (Figure 3c). These tools enable users to select their datasets of interest. Datasets with 'Submitted' status indicate the data entry procedure is complete and checked by the submitter. Datasets as 'Draft' have not been verified or completed by the submitter. Registered users may query the database and download results by clicking the download icon ( ) at the right side of each dataset. For example, to obtain all information from datasets published in 2003, filter by publication year (Pub Year; Figure 3). The output consists of tabular arrays stored in spreadsheets as

Figure 3
View of (a) dashboard and all datasets (b) by clicking over the Species field enables the user to search or select for a particular species (c) by hovering and clicking over the filter icon enables a search for a particular publication year (Pub Year).
'comma-separated value' (CSV) files (Allan et al., 2012) which are saved directly to the user's PC download folder. Currently, the CSV file generates a data table which contains 47 columns (Table 1). In addition to the 'real' data (yield, growth rate and flowering time) the CSV holds the metadata. Metadata are 'data about the data', and refers to extra information that, while not strictly part of the recorded data, defines how the data were documented. Metadata are necessary to work with the associated data. For example, this includes the yield and flowering date units or soil type, irrigation or fertilizer treatment or species and from which source.
The downloaded CSV files are named according to the Title (Row 2, Table 1). To open and visualise the CSV files requires standard programmes, like Microsoft Excel, Notepad (as a txt file, using the Command Prompt app in Windows (de Bruin 2020) or another programming package, such as Python™ or R©. Users can then analyse, export and graph using Excel or other software such as Sigmaplot©, Veusz (https://veusz. github.io/) or R©. As the AgYields work progresses an option to bulk export will be available.

Data entry mode
Users can insert a dataset into AgYields using the 'Create New Dataset' yellow button located at the top right dashboard screen. The next screen is designed to create a dataset which starts with the entry of the Study Reference Details. A message instructs the user to: '…complete the study reference details below with as much detail as possible. Title, Author(s) and Publication Year are mandatory fields for published datasets. These refer to output Columns 1-10 (Rows 1-10 in Table 1). In this screen users can indicate to other people whether they hold additional data such as meteorological (Met) files, photos, soil water data or/and raw data (Columns 44-47 in the CSV file, Table 1). After the reference details are entered, clicking on the 'Create now' button at the left bottom side of the screen moves to the second page where classification information is entered.
The site and experiment screen refers to site description (Region, Location Name, Site Name, Latitude, Longitude, Altitude). In this step, site basal attributes can be entered. These include the whole experiment and not the treatment information, e.g., species, if it is a fertilizer experiment across a single Example of (a) the AgYields National Database data entry mode -Site(s) and Experiment(s) page with a created grid for data entry.
species. However, if different species are compared, then that is entered in the experimental details. The same logic applies for fertilizer and irrigation. This section refers to any fertiliser or irrigation applied across the experimental area (i.e., not treatments). Within the 'Site' page, users can create one or multiple experiments according to the original dataset. In most cases, a single experiment would be reported, but at times more than one experiment may have been conducted at this site. Users can customise the data entry grid at the site-treatment level. For example, an experiment that compares different nitrogen and irrigation levels can be tailored to contain only the key columns to simplify the data entry. The common site and reference details are related and will appear at the top part of the screen (Figure 4) without the need to be typed for each data value.
Once the experimental grid is created, it is possible to type the values into cells, to edit or to work in conjunction with an Excel spreadsheet to modify, copy and paste information into cells directly. The user can save the data entry during the process and come back to edit and finish data entry later.
There is the option to add another set of data within the same Site by clicking the 'Add new experiment' button at the left menu ( Figure 5). Then a screen will pop up where the user can specify the experiment name and measurement details. Once fields are filled in, the 'Save' button will generate a new grid for data entry. To cancel the procedure simply press 'Cancel' button.
On the top right-hand side there the buttons which allow some commands. For example, the datasets are initially saved with a 'Draft' status ( Figure 3 and Figure  6) which allows the user or the team of AgYields to check the information entered prior to final submission, when the dataset status is converted to 'Submitted'.

Discussion
The creation of an industry-appropriate agricultural database requires the co-operation of researchers, funders, and users to maximize its value. The AgYields database has been created to provide a repository for the basic yield data that underpins many functions on farm and for research purposes. These data have relevance for agricultural students for project work, many of whom will become farm consultants or researchers in the future. The opportunity exists to expand the format of data downloaded to make it easier to create figures of pasture and crop production in different regions of New Zealand. As researchers develop more powerful computing tools, the opportunity exists to use the data to compare past, present and future pasture production across a range of sites in New Zealand.
During development discussions, the decision was made that both published and unpublished data should be supported. This was because there are many datasets that are not formally published due to time and funding pressures. These are valuable to the agricultural community to examine what has been done previously. The deposition of a dataset into the AgYields database could be a mandatory step in the final milestone for research that has been publicly funded. The inclusion of a Digital Objective Identifier for individual datasets will encourage those who do not have resources to publish an academic article (but have time to add into a Database) to share information and be recognised. This sharing practice is becoming increasingly popular (White and van Evert 2008; da Cruz and do Nascimento 2019). This step is less onerous than the commitment to publish and suits data that may not be fully replicated or as scientifically robust as full experiments. This has value in generating a picture of the location and Example of the AgYields National Database data entry mode -Adding a New Experiment page.

Figure 6
Detail of right upside menu the AgYields National Database data entry mode -Help Guide-Go to Dashboard-Save and Submit.
type of research that is occurring throughout New Zealand. Source information enables the user to assess the merit of the data for their intended purpose. It is believed that, with adequate resources to collect data in a standardized way, the rural community is likely to contribute datasets throughout New Zealand, hence providing an opportunity for the agribusiness and rural community to determine gaps in the current knowledge base at a localised level.
To enable a wider community to submit data requires support documentation, such as videos of how to collect robust and standardised data. There is a need to revisit implications of the different methods used to generate crop and pasture growth rate data. Technically, the collection of pasture data has been described many times (Hodgson 1979;Hodgson and British Grassland Society 1981;Cayley and Bird 1996;Allen et al., 2011), but these practices may not always be followed. Currently, AgYields can capture different methods of harvest (Table 1) such as quadrat or cage cuts, RPM, machine harvest, pasture probe, or C-Dax. Pasture yields may be cut to ground level or a residual/ height. This will affect data comparison and a way to distinguish different methods of data collection may be necessary in future. It is believed that the creation of the AgYields database provides a first step in drawing together the data required to drive the next century of crop and pasture research and innovation. Collectively, data will continue to provide the evidence base that underpins the success of the New Zealand primary industries.
Farmers, students, consultants, regional planners, researchers, and Government agencies will benefit from the availability of pasture and crop data in a centralized location and format. This database is publicly accessible for the benefit of the NZ pastoral and arable industries. As with all databases, users will determine required improvements and the value of the data to the community. Potential future developments in AgYields could be the inclusion of plant traits (i.e., nutrient content, metabolizable energy, digestibility).

Conclusions
A single database now exists as a repository of pasture and crop data to enhance current and future research and agribusiness requirements. The application and utility of the database will be determined by the willingness of private and public industry organisation to share data for the benefit of the primary sector. The AgYields National Database provides an opportunity for centralization of resources and ultimately increased efficiency through reduced duplication of effort in collecting valuable pasture and crop datasets. Its value will be increased when data collection methods and reporting are standardized.