Quick Guide: Errata, New Versions and Reprocessing
Overview
At times there will be a need to follow up on already published data to address issues around changes in processing or calibration or just that an error was spotted that needs to be corrected for. To manage this, there are ways to prepare and send along these adjustments to already published datasets that revolve around marking these data up with incrementing the version number.
Errata
It can happen that errors are discovered in data that has already been published. These could include erroneous data (e.g. incorrect calibration, errors in the data analysis) or missing variables. In this case the data should be reprocessed and resubmitted to CEDA. Depending on the level of the change the version numbering should be increased according to the guidelines below. Any changes in version number should be amended in the resulting NetCDF file.
New and improved data analysis
It could also be that a new and/or improved data analysis technique has been implemented that was not available when the original dataset was published. Again, the data should be reprocessed and resubmitted to CEDA and the version level should be incremented as indicated below and added to the NetCDF file attributes.
Versioning
NCAS data should follow an established versioning pattern. The one that has been chosen for NCAS data within the NCAS Data Standards follows the commonly used pattern:
m.n.p
Also known as:
Major.Minor.Patch
Where:
- Major (m) versions signify a breaking change from previous major versions. For example, a change in the underpinning data products or data standard being followed.
- Minor (n) versions denote non-breaking but important changes, e.g. a change in the processing chain/calibration that results in a substantive change to data
- Patch (p) versions are to capture small changes such as correcting for mistakes in global metadata fields, but don’t substantively alter the actual data themselves nor their understanding
Is there a need to re-process data?
In some cases (e.g. updating the global attributes) it may be possible to use the previous processed data files and amend them accordingly.
Other cases (e.g. a new analysis technique) may require the raw data to be reprocessed using the updated data analysis software.
Once the data has been reprocessed, it can be resubmitted to the archive using the usual delivery tools. The version numbering should ensure that these files are archived into a new version folder for the deployment, ready for a new catalogue entry and publication. However, flagging this to the CEDA archive is always helpful to make sure that they can look out for these new data and ensure they are shepherded into the right part of the archive and older datasets are marked as superseded with links to the new datasset.