Step 8: Check your data

Step 08:  Check your NCAS data

Who’s involved

Instrument scientist, NCAS data manager

Introduction

Interoperability only really works if we stick to following agreed ways of working. The Checksit tool has been built on the ‘NCAS Standards’ to help provide feedback whilst preparing the processing code to ensure output data meet these standards.

It’s a community tool that can have amendments made to it and used in a wide range of ways. The code is open source, so it can be set up locally or used on JASMIN where it is already installed.

As these draw on our defined lists of instruments, data products and locations/platforms, the checksit tool will automatically ensure that elements are consistent across the data pipelines and archive.

More details on the checksit tool can be found here.

Workflow

This is the suggested workflow for you to follow. This will be using the NCAS General Data Standard:

  1. In step 07 you should have identified how your NetCDF files should be formatted or how to prepared your image or data plot files.
  2. Find the checksit tool. You can either:
    1. Use the checklist command already installed on jasmin or
    2. You can download it and run it yourself. Instructions are available here: https://checksit.readthedocs.io/en/latest/
  3. Run checksit . If you are using jasmin you can try this command using a sample file:
/apps/jasmin/community/checksit/checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
  1. If checksit ran correctly, it might show some errors or warnings. These errors are to help diagnose where the NetCDF file doesn’t comply with the NCAS Standard.
  2. Now you can run checksit on your own files, but replacing the filename with your NetCDF file.
  3. The output may present you with errors or warnings, e.g. 
[file name]: Invalid file name format - 'ashfarm' is not a valid platform in the CEDA catalogue
  1. Checksit has looked at the filename and has found that 'ashfarm' is not a platform that exists in the CEDA catalogue. Possible reasons could be:
    1. 'ashfarm' needs to be added to the list of valid platforms registered in the CEDA catalogue. For this, you should contact the NCAS data manager
    2. An appropriate platform record does exist in the CEDA catalogue, but the identifier for it may be different/needs adding. For this, contact the NCAS data manager. [It could be that the site is known by an alternative name and the identifier for that should be used instead]
  2. The error and warning messages should help you adjust you NetCDF files to remove error messages.
  3. There is a possibility that there are mistakes in the data standard or the checking against it. For this please raise the issue with the NCAS data manager.

Future Plan

The Checksit tool will get further refinement to help make the output easier to understand and to use. Additionally, it will continue to grow in terms of the range of standards that it can be applied to and more widely adopted for other data types too.

Further Details

JASMIN documentation on the community software instance of Checksit: https://help.jasmin.ac.uk/docs/software-on-jasmin/community-software-checksit/

Checksit read the docs pages covering both a more indepth user guide and also instructions on how to install and develop new checks for the Checksit suite : https://checksit.readthedocs.io/en/latest/ 

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us