Quick Guide F: Directory and File Naming
What are the Directory and File naming conventions for NCAS Data?
To support data pipelines to be as automated as possible and consistent as possible a series of conventions have been developed for use at various points along the data pipeline. Some aspects make use of identifiers already established as part of the instrument registration process, others are determined by the particular operation or processing that is needed.
When to Use them
These should be used for all data going through the NCAS Observations Group Work Space, including the final step to get the data sent over to the CEDA Archive. Following this the CEDA Archive will utilise the information provided to automatically place the data into an appropriate location in the archive to support data publication and connections to other relevant data (either from the same instrument, project etc as needed).
How to use them
Within the NCAS Obs GWS directories.
The overall directory structure for a registered instrument will have been set up as part of the registration process. However, for the purpose of processed data being prepared for delivery to the archive a key directory name to establish is for the ‘deployment’ directory. This, along with versioning, allows each distinct dataset for an instrument to be identified. An instrument’s deployment signifies the installation of an instrument for a given purpose, be that long-term measurements or a specific campaign. These deployment directory names should take the form:
<startdate>_<deployment-type>
Where:
Startdate is first date the instrument is on the deployment (this may be before the first data timestamp in the final data)
Deployment type captures the nature of the instrument’s deployment. This will depend on the instrument’s mode of operation:
- Long-term instruments (i.e. 24/7 deployed instruments providing (near) continual measurements should use:
longterm
(no spaces etc) - Campaign deployed instruments should use a suitable shortname for the campaign (e.g. namblex, csip-pilot). These should coincide with ones used on a CEDA Project record. View all Platform records to see which already exist and use the lower-case abbreviation for the project. If one does not exist, contact CEDA to discuss this further as other project details may need to be captured.
- Alternating mode instruments (i.e. those that switch between campaign and long term measurements should use campaign name as above or, when returning to long-term measurement operations, the location (e.g. ‘cardington’, or an NCAS observatory’s shortname - e.g. cao, wao, cvao etc.)
Version Directory
<version>[_<release-type>]
Where:
- version - follow major.minor.patch structure as needed. These parts have the meaning:
- major breaks with previous version (e.g. different NCAS standard version used which has different data structure
- minor adds in additional items to an existing version (e.g. additional parameters, but overall the same NCAS standard version structure is being followed)
- patch for errata (same file contents, but errata needs to be issued compared to previous release
- release-type - an optional extra field where needed to convey additional release meaning. This should be one of:
- initial - an initial release of the data that should be treated with caution
- verified - a fully QC-ed version of the data following on from an initial release that has been made for earlier access (e.g. yearly release cycle)
Examples:
V2.0.0 - breaking change from v1 V2.1.0 - minor change, new parameters added in V2.1.1 - correction for errors in previous release
File Naming Convention:
Filenames will have the structure:
<instrument_name>_<platform_name>_<YYYY>[<MM><DD>-<HH><mm><SS>]_<data_product>_<option1>_<option2>_<option3>_v<version>.nc
Components:
The components are split by underscores (‘_’) whilst dashes (‘-’) replace spaces within components. These should be lowercase where possible and use characters from: a-zA-Z0-9.
instrument_name: name of instrument as registered with NCAS instruments list (or for non-NCAS instrument, as registered with CEDA (see Step 01: Registering your instrument how to get an instrument identifier). These will typically follow the form -. E.g. ‘york-o3’, ‘ncas-aws-1’
platform_name: where or on what (in the case of being mounted on a moving platform, e.g ship, aircraft, tower) was the instrument deployed. NOTE: these are a controlled list held in the CEDA catalogue. To find existing entries view all Platform records in the CEDA data catalogue and use the ‘abbreviation’ for the record. Should a suitable record or abbreviation not be present, contact CEDA to agree one to use.
Date-Time: YYYY[\-\. A suitable date-time string (in UTC) that marks the beginning of the data in the file to a suitable resolution for the data contained
Example:
for a file containing 1 years (YYYY) worth of data: 2016,
for a file containing 1 months (MM) worth of data: 201604,
for a file containing 1 days (DD) worth of data: 20160401,
for a file containing 1 hours (HH) worth of data: 2016040109,
for a file containing 1 minutes (mm) worth of data: 201604010950,
for a file starting at a specific time - for example launch time of soundings: 20160401095059.
data-product: name of the defined data product as per the NCAS Data Standard being followed
option1, option2 & option3: these are optional extras providing more information to the user.
version: version of the data set n.m.p: n - major revision integer, m - minor revision integer, patch - patch version integer
Examples:
ncas-aws-1_ral_29001225_surface-met_30m_v0.1.nc ncas-lidar-dop-1_ral_29001225_aerosol-backscatter-radial-winds_fixed_co_advanced_v0.1.nc
Tools
Further Details
CEDA Archive Structure
The deployment name and structure below that for the processed data will then be lifted into the archive as follows:
/badc/ncas-<observatory name>/data/<instrument name>
Below which :
/<deployment name>/
<version>[_<release-type>]/
yyyy/mm/dd (as needed)
- observatory-name - code list
- One of: cdao; wao; iao; bttao; cvao; cao or mobile
- instrument-name - code list
- e.g. ncas-mobile-wind-profiler1
- Deployment-name - form is <startdate>_<deployment type> (see earlier in this document)
- Startdate of data is consistently known (and will list in order)
- Deployment types: campaign-name; longterm (location for alternating mode instruments as a ‘campaign’ equivalent)
- Version - follow major.minor.patch
- (major breaks with previous;minor adds in additional items;patch for errata)
- Release-type - where needed; initial ; verified
- yyyy/mm/dd - standard splitting as needed
Long-term instruments
/badc/ncas-cao/data/ncas-anemometer-2/20200926_longterm/v1.1/
/badc/ncas-cdao/data/ncas-ceilometer-7/20040823_longterm/2.0/
Campaign deployment instruments (and pre-amof data)
/badc/ncas-mobile/data/ncas-fage-1/20020704_namblex/v1.1/
/badc/ncas-mobile/data/ncas-fage-1/20020704_namblex/previous_v1
symlinked to -->/badc/namblex/data/leeds-fage/
‘Alternating mode’ instruments (treat ‘long-term’ deployment as a ‘campaign’ where campaign name is the location of deployment)
/badc/ncas-mobile/data/ncas-radar-wind-profiler-1/20020801_namblex/v1.1/yyyy/mm/dd/
/badc/ncas-mobile/data/ncas-radar-wind-profiler-1/20020823_cardington/v1.1/yyyy/mm/dd/