Step 10: Getting your data published and tracking its use
Step 10: Getting it published and tracking its use
Who’s involved
CEDA Archive, CEDA Data Scientists, Instrument scientists
Introduction
Making the data available in long-term repositories will aid wider discoverability and re-use of the data. It also ensures that NCAS are meeting their legal requirements to make data produced from the public purse discoverable in standards driven ways for the community in general.
Workflow
Once at CEDA, the data will automatically be ingested into the archive based on the details in the folder, filenames and internal metadata and harvesting from the central lists, such as the instrument details list. These are also used as sources of information to construct the necessary data catalogue pages to allow publication and DOI minting.
The catalogue records will be created using a standard template, filled in with the information harvested from file contents, folder information and instrument list details. These catalogue records ensure that the data is properly published and conforms to requirements on their visibility, being picked up by other services such as the NERC central data catalogue and Google dataset search.
If the deployment is related to a specific project, the CEDA Data Scientist will ensure that the data are correctly linked up with other project data in the CEDA Archive and that the catalogue record is also linked to other project records as needed.
Once created, the CEDA Data Scientist looking after the data will get in touch with the instrument scientist to ensure a final check on the details - just in case anything has slipped through (e.g. free text fields have some erroneous information or geographic bounding boxes aren't quite right) or there is a need to capture specific nuances of a deployment that can't be conveyed via the standard workflows.
These catalogue records have links to other related catalogue records, linking them to other background information such as :
- The Project for which the data were collected (i.e. funded to operated for)
- Instrument details - including any historical information about the instrument
- Platform details covering the location or vehicle on which the instrument was deployed
There may also be links to other versions of the data and Dataset Collections that the dataset has been used as part of a larger corpus of associated datasets (e.g. all datasets from a Project).
When the instrument scientist has agreed to the catalogue record and all authors are happy for a DOI to be minted, the CEDA Data Scientist submits the record for internal review (against NERC metadata guidelines) with a request to publish and issue a DOI.
The DOI and associated dataset citation that can be used by users of the data. In turn, such use of the data, cited in the literature using its DOI, allows CEDA to then automatically harvest such citations to aid the data download stats to show data usage.
Viewing download statistics
CEDA collects logs of data that are accessed via the web download and anonymous FTP services. These can be viewed per dataset by selecting the Download Stats link on the catalogue record:
This defaults to the last 12 months, but the date range can be altered for shorter or longer periods as required.
It's also possible to change the 'dataset' archive path to see wider aggregations of data download statistics.
Note, though, these statisitcs do not include any direct access of the data from JASMIN.
Future Plan
CEDA will continue to refine the publication pipeline to add in support for new data standards as they are created.
Further Details
None