Data Structure


Data are available for download at native resolution, high resolution (1 Hz), or in temporal bins.

Native Unmodified Data: These unmodified data messages are provided at the native resolution as collected by the sensor. The time of collection is prepended to each data message. No quality control has been performed.
High Resolution Data: These data are provided at high resolution (1 Hz or below). Timestamps and data frequencies vary between sensors. Preliminary quality flags are included.
Binned Data: These data are provided in temporal (e.g. one-minute) bins. The mean and accompanying statistics are provided. Preliminary quality flags are included.
 
Native Unmodified Data

All sensor data received by CORIOLIX are archived as ASCII flat files, one file per sensor per day. The unmodified raw messages (strings) are stored at the native resolution collected by the sensor. The format of the strings varies by sensor. Each message string is prepended with the UTC timestamp of receipt. No quality control has been performed on these data. These datasets are delivered to R2R after every cruise for long-term achive. These native resolution data files are also available to end-users for download.
 
High Resolution (1 Hz or below) Data
Data Download Page: High Resolution Data Download

For real-time applications, a subset of sensor data are extracted and stored within CORIOLIX. These searchable datasets are available at 1 Hz resolution or below. The collection timestamps and data values are unmodified. Preliminary quality control flags are provided as an additional field.
 
Binned Data
Data Download Page: Binned Data Download

Binning Method

Data are binned using an (unweighted) mean over a set time interval. Statistics are also captured (std, min, max, num, spotval, median) and made available to the end-user.

Circular data (such as heading or wind direction) are averaged using circular mean and circular standard deviation calculations. For example, the circular average of 355 and 5 is 0 not 180.

Vector data (such as true winds) are averaged using vector averaging. First, the components (u,v,w) are each averaged using a simple mean. These mean components are then used to derive the mean direction and mean magnitude:
mean_direction = rad2deg(atan2(mean_u, mean_v)) + 180
mean_magnitude = sqrt(mean_u^2 + mean_v^2)


Binning Time Interval

All parameters are binned by default to 1-minute bins, centered on "nice" minute timestamps (12:03:00, 12:04:00, etc). On request prior to a cruise, binned data at a second (custom) time interval may also be generated for that cruise. Data collected less frequently than every minute (e.g. ceilometer data are collected every six minutes) are populated with NaNs between data points. The timestamp provided with each bin is the center time of the bin.


Date Range

Binned data are stored and available only for the past X days. The number of days can be configured from 10 - 365 days (90 days is recommended) on the Data/Access/Archive page.


Binned Data Quality

Three binned products are provided: "All", "Best", and "SAMOS". The distinction is the quality of data used to calculate the bin.

All Bins (a)
The "all" binned product bins all data points within the time interval regardless of quality. Quality flags are ignored.

Best Bins (b)
Best bins are derived only from data that have flag values of 0, 1, or 2 (i.e. no suspect or failed data). For example, if there are 10 data points within the binning time interval, and one of them has a flag set to "fail", only the remaining 9 data points are used to calculate the bin value.

Special note: QA/QC happens at various stages throughout the data lifecycle. "Best" bins are regenerated (a) at various stages of the automated QA/QC process, (b) on demand by the sensor technician during the manual QA/QC process, and (c) at the end of the cruise. As a consequence, the "best" binned values may change throughout the cruise. Data earlier in the lifecycle will, by definition, have fewer opportunities for flags to be set to "suspect" or "fail".

SAMOS Bins (c)
The third binned product, "SAMOS", uses only a subset of flags to determine whether data should be included in the bin or not. Data must pass only the following four quality tests to be included in the bin:
(1) Gap Test
(2) Syntax Test
(3) Gross Range Test
(4) Global Range Test
SAMOS performs further quality assessment on these binned data.


Binned Quality Flags

Quality flags representing the binned data are a composite of all of the flags from that bin. For each flag position, the largest value is used in the composite binned flag.

For example:
Individual Data Flags
111144000022222222111100000000
121122000033222222114400000000
Composite Flag
121144000033222222114400000000

Special note: QA/QC happens at various stages throughout the data lifecycle. As a consequence, the composite quality flags may change throughout the cruise.


Binned Data Structure

A file with a single day of one-minute binned data will contain 1440 rows (60 minutes * 24 hours).

Each downloaded binned data file contains the following columns:
Datetime: the center timestamp of the bin
Latitude: the latitude closest in time to the bin center timestamp (a spot value)
Longitude: the longitude closest in time to the bin center timestamp (a spot value)
Parameter Data: a JSON dictionary containing the binned data and statistics
Multiple parameters may be included in a single file (one column per parameter).

Binned data and statistics for a given parameter and time bin are stored as a JSON dictionary. An example follows below:

{ "a": [ 359.017, 5.259, 0.1, 359.9, 120, 359.5, -999 ], "b": [ 358.982, 5.321, 1, 359.9, 117, 359.5, -999 ], "c": [ 358.982, 5.321, 1, 359.9, 117, 359.5, -999 ], "fa": "224122222222222222222222222222", "fb": "221122222222222222222222222222", "fc": "221122222222222222222222222222", "sa": "2021-05-06 22:12:59Z", "sb": "2021-05-06 22:12:59Z", "sc": "2021-05-06 22:12:59Z" }

where:
"a": all array (mean, std, min, max, num, spot, median)
"b": best array (mean, std, min, max, num, spot, median)
"c": samos array (mean, std, min, max, num, spot, median)
"fa": all flag (combination of flags from the data used to derive the "all" bins)
"fb": best flags (combination of flags from the data used to derive the "best" bin)
"fc": samos flag (combination of flags from the data used to derive the "samos" bin)
"sa": spot datetime (all)
"sb": spot datetime (best)
"sc": spot datetime (samos)

Spot values are unmodified data values that were collected closest in time to the center time of the bin.

 
 
 
OSU logo NSF logo RCRV logo
RCRV Datapresence and Engineering Support Center
Oregon State University
Corvallis, OR 97331, USA