obsplus.datasets.dataset.DataSet

class obsplus.datasets.dataset.DataSet(base_path=None)[source]

Abstract Base Class for downloading and serving datasets.

This is not intended to be used directly, but rather through subclassing.

Parameters:

base_path – The path to which the dataset will be saved.

data_path

The path containing the data. By default it is base_path / name.

source_path

The path which contains the original files included in the dataset before download. By default this is found in the same directory as the dataset’s code (.py) file in a folder with the same name as the dataset.

Notes

Importantly, each dataset references two directories, the source_path and data_path. The source_path contains all data included within the dataset and should not be altered. The data_path has a copy of everything in the source_path, plus the files created during the downloading process.

The base_path (the parent of data_path) is resolved for each dataset using the following priorities:

  1. The base_path provided to Dataset’s __init__ method.

  2. .data_path.txt file stored in the data source

  3. An environmental name OPSDATA_PATH

  4. The opsdata_path variable from obsplus.constants

By default the data will be downloaded to the user’s home directory in a folder called “opsdata”, but again, this is easily changed by setting the OPSDATA_PATH environmental variable.

__init__(base_path=None)[source]

download and load data into memory.

Methods

__init__([base_path])

download and load data into memory.

check_hashes([check_hash])

Check that the files are all there and have the correct Hashes.

check_version()

Check the version of the dataset.

copy([deep])

Return a copy of the dataset.

copy_to([destination])

Copy the dataset to a destination.

create_sha256_hash([path, hidden])

Create a sha256 hash of the dataset's data files.

delete_data_directory()

Delete the datafiles of a dataset.

download_events()

Method to ensure the events have been downloaded.

download_stations()

Method to ensure inventories have been downloaded.

download_waveforms()

Method to ensure waveforms have been downloaded.

get_fetcher(**kwargs)

Return a Fetcher from the data.

load_dataset(name[, silent])

Get a loaded dataset.

post_download_hook()

Code to run after any downloads.

pre_download_hook()

Code to run before any downloads.

read_data_version([path])

Read the data version from disk.

write_version([path])

Write the version string to disk.

Attributes

data_files

Return a list of top-level files associated with the dataset.

data_loaded

data_path

Return a path to where the dataset's data was/will be downloaded.

event_client

A cached property for an event client

event_path

Return the path to the events.

events_need_downloading

Returns True if event data need to be downloaded.

name

Name of the dataset

source_path

Return a path to the directory where the data files included with the dataset live.

station_client

A cached property for a station client

station_path

Return the path to the stations.

stations_need_downloading

Returns True if station data need to be downloaded.

version

Dataset version.

version_tuple

Return a tuple of the version string.

waveform_client

A cached property for a waveform client

waveform_path

Return the path to the waveforms.

waveforms_need_downloading

Returns True if waveform data need to be downloaded.