obsplus.datasets.dataset.DataSet¶
- class obsplus.datasets.dataset.DataSet(base_path=None)[source]¶
Abstract Base Class for downloading and serving datasets.
This is not intended to be used directly, but rather through subclassing.
- Parameters:
base_path – The path to which the dataset will be saved.
- data_path¶
The path containing the data. By default it is base_path / name.
- source_path¶
The path which contains the original files included in the dataset before download. By default this is found in the same directory as the dataset’s code (.py) file in a folder with the same name as the dataset.
Notes
Importantly, each dataset references two directories, the source_path and data_path. The source_path contains all data included within the dataset and should not be altered. The data_path has a copy of everything in the source_path, plus the files created during the downloading process.
The base_path (the parent of data_path) is resolved for each dataset using the following priorities:
The base_path provided to Dataset’s __init__ method.
.data_path.txt file stored in the data source
An environmental name OPSDATA_PATH
The opsdata_path variable from obsplus.constants
By default the data will be downloaded to the user’s home directory in a folder called “opsdata”, but again, this is easily changed by setting the OPSDATA_PATH environmental variable.
Methods
__init__
([base_path])download and load data into memory.
check_hashes
([check_hash])Check that the files are all there and have the correct Hashes.
Check the version of the dataset.
copy
([deep])Return a copy of the dataset.
copy_to
([destination])Copy the dataset to a destination.
create_sha256_hash
([path, hidden])Create a sha256 hash of the dataset's data files.
Delete the datafiles of a dataset.
Method to ensure the events have been downloaded.
Method to ensure inventories have been downloaded.
Method to ensure waveforms have been downloaded.
get_fetcher
(**kwargs)Return a Fetcher from the data.
load_dataset
(name[, silent])Get a loaded dataset.
Code to run after any downloads.
Code to run before any downloads.
read_data_version
([path])Read the data version from disk.
write_version
([path])Write the version string to disk.
Attributes
Return a list of top-level files associated with the dataset.
Return a path to where the dataset's data was/will be downloaded.
A cached property for an event client
Return the path to the events.
Returns True if event data need to be downloaded.
Name of the dataset
Return a path to the directory where the data files included with the dataset live.
A cached property for a station client
Return the path to the stations.
Returns True if station data need to be downloaded.
Dataset version.
Return a tuple of the version string.
A cached property for a waveform client
Return the path to the waveforms.
Returns True if waveform data need to be downloaded.