obsplus.datasets.dataset module

Module for loading, (and downloading) data sets.

class obsplus.datasets.dataset.DataSet(base_path=None)[source]

Bases: ABC

Abstract Base Class for downloading and serving datasets.

This is not intended to be used directly, but rather through subclassing.


base_path – The path to which the dataset will be saved.


The path containing the data. By default it is base_path / name.


The path which contains the original files included in the dataset before download. By default this is found in the same directory as the dataset’s code (.py) file in a folder with the same name as the dataset.


Importantly, each dataset references two directories, the source_path and data_path. The source_path contains all data included within the dataset and should not be altered. The data_path has a copy of everything in the source_path, plus the files created during the downloading process.

The base_path (the parent of data_path) is resolved for each dataset using the following priorities:

  1. The base_path provided to Dataset’s __init__ method.

  2. .data_path.txt file stored in the data source

  3. An environmental name OPSDATA_PATH

  4. The opsdata_path variable from obsplus.constants

By default the data will be downloaded to the user’s home directory in a folder called “opsdata”, but again, this is easily changed by setting the OPSDATA_PATH environmental variable.


Check that the files are all there and have the correct Hashes.


check_hash – If True check the hash of the files.


Check the version of the dataset.

Verifies the version string in the dataset class definition matches the one saved on disk. Returns True if all is well else raises a DataVersionError.


path – Expected path of the version file.


DataVersionError – If any version problems are discovered.

Return type:



Return a copy of the dataset.


deep – If True deep copy the objects attached to the dataset.

Return type:

TypeVar(DataSetType, bound= DataSet)


This only copies data in memory, not on disk. If you plan to make any changes to the dataset’s on disk resources please use :method:`~obsplus.Dataset.copy_to`.


Copy the dataset to a destination.

If the destination already exists simply do nothing.


destination (Union[str, Path, None]) – The destination to copy the dataset. It will be created if it doesn’t exist. If None is provided use tmpfile to create a temporary directory.

Return type:

A new dataset object which refers to the copied files.

create_sha256_hash(path=None, hidden=False)[source]

Create a sha256 hash of the dataset’s data files.

The output is stored in a simple json file. Keys are paths (relative to dataset base path) and values are files hashes.

If you want to update/create the hash file in the dataset’s source this can be done by passing the dataset’s source_path as the path argument.

  • path – The path to which the hash data is saved. If None use data_path.

  • hidden – If True also include hidden files.

Return type:


property data_files: tuple[Path, ...]

Return a list of top-level files associated with the dataset.

Hidden files are ignored.

data_loaded = False
property data_path: Path

Return a path to where the dataset’s data was/will be downloaded.


Delete the datafiles of a dataset.

This will force the data to be re-copied from the source files and download logic to be run.


Method to ensure the events have been downloaded.

Events should be written in an obspy-readable format to self.event_path. If not implemented this method will create an empty directory.

Return type:



Method to ensure inventories have been downloaded.

Station data should be written in an obspy-readable format to self.station_path. Since there is not yet a functional StationBank, this method must be implemented by subclass.

Return type:



Method to ensure waveforms have been downloaded.

Waveforms should be written in an obspy-readable format to self.waveform_path.

Return type:


property event_client: EventClient | None

A cached property for an event client

property event_path: Path

Return the path to the events.

property events_need_downloading: bool

Returns True if event data need to be downloaded.


Return a Fetcher from the data.

kwargs are passed to Fetcher’s constructor. See its documentation for acceptable kwargs.

Return type:


classmethod load_dataset(name, silent=False)[source]

Get a loaded dataset.

Will ensure all files are downloaded and the appropriate data are loaded into memory.


name (str | DataSet) – The name of the dataset to load or a DataSet object. If a DataSet object is passed a copy of it will be returned.

Return type:

TypeVar(DataSetType, bound= DataSet)


>>> # --- Load an example dataset for testing
>>> import obsplus
>>> ds = obsplus.load_dataset('default_test')
>>> # If you plan to make changes to the dataset be sure to copy it first
>>> # The following will copy all files in the dataset to a tmpdir
>>> ds2 = obsplus.copy_dataset('default_test')
>>> # --- Use dataset clients to load waveforms, stations, and events
>>> cat = ds.event_client.get_events()
>>> st = ds.waveform_client.get_waveforms()
>>> inv = ds.station_client.get_stations()
>>> # --- get a fetcher for more "dataset aware" querying
>>> fetcher = ds.get_fetcher()
abstract property name: str

Name of the dataset


Code to run after any downloads.


Code to run before any downloads.


Read the data version from disk.

Return a 3 length tuple from the semantic version string (of the form xx.yy.zz). Raise a DataVersionError if not found.

Return type:


property source_path: Path

Return a path to the directory where the data files included with the dataset live.

property station_client: StationClient | None

A cached property for a station client

property station_path: Path

Return the path to the stations.

property stations_need_downloading: bool

Returns True if station data need to be downloaded.

abstract property version: str

Dataset version. Should be a str of the form x.y.z

property version_tuple: tuple[int, int, int]

Return a tuple of the version string.

property waveform_client: WaveformClient | None

A cached property for a waveform client

property waveform_path: Path

Return the path to the waveforms.

property waveforms_need_downloading: bool

Returns True if waveform data need to be downloaded.


Write the version string to disk.

obsplus.datasets.dataset.load_dataset(name, silent=False)

Get a loaded dataset.

Will ensure all files are downloaded and the appropriate data are loaded into memory.


name (str | DataSet) – The name of the dataset to load or a DataSet object. If a DataSet object is passed a copy of it will be returned.

Return type:

TypeVar(DataSetType, bound= DataSet)


>>> # --- Load an example dataset for testing
>>> import obsplus
>>> ds = obsplus.load_dataset('default_test')
>>> # If you plan to make changes to the dataset be sure to copy it first
>>> # The following will copy all files in the dataset to a tmpdir
>>> ds2 = obsplus.copy_dataset('default_test')
>>> # --- Use dataset clients to load waveforms, stations, and events
>>> cat = ds.event_client.get_events()
>>> st = ds.waveform_client.get_waveforms()
>>> inv = ds.station_client.get_stations()
>>> # --- get a fetcher for more "dataset aware" querying
>>> fetcher = ds.get_fetcher()