Datasets¶
ObsPlus includes a few interesting datasets which are used for testing purposes, but a template with instructions is provided if you would like to create and distribute your own.
The datasets are “lazy” in that all but the most essential information will be downloaded only when some code requests the dataset. This helps keep the size of ObsPlus small, a network connection will be needed the first time each dataset is used. Here are a few examples of things that can be done with datasets:
Dataset basics¶
Loading a dataset only requires knowing its name (and having installed it, more on that later).
[1]:
import obspy
import obsplus
ds = obsplus.load_dataset("crandall_test")
The best way to access the data in a dataset is by using the desired client:
[2]:
wave_client = ds.waveform_client
station_client = ds.station_client
event_client = ds.event_client
These clients behave the same as any Client
in ObsPy:
[3]:
st = wave_client.get_waveforms()
assert isinstance(st, obspy.Stream)
inv = station_client.get_stations()
assert isinstance(inv, obspy.Inventory)
cat = event_client.get_events()
assert isinstance(cat, obspy.Catalog)
A ‘Fetcher’ can be used for “dataset aware” querying.
[4]:
fetcher = ds.get_fetcher()
Each dataset is a just a directory of files whose path is stored as the data_path
attribute:
[5]:
ds.data_path
[5]:
PosixPath('/home/runner/opsdata/crandall_test')
The included data files are found in the source_path
:
[6]:
ds.source_path
[6]:
PosixPath('/home/runner/work/obsplus/obsplus/src/obsplus/datasets/crandall_test')
Datasets can be copied with the copy_dataset
function.
[7]:
from pathlib import Path
obsplus.copy_dataset("crandall_test", ".")
path = Path(".") / "crandall_test"
assert path.exists() and path.is_dir()
Data path¶
By default, all datasets are stored in the user’s home directory in a directory called ‘opsdata’. Each dataset is contained by a subdirectory with the same name as the dataset. The environmental variable OPSDATA_PATH
can be used to change the default dataset location.
Included Test Datasets¶
TA_test: A small dataset with two stations from the TA with channels that have very low sampling rates.
Crandall_test: Event waveforms for the Crandall Canyon Mine collapse and associated aftershocks. The dataset also includes a catalog of the events and a station inventory.
Bingham_test: Event waveforms associated with the Bingham Canyon Landslide, one of the largest anthropogenic landslides ever recorded. The dataset also includes a catalog of the events and a station inventory.
Each of these data sets is accessed via obsplus.load_dataset
function which takes the name of the dataset as the only argument and returns a DataSet
instance. This will take a few minutes if the datasets have not yet been downloaded, otherwise it should be very quick.
[8]:
# cleanup temporary directory
import shutil
from pathlib import Path
path = Path("crandall_test")
if path.exists():
shutil.rmtree(path)