obsplus.bank.wavebank module

A local database for waveform formats.

class obsplus.bank.wavebank.WaveBank(base_path='.', path_structure=None, name_structure=None, cache_size=5, format='mseed', ext=None, executor=None)[source]

Bases: _Bank

A class to interact with a directory of waveform files.

WaveBank recursively reads each file in a directory and creates an index to allow the files to be efficiently queried.

Implements a superset of the WaveformClient interface.

Parameters:
  • base_path (str) – The path to the directory containing waveform files. If it does not exist an empty directory will be created.

  • path_structure (str) –

    Define the directory structure of the wavebank that will be used to put waveforms into the directory. Characters are separated by /, regardless of operating system. The following words can be used in curly braces as data specific variables:

    year, month, day, julday, hour, minute, second, network, station, location, channel, time

    example : streams/{year}/{month}/{day}/{network}/{station} If no structure is provided it will be read from the index, if no index exists the default is {net}/{sta}/{chan}/{year}/{month}/{day}

  • name_structure (str) – The same as path structure but for the file name. Supports the same variables but requires a period as the separation character. The default extension (.mseed) will be added. The default is {time} example : {seedid}.{time}

  • cache_size (int) – The number of queries to store. Avoids having to read the index of the bank multiple times for queries involving the same start and end times.

  • format (str) – The expected format for the waveform files. Any format supported by obspy.read is permitted. The default is mseed. Other formats will be tried after the default parser fails.

  • ext (str or None) – The extension of the waveform files. If provided, only files with this extension will be read.

  • executor (Optional[Executor]) – An executor with the same interface as concurrent.futures.Executor, the map method of the executor will be used for reading files and updating indices.

Examples

>>> # --- Create a `WaveBank` from a path to a directory with waveform files.
>>> import obsplus
>>> import obspy
>>> waveform_path = obsplus.copy_dataset('default_test').waveform_path
>>> # init a WaveBank and index the files.
>>> wbank = obsplus.WaveBank(waveform_path).update_index()
>>> # --- Retrieve a stream objects from the bank.
>>> # Load all Z component data (dont do this for large datasets!)
>>> st = wbank.get_waveforms(channel='*Z')
>>> assert isinstance(st, obspy.Stream) and len(st) == 1
>>> # --- Read the index used by WaveBank as a DataFrame.
>>> df = wbank.read_index()
>>> assert len(df) == 3, 'there should be 3 traces in the bank.'
>>> # --- Get availability of archive as dataframe
>>> avail = wbank.get_availability_df()
>>> # --- Get table of gaps in the archive
>>> gaps_df = wbank.get_gaps_df()
>>> # --- yield 5 sec contiguous streams with 1 sec overlap (6 sec total)
>>> # get input parameters
>>> t1, t2 = avail.iloc[0]['starttime'], avail.iloc[0]['endtime']
>>> kwargs = dict(starttime=t1, endtime=t2, duration=5, overlap=1)
>>> # init list for storing output
>>> out = []
>>> for st in wbank.yield_waveforms(**kwargs):
...     out.append(st)
>>> assert len(out) == 6
>>> # --- Put a new stream and into the bank
>>> # get an event from another dataset, keep track of its id
>>> ds = obsplus.load_dataset('bingham_test')
>>> query_kwargs = dict (station='NOQ', channel='*Z')
>>> new_st = ds.waveform_client.get_waveforms(**query_kwargs)
>>> assert len(new_st)
>>> wbank.put_waveforms(new_st)
>>> st2 = wbank.get_waveforms(channel='*Z')
>>> assert len(new_st) + 2
availability(network=None, station=None, location=None, channel=None)[source]

Get availability for a given group of instruments.

Parameters:
  • network (Optional[str]) – The network code.

  • station (Optional[str]) – The station code.

  • location (Optional[str]) – The location code

  • channel (Optional[str]) – The chanel code.

Return type:

List[Tuple[str, str, str, str, UTCDateTime, UTCDateTime]]

buffer = numpy.timedelta64(1000000000,'ns')
columns_no_path = ('network', 'station', 'location', 'channel', 'starttime', 'endtime', 'sampling_period')
get_availability_df(*args, **kwargs)[source]

Return a dataframe specifying the availability of the archive.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

Return type:

DataFrame

get_gaps_df(*args, min_gap=None, **kwargs)[source]

Return a dataframe containing an entry for every gap in the archive.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

  • min_gap (Union[float, timedelta64, None]) –

    The minimum gap to report in seconds or as a timedelta64.

    If None, use 1.5 x sampling rate for each channel.

Return type:

DataFrame

get_segments_df(*args, **kwargs)[source]

Return a dataframe of contiguous segments for the selected channels

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

  • min_gap – The minimum gap to report in seconds or as a timedelta64. If None, use 1.5 x sampling rate for each channel.

Return type:

DataFrame

get_uptime_df(*args, **kwargs)[source]

Return a dataframe with uptime stats for selected channels.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

  • min_gap – The minimum gap to report in seconds or as a timedelta64. If None, use 1.5 x sampling rate for each channel.

Return type:

DataFrame

get_waveforms(network=None, station=None, location=None, channel=None, starttime=None, endtime=None)[source]

Get waveforms from the bank.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

Return type:

Stream

Notes

All string parameters can use posix style matching with * and ? chars. All datapoints between selected starttime and endtime will be returned. Consequently there may be gaps in the returned stream.

get_waveforms_bulk(bulk, index=None, **kwargs)[source]

Get a large number of waveforms with a bulk request.

Parameters:
  • bulk (Union[Sequence[Tuple[str, str, str, str, Union[str, UTCDateTime, float, datetime64, Timestamp], Union[str, UTCDateTime, float, datetime64, Timestamp]]], DataFrame]) – A list of any number of lists containing the following: (network, station, location, channel, starttime, endtime).

  • index (Optional[DataFrame]) – A dataframe returned by read_index. Enables calling code to only read the index from disk once for repetitive calls.

Return type:

Stream

property hdf_kwargs: dict

A dict of hdf_kwargs to pass to PyTables

index_columns = ('network', 'station', 'location', 'channel', 'starttime', 'endtime', 'sampling_period', 'path')
index_ints = ('starttime', 'endtime', 'sampling_period')
index_str = ('network', 'station', 'location', 'channel')
property last_updated_timestamp: float | None

Return the last modified time stored in the index, else None.

metadata_columns = ['last_updated', 'path_structure', 'name_structure']
min_itemsize = {'channel': 8, 'location': 8, 'network': 8, 'path': 79, 'station': 8}
namespace = '/waveforms'
put_waveforms(stream, name=None, update_index=True)[source]

Add the waveforms in a waveforms to the bank.

Parameters:
  • stream (Union[Stream, Trace]) – An obspy waveforms object to add to the bank

  • name – Name of file, if None it will be determined based on contents

  • update_index – Flag to indicate whether or not to update the waveform index after writing the new events. Default is True.

read_index(network=None, station=None, location=None, channel=None, starttime=None, endtime=None, **kwargs)[source]

Return a dataframe of the index, optionally applying filters.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

  • kwargs – kwargs are passed to pandas.read_hdf function

Return type:

DataFrame

update_index(bar=None, paths=None)[source]

Iterate files in bank and add any modified since last update to index.

Parameters:
  • bar

    This parameter controls if a progress bar will be used for this function call. Its behavior is dependent on the bar parameter:

    False - Don’t use a progress bar None - Use the default progress bar ProgressBar - a custom implementation of progress bar is used.

    If a custom progress bar is to be used, it must have an update and finish method.

  • sub_paths – A str, or iterable of str, specifying subdirectories (relative to bank path) to allow updating only files in specific directories of the bank. This is useful for large banks which have files added to them in predictable locations. However, if other files are added outside of these locations they may not get indexed as the banks timestamp indicating the last time of indexing will still get updated.

yield_waveforms(network=None, station=None, location=None, channel=None, starttime=None, endtime=None, duration=3600.0, overlap=None)[source]

Yield time-series segments.

Parameters:
  • network (str) – The network code

  • station (str) – The station code

  • location (str) – The location code

  • channel (str) – The channel code

  • starttime (float or obspy.UTCDateTime) – The desired starttime of the waveforms

  • endtime (float or obspy.UTCDateTime) – The desired endtime of the waveforms

  • duration (float) – The duration of the streams to yield. All channels selected channels will be included in the waveforms.

  • overlap (float) – If duration is used, the amount of overlap in yielded streams, added to the end of the waveforms.

Return type:

Stream

Notes

All string parameters can use posix style matching with * and ? chars.

Total duration of yielded streams = duration + overlap.