obsplus.WaveBank

class obsplus.WaveBank(base_path='.', path_structure=None, name_structure=None, cache_size=5, format='mseed', ext=None, executor=None)[source]

A class to interact with a directory of waveform files.

WaveBank recursively reads each file in a directory and creates an index to allow the files to be efficiently queried.

Implements a superset of the WaveformClient interface.

Parameters:
  • base_path (str) – The path to the directory containing waveform files. If it does not exist an empty directory will be created.

  • path_structure (str) –

    Define the directory structure of the wavebank that will be used to put waveforms into the directory. Characters are separated by /, regardless of operating system. The following words can be used in curly braces as data specific variables:

    year, month, day, julday, hour, minute, second, network, station, location, channel, time

    example : streams/{year}/{month}/{day}/{network}/{station} If no structure is provided it will be read from the index, if no index exists the default is {net}/{sta}/{chan}/{year}/{month}/{day}

  • name_structure (str) – The same as path structure but for the file name. Supports the same variables but requires a period as the separation character. The default extension (.mseed) will be added. The default is {time} example : {seedid}.{time}

  • cache_size (int) – The number of queries to store. Avoids having to read the index of the bank multiple times for queries involving the same start and end times.

  • format (str) – The expected format for the waveform files. Any format supported by obspy.read is permitted. The default is mseed. Other formats will be tried after the default parser fails.

  • ext (str or None) – The extension of the waveform files. If provided, only files with this extension will be read.

  • executor (Optional[Executor]) – An executor with the same interface as concurrent.futures.Executor, the map method of the executor will be used for reading files and updating indices.

Examples

>>> # --- Create a `WaveBank` from a path to a directory with waveform files.
>>> import obsplus
>>> import obspy
>>> waveform_path = obsplus.copy_dataset('default_test').waveform_path
>>> # init a WaveBank and index the files.
>>> wbank = obsplus.WaveBank(waveform_path).update_index()
>>> # --- Retrieve a stream objects from the bank.
>>> # Load all Z component data (dont do this for large datasets!)
>>> st = wbank.get_waveforms(channel='*Z')
>>> assert isinstance(st, obspy.Stream) and len(st) == 1
>>> # --- Read the index used by WaveBank as a DataFrame.
>>> df = wbank.read_index()
>>> assert len(df) == 3, 'there should be 3 traces in the bank.'
>>> # --- Get availability of archive as dataframe
>>> avail = wbank.get_availability_df()
>>> # --- Get table of gaps in the archive
>>> gaps_df = wbank.get_gaps_df()
>>> # --- yield 5 sec contiguous streams with 1 sec overlap (6 sec total)
>>> # get input parameters
>>> t1, t2 = avail.iloc[0]['starttime'], avail.iloc[0]['endtime']
>>> kwargs = dict(starttime=t1, endtime=t2, duration=5, overlap=1)
>>> # init list for storing output
>>> out = []
>>> for st in wbank.yield_waveforms(**kwargs):
...     out.append(st)
>>> assert len(out) == 6
>>> # --- Put a new stream and into the bank
>>> # get an event from another dataset, keep track of its id
>>> ds = obsplus.load_dataset('bingham_test')
>>> query_kwargs = dict (station='NOQ', channel='*Z')
>>> new_st = ds.waveform_client.get_waveforms(**query_kwargs)
>>> assert len(new_st)
>>> wbank.put_waveforms(new_st)
>>> st2 = wbank.get_waveforms(channel='*Z')
>>> assert len(new_st) + 2
__init__(base_path='.', path_structure=None, name_structure=None, cache_size=5, format='mseed', ext=None, executor=None)[source]

Methods

__init__([base_path, path_structure, ...])

availability([network, station, location, ...])

Get availability for a given group of instruments.

clear_cache()

Clear the index cache if the bank is using one.

ensure_bank_path_exists([create])

Ensure the bank_path exists else raise an BankDoesNotExistError.

get_availability_df(*args, **kwargs)

Return a dataframe specifying the availability of the archive.

get_gaps_df(*args[, min_gap])

Return a dataframe containing an entry for every gap in the archive.

get_progress_bar([bar])

Return a progress bar instance based on bar parameter.

get_segments_df(*args, **kwargs)

Return a dataframe of contiguous segments for the selected channels

get_service_version()

Return the version of obsplus used to create index.

get_uptime_df(*args, **kwargs)

Return a dataframe with uptime stats for selected channels.

get_waveforms([network, station, location, ...])

Get waveforms from the bank.

get_waveforms_bulk(bulk[, index])

Get a large number of waveforms with a bulk request.

load_example_bank([dataset, path])

Create an example bank which is safe to modify.

put_waveforms(stream[, name, update_index])

Add the waveforms in a waveforms to the bank.

read_index([network, station, location, ...])

Return a dataframe of the index, optionally applying filters.

update_index([bar, paths])

Iterate files in bank and add any modified since last update to index.

yield_waveforms([network, station, ...])

Yield time-series segments.

Attributes

bank_path

buffer

columns_no_path

executor

ext

hdf_kwargs

A dict of hdf_kwargs to pass to PyTables

index_columns

index_ints

index_name

index_path

Return the expected path to the index file.

index_str

last_updated

Get the last time (UTC) that the bank was updated.

last_updated_timestamp

Return the last modified time stored in the index, else None.

metadata_columns

min_itemsize

name_structure

namespace

path_structure