obsplus.WaveBank¶
- class obsplus.WaveBank(base_path='.', path_structure=None, name_structure=None, cache_size=5, format='mseed', ext=None, executor=None)[source]¶
A class to interact with a directory of waveform files.
WaveBank recursively reads each file in a directory and creates an index to allow the files to be efficiently queried.
Implements a superset of the
WaveformClient
interface.- Parameters:
base_path (str) – The path to the directory containing waveform files. If it does not exist an empty directory will be created.
path_structure (str) –
Define the directory structure of the wavebank that will be used to put waveforms into the directory. Characters are separated by /, regardless of operating system. The following words can be used in curly braces as data specific variables:
year, month, day, julday, hour, minute, second, network, station, location, channel, time
example : streams/{year}/{month}/{day}/{network}/{station} If no structure is provided it will be read from the index, if no index exists the default is {net}/{sta}/{chan}/{year}/{month}/{day}
name_structure (str) – The same as path structure but for the file name. Supports the same variables but requires a period as the separation character. The default extension (.mseed) will be added. The default is {time} example : {seedid}.{time}
cache_size (int) – The number of queries to store. Avoids having to read the index of the bank multiple times for queries involving the same start and end times.
format (str) – The expected format for the waveform files. Any format supported by obspy.read is permitted. The default is mseed. Other formats will be tried after the default parser fails.
ext (str or None) – The extension of the waveform files. If provided, only files with this extension will be read.
executor (
Optional
[Executor
]) – An executor with the same interface as concurrent.futures.Executor, the map method of the executor will be used for reading files and updating indices.
Examples
>>> # --- Create a `WaveBank` from a path to a directory with waveform files. >>> import obsplus >>> import obspy >>> waveform_path = obsplus.copy_dataset('default_test').waveform_path >>> # init a WaveBank and index the files. >>> wbank = obsplus.WaveBank(waveform_path).update_index()
>>> # --- Retrieve a stream objects from the bank. >>> # Load all Z component data (dont do this for large datasets!) >>> st = wbank.get_waveforms(channel='*Z') >>> assert isinstance(st, obspy.Stream) and len(st) == 1
>>> # --- Read the index used by WaveBank as a DataFrame. >>> df = wbank.read_index() >>> assert len(df) == 3, 'there should be 3 traces in the bank.'
>>> # --- Get availability of archive as dataframe >>> avail = wbank.get_availability_df()
>>> # --- Get table of gaps in the archive >>> gaps_df = wbank.get_gaps_df()
>>> # --- yield 5 sec contiguous streams with 1 sec overlap (6 sec total) >>> # get input parameters >>> t1, t2 = avail.iloc[0]['starttime'], avail.iloc[0]['endtime'] >>> kwargs = dict(starttime=t1, endtime=t2, duration=5, overlap=1) >>> # init list for storing output >>> out = [] >>> for st in wbank.yield_waveforms(**kwargs): ... out.append(st) >>> assert len(out) == 6
>>> # --- Put a new stream and into the bank >>> # get an event from another dataset, keep track of its id >>> ds = obsplus.load_dataset('bingham_test') >>> query_kwargs = dict (station='NOQ', channel='*Z') >>> new_st = ds.waveform_client.get_waveforms(**query_kwargs) >>> assert len(new_st) >>> wbank.put_waveforms(new_st) >>> st2 = wbank.get_waveforms(channel='*Z') >>> assert len(new_st) + 2
- __init__(base_path='.', path_structure=None, name_structure=None, cache_size=5, format='mseed', ext=None, executor=None)[source]¶
Methods
__init__
([base_path, path_structure, ...])availability
([network, station, location, ...])Get availability for a given group of instruments.
clear_cache
()Clear the index cache if the bank is using one.
ensure_bank_path_exists
([create])Ensure the bank_path exists else raise an BankDoesNotExistError.
get_availability_df
(*args, **kwargs)Return a dataframe specifying the availability of the archive.
get_gaps_df
(*args[, min_gap])Return a dataframe containing an entry for every gap in the archive.
get_progress_bar
([bar])Return a progress bar instance based on bar parameter.
get_segments_df
(*args, **kwargs)Return a dataframe of contiguous segments for the selected channels
get_service_version
()Return the version of obsplus used to create index.
get_uptime_df
(*args, **kwargs)Return a dataframe with uptime stats for selected channels.
get_waveforms
([network, station, location, ...])Get waveforms from the bank.
get_waveforms_bulk
(bulk[, index])Get a large number of waveforms with a bulk request.
load_example_bank
([dataset, path])Create an example bank which is safe to modify.
put_waveforms
(stream[, name, update_index])Add the waveforms in a waveforms to the bank.
read_index
([network, station, location, ...])Return a dataframe of the index, optionally applying filters.
update_index
([bar, paths])Iterate files in bank and add any modified since last update to index.
yield_waveforms
([network, station, ...])Yield time-series segments.
Attributes
bank_path
buffer
columns_no_path
executor
ext
hdf_kwargs
A dict of hdf_kwargs to pass to PyTables
index_columns
index_ints
index_name
index_path
Return the expected path to the index file.
index_str
last_updated
Get the last time (UTC) that the bank was updated.
last_updated_timestamp
Return the last modified time stored in the index, else None.
metadata_columns
min_itemsize
name_structure
namespace
path_structure