WaveBank¶

WaveBank is an in-process database for accessing seismic time-series data. Any directory structure containing ObsPy-readable waveforms can be used as the data source. WaveBank uses a simple indexing scheme and the Hierarchical Data Format to keep track of each Trace in the directory. Without WaveBank (or another similar program) applications have implement their own data organization/access logic which is tedious and clutters up application code. WaveBank provides a better way.

Load Example Data¶

This tutorial will demonstrate the use of WaveBank on two different obsplus datasets.

The first dataset, crandall canyon, only has event waveform files. The second only has continuous data from two TA stations. We start by loading these datasets, making a temporary copy, and getting a path to their waveform directories.

[1]:

%%capture
import obsplus

# make sure datasets are downloaded and copy them to temporary
# directories to make sure no accidental changes are made
crandall_dataset = obsplus.load_dataset('crandall_test').copy()
ta_dataset = obsplus.load_dataset('ta_test').copy()

# get path to waveform directories
crandall_path = crandall_dataset.waveform_path
ta_path = ta_dataset.waveform_path

[2]:

crandall_path

[2]:

PosixPath('/home/runner/opsdata/crandall_test/waveforms')

Create a WaveBank object¶

To create a WaveBank instance simply pass the class a path to the waveform directory.

[3]:

bank = obsplus.WaveBank(crandall_path)

Utilizing the udpate_index method on the bank ensures the index is up-to-date. This will iterate through all files that are timestamped later than the last time update_index was run.

Note: If the index has not yet been created or new files have been added, update_index needs to be called.

[4]:

bank.update_index()

[4]:

WaveBank(base_path=/home/runner/opsdata/crandall_test/waveforms)

Get waveforms¶

The files can be retrieved from the directory with the get_waveforms method. This method has the same signature as the ObsPy client get_waveform methods so they can be used interchangeably:

[5]:

import obspy

t1 = obspy.UTCDateTime('2007-08-06T01-44-48')
t2 = t1 + 60
st = bank.get_waveforms(starttime=t1, endtime=t2)
print (st[:5])  # print first 5 traces

5 Trace(s) in Stream:
TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples
TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples
TA.O15A..BHZ | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples
TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples
TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples

WaveBank can filter on channels, locations, stations, networks, etc. using linux style search strings or regex.

[6]:

st2 = bank.get_waveforms(network='UU', starttime=t1, endtime=t2)

# ensure only UU traces were returned
for tr in st2:
    assert tr.stats.network == 'UU'

print(st2[:5])  # print first 5 traces

5 Trace(s) in Stream:
UU.CTU..HHE | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples
UU.CTU..HHN | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples
UU.CTU..HHZ | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples
UU.MPU..HHE | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples
UU.MPU..HHN | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples

[7]:

st = bank.get_waveforms(starttime=t1, endtime=t2, station='O1??', channel='BH[NE]')

# test returned traces
for tr in st:
    assert tr.stats.starttime >= t1 - .00001
    assert tr.stats.endtime <= t2 + .00001
    assert tr.stats.station.startswith('O1')
    assert tr.stats.channel.startswith('BH')
    assert tr.stats.channel[-1] in {'N', 'E'}

print(st)

6 Trace(s) in Stream:
TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples
TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples
TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples
TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples
TA.O18A..BHE | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples
TA.O18A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples

WaveBank also has a get_waveforms_bulk method for efficiently retrieving a large number of streams.

[8]:

args = [  # in practice this list may contain hundreds or thousands of requests
    ('TA', 'O15A', '', 'BHZ', t1 - 5, t2 - 5,),
    ('UU', 'SRU', '', 'HHZ', t1, t2,),
]
st = bank.get_waveforms_bulk(args)
print(st )

2 Trace(s) in Stream:
TA.O15A..BHZ | 2007-08-06T01:44:42.999998Z - 2007-08-06T01:45:42.999998Z | 40.0 Hz, 2401 samples
UU.SRU..HHZ  | 2007-08-06T01:44:47.995000Z - 2007-08-06T01:45:47.995000Z | 100.0 Hz, 6001 samples

Yield waveforms¶

The Bank class also provides a generator for iterating large amounts of continuous waveforms. The following example shows how to get streams of one hour duration with a minute of overlap between the slices.

The first step is to create a bank on a dataset which has continuous data. The example below will use the TA dataset.

[9]:

ta_bank = obsplus.WaveBank(ta_path)

[10]:

# get a few hours of kemmerer data
ta_t1 = obspy.UTCDateTime('2007-02-15')
ta_t2 = obspy.UTCDateTime('2007-02-16')

for st in ta_bank.yield_waveforms(starttime=ta_t1, endtime=ta_t2, duration=3600, overlap=60):
    print (f'got {len(st)} streams from {st[0].stats.starttime} to {st[0].stats.endtime}')

got 6 streams from 2007-02-15T00:00:09.999998Z to 2007-02-15T01:00:59.999998Z
got 6 streams from 2007-02-15T00:59:59.999998Z to 2007-02-15T02:00:59.999998Z
got 6 streams from 2007-02-15T01:59:59.999998Z to 2007-02-15T03:00:59.999998Z

got 6 streams from 2007-02-15T02:59:59.999998Z to 2007-02-15T04:00:59.999998Z

got 6 streams from 2007-02-15T03:59:59.999998Z to 2007-02-15T05:00:59.999998Z

got 6 streams from 2007-02-15T04:59:59.999998Z to 2007-02-15T06:00:59.999998Z
got 6 streams from 2007-02-15T05:59:59.999998Z to 2007-02-15T07:00:59.999998Z
got 6 streams from 2007-02-15T06:59:59.999998Z to 2007-02-15T08:00:59.999998Z
got 6 streams from 2007-02-15T07:59:59.999998Z to 2007-02-15T09:00:59.999998Z

got 6 streams from 2007-02-15T08:59:59.999998Z to 2007-02-15T10:00:59.999998Z
got 6 streams from 2007-02-15T09:59:59.999998Z to 2007-02-15T11:00:59.999998Z
got 6 streams from 2007-02-15T10:59:59.999998Z to 2007-02-15T12:00:59.999998Z
got 6 streams from 2007-02-15T11:59:59.999998Z to 2007-02-15T13:00:59.999998Z
got 6 streams from 2007-02-15T12:59:59.999998Z to 2007-02-15T14:00:59.999998Z

got 6 streams from 2007-02-15T13:59:59.999998Z to 2007-02-15T15:00:59.999998Z
got 6 streams from 2007-02-15T14:59:59.999998Z to 2007-02-15T16:00:59.999998Z
got 6 streams from 2007-02-15T15:59:59.999998Z to 2007-02-15T17:00:59.999998Z

got 6 streams from 2007-02-15T16:59:59.999998Z to 2007-02-15T18:00:59.999998Z

got 6 streams from 2007-02-15T17:59:59.999998Z to 2007-02-15T19:00:59.999998Z
got 6 streams from 2007-02-15T18:59:59.999998Z to 2007-02-15T20:00:59.999998Z
got 6 streams from 2007-02-15T19:59:59.999998Z to 2007-02-15T21:00:59.999998Z
got 6 streams from 2007-02-15T20:59:59.999998Z to 2007-02-15T22:00:59.999998Z
got 6 streams from 2007-02-15T21:59:59.999998Z to 2007-02-15T23:00:59.999998Z

got 6 streams from 2007-02-15T22:59:59.999998Z to 2007-02-16T00:00:59.999998Z

Put waveforms¶

Files can be added to the bank by passing a stream or trace to the bank.put_waveforms method. WaveBank does not merge files so overlap in data may occur if care is not taken.

[11]:

# show that no data for RJOB is in the bank
st = bank.get_waveforms(station='RJOB')

assert len(st) == 0

print(st)

0 Trace(s) in Stream:

[12]:

# add the default stream to the archive (which contains data for RJOB)
bank.put_waveforms(obspy.read())
st_out = bank.get_waveforms(station='RJOB')

# test output
assert len(st_out)
for tr in st_out:
    assert tr.stats.station == 'RJOB'


print(st_out)

3 Trace(s) in Stream:
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples

Availability¶

WaveBank can be used to get the availability of data. The outputs can either be a dataframe or as a list of tuples in the form of [(network, station, location, channel, min_starttime, max_endtime)].

[13]:

# get a dataframe of availability by seed ids and timestamps
bank.get_availability_df(channel='BHE', station='[OR]*')

[13]:

	network	station	channel	starttime	endtime
0	TA	O15A	BHE	2007-08-06 01:44:38.825000	2007-08-07 21:43:51.124998
1	TA	O16A	BHE	2007-08-06 01:44:38.825000	2007-08-07 21:43:51.125000
2	TA	O18A	BHE	2007-08-06 01:44:38.824998	2007-08-07 21:43:51.125000
3	TA	R16A	BHE	2007-08-07 02:04:54.500000	2007-08-07 21:43:51.125000
4	TA	R17A	BHE	2007-08-06 01:44:38.825000	2007-08-07 21:43:51.125000

[14]:

# get list of tuples of availability
bank.availability(channel='BHE', station='[OR]*')

[14]:

[('TA',
  'O15A',
  '',
  'BHE',
  2007-08-06T01:44:38.825000Z,
  2007-08-07T21:43:51.124998Z),
 ('TA',
  'O16A',
  '',
  'BHE',
  2007-08-06T01:44:38.825000Z,
  2007-08-07T21:43:51.125000Z),
 ('TA',
  'O18A',
  '',
  'BHE',
  2007-08-06T01:44:38.824998Z,
  2007-08-07T21:43:51.125000Z),
 ('TA',
  'R16A',
  '',
  'BHE',
  2007-08-07T02:04:54.500000Z,
  2007-08-07T21:43:51.125000Z),
 ('TA',
  'R17A',
  '',
  'BHE',
  2007-08-06T01:44:38.825000Z,
  2007-08-07T21:43:51.125000Z)]

Get Gaps and uptime¶

WaveBank can return a dataframe of missing data with the get_gaps_df method, and a dataframe of reliability statistics with the get_uptime_df method. These are useful for assessing the completeness of an archive of contiguous data.

[15]:

bank.get_gaps_df(channel='BHE', station='O*').head()

[15]:

	network	station	channel	starttime	endtime	sampling_period	path	gap_duration
0	TA	O15A	BHE	2007-08-06 01:45:48.799998	2007-08-06 08:48:30.024998	0 days 00:00:00.025000	TA.O15A..BHE__20070806T014438Z__20070806T01454...	0 days 07:02:41.225000
1	TA	O15A	BHE	2007-08-06 08:49:39.999998	2007-08-06 10:47:15.624998	0 days 00:00:00.025000	TA.O15A..BHE__20070806T084830Z__20070806T08494...	0 days 01:57:35.625000
2	TA	O15A	BHE	2007-08-06 10:48:25.599998	2007-08-07 02:04:54.499998	0 days 00:00:00.025000	TA.O15A..BHE__20070806T104715Z__20070806T10482...	0 days 15:16:28.900000
3	TA	O15A	BHE	2007-08-07 02:06:04.474998	2007-08-07 02:14:14.100000	0 days 00:00:00.025000	TA.O15A..BHE__20070807T020454Z__20070807T02060...	0 days 00:08:09.625002
4	TA	O15A	BHE	2007-08-07 02:15:24.074998	2007-08-07 03:44:08.474998	0 days 00:00:00.025000	TA.O15A..BHE__20070807T021414Z__20070807T02152...	0 days 01:28:44.400000

[16]:

ta_bank.get_uptime_df()

[16]:

	network	station	channel	starttime	endtime	duration	uptime	availability
0	TA	M11A	VHE	2007-02-15 00:00:09.999998	2007-02-24 23:59:59.999998	9 days 23:59:50	9 days 23:59:50	1.0
1	TA	M11A	VHN	2007-02-15 00:00:09.999998	2007-02-24 23:59:59.999998	9 days 23:59:50	9 days 23:59:50	1.0
2	TA	M11A	VHZ	2007-02-15 00:00:09.999998	2007-02-24 23:59:59.999998	9 days 23:59:50	9 days 23:59:50	1.0
3	TA	M14A	VHE	2007-02-15 00:00:00.000003	2007-02-25 00:00:00.000003	10 days 00:00:00	10 days 00:00:00	1.0
4	TA	M14A	VHN	2007-02-15 00:00:00.000003	2007-02-25 00:00:00.000003	10 days 00:00:00	10 days 00:00:00	1.0
5	TA	M14A	VHZ	2007-02-15 00:00:00.000004	2007-02-25 00:00:00.000004	10 days 00:00:00	10 days 00:00:00	1.0

Read index¶

WaveBank can return a dataframe of the the index with the read_index method, although in most cases this shouldn’t be needed.

[17]:

ta_bank.read_index().head()

[17]:

	network	station	channel	starttime	endtime	sampling_period	path
0	TA	M11A	VHN	2007-02-19 14:59:59.999998	2007-02-19 15:59:59.999998	0 days 00:00:10	TA/M11A/VHN/2007-02-19T15-00-00.mseed
1	TA	M14A	VHN	2007-02-19 15:00:00.000003	2007-02-19 16:00:00.000003	0 days 00:00:10	TA/M11A/VHN/2007-02-19T15-00-00.mseed
2	TA	M11A	VHN	2007-02-15 23:59:59.999998	2007-02-16 00:59:59.999998	0 days 00:00:10	TA/M11A/VHN/2007-02-16T00-00-00.mseed
3	TA	M14A	VHN	2007-02-16 00:00:00.000003	2007-02-16 01:00:00.000003	0 days 00:00:10	TA/M11A/VHN/2007-02-16T00-00-00.mseed
4	TA	M11A	VHN	2007-02-20 15:59:59.999998	2007-02-20 16:59:59.999998	0 days 00:00:10	TA/M11A/VHN/2007-02-20T16-00-00.mseed

Similar Projects¶

WaveBank is a useful tool, but it may not be a good fit for every application. Check out the following items as well:

Obspy has a way to visualize availability of waveform data in a directory using obspy-scan. If you prefer a graphical option to working with DataFrames this might be for you.

Obspy also has filesystem client for working with SeisComP structured archives.

IRIS released a mini-seed indexing program called mseedindex which has an ObsPy API.