{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# WaveBank\n", "`WaveBank` is an in-process database for accessing seismic time-series data. Any directory structure containing ObsPy-readable waveforms can be used as the data source. `WaveBank` uses a simple indexing scheme and the [Hierarchical Data Format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) to keep track of each `Trace` in the directory. Without `WaveBank` (or another similar program) applications have implement their own data organization/access logic which is tedious and clutters up application code. `WaveBank` provides a better way. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Example Data\n", "This tutorial will demonstrate the use of `WaveBank` on two different [obsplus datasets](../datasets/datasets.ipynb). \n", "\n", "The first dataset, [crandall canyon](https://en.wikipedia.org/wiki/Crandall_Canyon_Mine), only has event waveform files. The second only has continuous data from two TA stations. We start by loading these datasets, making a temporary copy, and getting a path to their waveform directories." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:19:58.688668Z", "iopub.status.busy": "2024-02-28T22:19:58.688492Z", "iopub.status.idle": "2024-02-28T22:20:00.877905Z", "shell.execute_reply": "2024-02-28T22:20:00.877328Z" } }, "outputs": [], "source": [ "%%capture\n", "import obsplus\n", "\n", "# make sure datasets are downloaded and copy them to temporary\n", "# directories to make sure no accidental changes are made\n", "crandall_dataset = obsplus.load_dataset('crandall_test').copy()\n", "ta_dataset = obsplus.load_dataset('ta_test').copy()\n", "\n", "# get path to waveform directories\n", "crandall_path = crandall_dataset.waveform_path\n", "ta_path = ta_dataset.waveform_path" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:00.880847Z", "iopub.status.busy": "2024-02-28T22:20:00.880262Z", "iopub.status.idle": "2024-02-28T22:20:00.885826Z", "shell.execute_reply": "2024-02-28T22:20:00.885263Z" } }, "outputs": [ { "data": { "text/plain": [ "PosixPath('/home/runner/opsdata/crandall_test/waveforms')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "crandall_path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a WaveBank object\n", "To create a `WaveBank` instance simply pass the class a path to the waveform directory." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:00.912860Z", "iopub.status.busy": "2024-02-28T22:20:00.912423Z", "iopub.status.idle": "2024-02-28T22:20:00.927601Z", "shell.execute_reply": "2024-02-28T22:20:00.927162Z" } }, "outputs": [], "source": [ "bank = obsplus.WaveBank(crandall_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Utilizing the `udpate_index` method on the bank ensures the index is up-to-date. This will iterate through all files that are timestamped later than the last time `update_index` was run.\n", "\n", "Note: If the index has not yet been created or new files have been added, `update_index` needs to be called." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:00.929891Z", "iopub.status.busy": "2024-02-28T22:20:00.929709Z", "iopub.status.idle": "2024-02-28T22:20:00.944738Z", "shell.execute_reply": "2024-02-28T22:20:00.944248Z" } }, "outputs": [ { "data": { "text/plain": [ "WaveBank(base_path=/home/runner/opsdata/crandall_test/waveforms)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bank.update_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get waveforms\n", "\n", "The files can be retrieved from the directory with the `get_waveforms` method. This method has the same signature as the ObsPy client `get_waveform` methods so they can be used interchangeably:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:00.946907Z", "iopub.status.busy": "2024-02-28T22:20:00.946581Z", "iopub.status.idle": "2024-02-28T22:20:01.014708Z", "shell.execute_reply": "2024-02-28T22:20:01.014133Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 Trace(s) in Stream:\n", "TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n", "TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n", "TA.O15A..BHZ | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n", "TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n", "TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n" ] } ], "source": [ "import obspy\n", "\n", "t1 = obspy.UTCDateTime('2007-08-06T01-44-48')\n", "t2 = t1 + 60\n", "st = bank.get_waveforms(starttime=t1, endtime=t2)\n", "print (st[:5]) # print first 5 traces" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`WaveBank` can filter on channels, locations, stations, networks, etc. using linux style search strings or regex. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.017190Z", "iopub.status.busy": "2024-02-28T22:20:01.016827Z", "iopub.status.idle": "2024-02-28T22:20:01.031510Z", "shell.execute_reply": "2024-02-28T22:20:01.030986Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 Trace(s) in Stream:\n", "UU.CTU..HHE | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n", "UU.CTU..HHN | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n", "UU.CTU..HHZ | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n", "UU.MPU..HHE | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples\n", "UU.MPU..HHN | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples\n" ] } ], "source": [ "st2 = bank.get_waveforms(network='UU', starttime=t1, endtime=t2)\n", "\n", "# ensure only UU traces were returned\n", "for tr in st2:\n", " assert tr.stats.network == 'UU'\n", "\n", "print(st2[:5]) # print first 5 traces" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.033886Z", "iopub.status.busy": "2024-02-28T22:20:01.033469Z", "iopub.status.idle": "2024-02-28T22:20:01.045000Z", "shell.execute_reply": "2024-02-28T22:20:01.044472Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6 Trace(s) in Stream:\n", "TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n", "TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n", "TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n", "TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n", "TA.O18A..BHE | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n", "TA.O18A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n" ] } ], "source": [ "st = bank.get_waveforms(starttime=t1, endtime=t2, station='O1??', channel='BH[NE]')\n", "\n", "# test returned traces\n", "for tr in st:\n", " assert tr.stats.starttime >= t1 - .00001\n", " assert tr.stats.endtime <= t2 + .00001\n", " assert tr.stats.station.startswith('O1')\n", " assert tr.stats.channel.startswith('BH')\n", " assert tr.stats.channel[-1] in {'N', 'E'}\n", "\n", "print(st)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WaveBank also has a `get_waveforms_bulk` method for efficiently retrieving a large number of streams. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.047286Z", "iopub.status.busy": "2024-02-28T22:20:01.046951Z", "iopub.status.idle": "2024-02-28T22:20:01.103820Z", "shell.execute_reply": "2024-02-28T22:20:01.103241Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 Trace(s) in Stream:\n", "TA.O15A..BHZ | 2007-08-06T01:44:42.999998Z - 2007-08-06T01:45:42.999998Z | 40.0 Hz, 2401 samples\n", "UU.SRU..HHZ | 2007-08-06T01:44:47.995000Z - 2007-08-06T01:45:47.995000Z | 100.0 Hz, 6001 samples\n" ] } ], "source": [ "args = [ # in practice this list may contain hundreds or thousands of requests\n", " ('TA', 'O15A', '', 'BHZ', t1 - 5, t2 - 5,),\n", " ('UU', 'SRU', '', 'HHZ', t1, t2,),\n", "]\n", "st = bank.get_waveforms_bulk(args)\n", "print(st )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Yield waveforms\n", "The Bank class also provides a generator for iterating large amounts of continuous waveforms. The following example shows how to get streams of one hour duration with a minute of overlap between the slices. \n", "\n", "The first step is to create a bank on a dataset which has continuous data. The example below will use the TA dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.106374Z", "iopub.status.busy": "2024-02-28T22:20:01.106006Z", "iopub.status.idle": "2024-02-28T22:20:01.119967Z", "shell.execute_reply": "2024-02-28T22:20:01.119503Z" } }, "outputs": [], "source": [ "ta_bank = obsplus.WaveBank(ta_path)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.122291Z", "iopub.status.busy": "2024-02-28T22:20:01.121859Z", "iopub.status.idle": "2024-02-28T22:20:01.542088Z", "shell.execute_reply": "2024-02-28T22:20:01.541461Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T00:00:09.999998Z to 2007-02-15T01:00:59.999998Z\n", "got 6 streams from 2007-02-15T00:59:59.999998Z to 2007-02-15T02:00:59.999998Z\n", "got 6 streams from 2007-02-15T01:59:59.999998Z to 2007-02-15T03:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T02:59:59.999998Z to 2007-02-15T04:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T03:59:59.999998Z to 2007-02-15T05:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T04:59:59.999998Z to 2007-02-15T06:00:59.999998Z\n", "got 6 streams from 2007-02-15T05:59:59.999998Z to 2007-02-15T07:00:59.999998Z\n", "got 6 streams from 2007-02-15T06:59:59.999998Z to 2007-02-15T08:00:59.999998Z\n", "got 6 streams from 2007-02-15T07:59:59.999998Z to 2007-02-15T09:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T08:59:59.999998Z to 2007-02-15T10:00:59.999998Z\n", "got 6 streams from 2007-02-15T09:59:59.999998Z to 2007-02-15T11:00:59.999998Z\n", "got 6 streams from 2007-02-15T10:59:59.999998Z to 2007-02-15T12:00:59.999998Z\n", "got 6 streams from 2007-02-15T11:59:59.999998Z to 2007-02-15T13:00:59.999998Z\n", "got 6 streams from 2007-02-15T12:59:59.999998Z to 2007-02-15T14:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T13:59:59.999998Z to 2007-02-15T15:00:59.999998Z\n", "got 6 streams from 2007-02-15T14:59:59.999998Z to 2007-02-15T16:00:59.999998Z\n", "got 6 streams from 2007-02-15T15:59:59.999998Z to 2007-02-15T17:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T16:59:59.999998Z to 2007-02-15T18:00:59.999998Z" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T17:59:59.999998Z to 2007-02-15T19:00:59.999998Z\n", "got 6 streams from 2007-02-15T18:59:59.999998Z to 2007-02-15T20:00:59.999998Z\n", "got 6 streams from 2007-02-15T19:59:59.999998Z to 2007-02-15T21:00:59.999998Z\n", "got 6 streams from 2007-02-15T20:59:59.999998Z to 2007-02-15T22:00:59.999998Z\n", "got 6 streams from 2007-02-15T21:59:59.999998Z to 2007-02-15T23:00:59.999998Z\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "got 6 streams from 2007-02-15T22:59:59.999998Z to 2007-02-16T00:00:59.999998Z\n" ] } ], "source": [ "# get a few hours of kemmerer data\n", "ta_t1 = obspy.UTCDateTime('2007-02-15')\n", "ta_t2 = obspy.UTCDateTime('2007-02-16')\n", "\n", "for st in ta_bank.yield_waveforms(starttime=ta_t1, endtime=ta_t2, duration=3600, overlap=60):\n", " print (f'got {len(st)} streams from {st[0].stats.starttime} to {st[0].stats.endtime}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Put waveforms\n", "Files can be added to the bank by passing a stream or trace to the `bank.put_waveforms` method. `WaveBank` does not merge files so overlap in data may occur if care is not taken." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.544595Z", "iopub.status.busy": "2024-02-28T22:20:01.544222Z", "iopub.status.idle": "2024-02-28T22:20:01.587175Z", "shell.execute_reply": "2024-02-28T22:20:01.586596Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Trace(s) in Stream:\n", "\n" ] } ], "source": [ "# show that no data for RJOB is in the bank\n", "st = bank.get_waveforms(station='RJOB')\n", "\n", "assert len(st) == 0\n", "\n", "print(st)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.589428Z", "iopub.status.busy": "2024-02-28T22:20:01.589242Z", "iopub.status.idle": "2024-02-28T22:20:01.748728Z", "shell.execute_reply": "2024-02-28T22:20:01.748167Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 Trace(s) in Stream:\n", "BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n", "BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n", "BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n" ] } ], "source": [ "# add the default stream to the archive (which contains data for RJOB)\n", "bank.put_waveforms(obspy.read())\n", "st_out = bank.get_waveforms(station='RJOB')\n", "\n", "# test output\n", "assert len(st_out)\n", "for tr in st_out:\n", " assert tr.stats.station == 'RJOB'\n", "\n", "\n", "print(st_out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Availability\n", "`WaveBank` can be used to get the availability of data. The outputs can either be a dataframe or as a list of tuples in the form of [(network, station, location, channel, min_starttime, max_endtime)]. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.751180Z", "iopub.status.busy": "2024-02-28T22:20:01.750809Z", "iopub.status.idle": "2024-02-28T22:20:01.799647Z", "shell.execute_reply": "2024-02-28T22:20:01.799162Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
networkstationlocationchannelstarttimeendtime
0TAO15ABHE2007-08-06 01:44:38.8250002007-08-07 21:43:51.124998
1TAO16ABHE2007-08-06 01:44:38.8250002007-08-07 21:43:51.125000
2TAO18ABHE2007-08-06 01:44:38.8249982007-08-07 21:43:51.125000
3TAR16ABHE2007-08-07 02:04:54.5000002007-08-07 21:43:51.125000
4TAR17ABHE2007-08-06 01:44:38.8250002007-08-07 21:43:51.125000
\n", "
" ], "text/plain": [ " network station location channel starttime \\\n", "0 TA O15A BHE 2007-08-06 01:44:38.825000 \n", "1 TA O16A BHE 2007-08-06 01:44:38.825000 \n", "2 TA O18A BHE 2007-08-06 01:44:38.824998 \n", "3 TA R16A BHE 2007-08-07 02:04:54.500000 \n", "4 TA R17A BHE 2007-08-06 01:44:38.825000 \n", "\n", " endtime \n", "0 2007-08-07 21:43:51.124998 \n", "1 2007-08-07 21:43:51.125000 \n", "2 2007-08-07 21:43:51.125000 \n", "3 2007-08-07 21:43:51.125000 \n", "4 2007-08-07 21:43:51.125000 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get a dataframe of availability by seed ids and timestamps\n", "bank.get_availability_df(channel='BHE', station='[OR]*')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.801946Z", "iopub.status.busy": "2024-02-28T22:20:01.801586Z", "iopub.status.idle": "2024-02-28T22:20:01.814908Z", "shell.execute_reply": "2024-02-28T22:20:01.814310Z" } }, "outputs": [ { "data": { "text/plain": [ "[('TA',\n", " 'O15A',\n", " '',\n", " 'BHE',\n", " 2007-08-06T01:44:38.825000Z,\n", " 2007-08-07T21:43:51.124998Z),\n", " ('TA',\n", " 'O16A',\n", " '',\n", " 'BHE',\n", " 2007-08-06T01:44:38.825000Z,\n", " 2007-08-07T21:43:51.125000Z),\n", " ('TA',\n", " 'O18A',\n", " '',\n", " 'BHE',\n", " 2007-08-06T01:44:38.824998Z,\n", " 2007-08-07T21:43:51.125000Z),\n", " ('TA',\n", " 'R16A',\n", " '',\n", " 'BHE',\n", " 2007-08-07T02:04:54.500000Z,\n", " 2007-08-07T21:43:51.125000Z),\n", " ('TA',\n", " 'R17A',\n", " '',\n", " 'BHE',\n", " 2007-08-06T01:44:38.825000Z,\n", " 2007-08-07T21:43:51.125000Z)]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get list of tuples of availability\n", "bank.availability(channel='BHE', station='[OR]*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Gaps and uptime\n", "`WaveBank` can return a dataframe of missing data with the `get_gaps_df` method, and a dataframe of reliability statistics with the `get_uptime_df` method. These are useful for assessing the completeness of an archive of contiguous data." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.817188Z", "iopub.status.busy": "2024-02-28T22:20:01.816836Z", "iopub.status.idle": "2024-02-28T22:20:01.837758Z", "shell.execute_reply": "2024-02-28T22:20:01.837275Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
networkstationlocationchannelstarttimeendtimesampling_periodpathgap_duration
0TAO15ABHE2007-08-06 01:45:48.7999982007-08-06 08:48:30.0249980 days 00:00:00.025000TA.O15A..BHE__20070806T014438Z__20070806T01454...0 days 07:02:41.225000
1TAO15ABHE2007-08-06 08:49:39.9999982007-08-06 10:47:15.6249980 days 00:00:00.025000TA.O15A..BHE__20070806T084830Z__20070806T08494...0 days 01:57:35.625000
2TAO15ABHE2007-08-06 10:48:25.5999982007-08-07 02:04:54.4999980 days 00:00:00.025000TA.O15A..BHE__20070806T104715Z__20070806T10482...0 days 15:16:28.900000
3TAO15ABHE2007-08-07 02:06:04.4749982007-08-07 02:14:14.1000000 days 00:00:00.025000TA.O15A..BHE__20070807T020454Z__20070807T02060...0 days 00:08:09.625002
4TAO15ABHE2007-08-07 02:15:24.0749982007-08-07 03:44:08.4749980 days 00:00:00.025000TA.O15A..BHE__20070807T021414Z__20070807T02152...0 days 01:28:44.400000
\n", "
" ], "text/plain": [ " network station location channel starttime \\\n", "0 TA O15A BHE 2007-08-06 01:45:48.799998 \n", "1 TA O15A BHE 2007-08-06 08:49:39.999998 \n", "2 TA O15A BHE 2007-08-06 10:48:25.599998 \n", "3 TA O15A BHE 2007-08-07 02:06:04.474998 \n", "4 TA O15A BHE 2007-08-07 02:15:24.074998 \n", "\n", " endtime sampling_period \\\n", "0 2007-08-06 08:48:30.024998 0 days 00:00:00.025000 \n", "1 2007-08-06 10:47:15.624998 0 days 00:00:00.025000 \n", "2 2007-08-07 02:04:54.499998 0 days 00:00:00.025000 \n", "3 2007-08-07 02:14:14.100000 0 days 00:00:00.025000 \n", "4 2007-08-07 03:44:08.474998 0 days 00:00:00.025000 \n", "\n", " path gap_duration \n", "0 TA.O15A..BHE__20070806T014438Z__20070806T01454... 0 days 07:02:41.225000 \n", "1 TA.O15A..BHE__20070806T084830Z__20070806T08494... 0 days 01:57:35.625000 \n", "2 TA.O15A..BHE__20070806T104715Z__20070806T10482... 0 days 15:16:28.900000 \n", "3 TA.O15A..BHE__20070807T020454Z__20070807T02060... 0 days 00:08:09.625002 \n", "4 TA.O15A..BHE__20070807T021414Z__20070807T02152... 0 days 01:28:44.400000 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bank.get_gaps_df(channel='BHE', station='O*').head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.839963Z", "iopub.status.busy": "2024-02-28T22:20:01.839603Z", "iopub.status.idle": "2024-02-28T22:20:01.959898Z", "shell.execute_reply": "2024-02-28T22:20:01.959250Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
networkstationlocationchannelstarttimeendtimedurationgap_durationuptimeavailability
0TAM11AVHE2007-02-15 00:00:09.9999982007-02-24 23:59:59.9999989 days 23:59:500 days9 days 23:59:501.0
1TAM11AVHN2007-02-15 00:00:09.9999982007-02-24 23:59:59.9999989 days 23:59:500 days9 days 23:59:501.0
2TAM11AVHZ2007-02-15 00:00:09.9999982007-02-24 23:59:59.9999989 days 23:59:500 days9 days 23:59:501.0
3TAM14AVHE2007-02-15 00:00:00.0000032007-02-25 00:00:00.00000310 days 00:00:000 days10 days 00:00:001.0
4TAM14AVHN2007-02-15 00:00:00.0000032007-02-25 00:00:00.00000310 days 00:00:000 days10 days 00:00:001.0
5TAM14AVHZ2007-02-15 00:00:00.0000042007-02-25 00:00:00.00000410 days 00:00:000 days10 days 00:00:001.0
\n", "
" ], "text/plain": [ " network station location channel starttime \\\n", "0 TA M11A VHE 2007-02-15 00:00:09.999998 \n", "1 TA M11A VHN 2007-02-15 00:00:09.999998 \n", "2 TA M11A VHZ 2007-02-15 00:00:09.999998 \n", "3 TA M14A VHE 2007-02-15 00:00:00.000003 \n", "4 TA M14A VHN 2007-02-15 00:00:00.000003 \n", "5 TA M14A VHZ 2007-02-15 00:00:00.000004 \n", "\n", " endtime duration gap_duration uptime \\\n", "0 2007-02-24 23:59:59.999998 9 days 23:59:50 0 days 9 days 23:59:50 \n", "1 2007-02-24 23:59:59.999998 9 days 23:59:50 0 days 9 days 23:59:50 \n", "2 2007-02-24 23:59:59.999998 9 days 23:59:50 0 days 9 days 23:59:50 \n", "3 2007-02-25 00:00:00.000003 10 days 00:00:00 0 days 10 days 00:00:00 \n", "4 2007-02-25 00:00:00.000003 10 days 00:00:00 0 days 10 days 00:00:00 \n", "5 2007-02-25 00:00:00.000004 10 days 00:00:00 0 days 10 days 00:00:00 \n", "\n", " availability \n", "0 1.0 \n", "1 1.0 \n", "2 1.0 \n", "3 1.0 \n", "4 1.0 \n", "5 1.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ta_bank.get_uptime_df()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read index\n", "`WaveBank` can return a dataframe of the the index with the `read_index` method, although in most cases this shouldn't be needed." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:01.962302Z", "iopub.status.busy": "2024-02-28T22:20:01.961950Z", "iopub.status.idle": "2024-02-28T22:20:01.971451Z", "shell.execute_reply": "2024-02-28T22:20:01.970969Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
networkstationlocationchannelstarttimeendtimesampling_periodpath
0TAM11AVHN2007-02-19 14:59:59.9999982007-02-19 15:59:59.9999980 days 00:00:10TA/M11A/VHN/2007-02-19T15-00-00.mseed
1TAM14AVHN2007-02-19 15:00:00.0000032007-02-19 16:00:00.0000030 days 00:00:10TA/M11A/VHN/2007-02-19T15-00-00.mseed
2TAM11AVHN2007-02-15 23:59:59.9999982007-02-16 00:59:59.9999980 days 00:00:10TA/M11A/VHN/2007-02-16T00-00-00.mseed
3TAM14AVHN2007-02-16 00:00:00.0000032007-02-16 01:00:00.0000030 days 00:00:10TA/M11A/VHN/2007-02-16T00-00-00.mseed
4TAM11AVHN2007-02-20 15:59:59.9999982007-02-20 16:59:59.9999980 days 00:00:10TA/M11A/VHN/2007-02-20T16-00-00.mseed
\n", "
" ], "text/plain": [ " network station location channel starttime \\\n", "0 TA M11A VHN 2007-02-19 14:59:59.999998 \n", "1 TA M14A VHN 2007-02-19 15:00:00.000003 \n", "2 TA M11A VHN 2007-02-15 23:59:59.999998 \n", "3 TA M14A VHN 2007-02-16 00:00:00.000003 \n", "4 TA M11A VHN 2007-02-20 15:59:59.999998 \n", "\n", " endtime sampling_period \\\n", "0 2007-02-19 15:59:59.999998 0 days 00:00:10 \n", "1 2007-02-19 16:00:00.000003 0 days 00:00:10 \n", "2 2007-02-16 00:59:59.999998 0 days 00:00:10 \n", "3 2007-02-16 01:00:00.000003 0 days 00:00:10 \n", "4 2007-02-20 16:59:59.999998 0 days 00:00:10 \n", "\n", " path \n", "0 TA/M11A/VHN/2007-02-19T15-00-00.mseed \n", "1 TA/M11A/VHN/2007-02-19T15-00-00.mseed \n", "2 TA/M11A/VHN/2007-02-16T00-00-00.mseed \n", "3 TA/M11A/VHN/2007-02-16T00-00-00.mseed \n", "4 TA/M11A/VHN/2007-02-20T16-00-00.mseed " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ta_bank.read_index().head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Similar Projects\n", "`WaveBank` is a useful tool, but it may not be a good fit for every application. Check out the following items as well:\n", "\n", "Obspy has a way to visualize availability of waveform data in a directory using [obspy-scan](https://docs.obspy.org/tutorial/code_snippets/visualize_data_availability_of_local_waveform_archive.html). If you prefer a graphical option to working with `DataFrame`s this might be for you.\n", "\n", "Obspy also has [filesystem client](https://docs.obspy.org/master/packages/autogen/obspy.clients.filesystem.sds.Client.html#obspy.clients.filesystem.sds.Client) for working with SeisComP structured archives.\n", "\n", "[IRIS](https://www.iris.edu/hq/) released a mini-seed indexing program called [mseedindex](https://github.com/iris-edu/mseedindex) which has an [ObsPy API](https://github.com/obspy/obspy/pull/2206)." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }