{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# WaveBank\n",
    "`WaveBank` is an in-process database for accessing seismic time-series data. Any directory structure containing ObsPy-readable waveforms can be used as the data source. `WaveBank` uses a simple indexing scheme and the [Hierarchical Data Format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) to keep track of each `Trace` in the directory. Without `WaveBank` (or another similar program) applications have implement their own data organization/access logic which is tedious and clutters up application code. `WaveBank` provides a better way. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Example Data\n",
    "This tutorial will demonstrate the use of `WaveBank` on  two different [obsplus datasets](../datasets/datasets.ipynb). \n",
    "\n",
    "The first dataset, [crandall canyon](https://en.wikipedia.org/wiki/Crandall_Canyon_Mine), only has event waveform files. The second only has continuous data from two TA stations. We start by loading these datasets, making a temporary copy, and getting a path to their waveform directories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:58.688668Z",
     "iopub.status.busy": "2024-02-28T22:19:58.688492Z",
     "iopub.status.idle": "2024-02-28T22:20:00.877905Z",
     "shell.execute_reply": "2024-02-28T22:20:00.877328Z"
    }
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "import obsplus\n",
    "\n",
    "# make sure datasets are downloaded and copy them to temporary\n",
    "# directories to make sure no accidental changes are made\n",
    "crandall_dataset = obsplus.load_dataset('crandall_test').copy()\n",
    "ta_dataset = obsplus.load_dataset('ta_test').copy()\n",
    "\n",
    "# get path to waveform directories\n",
    "crandall_path = crandall_dataset.waveform_path\n",
    "ta_path = ta_dataset.waveform_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:00.880847Z",
     "iopub.status.busy": "2024-02-28T22:20:00.880262Z",
     "iopub.status.idle": "2024-02-28T22:20:00.885826Z",
     "shell.execute_reply": "2024-02-28T22:20:00.885263Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PosixPath('/home/runner/opsdata/crandall_test/waveforms')"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "crandall_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a WaveBank object\n",
    "To create a `WaveBank` instance simply pass the class a path to the waveform directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:00.912860Z",
     "iopub.status.busy": "2024-02-28T22:20:00.912423Z",
     "iopub.status.idle": "2024-02-28T22:20:00.927601Z",
     "shell.execute_reply": "2024-02-28T22:20:00.927162Z"
    }
   },
   "outputs": [],
   "source": [
    "bank = obsplus.WaveBank(crandall_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Utilizing the `udpate_index` method on the bank ensures the index is up-to-date. This will iterate through all files that are timestamped later than the last time `update_index` was run.\n",
    "\n",
    "Note: If the index has not yet been created or new files have been added, `update_index` needs to be called."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:00.929891Z",
     "iopub.status.busy": "2024-02-28T22:20:00.929709Z",
     "iopub.status.idle": "2024-02-28T22:20:00.944738Z",
     "shell.execute_reply": "2024-02-28T22:20:00.944248Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WaveBank(base_path=/home/runner/opsdata/crandall_test/waveforms)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bank.update_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get waveforms\n",
    "\n",
    "The files can be retrieved from the directory with the `get_waveforms` method. This method has the same signature as the ObsPy client `get_waveform` methods so they can be used interchangeably:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:00.946907Z",
     "iopub.status.busy": "2024-02-28T22:20:00.946581Z",
     "iopub.status.idle": "2024-02-28T22:20:01.014708Z",
     "shell.execute_reply": "2024-02-28T22:20:01.014133Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5 Trace(s) in Stream:\n",
      "TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n",
      "TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n",
      "TA.O15A..BHZ | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n",
      "TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n",
      "TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n"
     ]
    }
   ],
   "source": [
    "import obspy\n",
    "\n",
    "t1 = obspy.UTCDateTime('2007-08-06T01-44-48')\n",
    "t2 = t1 + 60\n",
    "st = bank.get_waveforms(starttime=t1, endtime=t2)\n",
    "print (st[:5])  # print first 5 traces"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`WaveBank` can filter on channels, locations, stations, networks, etc. using linux style search strings or regex. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.017190Z",
     "iopub.status.busy": "2024-02-28T22:20:01.016827Z",
     "iopub.status.idle": "2024-02-28T22:20:01.031510Z",
     "shell.execute_reply": "2024-02-28T22:20:01.030986Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5 Trace(s) in Stream:\n",
      "UU.CTU..HHE | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n",
      "UU.CTU..HHN | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n",
      "UU.CTU..HHZ | 2007-08-06T01:44:47.994000Z - 2007-08-06T01:45:47.994000Z | 100.0 Hz, 6001 samples\n",
      "UU.MPU..HHE | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples\n",
      "UU.MPU..HHN | 2007-08-06T01:44:47.992000Z - 2007-08-06T01:45:47.992000Z | 100.0 Hz, 6001 samples\n"
     ]
    }
   ],
   "source": [
    "st2 = bank.get_waveforms(network='UU', starttime=t1, endtime=t2)\n",
    "\n",
    "# ensure only UU traces were returned\n",
    "for tr in st2:\n",
    "    assert tr.stats.network == 'UU'\n",
    "\n",
    "print(st2[:5])  # print first 5 traces"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.033886Z",
     "iopub.status.busy": "2024-02-28T22:20:01.033469Z",
     "iopub.status.idle": "2024-02-28T22:20:01.045000Z",
     "shell.execute_reply": "2024-02-28T22:20:01.044472Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6 Trace(s) in Stream:\n",
      "TA.O15A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n",
      "TA.O15A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n",
      "TA.O16A..BHE | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n",
      "TA.O16A..BHN | 2007-08-06T01:44:48.000000Z - 2007-08-06T01:45:48.000000Z | 40.0 Hz, 2401 samples\n",
      "TA.O18A..BHE | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n",
      "TA.O18A..BHN | 2007-08-06T01:44:47.999998Z - 2007-08-06T01:45:47.999998Z | 40.0 Hz, 2401 samples\n"
     ]
    }
   ],
   "source": [
    "st = bank.get_waveforms(starttime=t1, endtime=t2, station='O1??', channel='BH[NE]')\n",
    "\n",
    "# test returned traces\n",
    "for tr in st:\n",
    "    assert tr.stats.starttime >= t1 - .00001\n",
    "    assert tr.stats.endtime <= t2 + .00001\n",
    "    assert tr.stats.station.startswith('O1')\n",
    "    assert tr.stats.channel.startswith('BH')\n",
    "    assert tr.stats.channel[-1] in {'N', 'E'}\n",
    "\n",
    "print(st)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WaveBank also has a `get_waveforms_bulk` method for efficiently retrieving a large number of streams. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.047286Z",
     "iopub.status.busy": "2024-02-28T22:20:01.046951Z",
     "iopub.status.idle": "2024-02-28T22:20:01.103820Z",
     "shell.execute_reply": "2024-02-28T22:20:01.103241Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2 Trace(s) in Stream:\n",
      "TA.O15A..BHZ | 2007-08-06T01:44:42.999998Z - 2007-08-06T01:45:42.999998Z | 40.0 Hz, 2401 samples\n",
      "UU.SRU..HHZ  | 2007-08-06T01:44:47.995000Z - 2007-08-06T01:45:47.995000Z | 100.0 Hz, 6001 samples\n"
     ]
    }
   ],
   "source": [
    "args = [  # in practice this list may contain hundreds or thousands of requests\n",
    "    ('TA', 'O15A', '', 'BHZ', t1 - 5, t2 - 5,),\n",
    "    ('UU', 'SRU', '', 'HHZ', t1, t2,),\n",
    "]\n",
    "st = bank.get_waveforms_bulk(args)\n",
    "print(st )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Yield waveforms\n",
    "The Bank class also provides a generator for iterating large amounts of continuous waveforms. The following example shows how to get streams of one hour duration with a minute of overlap between the slices. \n",
    "\n",
    "The first step is to create a bank on a dataset which has continuous data. The example below will use the TA dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.106374Z",
     "iopub.status.busy": "2024-02-28T22:20:01.106006Z",
     "iopub.status.idle": "2024-02-28T22:20:01.119967Z",
     "shell.execute_reply": "2024-02-28T22:20:01.119503Z"
    }
   },
   "outputs": [],
   "source": [
    "ta_bank = obsplus.WaveBank(ta_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.122291Z",
     "iopub.status.busy": "2024-02-28T22:20:01.121859Z",
     "iopub.status.idle": "2024-02-28T22:20:01.542088Z",
     "shell.execute_reply": "2024-02-28T22:20:01.541461Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T00:00:09.999998Z to 2007-02-15T01:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T00:59:59.999998Z to 2007-02-15T02:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T01:59:59.999998Z to 2007-02-15T03:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T02:59:59.999998Z to 2007-02-15T04:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T03:59:59.999998Z to 2007-02-15T05:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T04:59:59.999998Z to 2007-02-15T06:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T05:59:59.999998Z to 2007-02-15T07:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T06:59:59.999998Z to 2007-02-15T08:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T07:59:59.999998Z to 2007-02-15T09:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T08:59:59.999998Z to 2007-02-15T10:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T09:59:59.999998Z to 2007-02-15T11:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T10:59:59.999998Z to 2007-02-15T12:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T11:59:59.999998Z to 2007-02-15T13:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T12:59:59.999998Z to 2007-02-15T14:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T13:59:59.999998Z to 2007-02-15T15:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T14:59:59.999998Z to 2007-02-15T16:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T15:59:59.999998Z to 2007-02-15T17:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T16:59:59.999998Z to 2007-02-15T18:00:59.999998Z"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T17:59:59.999998Z to 2007-02-15T19:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T18:59:59.999998Z to 2007-02-15T20:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T19:59:59.999998Z to 2007-02-15T21:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T20:59:59.999998Z to 2007-02-15T22:00:59.999998Z\n",
      "got 6 streams from 2007-02-15T21:59:59.999998Z to 2007-02-15T23:00:59.999998Z\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "got 6 streams from 2007-02-15T22:59:59.999998Z to 2007-02-16T00:00:59.999998Z\n"
     ]
    }
   ],
   "source": [
    "# get a few hours of kemmerer data\n",
    "ta_t1 = obspy.UTCDateTime('2007-02-15')\n",
    "ta_t2 = obspy.UTCDateTime('2007-02-16')\n",
    "\n",
    "for st in ta_bank.yield_waveforms(starttime=ta_t1, endtime=ta_t2, duration=3600, overlap=60):\n",
    "    print (f'got {len(st)} streams from {st[0].stats.starttime} to {st[0].stats.endtime}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Put waveforms\n",
    "Files can be added to the bank by passing a stream or trace to the `bank.put_waveforms` method. `WaveBank` does not merge files so overlap in data may occur if care is not taken."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.544595Z",
     "iopub.status.busy": "2024-02-28T22:20:01.544222Z",
     "iopub.status.idle": "2024-02-28T22:20:01.587175Z",
     "shell.execute_reply": "2024-02-28T22:20:01.586596Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 Trace(s) in Stream:\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# show that no data for RJOB is in the bank\n",
    "st = bank.get_waveforms(station='RJOB')\n",
    "\n",
    "assert len(st) == 0\n",
    "\n",
    "print(st)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.589428Z",
     "iopub.status.busy": "2024-02-28T22:20:01.589242Z",
     "iopub.status.idle": "2024-02-28T22:20:01.748728Z",
     "shell.execute_reply": "2024-02-28T22:20:01.748167Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 Trace(s) in Stream:\n",
      "BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n",
      "BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n",
      "BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples\n"
     ]
    }
   ],
   "source": [
    "# add the default stream to the archive (which contains data for RJOB)\n",
    "bank.put_waveforms(obspy.read())\n",
    "st_out = bank.get_waveforms(station='RJOB')\n",
    "\n",
    "# test output\n",
    "assert len(st_out)\n",
    "for tr in st_out:\n",
    "    assert tr.stats.station == 'RJOB'\n",
    "\n",
    "\n",
    "print(st_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Availability\n",
    "`WaveBank` can be used to get the availability of data. The outputs can either be a dataframe or as a list of tuples in the form of [(network, station, location, channel, min_starttime, max_endtime)]. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.751180Z",
     "iopub.status.busy": "2024-02-28T22:20:01.750809Z",
     "iopub.status.idle": "2024-02-28T22:20:01.799647Z",
     "shell.execute_reply": "2024-02-28T22:20:01.799162Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>network</th>\n",
       "      <th>station</th>\n",
       "      <th>location</th>\n",
       "      <th>channel</th>\n",
       "      <th>starttime</th>\n",
       "      <th>endtime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 01:44:38.825000</td>\n",
       "      <td>2007-08-07 21:43:51.124998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TA</td>\n",
       "      <td>O16A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 01:44:38.825000</td>\n",
       "      <td>2007-08-07 21:43:51.125000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TA</td>\n",
       "      <td>O18A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 01:44:38.824998</td>\n",
       "      <td>2007-08-07 21:43:51.125000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TA</td>\n",
       "      <td>R16A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-07 02:04:54.500000</td>\n",
       "      <td>2007-08-07 21:43:51.125000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>TA</td>\n",
       "      <td>R17A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 01:44:38.825000</td>\n",
       "      <td>2007-08-07 21:43:51.125000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  network station location channel                  starttime  \\\n",
       "0      TA    O15A              BHE 2007-08-06 01:44:38.825000   \n",
       "1      TA    O16A              BHE 2007-08-06 01:44:38.825000   \n",
       "2      TA    O18A              BHE 2007-08-06 01:44:38.824998   \n",
       "3      TA    R16A              BHE 2007-08-07 02:04:54.500000   \n",
       "4      TA    R17A              BHE 2007-08-06 01:44:38.825000   \n",
       "\n",
       "                     endtime  \n",
       "0 2007-08-07 21:43:51.124998  \n",
       "1 2007-08-07 21:43:51.125000  \n",
       "2 2007-08-07 21:43:51.125000  \n",
       "3 2007-08-07 21:43:51.125000  \n",
       "4 2007-08-07 21:43:51.125000  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get a dataframe of availability by seed ids and timestamps\n",
    "bank.get_availability_df(channel='BHE', station='[OR]*')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.801946Z",
     "iopub.status.busy": "2024-02-28T22:20:01.801586Z",
     "iopub.status.idle": "2024-02-28T22:20:01.814908Z",
     "shell.execute_reply": "2024-02-28T22:20:01.814310Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('TA',\n",
       "  'O15A',\n",
       "  '',\n",
       "  'BHE',\n",
       "  2007-08-06T01:44:38.825000Z,\n",
       "  2007-08-07T21:43:51.124998Z),\n",
       " ('TA',\n",
       "  'O16A',\n",
       "  '',\n",
       "  'BHE',\n",
       "  2007-08-06T01:44:38.825000Z,\n",
       "  2007-08-07T21:43:51.125000Z),\n",
       " ('TA',\n",
       "  'O18A',\n",
       "  '',\n",
       "  'BHE',\n",
       "  2007-08-06T01:44:38.824998Z,\n",
       "  2007-08-07T21:43:51.125000Z),\n",
       " ('TA',\n",
       "  'R16A',\n",
       "  '',\n",
       "  'BHE',\n",
       "  2007-08-07T02:04:54.500000Z,\n",
       "  2007-08-07T21:43:51.125000Z),\n",
       " ('TA',\n",
       "  'R17A',\n",
       "  '',\n",
       "  'BHE',\n",
       "  2007-08-06T01:44:38.825000Z,\n",
       "  2007-08-07T21:43:51.125000Z)]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get list of tuples of availability\n",
    "bank.availability(channel='BHE', station='[OR]*')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Gaps and uptime\n",
    "`WaveBank` can return a dataframe of missing data with the `get_gaps_df` method, and a dataframe of reliability statistics with the `get_uptime_df` method. These are useful for assessing the completeness of an archive of contiguous data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.817188Z",
     "iopub.status.busy": "2024-02-28T22:20:01.816836Z",
     "iopub.status.idle": "2024-02-28T22:20:01.837758Z",
     "shell.execute_reply": "2024-02-28T22:20:01.837275Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>network</th>\n",
       "      <th>station</th>\n",
       "      <th>location</th>\n",
       "      <th>channel</th>\n",
       "      <th>starttime</th>\n",
       "      <th>endtime</th>\n",
       "      <th>sampling_period</th>\n",
       "      <th>path</th>\n",
       "      <th>gap_duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 01:45:48.799998</td>\n",
       "      <td>2007-08-06 08:48:30.024998</td>\n",
       "      <td>0 days 00:00:00.025000</td>\n",
       "      <td>TA.O15A..BHE__20070806T014438Z__20070806T01454...</td>\n",
       "      <td>0 days 07:02:41.225000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 08:49:39.999998</td>\n",
       "      <td>2007-08-06 10:47:15.624998</td>\n",
       "      <td>0 days 00:00:00.025000</td>\n",
       "      <td>TA.O15A..BHE__20070806T084830Z__20070806T08494...</td>\n",
       "      <td>0 days 01:57:35.625000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-06 10:48:25.599998</td>\n",
       "      <td>2007-08-07 02:04:54.499998</td>\n",
       "      <td>0 days 00:00:00.025000</td>\n",
       "      <td>TA.O15A..BHE__20070806T104715Z__20070806T10482...</td>\n",
       "      <td>0 days 15:16:28.900000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-07 02:06:04.474998</td>\n",
       "      <td>2007-08-07 02:14:14.100000</td>\n",
       "      <td>0 days 00:00:00.025000</td>\n",
       "      <td>TA.O15A..BHE__20070807T020454Z__20070807T02060...</td>\n",
       "      <td>0 days 00:08:09.625002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>TA</td>\n",
       "      <td>O15A</td>\n",
       "      <td></td>\n",
       "      <td>BHE</td>\n",
       "      <td>2007-08-07 02:15:24.074998</td>\n",
       "      <td>2007-08-07 03:44:08.474998</td>\n",
       "      <td>0 days 00:00:00.025000</td>\n",
       "      <td>TA.O15A..BHE__20070807T021414Z__20070807T02152...</td>\n",
       "      <td>0 days 01:28:44.400000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  network station location channel                  starttime  \\\n",
       "0      TA    O15A              BHE 2007-08-06 01:45:48.799998   \n",
       "1      TA    O15A              BHE 2007-08-06 08:49:39.999998   \n",
       "2      TA    O15A              BHE 2007-08-06 10:48:25.599998   \n",
       "3      TA    O15A              BHE 2007-08-07 02:06:04.474998   \n",
       "4      TA    O15A              BHE 2007-08-07 02:15:24.074998   \n",
       "\n",
       "                     endtime        sampling_period  \\\n",
       "0 2007-08-06 08:48:30.024998 0 days 00:00:00.025000   \n",
       "1 2007-08-06 10:47:15.624998 0 days 00:00:00.025000   \n",
       "2 2007-08-07 02:04:54.499998 0 days 00:00:00.025000   \n",
       "3 2007-08-07 02:14:14.100000 0 days 00:00:00.025000   \n",
       "4 2007-08-07 03:44:08.474998 0 days 00:00:00.025000   \n",
       "\n",
       "                                                path           gap_duration  \n",
       "0  TA.O15A..BHE__20070806T014438Z__20070806T01454... 0 days 07:02:41.225000  \n",
       "1  TA.O15A..BHE__20070806T084830Z__20070806T08494... 0 days 01:57:35.625000  \n",
       "2  TA.O15A..BHE__20070806T104715Z__20070806T10482... 0 days 15:16:28.900000  \n",
       "3  TA.O15A..BHE__20070807T020454Z__20070807T02060... 0 days 00:08:09.625002  \n",
       "4  TA.O15A..BHE__20070807T021414Z__20070807T02152... 0 days 01:28:44.400000  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bank.get_gaps_df(channel='BHE', station='O*').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.839963Z",
     "iopub.status.busy": "2024-02-28T22:20:01.839603Z",
     "iopub.status.idle": "2024-02-28T22:20:01.959898Z",
     "shell.execute_reply": "2024-02-28T22:20:01.959250Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>network</th>\n",
       "      <th>station</th>\n",
       "      <th>location</th>\n",
       "      <th>channel</th>\n",
       "      <th>starttime</th>\n",
       "      <th>endtime</th>\n",
       "      <th>duration</th>\n",
       "      <th>gap_duration</th>\n",
       "      <th>uptime</th>\n",
       "      <th>availability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHE</td>\n",
       "      <td>2007-02-15 00:00:09.999998</td>\n",
       "      <td>2007-02-24 23:59:59.999998</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>0 days</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-15 00:00:09.999998</td>\n",
       "      <td>2007-02-24 23:59:59.999998</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>0 days</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHZ</td>\n",
       "      <td>2007-02-15 00:00:09.999998</td>\n",
       "      <td>2007-02-24 23:59:59.999998</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>0 days</td>\n",
       "      <td>9 days 23:59:50</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TA</td>\n",
       "      <td>M14A</td>\n",
       "      <td></td>\n",
       "      <td>VHE</td>\n",
       "      <td>2007-02-15 00:00:00.000003</td>\n",
       "      <td>2007-02-25 00:00:00.000003</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>0 days</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>TA</td>\n",
       "      <td>M14A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-15 00:00:00.000003</td>\n",
       "      <td>2007-02-25 00:00:00.000003</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>0 days</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>TA</td>\n",
       "      <td>M14A</td>\n",
       "      <td></td>\n",
       "      <td>VHZ</td>\n",
       "      <td>2007-02-15 00:00:00.000004</td>\n",
       "      <td>2007-02-25 00:00:00.000004</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>0 days</td>\n",
       "      <td>10 days 00:00:00</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  network station location channel                  starttime  \\\n",
       "0      TA    M11A              VHE 2007-02-15 00:00:09.999998   \n",
       "1      TA    M11A              VHN 2007-02-15 00:00:09.999998   \n",
       "2      TA    M11A              VHZ 2007-02-15 00:00:09.999998   \n",
       "3      TA    M14A              VHE 2007-02-15 00:00:00.000003   \n",
       "4      TA    M14A              VHN 2007-02-15 00:00:00.000003   \n",
       "5      TA    M14A              VHZ 2007-02-15 00:00:00.000004   \n",
       "\n",
       "                     endtime         duration gap_duration           uptime  \\\n",
       "0 2007-02-24 23:59:59.999998  9 days 23:59:50       0 days  9 days 23:59:50   \n",
       "1 2007-02-24 23:59:59.999998  9 days 23:59:50       0 days  9 days 23:59:50   \n",
       "2 2007-02-24 23:59:59.999998  9 days 23:59:50       0 days  9 days 23:59:50   \n",
       "3 2007-02-25 00:00:00.000003 10 days 00:00:00       0 days 10 days 00:00:00   \n",
       "4 2007-02-25 00:00:00.000003 10 days 00:00:00       0 days 10 days 00:00:00   \n",
       "5 2007-02-25 00:00:00.000004 10 days 00:00:00       0 days 10 days 00:00:00   \n",
       "\n",
       "   availability  \n",
       "0           1.0  \n",
       "1           1.0  \n",
       "2           1.0  \n",
       "3           1.0  \n",
       "4           1.0  \n",
       "5           1.0  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ta_bank.get_uptime_df()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read index\n",
    "`WaveBank` can return a dataframe of the the index with the `read_index` method, although in most cases this shouldn't be needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:20:01.962302Z",
     "iopub.status.busy": "2024-02-28T22:20:01.961950Z",
     "iopub.status.idle": "2024-02-28T22:20:01.971451Z",
     "shell.execute_reply": "2024-02-28T22:20:01.970969Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>network</th>\n",
       "      <th>station</th>\n",
       "      <th>location</th>\n",
       "      <th>channel</th>\n",
       "      <th>starttime</th>\n",
       "      <th>endtime</th>\n",
       "      <th>sampling_period</th>\n",
       "      <th>path</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-19 14:59:59.999998</td>\n",
       "      <td>2007-02-19 15:59:59.999998</td>\n",
       "      <td>0 days 00:00:10</td>\n",
       "      <td>TA/M11A/VHN/2007-02-19T15-00-00.mseed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TA</td>\n",
       "      <td>M14A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-19 15:00:00.000003</td>\n",
       "      <td>2007-02-19 16:00:00.000003</td>\n",
       "      <td>0 days 00:00:10</td>\n",
       "      <td>TA/M11A/VHN/2007-02-19T15-00-00.mseed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-15 23:59:59.999998</td>\n",
       "      <td>2007-02-16 00:59:59.999998</td>\n",
       "      <td>0 days 00:00:10</td>\n",
       "      <td>TA/M11A/VHN/2007-02-16T00-00-00.mseed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>TA</td>\n",
       "      <td>M14A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-16 00:00:00.000003</td>\n",
       "      <td>2007-02-16 01:00:00.000003</td>\n",
       "      <td>0 days 00:00:10</td>\n",
       "      <td>TA/M11A/VHN/2007-02-16T00-00-00.mseed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>TA</td>\n",
       "      <td>M11A</td>\n",
       "      <td></td>\n",
       "      <td>VHN</td>\n",
       "      <td>2007-02-20 15:59:59.999998</td>\n",
       "      <td>2007-02-20 16:59:59.999998</td>\n",
       "      <td>0 days 00:00:10</td>\n",
       "      <td>TA/M11A/VHN/2007-02-20T16-00-00.mseed</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  network station location channel                  starttime  \\\n",
       "0      TA    M11A              VHN 2007-02-19 14:59:59.999998   \n",
       "1      TA    M14A              VHN 2007-02-19 15:00:00.000003   \n",
       "2      TA    M11A              VHN 2007-02-15 23:59:59.999998   \n",
       "3      TA    M14A              VHN 2007-02-16 00:00:00.000003   \n",
       "4      TA    M11A              VHN 2007-02-20 15:59:59.999998   \n",
       "\n",
       "                     endtime sampling_period  \\\n",
       "0 2007-02-19 15:59:59.999998 0 days 00:00:10   \n",
       "1 2007-02-19 16:00:00.000003 0 days 00:00:10   \n",
       "2 2007-02-16 00:59:59.999998 0 days 00:00:10   \n",
       "3 2007-02-16 01:00:00.000003 0 days 00:00:10   \n",
       "4 2007-02-20 16:59:59.999998 0 days 00:00:10   \n",
       "\n",
       "                                    path  \n",
       "0  TA/M11A/VHN/2007-02-19T15-00-00.mseed  \n",
       "1  TA/M11A/VHN/2007-02-19T15-00-00.mseed  \n",
       "2  TA/M11A/VHN/2007-02-16T00-00-00.mseed  \n",
       "3  TA/M11A/VHN/2007-02-16T00-00-00.mseed  \n",
       "4  TA/M11A/VHN/2007-02-20T16-00-00.mseed  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ta_bank.read_index().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Similar Projects\n",
    "`WaveBank` is a useful tool, but it may not be a good fit for every application. Check out the following items as well:\n",
    "\n",
    "Obspy has a way to visualize availability of waveform data in a directory using [obspy-scan](https://docs.obspy.org/tutorial/code_snippets/visualize_data_availability_of_local_waveform_archive.html). If you prefer a graphical option to working with `DataFrame`s this might be for you.\n",
    "\n",
    "Obspy also has [filesystem client](https://docs.obspy.org/master/packages/autogen/obspy.clients.filesystem.sds.Client.html#obspy.clients.filesystem.sds.Client) for working with SeisComP structured archives.\n",
    "\n",
    "[IRIS](https://www.iris.edu/hq/) released a mini-seed indexing program called [mseedindex](https://github.com/iris-edu/mseedindex) which has an [ObsPy API](https://github.com/obspy/obspy/pull/2206)."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}