{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fetcher\n",
    " \n",
    "The Fetcher class builds on the unified data request interfaces to provide convenient methods for working with whole datasets.  \n",
    "\n",
    "Listed below are the two methods which are currently available:\n",
    " \n",
    "    #1 - Get continuous data using channels contained in an inventory (or returned from a station_client).\n",
    "\n",
    "\n",
    "    #2 - Iterate over events and their corresponding waveforms that have channels defined by the station_client. \n",
    "\n",
    "\n",
    "    Note: Additional methods may be implemented in the future.\n",
    "\n",
    "\n",
    "## Setup\n",
    "In the example below, a Fetcher from the TA and Crandall dataset will be used to demonstrate different aspects of the `Fetcher` functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:05.371735Z",
     "iopub.status.busy": "2024-02-28T22:19:05.371565Z",
     "iopub.status.idle": "2024-02-28T22:19:12.283009Z",
     "shell.execute_reply": "2024-02-28T22:19:12.282328Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "downloading waveform data for ta_test dataset ...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finished downloading waveform data for ta_test\n",
      "downloading station data for ta_test dataset ...\n",
      "finished downloading station data for ta_test\n",
      "downloading event data for ta_test dataset ...\n",
      "finished downloading event data for ta_test\n"
     ]
    }
   ],
   "source": [
    "import obsplus\n",
    "\n",
    "# ta_test dataset contains continuous data\n",
    "ta_dataset = obsplus.load_dataset('ta_test')\n",
    "crandall = obsplus.load_dataset('crandall_test')\n",
    "\n",
    "# crandall contains only event data\n",
    "ta_fetcher = ta_dataset.get_fetcher()\n",
    "crandall_fetcher = crandall.get_fetcher()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Fetcher can be initialized with any objects that the appropriate client can be obtained from. Commonly it is used with a `WaveBank`, a `Catalog` (or `EventBank`) and an `Inventory`. \n",
    "\n",
    "The following example would also be valid:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:12.285784Z",
     "iopub.status.busy": "2024-02-28T22:19:12.285585Z",
     "iopub.status.idle": "2024-02-28T22:19:12.389031Z",
     "shell.execute_reply": "2024-02-28T22:19:12.388361Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "cat = crandall.event_client.get_events()\n",
    "inv = crandall.station_client.get_stations()\n",
    "wavebank = crandall.waveform_client\n",
    "\n",
    "crandall_fetcher = obsplus.Fetcher(waveforms=wavebank, stations=inv, events=cat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The WaveFetcher constructor can also take obsplus created csv/dataframes for the [stations](../datastructures/stations_to_pandas.ipynb) and [events](../datastructures/events_to_pandas.ipynb) arguments. However, some of the `Fetcher`'s functionality, like getting a dataframe of picks, will raise an exception unless full station/event clients are used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick fetch\n",
    "The easiest way to get data out of a data fetcher is to call it. The fetcher takes an argument that will provide it with information about when the stream should start. It can be a variety of types (float, UTCDateTime, Catalog, Event). The time before the reference time, and the time after the reference time must also be provided in the method call or in the `Fetcher` construction. \n",
    "\n",
    "The fetcher uses the inventory or `station_client` to know which channels to request from the waveform_client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:12.391889Z",
     "iopub.status.busy": "2024-02-28T22:19:12.391678Z",
     "iopub.status.idle": "2024-02-28T22:19:12.461953Z",
     "shell.execute_reply": "2024-02-28T22:19:12.461412Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6 Trace(s) in Stream:\n",
      "TA.M11A..VHE | 2007-02-15T05:59:59.999998Z - 2007-02-15T06:00:29.999998Z | 0.1 Hz, 4 samples\n",
      "TA.M11A..VHN | 2007-02-15T05:59:59.999998Z - 2007-02-15T06:00:29.999998Z | 0.1 Hz, 4 samples\n",
      "TA.M11A..VHZ | 2007-02-15T05:59:59.999998Z - 2007-02-15T06:00:29.999998Z | 0.1 Hz, 4 samples\n",
      "TA.M14A..VHE | 2007-02-15T06:00:00.000003Z - 2007-02-15T06:00:30.000003Z | 0.1 Hz, 4 samples\n",
      "TA.M14A..VHN | 2007-02-15T06:00:00.000003Z - 2007-02-15T06:00:30.000003Z | 0.1 Hz, 4 samples\n",
      "TA.M14A..VHZ | 2007-02-15T06:00:00.000004Z - 2007-02-15T06:00:30.000004Z | 0.1 Hz, 4 samples\n"
     ]
    }
   ],
   "source": [
    "import obspy\n",
    "reference_time = obspy.UTCDateTime('2007-02-15T06')\n",
    "time_before = 1\n",
    "time_after = 30\n",
    "stream = ta_fetcher(reference_time, time_before, time_after)\n",
    "print(stream)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Continuous data\n",
    "\n",
    "Continuous data can be requested from the wavefetcher, which uses the `station_client` to know which channels to pull from the waveform_client. This enables users to skip a lot of the boiler-plate associated with the normal `get_waveforms` interface.  \n",
    "\n",
    "The example below shows the continuous data being interated over while running a simple STA/LTA detector.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:12.464352Z",
     "iopub.status.busy": "2024-02-28T22:19:12.463981Z",
     "iopub.status.idle": "2024-02-28T22:19:12.628725Z",
     "shell.execute_reply": "2024-02-28T22:19:12.628034Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "from obspy.signal.trigger import classic_sta_lta\n",
    "\n",
    "# first define a function for doing the sta/lta\n",
    "def print_sta_lta(tr: obspy.Trace):\n",
    "    \"\"\" prints the sta/lta \"\"\"\n",
    "    sr = tr.stats.sampling_rate\n",
    "    cft = classic_sta_lta(tr.data, int(20 * sr), int(60 * sr))\n",
    "    print(f'{tr.id} starting at {st[0].stats.starttime}, has a max sta/lta of {max(cft):0.2f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:12.631544Z",
     "iopub.status.busy": "2024-02-28T22:19:12.631046Z",
     "iopub.status.idle": "2024-02-28T22:19:13.268374Z",
     "shell.execute_reply": "2024-02-28T22:19:13.267786Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-15T23:59:59.999998Z, has a max sta/lta of 2.88\n",
      "TA.M14A..VHZ starting at 2007-02-15T23:59:59.999998Z, has a max sta/lta of 2.98\n",
      "TA.M11A..VHZ starting at 2007-02-16T19:59:59.999998Z, has a max sta/lta of 2.98\n",
      "TA.M14A..VHZ starting at 2007-02-16T19:59:59.999998Z, has a max sta/lta of 2.97\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-17T15:59:59.999998Z, has a max sta/lta of 2.95\n",
      "TA.M14A..VHZ starting at 2007-02-17T15:59:59.999998Z, has a max sta/lta of 2.84\n",
      "TA.M11A..VHZ starting at 2007-02-18T11:59:59.999998Z, has a max sta/lta of 2.98\n",
      "TA.M14A..VHZ starting at 2007-02-18T11:59:59.999998Z, has a max sta/lta of 3.00\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-19T07:59:59.999998Z, has a max sta/lta of 2.93\n",
      "TA.M14A..VHZ starting at 2007-02-19T07:59:59.999998Z, has a max sta/lta of 2.93\n"
     ]
    }
   ],
   "source": [
    "# starttime for the continuous data\n",
    "t1 = obspy.UTCDateTime('2007-02-16')\n",
    "\n",
    "# endtime for the continuous data\n",
    "t2 = t1 + 36000 * 10  # use 10 hours\n",
    "\n",
    "# duration of each chunk returned (in seconds)\n",
    "duration = 72000\n",
    "\n",
    "# overlap (added to the end of the duration)\n",
    "overlap = 60\n",
    "\n",
    "# iterate over each chunk\n",
    "kwargs = dict(starttime=t1, endtime=t2, duration=duration, overlap=overlap)\n",
    "for st in ta_fetcher.yield_waveforms(**kwargs):\n",
    "    # select only z component and perform preprocessing\n",
    "    st = st.select(component='Z')\n",
    "    st.detrend('linear')\n",
    "    # do the sta/lta\n",
    "    for tr in st:\n",
    "        print_sta_lta(tr)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stream processors\n",
    "\n",
    "It can be useful to define a stream_processing function that will be called on each stream before yielding it. This allows the user to define flexible, custom processing functions without cluttering up the function calls with a lot of processing parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:13.271128Z",
     "iopub.status.busy": "2024-02-28T22:19:13.270730Z",
     "iopub.status.idle": "2024-02-28T22:19:13.737785Z",
     "shell.execute_reply": "2024-02-28T22:19:13.737195Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-15T23:59:59.999998Z, has a max sta/lta of 2.96\n",
      "TA.M14A..VHZ starting at 2007-02-15T23:59:59.999998Z, has a max sta/lta of 2.95\n",
      "TA.M11A..VHZ starting at 2007-02-16T19:59:59.999998Z, has a max sta/lta of 2.95\n",
      "TA.M14A..VHZ starting at 2007-02-16T19:59:59.999998Z, has a max sta/lta of 3.00\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-17T15:59:59.999998Z, has a max sta/lta of 2.98\n",
      "TA.M14A..VHZ starting at 2007-02-17T15:59:59.999998Z, has a max sta/lta of 2.96\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TA.M11A..VHZ starting at 2007-02-18T11:59:59.999998Z, has a max sta/lta of 2.99\n",
      "TA.M14A..VHZ starting at 2007-02-18T11:59:59.999998Z, has a max sta/lta of 2.97\n",
      "TA.M11A..VHZ starting at 2007-02-19T07:59:59.999998Z, has a max sta/lta of 3.00\n",
      "TA.M14A..VHZ starting at 2007-02-19T07:59:59.999998Z, has a max sta/lta of 2.99\n"
     ]
    }
   ],
   "source": [
    "# define a function that will be called on the stream before returning it. \n",
    "def stream_processor(st: obspy.Stream) -> obspy.Stream:\n",
    "    \"\"\" select the z component, detrend, and filter a stream \"\"\"\n",
    "    st = st.select(component='Z')\n",
    "    st.detrend('linear')\n",
    "    st.filter('bandpass', freqmin=.005, freqmax=.04)\n",
    "    return st\n",
    "\n",
    "# attach stream processor to the wave fetcher\n",
    "ta_fetcher.stream_processor = stream_processor\n",
    "\n",
    "kwargs = dict(starttime=t1, endtime=t2, duration=duration, overlap=overlap)\n",
    "for st in ta_fetcher.yield_waveforms(**kwargs):\n",
    "    for tr in st:\n",
    "        print_sta_lta(tr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Event data\n",
    "\n",
    "When the wavefetcher object is provided with an event client, the event client can be used to iterate through the event waveforms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:13.740655Z",
     "iopub.status.busy": "2024-02-28T22:19:13.740280Z",
     "iopub.status.idle": "2024-02-28T22:19:14.135848Z",
     "shell.execute_reply": "2024-02-28T22:19:14.135280Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fetched waveform data for smi:local/248828 which has 48 traces\n",
      "fetched waveform data for smi:local/248839 which has 48 traces\n",
      "fetched waveform data for smi:local/248843 which has 48 traces\n",
      "fetched waveform data for smi:local/248882 which has 51 traces\n",
      "fetched waveform data for smi:local/248883 which has 51 traces\n",
      "fetched waveform data for smi:local/248887 which has 51 traces\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fetched waveform data for smi:local/248891 which has 51 traces\n",
      "fetched waveform data for smi:local/248925 which has 51 traces\n"
     ]
    }
   ],
   "source": [
    "time_before = 1\n",
    "time_after = 3\n",
    "iterrator = crandall_fetcher.yield_event_waveforms(time_before, time_after)\n",
    "for event_id, st in iterrator:\n",
    "    print(f'fetched waveform data for {event_id} which has {len(st)} traces')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A dict of {event_id: stream} can be created using the following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:14.138377Z",
     "iopub.status.busy": "2024-02-28T22:19:14.137929Z",
     "iopub.status.idle": "2024-02-28T22:19:14.466676Z",
     "shell.execute_reply": "2024-02-28T22:19:14.466143Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "smi:local/248828 48\n",
      "smi:local/248839 48\n",
      "smi:local/248843 48\n",
      "smi:local/248882 51\n",
      "smi:local/248883 51\n",
      "smi:local/248887 51\n",
      "smi:local/248891 51\n",
      "smi:local/248925 51\n"
     ]
    }
   ],
   "source": [
    "st_dict = dict(crandall_fetcher.yield_event_waveforms(time_before, time_after))\n",
    "\n",
    "for event_id, st in st_dict.items():\n",
    "    print(event_id, len(st))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Different events/inventories\n",
    "The clients can be swapped out on each method call. This may be be useful to get a subset of the events or channels by providing a filtered catalog/inventory.  \n",
    "\n",
    "If a single call was needed for a station, the example below will accomplish this task.  In this example station M11A will be used.\n",
    "\n",
    "Note: This will not modify the original wavefetcher."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:14.468922Z",
     "iopub.status.busy": "2024-02-28T22:19:14.468722Z",
     "iopub.status.idle": "2024-02-28T22:19:51.119182Z",
     "shell.execute_reply": "2024-02-28T22:19:51.118614Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "# get a subset of the original inventory ()\n",
    "inv = ta_dataset.station_client.get_stations()\n",
    "inv2 = inv.select(station='M11A')\n",
    "\n",
    "# iterate and print\n",
    "for st in ta_fetcher.yield_waveforms(t1, t2, duration, overlap, stations=inv2):\n",
    "    for tr in st:\n",
    "        print_sta_lta(tr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same call applies for swapping out events:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-02-28T22:19:51.121776Z",
     "iopub.status.busy": "2024-02-28T22:19:51.121567Z",
     "iopub.status.idle": "2024-02-28T22:19:51.557069Z",
     "shell.execute_reply": "2024-02-28T22:19:51.556405Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fetching smi:local/248839, got 48 traces\n",
      "fetching smi:local/248883, got 51 traces\n"
     ]
    }
   ],
   "source": [
    "# read in catalog as and get a subset as a dataframe\n",
    "cat = crandall.event_client.get_events()\n",
    "cat_df = obsplus.events_to_df(cat)[:2]\n",
    "\n",
    "# iterate the events and print \n",
    "iterator = crandall_fetcher.yield_event_waveforms(time_before, time_after, events=cat_df)\n",
    "for event_id, st in iterator:\n",
    "    print(f'fetching {event_id}, got {len(st)} traces')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}