{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with Catalogs\n", "\n", "[Obspy's event representation](https://docs.obspy.org/packages/obspy.core.html#event-metadata) is based on the [FDSN](http://www.fdsn.org/) [QuakeML standard](https://quake.ethz.ch/quakeml/), which is very comprehensive, and arguably the best standard available. However, It can be a bit difficult to work with the `Catalog` object (and friends) for a few reasons:\n", "\n", " 1. Often the desired data is deeply nested and hard to aggregate\n", " \n", " 2. Identifying data relations depends on the complex behavior of Obspy's `ResourceIdentifier`\n", " \n", " 3. Preferred objects (eg origin, magnitude, etc.) are often not set\n", " \n", "ObsPlus tries to solve all of these problems. The first is addressed by the [DataFrame Extractor](../utils/dataframeextractor.ipynb) and other tree transversal tools. The second and third are addressed by a collection of catalog validators " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Catalog Navigation\n", "\n", "If you only need to extract information contained in a catalog there are various examples of creating dataframes from different parts of the catalog in the [events_to_pandas section](../datastructures/events_to_pandas.ipynb). \n", "\n", "If the tree structure needs to be maintained, the `yield_object_parent_attr` function can be very useful. For example, let's assume we want to sabotage a seismic analyst by adding noise to his/her pick times. We could do that like so: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:38.154811Z", "iopub.status.busy": "2024-02-28T22:20:38.154390Z", "iopub.status.idle": "2024-02-28T22:20:40.967629Z", "shell.execute_reply": "2024-02-28T22:20:40.967036Z" } }, "outputs": [], "source": [ "# \n", "import numpy as np\n", "import obspy\n", "import obspy.core.event as ev\n", "import obsplus\n", "from obsplus.utils import yield_obj_parent_attr\n", "\n", "cat = obsplus.load_dataset('crandall_test').event_client.get_events()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:40.970374Z", "iopub.status.busy": "2024-02-28T22:20:40.970006Z", "iopub.status.idle": "2024-02-28T22:20:41.088185Z", "shell.execute_reply": "2024-02-28T22:20:41.087580Z" } }, "outputs": [], "source": [ "# iterate picks, add noise to pick time (-1 to 1 seconds, uniform dist.)\n", "for pick, parent, attr in yield_obj_parent_attr(cat, cls=ev.Pick):\n", " pick.time = pick.time + (np.random.random() - 0.5) * 2.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, a less malicious example, perhaps we want to count all the `ResourceIdentifier` instances and ensure they have some minimum length. If they don't we want to regenerate them." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.091155Z", "iopub.status.busy": "2024-02-28T22:20:41.090708Z", "iopub.status.idle": "2024-02-28T22:20:41.183934Z", "shell.execute_reply": "2024-02-28T22:20:41.183277Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 2807 resource ids in the catalog. 1407 were replaced.\n" ] } ], "source": [ "count = 0\n", "replaced = 0\n", "\n", "for rid, parent, attr in yield_obj_parent_attr(cat, cls=ev.ResourceIdentifier):\n", " # increment counter\n", " count += 1\n", " # if the resource id is longer than twenty keep going\n", " if len(str(rid)) > 20:\n", " continue\n", " # else create a new resource_id and bind it to the parent\n", " new_rid = ev.ResourceIdentifier(referred_object=parent)\n", " setattr(parent, attr, new_rid)\n", " replaced += 1\n", " \n", "\n", "print(f\"There are {count} resource ids in the catalog. {replaced} were replaced.\")\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Catalog Validation\n", "\n", "In addition to being difficult to navigate (point 1), validating and ensuring catalogs work as expected (points 2-3) is non-trivial. ObsPlus's `validate_catalog` helps by ensuring all resource_ids point to the correct objects, preferred objects are set, and preforming other sanity checks. The default event validation function in ObsPlus is a bit opinionated and was built specifically for the NIOSH style of QuakeML, but you may still find it useful. Additionally, you can create your own validation namespace and define validators for your own data/schema as described by the [validators documentation](validators.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Catalog setup\n", "Let's create a catalog that has the following problems:\n", "\n", "- resource_id on arrivals no longer point to the correct picks (only possible to break on Obspy versions <= 1.1.0)\n", "\n", "- no preferred origin/magnitudes are set\n", "\n", "ObsPlus will go through and set the resource_ids to point to the correct objects, and set all the preferred_{whatever} to the last element in the {whatever}s list (for whatever in ['magnitude', 'origin', 'focal_mechanism'])." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.211891Z", "iopub.status.busy": "2024-02-28T22:20:41.211485Z", "iopub.status.idle": "2024-02-28T22:20:41.217789Z", "shell.execute_reply": "2024-02-28T22:20:41.217227Z" } }, "outputs": [], "source": [ "\n", "# create catalog 1\n", "def create_cat1():\n", " \"\"\" a catalog with an arrival that doesn't refer to any pick \"\"\"\n", " time = obspy.UTCDateTime('2017-09-22T08:35:00')\n", " wid = ev.WaveformStreamID(network_code='UU', station_code='TMU', \n", " location_code='', channel_code='HHZ')\n", " pick = ev.Pick(time=time, phase_hint='P', waveform_id=wid)\n", " arrival = ev.Arrival(pick_id=pick.resource_id, waveform_id=wid)\n", " origin = ev.Origin(time=time, arrivals=[arrival], latitude=45.5,\n", " longitude=-111.1)\n", " description = ev.EventDescription(create_cat1.__doc__)\n", " event = ev.Event(origins=[origin], picks=[pick], \n", " event_descriptions=[description])\n", " cat = ev.Catalog(events=[event])\n", " # create a copy of the catalog. In older versions this would screw up\n", " # the resource ids, but the issue seems to be fixed now.\n", " cat.copy()\n", " return cat\n", "\n", "\n", "cat = create_cat1() \n", "event = cat[0]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.219945Z", "iopub.status.busy": "2024-02-28T22:20:41.219594Z", "iopub.status.idle": "2024-02-28T22:20:41.222366Z", "shell.execute_reply": "2024-02-28T22:20:41.221870Z" } }, "outputs": [], "source": [ "arrival = event.origins[-1].arrivals[-1]\n", "pick = event.picks[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate\n", "These two problems can be fixed in place with the validate_catalog function" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.224595Z", "iopub.status.busy": "2024-02-28T22:20:41.224233Z", "iopub.status.idle": "2024-02-28T22:20:41.278750Z", "shell.execute_reply": "2024-02-28T22:20:41.278118Z" } }, "outputs": [ { "data": { "text/plain": [ "1 Event(s) in Catalog:\n", "2017-09-22T08:35:00.000000Z | +45.500, -111.100" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obsplus.validate_catalog(cat)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.281259Z", "iopub.status.busy": "2024-02-28T22:20:41.280896Z", "iopub.status.idle": "2024-02-28T22:20:41.284455Z", "shell.execute_reply": "2024-02-28T22:20:41.283833Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Origin\n", "\t resource_id: ResourceIdentifier(id=\"smi:local/0a5ed84d-731b-4c8c-8fc0-ac94935647d3\")\n", "\t time: UTCDateTime(2017, 9, 22, 8, 35)\n", "\t longitude: -111.1\n", "\t latitude: 45.5\n", "\t ---------\n", "\t arrivals: 1 Elements\n", "Pick\n", "\t resource_id: ResourceIdentifier(id=\"smi:local/def478c2-847d-4636-80ba-5aa8c1a890e1\")\n", "\t time: UTCDateTime(2017, 9, 22, 8, 35)\n", "\t waveform_id: WaveformStreamID(network_code='UU', station_code='TMU', channel_code='HHZ', location_code='')\n", "\t phase_hint: 'P'\n" ] } ], "source": [ "print(event.preferred_origin())\n", "arrival = event.origins[0].arrivals[0]\n", "# now we will get the correct pick through the arrival object, even on older versions of obspy\n", "print(arrival.pick_id.get_referred_object())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fail fast\n", "For issues that obsplus doesn't know how to fix, an `AssertionError` will be raised. If you are generating or downloading catalogs it may be useful to run them through the validation function right away so that you know there is an issue before trying to perform any meaningful analysis.\n", "\n", "For example, if there was an arrival that didn't refer to any known pick this could be a quality issue that you might like to know about." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.286909Z", "iopub.status.busy": "2024-02-28T22:20:41.286483Z", "iopub.status.idle": "2024-02-28T22:20:41.291169Z", "shell.execute_reply": "2024-02-28T22:20:41.290585Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "something is wrong with this catalog\n" ] } ], "source": [ "# create a problem with the catalog\n", "old_pick_id = cat[0].origins[0].arrivals[0].pick_id\n", "cat[0].origins[0].arrivals[0].pick_id = None\n", "\n", "try:\n", " obsplus.validate_catalog(cat)\n", "except AssertionError as e:\n", " print('something is wrong with this catalog')\n", "\n", "# undo the problem\n", "cat[0].origins[0].arrivals[0].pick_id = old_pick_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding custom validators\n", "See the [validators section](validators.ipynb) to learn how to create your own validators. The following example shows how to use a subset of ObsPlus validators. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-02-28T22:20:41.293453Z", "iopub.status.busy": "2024-02-28T22:20:41.293114Z", "iopub.status.idle": "2024-02-28T22:20:41.325983Z", "shell.execute_reply": "2024-02-28T22:20:41.325388Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: []" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# import the validators that are desired\n", "import obspy.core.event as ev\n", "from obsplus.utils.validate import validator, validate\n", "from obsplus.events.validate import (\n", " attach_all_resource_ids, \n", " check_arrivals_pick_id,\n", " check_duplicate_picks,\n", ")\n", "\n", "# create new validator namespace\n", "namespace = '_new_test'\n", "validator(namespace, ev.Event)(attach_all_resource_ids)\n", "validator(namespace, ev.Event)(check_arrivals_pick_id)\n", "validator(namespace, ev.Event)(check_duplicate_picks)\n", "\n", "# run the new validator\n", "validate(cat, namespace)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }