{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Validators\n", "\n", "ObsPlus provides a simple method for declaring and enforcing assumptions about data. You can think of it much like [pytest](https://docs.pytest.org/en/latest/) for data validation (i.e. it is enforced at runtime rather than in a test suite). The implementation is specifically geared towards nested tree structures (like obspy's `Catalog` object), but does work for any type of object. \n", "\n", "
\n", "\n", "**Warning**: This is a fairly advanced feature of ObsPlus intended primarily for library authors and users with stringent data requirements. The built-in validators will meet most people's needs. \n", "
\n", "\n", "
\n", "\n", "**Warning**: In the future we may move much of this functionality to ObsPy as described in [this proposal](https://github.com/obspy/obspy/issues/2154), but an appropriate deprecation cycle will be implemented.\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Built-in Validators\n", "Obsplus comes with a few built-in validators. See the [catalog validation](catalog_validation.ipynb) page for more details.\n", "\n", "## Custom Validators\n", "The example below creates a custom validator to ensure a group of events have at least four picks and the origins have latitude and longitude defined. Namespace `\"_silly_test\"` will be used to let obsplus know these validators should be grouped together. \n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:41.080108Z", "iopub.status.busy": "2025-01-09T18:05:41.079929Z", "iopub.status.idle": "2025-01-09T18:05:42.104769Z", "shell.execute_reply": "2025-01-09T18:05:42.104231Z" } }, "outputs": [], "source": [ "import obspy\n", "import obspy.core.event as ev\n", "\n", "from obsplus.utils.validate import validate, validator\n", "\n", "namespace = \"_silly_test\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.107248Z", "iopub.status.busy": "2025-01-09T18:05:42.106712Z", "iopub.status.idle": "2025-01-09T18:05:42.110555Z", "shell.execute_reply": "2025-01-09T18:05:42.110051Z" } }, "outputs": [], "source": [ "# We simply have to decorate a callable with the `validator` decorator\n", "# and specify the class it is to act on and its namespace\n", "@validator(namespace, ev.Event)\n", "def ensure_events_have_four_picks(event):\n", " picks = event.picks\n", " assert len(picks) >= 4\n", "\n", "\n", "@validator(namespace, ev.Origin)\n", "def ensure_origin_have_lat_lon(origin):\n", " assert origin.latitude is not None\n", " assert origin.longitude is not None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An event is created that will violate both conditions and run the `validate` function. it should raise an `AssertionError`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.112391Z", "iopub.status.busy": "2025-01-09T18:05:42.112044Z", "iopub.status.idle": "2025-01-09T18:05:42.199984Z", "shell.execute_reply": "2025-01-09T18:05:42.199392Z" } }, "outputs": [], "source": [ "cat = obspy.read_events()\n", "\n", "cat[0].picks = []\n", "\n", "for origin in cat[0].origins:\n", " origin.latitude = None\n", " origin.longitude = None" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.201934Z", "iopub.status.busy": "2025-01-09T18:05:42.201586Z", "iopub.status.idle": "2025-01-09T18:05:42.204899Z", "shell.execute_reply": "2025-01-09T18:05:42.204455Z" } }, "outputs": [], "source": [ "try:\n", " validate(cat, namespace)\n", "except AssertionError:\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A report of failures in the form of a dataframe can be created. This allows a way to identify problems with the data without haulting the execution of the code." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.206697Z", "iopub.status.busy": "2025-01-09T18:05:42.206347Z", "iopub.status.idle": "2025-01-09T18:05:42.218985Z", "shell.execute_reply": "2025-01-09T18:05:42.218336Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
validatorobjectmessagepassed
0ensure_events_have_four_picks[resource_id, event_type, event_type_certainty...validator ensure_events_have_four_picks failed...False
1ensure_events_have_four_picks[resource_id, event_type, event_type_certainty...validator ensure_events_have_four_picks failed...False
2ensure_events_have_four_picks[resource_id, event_type, event_type_certainty...validator ensure_events_have_four_picks failed...False
3ensure_origin_have_lat_lon[resource_id, time, time_errors, longitude, lo...validator ensure_origin_have_lat_lon failed ob...False
4ensure_origin_have_lat_lon[resource_id, time, time_errors, longitude, lo...True
5ensure_origin_have_lat_lon[resource_id, time, time_errors, longitude, lo...True
\n", "
" ], "text/plain": [ " validator \\\n", "0 ensure_events_have_four_picks \n", "1 ensure_events_have_four_picks \n", "2 ensure_events_have_four_picks \n", "3 ensure_origin_have_lat_lon \n", "4 ensure_origin_have_lat_lon \n", "5 ensure_origin_have_lat_lon \n", "\n", " object \\\n", "0 [resource_id, event_type, event_type_certainty... \n", "1 [resource_id, event_type, event_type_certainty... \n", "2 [resource_id, event_type, event_type_certainty... \n", "3 [resource_id, time, time_errors, longitude, lo... \n", "4 [resource_id, time, time_errors, longitude, lo... \n", "5 [resource_id, time, time_errors, longitude, lo... \n", "\n", " message passed \n", "0 validator ensure_events_have_four_picks failed... False \n", "1 validator ensure_events_have_four_picks failed... False \n", "2 validator ensure_events_have_four_picks failed... False \n", "3 validator ensure_origin_have_lat_lon failed ob... False \n", "4 True \n", "5 True " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "report = validate(cat, namespace, report=True)\n", "report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the `object` column is a reference to the python object which the validator ran on. This makes it very quick to find (and fix) problematic data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Validators with optional arguments\n", "Validators that take optional arguments (in the form of key word arguments) can also be created. The `validate` function then knows how to distribute these values to the appropriate validators." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.246599Z", "iopub.status.busy": "2025-01-09T18:05:42.246180Z", "iopub.status.idle": "2025-01-09T18:05:42.249186Z", "shell.execute_reply": "2025-01-09T18:05:42.248706Z" } }, "outputs": [], "source": [ "@validator(namespace, ev.Origin)\n", "def ensure_lat_greater_than(origin, min_lat=None):\n", " if min_lat is not None:\n", " assert origin.latitude is None or origin.latitude > min_lat" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2025-01-09T18:05:42.251044Z", "iopub.status.busy": "2025-01-09T18:05:42.250704Z", "iopub.status.idle": "2025-01-09T18:05:42.255445Z", "shell.execute_reply": "2025-01-09T18:05:42.254950Z" } }, "outputs": [], "source": [ "_ = validate(cat, namespace, min_lat=39, report=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 4 }