Validators

ObsPlus provides a simple method for declaring and enforcing assumptions about data. You can think of it much like pytest for data validation (i.e. it is enforced at runtime rather than in a test suite). The implementation is specifically geared towards nested tree structures (like obspy’s Catalog object), but does work for any type of object.

Warning: This is a fairly advanced feature of ObsPlus intended primarily for library authors and users with stringent data requirements. The built-in validators will meet most people’s needs.

Warning: In the future we may move much of this functionality to ObsPy as described in this proposal, but an appropriate deprecation cycle will be implemented.

Built-in Validators

Obsplus comes with a few built-in validators. See the catalog validation page for more details.

Custom Validators

The example below creates a custom validator to ensure a group of events have at least four picks and the origins have latitude and longitude defined. Namespace "_silly_test" will be used to let obsplus know these validators should be grouped together.

[1]:
import obsplus
import obspy
import obspy.core.event as ev
from obsplus.utils.validate import validator, validate

namespace = '_silly_test'
[2]:
# We simply have to decorate a callable with the `validator` decorator
# and specify the class it is to act on and its namespace
@validator(namespace, ev.Event)
def ensure_events_have_four_picks(event):
    picks = event.picks
    assert len(picks) >= 4


@validator(namespace, ev.Origin)
def ensure_origin_have_lat_lon(origin):
    assert origin.latitude is not None
    assert origin.longitude is not None

An event is created that will violate both conditions and run the validate function. it should raise an AssertionError.

[3]:
cat = obspy.read_events()

cat[0].picks = []

for origin in cat[0].origins:
    origin.latitude = None
    origin.longitude = None
[4]:
try:
    validate(cat, namespace)
except AssertionError:
    print('catalog failed validations')
catalog failed validations

A report of failures in the form of a dataframe can be created. This allows a way to identify problems with the data without haulting the execution of the code.

[5]:
report = validate(cat, namespace, report=True)
report
[5]:
validator object message passed
0 ensure_events_have_four_picks [resource_id, event_type, event_type_certainty... validator ensure_events_have_four_picks failed... False
1 ensure_events_have_four_picks [resource_id, event_type, event_type_certainty... validator ensure_events_have_four_picks failed... False
2 ensure_events_have_four_picks [resource_id, event_type, event_type_certainty... validator ensure_events_have_four_picks failed... False
3 ensure_origin_have_lat_lon [resource_id, time, time_errors, longitude, lo... validator ensure_origin_have_lat_lon failed ob... False
4 ensure_origin_have_lat_lon [resource_id, time, time_errors, longitude, lo... True
5 ensure_origin_have_lat_lon [resource_id, time, time_errors, longitude, lo... True

Notice how the object column is a reference to the python object which the validator ran on. This makes it very quick to find (and fix) problematic data.

Validators with optional arguments

Validators that take optional arguments (in the form of key word arguments) can also be created. The validate function then knows how to distribute these values to the appropriate validators.

[6]:
@validator(namespace, ev.Origin)
def ensure_lat_greater_than(origin, min_lat=None):
    if min_lat is not None:
        print(f"min latitude is {min_lat}")
        assert origin.latitude is None or origin.latitude > min_lat
[7]:
_ = validate(cat, namespace, min_lat=39, report=True)
min latitude is 39
min latitude is 39
min latitude is 39