Validators¶
ObsPlus provides a simple method for declaring and enforcing assumptions about data. You can think of it much like pytest for data validation (i.e. it is enforced at runtime rather than in a test suite). The implementation is specifically geared towards nested tree structures (like obspy’s Catalog
object), but does work for any type of object.
Warning: This is a fairly advanced feature of ObsPlus intended primarily for library authors and users with stringent data requirements. The built-in validators will meet most people’s needs.
Warning: In the future we may move much of this functionality to ObsPy as described in this proposal, but an appropriate deprecation cycle will be implemented.
Built-in Validators¶
Obsplus comes with a few built-in validators. See the catalog validation page for more details.
Custom Validators¶
The example below creates a custom validator to ensure a group of events have at least four picks and the origins have latitude and longitude defined. Namespace "_silly_test"
will be used to let obsplus know these validators should be grouped together.
[1]:
import obspy
import obspy.core.event as ev
from obsplus.utils.validate import validate, validator
namespace = "_silly_test"
[2]:
# We simply have to decorate a callable with the `validator` decorator
# and specify the class it is to act on and its namespace
@validator(namespace, ev.Event)
def ensure_events_have_four_picks(event):
picks = event.picks
assert len(picks) >= 4
@validator(namespace, ev.Origin)
def ensure_origin_have_lat_lon(origin):
assert origin.latitude is not None
assert origin.longitude is not None
An event is created that will violate both conditions and run the validate
function. it should raise an AssertionError
.
[3]:
cat = obspy.read_events()
cat[0].picks = []
for origin in cat[0].origins:
origin.latitude = None
origin.longitude = None
[4]:
try:
validate(cat, namespace)
except AssertionError:
pass
A report of failures in the form of a dataframe can be created. This allows a way to identify problems with the data without haulting the execution of the code.
[5]:
report = validate(cat, namespace, report=True)
report
[5]:
validator | object | message | passed | |
---|---|---|---|---|
0 | ensure_events_have_four_picks | [resource_id, event_type, event_type_certainty... | validator ensure_events_have_four_picks failed... | False |
1 | ensure_events_have_four_picks | [resource_id, event_type, event_type_certainty... | validator ensure_events_have_four_picks failed... | False |
2 | ensure_events_have_four_picks | [resource_id, event_type, event_type_certainty... | validator ensure_events_have_four_picks failed... | False |
3 | ensure_origin_have_lat_lon | [resource_id, time, time_errors, longitude, lo... | validator ensure_origin_have_lat_lon failed ob... | False |
4 | ensure_origin_have_lat_lon | [resource_id, time, time_errors, longitude, lo... | True | |
5 | ensure_origin_have_lat_lon | [resource_id, time, time_errors, longitude, lo... | True |
Notice how the object
column is a reference to the python object which the validator ran on. This makes it very quick to find (and fix) problematic data.
Validators with optional arguments¶
Validators that take optional arguments (in the form of key word arguments) can also be created. The validate
function then knows how to distribute these values to the appropriate validators.
[6]:
@validator(namespace, ev.Origin)
def ensure_lat_greater_than(origin, min_lat=None):
if min_lat is not None:
assert origin.latitude is None or origin.latitude > min_lat
[7]:
_ = validate(cat, namespace, min_lat=39, report=True)