Working with Catalogs

Obspy’s event representation is based on the FDSN QuakeML standard, which is very comprehensive, and arguably the best standard available. However, It can be a bit difficult to work with the Catalog object (and friends) for a few reasons:

1. Often the desired data is deeply nested and hard to aggregate

2. Identifying data relations depends on the complex behavior of Obspy's `ResourceIdentifier`

3. Preferred objects (eg origin, magnitude, etc.) are often not set

ObsPlus tries to solve all of these problems. The first is addressed by the DataFrame Extractor and other tree transversal tools. The second and third are addressed by a collection of catalog validators

Catalog Navigation

If you only need to extract information contained in a catalog there are various examples of creating dataframes from different parts of the catalog in the events_to_pandas section.

If the tree structure needs to be maintained, the yield_object_parent_attr function can be very useful. For example, let’s assume we want to sabotage a seismic analyst by adding noise to his/her pick times. We could do that like so:

[1]:
#
import numpy as np
import obspy
import obspy.core.event as ev
import obsplus
from obsplus.utils import yield_obj_parent_attr

cat = obsplus.load_dataset('crandall_test').event_client.get_events()
[2]:
# iterate picks, add noise to pick time (-1 to 1 seconds, uniform dist.)
for pick, parent, attr in yield_obj_parent_attr(cat, cls=ev.Pick):
    pick.time = pick.time + (np.random.random() - 0.5) * 2.0

Or, a less malicious example, perhaps we want to count all the ResourceIdentifier instances and ensure they have some minimum length. If they don’t we want to regenerate them.

[3]:
count = 0
replaced = 0

for rid, parent, attr in yield_obj_parent_attr(cat, cls=ev.ResourceIdentifier):
    # increment counter
    count += 1
    # if the resource id is longer than twenty keep going
    if len(str(rid)) > 20:
        continue
    # else create a new resource_id and bind it to the parent
    new_rid = ev.ResourceIdentifier(referred_object=parent)
    setattr(parent, attr, new_rid)
    replaced += 1


print(f"There are {count} resource ids in the catalog. {replaced} were replaced.")
There are 2807 resource ids in the catalog. 1407 were replaced.

Catalog Validation

In addition to being difficult to navigate (point 1), validating and ensuring catalogs work as expected (points 2-3) is non-trivial. ObsPlus’s validate_catalog helps by ensuring all resource_ids point to the correct objects, preferred objects are set, and preforming other sanity checks. The default event validation function in ObsPlus is a bit opinionated and was built specifically for the NIOSH style of QuakeML, but you may still find it useful. Additionally, you can create your own validation namespace and define validators for your own data/schema as described by the validators documentation.

Catalog setup

Let’s create a catalog that has the following problems:

  • resource_id on arrivals no longer point to the correct picks (only possible to break on Obspy versions <= 1.1.0)

  • no preferred origin/magnitudes are set

ObsPlus will go through and set the resource_ids to point to the correct objects, and set all the preferred_{whatever} to the last element in the {whatever}s list (for whatever in [‘magnitude’, ‘origin’, ‘focal_mechanism’]).

[4]:

# create catalog 1 def create_cat1(): """ a catalog with an arrival that doesn't refer to any pick """ time = obspy.UTCDateTime('2017-09-22T08:35:00') wid = ev.WaveformStreamID(network_code='UU', station_code='TMU', location_code='', channel_code='HHZ') pick = ev.Pick(time=time, phase_hint='P', waveform_id=wid) arrival = ev.Arrival(pick_id=pick.resource_id, waveform_id=wid) origin = ev.Origin(time=time, arrivals=[arrival], latitude=45.5, longitude=-111.1) description = ev.EventDescription(create_cat1.__doc__) event = ev.Event(origins=[origin], picks=[pick], event_descriptions=[description]) cat = ev.Catalog(events=[event]) # create a copy of the catalog. In older versions this would screw up # the resource ids, but the issue seems to be fixed now. cat.copy() return cat cat = create_cat1() event = cat[0]
[5]:
arrival = event.origins[-1].arrivals[-1]
pick = event.picks[-1]

Validate

These two problems can be fixed in place with the validate_catalog function

[6]:
obsplus.validate_catalog(cat)
[6]:
1 Event(s) in Catalog:
2017-09-22T08:35:00.000000Z | +45.500, -111.100
[7]:
print(event.preferred_origin())
arrival = event.origins[0].arrivals[0]
# now we will get the correct pick through the arrival object, even on older versions of obspy
print(arrival.pick_id.get_referred_object())
Origin
         resource_id: ResourceIdentifier(id="smi:local/0a5ed84d-731b-4c8c-8fc0-ac94935647d3")
                time: UTCDateTime(2017, 9, 22, 8, 35)
           longitude: -111.1
            latitude: 45.5
                ---------
            arrivals: 1 Elements
Pick
         resource_id: ResourceIdentifier(id="smi:local/def478c2-847d-4636-80ba-5aa8c1a890e1")
                time: UTCDateTime(2017, 9, 22, 8, 35)
         waveform_id: WaveformStreamID(network_code='UU', station_code='TMU', channel_code='HHZ', location_code='')
          phase_hint: 'P'

Fail fast

For issues that obsplus doesn’t know how to fix, an AssertionError will be raised. If you are generating or downloading catalogs it may be useful to run them through the validation function right away so that you know there is an issue before trying to perform any meaningful analysis.

For example, if there was an arrival that didn’t refer to any known pick this could be a quality issue that you might like to know about.

[8]:
# create a problem with the catalog
old_pick_id = cat[0].origins[0].arrivals[0].pick_id
cat[0].origins[0].arrivals[0].pick_id = None

try:
    obsplus.validate_catalog(cat)
except AssertionError as e:
    print('something is wrong with this catalog')

# undo the problem
cat[0].origins[0].arrivals[0].pick_id = old_pick_id
something is wrong with this catalog

Adding custom validators

See the validators section to learn how to create your own validators. The following example shows how to use a subset of ObsPlus validators.

[9]:
# import the validators that are desired
import obspy.core.event as ev
from obsplus.utils.validate import validator, validate
from obsplus.events.validate import (
    attach_all_resource_ids,
    check_arrivals_pick_id,
    check_duplicate_picks,
)

# create new validator namespace
namespace = '_new_test'
validator(namespace, ev.Event)(attach_all_resource_ids)
validator(namespace, ev.Event)(check_arrivals_pick_id)
validator(namespace, ev.Event)(check_duplicate_picks)

# run the new validator
validate(cat, namespace)

[9]: