obsplus.structures.dfextractor module

DataFrameExtractor class and friends.

class obsplus.structures.dfextractor.DataFrameExtractor(cls, required_columns=None, dtypes=None, pass_dataframe=True, column_funcs=None)[source]

Bases: UserDict

A class to extract dataframes from nested object trees.

Generally used to construct summary dataframes from nested object structures such as the obspy Catalog.

Parameters:
  • cls – The top-level class the extractor acts on.

  • required_columns (Optional[Sequence[str]]) – If not None, assert required columns are in dataframe, and order columns the same as required_columns, with extra columns at the end.

  • dtypes – A dict of {column name: required data type}. Can also be specified when registering extractors.

  • pass_dataframe – If True, return dataframes passed to DataFrameExtractor.__call__. This allows the DataFrameExtractor to be idempotent.

  • column_funcs (Optional[Mapping[str, Callable[[Series], Union[Series, ndarray]]]]) – Columns that are UTCDateTime objects. Will correctly handle UTCDateTime-able objects (like date-time strings, floats, etc).

exception SkipRow[source]

Bases: StopIteration

exception to raise to skip a row.

copy()[source]

Return a deep copy of the fetcher.

Return type:

DataFrameExtractor

property dtypes

return a dictionary of datatypes.

extractor(dtypes=None)[source]

Register an extractor.

An extractor is a function which extracts values from instances of a class. It should either return a dict of {column names: values} or a single value and the name of the function (minus get_ prefix if one exists) will be the column name.

Parameters:

dtypes (Optional[Dict[str, type]]) – A dict of {column name: dtype} to enforce a schema on the data.

nslc = {'channel', 'location', 'network', 'seed_id', 'station'}
register(cls)[source]

Registers an alternate constructor.

Registers an alternate constructor that is called when the input is not an instance of the expected class. This is useful, for examples to make the DataFrameExtractor idempotent or to default to various read methods in a path/str is passed.

Parameters:

cls – The dtype to register