obsplus.utils.pd module¶

Generic Utilities for Pandas

obsplus.utils.pd.apply_funcs_to_columns(df, funcs, inplace=False)[source]¶

Apply callables to columns.

Parameters:

df (DataFrame) – The input dataframe.
funcs (Optional[Mapping[str, Callable[[Series], Union[Series, ndarray]]]]) – A mapping of {column_name: function_to_apply}.
inplace (bool) – If True, perform operation in place.

Return type:

A new dataframe with the columns replaced with output of the function.

obsplus.utils.pd.cast_dtypes(df, dtype=None, inplace=False)[source]¶

Cast data types for columns in dataframe, skip columns that doesn’t exist.

The following obsplus specific datatypes are supported:: ‘ops_datetime’ - call obsplus.utils.time.to_datetime64() on column ‘ops_timedelta` - call obsplus.utils.time.to_timedelta64() on column

Notes

This function is different from pd.astype because it skips columns which don’t exist and handles custom obsplus dtypes.

Parameters:

df (DataFrame) – Dataframe
dtype (Optional[Mapping[str, Union[type, str]]]) – A dict of columns and datatypes.
inplace – If true perform operation in place.

Return type:

DataFrame

obsplus.utils.pd.convert_bytestrings(df, columns, inplace=False)[source]¶

Convert byte strings columns to strings.

This removes ‘b’ and quotation marks from string columns. For some reason encode doesn’t work on data returned from hdf5, hence this approach is a bit hacky.

Parameters:

df – The input dataframe.
columns – The names of the columns to convert to string types
inplace – If True, perform operation in place.

obsplus.utils.pd.filter_df(df, **kwargs)[source]¶

Determine if each row of the index meets some filter requirements.

Parameters:

df (DataFrame) – The input dataframe.
kwargs – Any condition to check against columns of df. Can be a single value or a collection of values (to check isin on columns). Str arguments can also use unix style matching.

Return type:

array

Returns:

A boolean array of the same len as df indicating if each row meets the
requirements.

obsplus.utils.pd.filter_index(index, network=None, station=None, location=None, channel=None, starttime=None, endtime=None, **kwargs)[source]¶

Filter a waveform index dataframe based on nslc codes and start/end times.

Parameters:

index – A dataframe to filter which should have the corresponding columns to any non-None parameters used in filter.
network – A network code as defined by seed standards.
station – A station code as defined by seed standards.
location – A location code as defined by seed standards.
channel – A channel code as defined by seed standards.
starttime – The starttime of interest.
endtime – The endtime of interest.
filters. (Additional kwargs are used as) –

Returns:

A numpy array of boolean values indicating if each row met the filter
requirements.

obsplus.utils.pd.get_regex(seed_str)[source]¶: Compile, and cache regex for str queries.

obsplus.utils.pd.get_seed_id_series(df, null_codes=(None, '--', 'None', 'nan', 'null', nan), subset=None)[source]¶

Create a series of seed_ids from a dataframe with required columns.

The seed id series contains strings of the form:: network.station.location.channel

Any “nullish” values (defined by the parameter null_codes) will be replaced with an empty string.

Parameters:

df (DataFrame) –

Any Dataframe that has columns with str dtype named:
network, station, location, channel
null_codes (Optional[Any]) – Codes which should be replaced with a blank string.
subset (Optional[Sequence[str]]) – Used to select a subset of the full seed_id. For example, (‘network’, ‘station’) would return a series of network.station.

Return type:

A series of concatenated seed_ids codes.

Examples

>>> import obsplus
>>> import obspy
>>> # Get a dataframe with only network station location channel columns
>>> cat = obspy.read_inventory()
>>> NSLC = ['network', 'station', 'location', 'channel']
>>> df = obsplus.stations_to_df(cat)[NSLC]
>>> out = get_seed_id_series(df)
>>> # Get a series of network.station
>>> net_sta = get_seed_id_series(df, subset=('network', 'station'))

obsplus.utils.pd.get_waveforms_bulk_args(df, time_dtype='utcdatetime')[source]¶

Get the inputs to a get_waveforms_bulk from a dataframe.

Parameters:

df (DataFrame) –

A dataframe with required columns:
network, station, location, channel, starttime, endtime
time_dtype (str) – Dtype to use for the starttime and endtime

Return type:

A list of tuples [(network, station, location, channel, starttime, endtime),]

obsplus.utils.pd.join_str_columns(df, columns, join_char='.')[source]¶

Join string columns on a dataframe together.

Parameters:

df (DataFrame) – The input dataframe with columns listed in columns parameter.
columns (Sequence[str]) – The columns to be joined. Must be part of df.
join_char (str) – The string to join the columns together.

Return type:

Series

obsplus.utils.pd.order_columns(df, required_columns, drop_columns=False, fill_missing=True)[source]¶

Order a dataframe’s columns and ensure it has required columns.

Parameters:

df (DataFrame) – The input dataframe.
required_columns (Sequence) – A sequence that contains the column names.
drop_columns – If True drop columns not in required_columns.
fill_missing – If True, create missing required columns and fill with nullish values.

Return type:

pd.DataFrame

obsplus.utils.pd.replace_or_swallow(df, replace)[source]¶

Replace values in a dataframe with new values.

Parameters:

df (DataFrame) – The dataframe for which the values will be replaced
replace (dict) – A dict of {old_value: new_values}

Return type:

DataFrame

obsplus.utils.pd module¶

Table of Contents

This Page