obsplus.utils.pd module¶
Generic Utilities for Pandas
- obsplus.utils.pd.apply_funcs_to_columns(df, funcs, inplace=False)[source]¶
Apply callables to columns.
- Parameters:
df (
DataFrame
) – The input dataframe.funcs (
Optional
[Mapping
[str
,Callable
[[Series
],Union
[Series
,ndarray
]]]]) – A mapping of {column_name: function_to_apply}.inplace (
bool
) – If True, perform operation in place.
- Return type:
A new dataframe with the columns replaced with output of the function.
- obsplus.utils.pd.cast_dtypes(df, dtype=None, inplace=False)[source]¶
Cast data types for columns in dataframe, skip columns that doesn’t exist.
- The following obsplus specific datatypes are supported:
‘ops_datetime’ - call
obsplus.utils.time.to_datetime64()
on column ‘ops_timedelta` - callobsplus.utils.time.to_timedelta64()
on column
Notes
This function is different from pd.astype because it skips columns which don’t exist and handles custom obsplus dtypes.
- Parameters:
df (
DataFrame
) – Dataframedtype (
Optional
[Mapping
[str
,Union
[type
,str
]]]) – A dict of columns and datatypes.inplace – If true perform operation in place.
- Return type:
DataFrame
- obsplus.utils.pd.convert_bytestrings(df, columns, inplace=False)[source]¶
Convert byte strings columns to strings.
This removes ‘b’ and quotation marks from string columns. For some reason encode doesn’t work on data returned from hdf5, hence this approach is a bit hacky.
- Parameters:
df – The input dataframe.
columns – The names of the columns to convert to string types
inplace – If True, perform operation in place.
- obsplus.utils.pd.filter_df(df, **kwargs)[source]¶
Determine if each row of the index meets some filter requirements.
- Parameters:
df (
DataFrame
) – The input dataframe.kwargs – Any condition to check against columns of df. Can be a single value or a collection of values (to check isin on columns). Str arguments can also use unix style matching.
- Return type:
array
- Returns:
A boolean array of the same len as df indicating if each row meets the
requirements.
- obsplus.utils.pd.filter_index(index, network=None, station=None, location=None, channel=None, starttime=None, endtime=None, **kwargs)[source]¶
Filter a waveform index dataframe based on nslc codes and start/end times.
- Parameters:
index – A dataframe to filter which should have the corresponding columns to any non-None parameters used in filter.
network – A network code as defined by seed standards.
station – A station code as defined by seed standards.
location – A location code as defined by seed standards.
channel – A channel code as defined by seed standards.
starttime – The starttime of interest.
endtime – The endtime of interest.
filters. (Additional kwargs are used as) –
- Returns:
A numpy array of boolean values indicating if each row met the filter
requirements.
- obsplus.utils.pd.get_seed_id_series(df, null_codes=(None, '--', 'None', 'nan', 'null', nan), subset=None)[source]¶
Create a series of seed_ids from a dataframe with required columns.
- The seed id series contains strings of the form:
network.station.location.channel
Any “nullish” values (defined by the parameter null_codes) will be replaced with an empty string.
- Parameters:
df (
DataFrame
) –- Any Dataframe that has columns with str dtype named:
network, station, location, channel
null_codes (
Optional
[Any
]) – Codes which should be replaced with a blank string.subset (
Optional
[Sequence
[str
]]) – Used to select a subset of the full seed_id. For example, (‘network’, ‘station’) would return a series of network.station.
- Return type:
A series of concatenated seed_ids codes.
Examples
>>> import obsplus >>> import obspy >>> # Get a dataframe with only network station location channel columns >>> cat = obspy.read_inventory() >>> NSLC = ['network', 'station', 'location', 'channel'] >>> df = obsplus.stations_to_df(cat)[NSLC] >>> out = get_seed_id_series(df) >>> # Get a series of network.station >>> net_sta = get_seed_id_series(df, subset=('network', 'station'))
- obsplus.utils.pd.get_waveforms_bulk_args(df, time_dtype='utcdatetime')[source]¶
Get the inputs to a get_waveforms_bulk from a dataframe.
- Parameters:
df (
DataFrame
) –- A dataframe with required columns:
network, station, location, channel, starttime, endtime
time_dtype (
str
) – Dtype to use for the starttime and endtime
- Return type:
A list of tuples [(network, station, location, channel, starttime, endtime),]
- obsplus.utils.pd.join_str_columns(df, columns, join_char='.')[source]¶
Join string columns on a dataframe together.
- Parameters:
df (
DataFrame
) – The input dataframe with columns listed in columns parameter.columns (
Sequence
[str
]) – The columns to be joined. Must be part of df.join_char (
str
) – The string to join the columns together.
- Return type:
Series
- obsplus.utils.pd.order_columns(df, required_columns, drop_columns=False, fill_missing=True)[source]¶
Order a dataframe’s columns and ensure it has required columns.
- Parameters:
df (
DataFrame
) – The input dataframe.required_columns (
Sequence
) – A sequence that contains the column names.drop_columns – If True drop columns not in required_columns.
fill_missing – If True, create missing required columns and fill with nullish values.
- Return type:
pd.DataFrame