Skip to content

Data Transforms

Source-specific transforms that convert raw vendor data into canonical PIT format.

Utilities

alphaforge.data.transforms.utils

Shared PIT transform utilities.

melt_to_pit_format(df: pd.DataFrame, entity_col: str, obs_date_col: str, asof_col: str, value_vars: Sequence[str], key_prefix: str, source_name: str) -> pd.DataFrame

Melt a wide DataFrame into PIT observation long format.

Parameters

df : Wide DataFrame with one row per (entity, date). entity_col : Column containing the entity identifier. obs_date_col : Column containing the observation date. asof_col : Column containing the as-of / publication timestamp. value_vars : Metric columns to melt into long format. key_prefix : Prefix for series_key (e.g. "cftc.cot.tff."). source_name : Value for the 'source' column.

Returns

DataFrame with columns: series_key, obs_date, asof_utc, value, source

safe_divide(numerator: pd.Series, denominator: pd.Series) -> pd.Series

Divide, returning NaN where denominator is zero.

CFTC CoT

alphaforge.data.transforms.cot_pit

Convert CFTC CoT output to PIT observation format.

Moved from positioning to alphaforge — this transform is source-specific (CFTC CSV format) and needed by any consumer of CoT data.

cot_to_pit_observations(df: pd.DataFrame, metrics: Sequence[str] | None = None, *, key_prefix: str = 'cftc.cot.tff.', source_name: str = 'cftc_cot') -> pd.DataFrame

Convert CFTC CoT fetch output to PIT observation rows.

Parameters

df : DataFrame from a CFTC CoT source fetch() with columns: date (publication date, Friday), entity_id, long_positions, short_positions, open_interest. metrics : which series to emit. Defaults to all five. key_prefix : Prefix for generated series_key values. source_name : Value for the output source column.

Returns

DataFrame with columns: series_key, obs_date, asof_utc, value, source

DTCC PPD

alphaforge.data.transforms.dtcc_pit

Convert DTCC PPD daily output to PIT observation format.

Moved from positioning to alphaforge — this transform is source-specific (DTCC PPD CSV format) and needed by any consumer of DTCC data.

dtcc_daily_to_pit_observations(df: pd.DataFrame, metrics: Sequence[str] | None = None, *, key_prefix: str = 'dtcc.ppd.daily.', source_name: str = 'dtcc_ppd') -> pd.DataFrame

Convert DTCC PPD daily fetch output to PIT observation rows.

Parameters

df : DataFrame from DTCCPPDSource.fetch(table="dtcc.ppd.daily") with columns: date, entity_id, asof_utc, trade_count, notional_sum, etc. metrics : which series to emit. Defaults to all five. key_prefix : Prefix for the emitted series keys. source_name : Value for the PIT lineage source column.

Returns

DataFrame with columns: series_key, obs_date, asof_utc, value, source