Data Transforms¶
Source-specific transforms that convert raw vendor data into canonical PIT format.
Utilities¶
alphaforge.data.transforms.utils
¶
Shared PIT transform utilities.
melt_to_pit_format(df: pd.DataFrame, entity_col: str, obs_date_col: str, asof_col: str, value_vars: Sequence[str], key_prefix: str, source_name: str) -> pd.DataFrame
¶
Melt a wide DataFrame into PIT observation long format.
Parameters¶
df : Wide DataFrame with one row per (entity, date). entity_col : Column containing the entity identifier. obs_date_col : Column containing the observation date. asof_col : Column containing the as-of / publication timestamp. value_vars : Metric columns to melt into long format. key_prefix : Prefix for series_key (e.g. "cftc.cot.tff."). source_name : Value for the 'source' column.
Returns¶
DataFrame with columns: series_key, obs_date, asof_utc, value, source
safe_divide(numerator: pd.Series, denominator: pd.Series) -> pd.Series
¶
Divide, returning NaN where denominator is zero.
CFTC CoT¶
alphaforge.data.transforms.cot_pit
¶
Convert CFTC CoT output to PIT observation format.
Moved from positioning to alphaforge — this transform is source-specific (CFTC CSV format) and needed by any consumer of CoT data.
cot_to_pit_observations(df: pd.DataFrame, metrics: Sequence[str] | None = None, *, key_prefix: str = 'cftc.cot.tff.', source_name: str = 'cftc_cot') -> pd.DataFrame
¶
Convert CFTC CoT fetch output to PIT observation rows.
Parameters¶
df : DataFrame from a CFTC CoT source fetch() with columns:
date (publication date, Friday), entity_id, long_positions,
short_positions, open_interest.
metrics : which series to emit. Defaults to all five.
key_prefix : Prefix for generated series_key values.
source_name : Value for the output source column.
Returns¶
DataFrame with columns: series_key, obs_date, asof_utc, value, source
DTCC PPD¶
alphaforge.data.transforms.dtcc_pit
¶
Convert DTCC PPD daily output to PIT observation format.
Moved from positioning to alphaforge — this transform is source-specific (DTCC PPD CSV format) and needed by any consumer of DTCC data.
dtcc_daily_to_pit_observations(df: pd.DataFrame, metrics: Sequence[str] | None = None, *, key_prefix: str = 'dtcc.ppd.daily.', source_name: str = 'dtcc_ppd') -> pd.DataFrame
¶
Convert DTCC PPD daily fetch output to PIT observation rows.
Parameters¶
df : DataFrame from DTCCPPDSource.fetch(table="dtcc.ppd.daily") with
columns: date, entity_id, asof_utc, trade_count, notional_sum, etc.
metrics : which series to emit. Defaults to all five.
key_prefix : Prefix for the emitted series keys.
source_name : Value for the PIT lineage source column.
Returns¶
DataFrame with columns: series_key, obs_date, asof_utc, value, source