Skip to content

Package

alphaforge

AlphaForge: general-purpose data/feature management for financial ML.

CFTCCoTSource

Bases: _BaseCFTCCoTSource

CFTC Commitments of Traders: Traders in Financial Futures (futures only).

CFTCDisaggregatedCoTSource

Bases: _BaseCFTCCoTSource

CFTC Commitments of Traders: disaggregated commodity futures.

CalendarDay dataclass

Bases: ReleaseRule

Published on a specific calendar day of the anchor month.

CustomRule dataclass

Bases: ReleaseRule

Free-text description for schedules that are not yet modeled.

DataContext dataclass

Runtime wiring for data sources, calendars, and optional PIT access.

Parameters

sources : Mapping[str, DataSource] Legacy data source mapping kept for backward compatibility and raw-loader workflows. calendars : Mapping[str, TradingCalendar] Trading calendar lookup. store : Store Backing store for persistence. adapters : dict[str, SourceAdapter] | None Canonical public data-loading surface keyed by source_name (e.g. "cftc"). default_sources : dict[str, str] | None Maps dataset → default source_name for canonical adapter routing (e.g. {"cot.tff": "cftc"}).

fetch(query: Query, *, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> 'FetchResult'

Canonical fetch path: resolve an adapter and delegate.

fetch_many(queries: list[Query], *, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> list['FetchResult']

Canonical batch fetch path, grouped by resolved adapter.

fetch_panel(source: str, q: Query) -> PanelFrame

Legacy panel-building path for DataSource-backed loaders.

from_adapters(*adapters: 'SourceAdapter', calendars: Mapping[str, TradingCalendar] | None = None, store: Store | None = None, universe: Optional[Universe] = None, entity_meta: Optional[EntityMetadata] = None, default_sources: Optional[dict[str, str]] = None) -> 'DataContext' classmethod

Build a DataContext from adapters without manual mapping boilerplate.

load(dataset: str, *, columns: Sequence[str], start: Optional[pd.Timestamp | str] = None, end: Optional[pd.Timestamp | str] = None, entities: Optional[Sequence[str]] = None, asof: Optional[pd.Timestamp | str] = None, vintage: VintageMode = 'latest', vintage_id: Optional[str] = None, grid: Optional[str] = None, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> 'FetchResult'

Happy-path source load without explicit Query construction.

prefetch(dataset: str, *, source: Optional[str] = None, asof_range: tuple[date, date] | None = None) -> 'CacheManifest'

Warm cache for a dataset via the resolved adapter.

DuckDBParquetStore dataclass

Store FeatureFrames as Parquet on disk, indexed by DuckDB.

Layout

root/ alphaforge.duckdb frames//X.parquet frames//catalog.parquet frames//meta.json states//payload.bin states//meta.json

EntityEntry dataclass

Generic entity metadata entry.

EntityMetadata dataclass

Entity metadata table: index=entity_id, columns like sector/country/etc.

EntityRegistry

Generic registry mapping entity names to metadata.

FRBTermStructureBenchmarkSource

Bases: PublicWebSourceBase

Federal Reserve Board Kim-Wright three-factor benchmark series.

FREDDataSource

Bases: DataSource

Legacy/raw-loader FRED DataSource kept for compatibility.

New code should prefer alphaforge.data.sources.fred.FREDSourceAdapter via DataContext.fetch(...).

fetch(q: Query) -> pd.DataFrame

Fetch data from FRED based on the given query.

schemas() -> Dict[str, TableSchema]

Returns a dictionary of table schemas. For FRED, we can think of each series as a table. However, for simplicity, we'll define a single generic schema for all series.

FeatureFrame dataclass

X + catalog + meta (+ optional artifacts).

set_tags(tags: Dict[str, Any], overwrite: bool = True) -> FeatureFrame

Broadcast tags to all rows in catalog: - catalog['tags'] holds the dict (in-memory convenience) - catalog['tags_json'] holds the JSON string (for persistence) If overwrite=False, merges with any existing dict; request tags override.

FirstRateBarsAdapter

Bases: SourceAdapterBase

Read local First Rate 5-minute text files through the SourceAdapter API.

FirstRateBarsConfig dataclass

Resolved configuration for local First Rate raw 5-minute bars.

FirstRateFuturesAdapter

Bases: SourceAdapterBase

Read persisted local futures artifacts via the SourceAdapter protocol.

FirstRateFuturesConfig dataclass

Resolved futures loader configuration.

Resolution order: 1. explicit arguments 2. YAML config file entries 3. environment variables

FirstRateFuturesLoader

Ingest a local directory of First Rate Data contract files.

FixedLagMonths dataclass

Bases: ReleaseRule

Published a fixed number of months after the observation month.

LagReturnsTemplate

Lagged return features from a canonical market-price dataset.

MOFJGBYieldCurveSource

Bases: PublicWebSourceBase

Daily JGB constant-maturity par yields from the MOF website.

fetch_wide(q: Query | None = None) -> pd.DataFrame

Return a date × tenor DataFrame (yields in percent).

This is the natural format for yield-curve analysis.

MissingnessReason

Bases: str, Enum

Why a point-in-time panel cell is missing.

NthBusinessDay dataclass

Bases: ReleaseRule

Published on the n-th US business day of an anchor month.

NthWeekday dataclass

Bases: ReleaseRule

Published on the n-th occurrence of a weekday in the anchor month.

PITAccessor dataclass

open(root: str | Path) -> 'PITAccessor' classmethod

Open a PIT accessor from a DuckDBParquetStore root.

PITCausalityError

Bases: PITValidationError

Raised when transform execution would violate PIT causality.

PITContractError

Bases: PITError

Raised when API contracts are violated.

PITDataSource dataclass

Bases: DataSource

Expose PIT rows through the legacy/raw-loader DataSource contract.

PITEngineError

Bases: PITError

Raised when requested transform engine cannot be satisfied.

PITError

Bases: Exception

Base class for PIT-related errors.

PITExperimentalFeatureError

Bases: PITContractError

Raised when an experimental PIT feature is used without opt-in.

PITTransformSpec dataclass

sanitized_params() -> dict[str, Any]

Params suitable for hashing and lineage serialization.

PITUnsupportedOperationError

Bases: PITContractError

Raised when an unsupported PIT operation is requested.

PITValidationError

Bases: PITError

Raised when PIT data fails validation.

PITValidationWarning

Bases: UserWarning

Warning emitted for non-blocking PIT validation outcomes.

PanelFrame dataclass

Canonical panel: MultiIndex (ts_utc, entity_id). Dates stored tz-aware (UTC).

PhiladelphiaSPFMeanLevelSource

Bases: PublicWebSourceBase

Historical mean SPF forecasts from the Philadelphia Fed.

QuarterlyRelease dataclass

Bases: ReleaseRule

GDP-style multi-release schedule.

RefRevisionQuery dataclass

Ref-period revision request for :meth:alphaforge.pit.accessor.PITAccessor.revisions_ref.

RefSnapshotQuery dataclass

Ref-period snapshot request for :meth:alphaforge.pit.accessor.PITAccessor.snapshot_ref.

ReleaseLagPolicy dataclass

Policy for lag-adjusting effective asof timestamps per series.

ReleaseRule dataclass

Bases: ABC

Base class for publication schedule rules.

expected_release_date(obs_date: date, release_number: int | None = None) -> date abstractmethod

Return the expected publication date for an observation date.

from_dict(payload: dict[str, Any]) -> 'ReleaseRule' staticmethod

Reconstruct a rule from a YAML-style mapping.

to_dict() -> dict[str, Any]

Serialize the rule to a YAML-friendly mapping.

RollingVolatilityTemplate

Rolling realized-volatility features from canonical market-price data.

ShortRateDataset dataclass

Container for a constructed short-rate research dataset.

TradingCalendar dataclass

Minimal business-day calendar.

  • tz is the calendar's local timezone (e.g. America/New_York for XNYS).
  • session labels are returned as tz-aware UTC instants at 00:00 UTC by default. (They are labels; open/close time helpers are provided below.)

session_close_utc(session_label: pd.Timestamp | str) -> pd.Timestamp

Return session close as tz-aware UTC Timestamp (default 16:00 local -> UTC).

session_open_utc(session_label: pd.Timestamp | str) -> pd.Timestamp

Return session open as tz-aware UTC Timestamp.

session_label may be a session label returned by sessions() (tz-aware UTC midnight) or a date-like string / Timestamp.

trading_minutes_utc(start_utc: pd.Timestamp, end_utc: pd.Timestamp, freq: str = '5min') -> pd.DatetimeIndex

Return tz-aware UTC DatetimeIndex of trading minutes between start_utc and end_utc.

Generates minutes during each trading session (09:30..16:00 local) with frequency freq.

Universe dataclass

Time-varying membership: index=date, columns=entity_id, values=bool.

WeeklyRelease dataclass

Bases: ReleaseRule

Weekly series released on a fixed weekday after a lag.

align_panel(panel: PanelFrame, schema: TableSchema, grid: Grid, align: AlignSpec, asof: pd.Timestamp | None = None) -> AlignedPanel

Align a PanelFrame to a target Grid and produce missingness typing.

MVP semantics: - structural missingness: NO_UPDATE_EXPECTED for low-freq between observation dates - abnormal missingness: * daily series: missing -> MISSING_UNKNOWN * low-freq series: missing on an observation date -> TEMPORARY_OUTAGE - NOT_YET_RELEASED: reserved for true vintage-aware sources (to add next)

build_first_rate_bars_context(config: FirstRateBarsConfig, *, calendars: Mapping[str, TradingCalendar] | None = None, store: Store | None = None, source_name: str = 'first_rate_bars') -> DataContext

Build a DataContext for locally mounted First Rate 5-minute bar directories.

classify_missingness(*, obs_date: date, asof_date: date, series_frequency: str, panel_frequency: str = 'M', release_rule: ReleaseRule | None = None, publication_lag_months: int | None = None, realized_release_date: date | None = None) -> MissingnessReason

Classify why a cell is missing at a given as-of date.

coerce_ref_revision_query(query: RefRevisionQuery | Mapping[str, Any]) -> RefRevisionQuery

Normalize a ref-period revision query into a validated typed object.

coerce_ref_snapshot_query(query: RefSnapshotQuery | Mapping[str, Any]) -> RefSnapshotQuery

Normalize a ref-period snapshot query into a validated typed object.

default_public_web_sources() -> dict[str, DataSource]

Construct the default public-web source registry.

effective_asof(asof: pd.Timestamp, series_key: str, policy: ReleaseLagPolicy) -> pd.Timestamp

Compute effective asof after lag/cutoff/embargo policy adjustments.

materialize(ctx: DataContext, template: FeatureTemplate, realization: FeatureRealization, policy: MaterializationPolicy, lineage: Optional[LineageGraph] = None, fit_slice: Optional[SliceSpec] = None) -> FeatureFrame

Materialize a FeatureRealization with caching and (optional) stateful fit.

pit_leakage_report(df: pd.DataFrame, ts_col: str = 'obs_date', asof_col: str = 'asof_utc') -> pd.DataFrame

Structured leakage and PIT hygiene diagnostics for PIT-shaped data.

revision_event_stream(pit: PITAccessor, series_key: str, *, start: pd.Timestamp | None = None, end: pd.Timestamp | None = None, min_abs_change: float = 0.0) -> pd.DataFrame

Return revision events across all obs_date timelines.

revision_volatility(pit: PITAccessor, series_key: str, *, start: pd.Timestamp | None = None, end: pd.Timestamp | None = None) -> pd.Series

Compute standard deviation of revision deltas by obs_date.