Package¶
alphaforge
¶
AlphaForge: general-purpose data/feature management for financial ML.
CFTCCoTSource
¶
Bases: _BaseCFTCCoTSource
CFTC Commitments of Traders: Traders in Financial Futures (futures only).
CFTCDisaggregatedCoTSource
¶
Bases: _BaseCFTCCoTSource
CFTC Commitments of Traders: disaggregated commodity futures.
CalendarDay
dataclass
¶
CustomRule
dataclass
¶
DataContext
dataclass
¶
Runtime wiring for data sources, calendars, and optional PIT access.
Parameters¶
sources : Mapping[str, DataSource]
Legacy data source mapping kept for backward compatibility and
raw-loader workflows.
calendars : Mapping[str, TradingCalendar]
Trading calendar lookup.
store : Store
Backing store for persistence.
adapters : dict[str, SourceAdapter] | None
Canonical public data-loading surface keyed by source_name
(e.g. "cftc").
default_sources : dict[str, str] | None
Maps dataset → default source_name for canonical adapter routing
(e.g. {"cot.tff": "cftc"}).
fetch(query: Query, *, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> 'FetchResult'
¶
Canonical fetch path: resolve an adapter and delegate.
fetch_many(queries: list[Query], *, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> list['FetchResult']
¶
Canonical batch fetch path, grouped by resolved adapter.
fetch_panel(source: str, q: Query) -> PanelFrame
¶
Legacy panel-building path for DataSource-backed loaders.
from_adapters(*adapters: 'SourceAdapter', calendars: Mapping[str, TradingCalendar] | None = None, store: Store | None = None, universe: Optional[Universe] = None, entity_meta: Optional[EntityMetadata] = None, default_sources: Optional[dict[str, str]] = None) -> 'DataContext'
classmethod
¶
Build a DataContext from adapters without manual mapping boilerplate.
load(dataset: str, *, columns: Sequence[str], start: Optional[pd.Timestamp | str] = None, end: Optional[pd.Timestamp | str] = None, entities: Optional[Sequence[str]] = None, asof: Optional[pd.Timestamp | str] = None, vintage: VintageMode = 'latest', vintage_id: Optional[str] = None, grid: Optional[str] = None, source: Optional[str] = None, max_staleness: Optional[timedelta] = None) -> 'FetchResult'
¶
Happy-path source load without explicit Query construction.
prefetch(dataset: str, *, source: Optional[str] = None, asof_range: tuple[date, date] | None = None) -> 'CacheManifest'
¶
Warm cache for a dataset via the resolved adapter.
DuckDBParquetStore
dataclass
¶
Store FeatureFrames as Parquet on disk, indexed by DuckDB.
Layout
root/
alphaforge.duckdb
frames/
EntityEntry
dataclass
¶
Generic entity metadata entry.
EntityMetadata
dataclass
¶
Entity metadata table: index=entity_id, columns like sector/country/etc.
EntityRegistry
¶
Generic registry mapping entity names to metadata.
FRBTermStructureBenchmarkSource
¶
Bases: PublicWebSourceBase
Federal Reserve Board Kim-Wright three-factor benchmark series.
FREDDataSource
¶
Bases: DataSource
Legacy/raw-loader FRED DataSource kept for compatibility.
New code should prefer alphaforge.data.sources.fred.FREDSourceAdapter
via DataContext.fetch(...).
FeatureFrame
dataclass
¶
X + catalog + meta (+ optional artifacts).
set_tags(tags: Dict[str, Any], overwrite: bool = True) -> FeatureFrame
¶
Broadcast tags to all rows in catalog: - catalog['tags'] holds the dict (in-memory convenience) - catalog['tags_json'] holds the JSON string (for persistence) If overwrite=False, merges with any existing dict; request tags override.
FirstRateBarsAdapter
¶
FirstRateBarsConfig
dataclass
¶
Resolved configuration for local First Rate raw 5-minute bars.
FirstRateFuturesAdapter
¶
FirstRateFuturesConfig
dataclass
¶
Resolved futures loader configuration.
Resolution order: 1. explicit arguments 2. YAML config file entries 3. environment variables
FirstRateFuturesLoader
¶
Ingest a local directory of First Rate Data contract files.
FixedLagMonths
dataclass
¶
LagReturnsTemplate
¶
Lagged return features from a canonical market-price dataset.
MOFJGBYieldCurveSource
¶
Bases: PublicWebSourceBase
Daily JGB constant-maturity par yields from the MOF website.
fetch_wide(q: Query | None = None) -> pd.DataFrame
¶
Return a date × tenor DataFrame (yields in percent).
This is the natural format for yield-curve analysis.
MissingnessReason
¶
Bases: str, Enum
Why a point-in-time panel cell is missing.
NthBusinessDay
dataclass
¶
NthWeekday
dataclass
¶
PITAccessor
dataclass
¶
open(root: str | Path) -> 'PITAccessor'
classmethod
¶
Open a PIT accessor from a DuckDBParquetStore root.
PITCausalityError
¶
Bases: PITValidationError
Raised when transform execution would violate PIT causality.
PITDataSource
dataclass
¶
Bases: DataSource
Expose PIT rows through the legacy/raw-loader DataSource contract.
PITError
¶
Bases: Exception
Base class for PIT-related errors.
PITExperimentalFeatureError
¶
Bases: PITContractError
Raised when an experimental PIT feature is used without opt-in.
PITTransformSpec
dataclass
¶
sanitized_params() -> dict[str, Any]
¶
Params suitable for hashing and lineage serialization.
PITUnsupportedOperationError
¶
Bases: PITContractError
Raised when an unsupported PIT operation is requested.
PITValidationWarning
¶
Bases: UserWarning
Warning emitted for non-blocking PIT validation outcomes.
PanelFrame
dataclass
¶
Canonical panel: MultiIndex (ts_utc, entity_id). Dates stored tz-aware (UTC).
PhiladelphiaSPFMeanLevelSource
¶
Bases: PublicWebSourceBase
Historical mean SPF forecasts from the Philadelphia Fed.
QuarterlyRelease
dataclass
¶
RefRevisionQuery
dataclass
¶
Ref-period revision request for :meth:alphaforge.pit.accessor.PITAccessor.revisions_ref.
RefSnapshotQuery
dataclass
¶
Ref-period snapshot request for :meth:alphaforge.pit.accessor.PITAccessor.snapshot_ref.
ReleaseLagPolicy
dataclass
¶
Policy for lag-adjusting effective asof timestamps per series.
ReleaseRule
dataclass
¶
Bases: ABC
Base class for publication schedule rules.
expected_release_date(obs_date: date, release_number: int | None = None) -> date
abstractmethod
¶
Return the expected publication date for an observation date.
from_dict(payload: dict[str, Any]) -> 'ReleaseRule'
staticmethod
¶
Reconstruct a rule from a YAML-style mapping.
to_dict() -> dict[str, Any]
¶
Serialize the rule to a YAML-friendly mapping.
RollingVolatilityTemplate
¶
Rolling realized-volatility features from canonical market-price data.
ShortRateDataset
dataclass
¶
Container for a constructed short-rate research dataset.
TradingCalendar
dataclass
¶
Minimal business-day calendar.
- tz is the calendar's local timezone (e.g. America/New_York for XNYS).
- session labels are returned as tz-aware UTC instants at 00:00 UTC by default. (They are labels; open/close time helpers are provided below.)
session_close_utc(session_label: pd.Timestamp | str) -> pd.Timestamp
¶
Return session close as tz-aware UTC Timestamp (default 16:00 local -> UTC).
session_open_utc(session_label: pd.Timestamp | str) -> pd.Timestamp
¶
Return session open as tz-aware UTC Timestamp.
session_label may be a session label returned by sessions() (tz-aware UTC midnight) or a date-like string / Timestamp.
trading_minutes_utc(start_utc: pd.Timestamp, end_utc: pd.Timestamp, freq: str = '5min') -> pd.DatetimeIndex
¶
Return tz-aware UTC DatetimeIndex of trading minutes between start_utc and end_utc.
Generates minutes during each trading session (09:30..16:00 local) with frequency freq.
Universe
dataclass
¶
Time-varying membership: index=date, columns=entity_id, values=bool.
WeeklyRelease
dataclass
¶
align_panel(panel: PanelFrame, schema: TableSchema, grid: Grid, align: AlignSpec, asof: pd.Timestamp | None = None) -> AlignedPanel
¶
Align a PanelFrame to a target Grid and produce missingness typing.
MVP semantics: - structural missingness: NO_UPDATE_EXPECTED for low-freq between observation dates - abnormal missingness: * daily series: missing -> MISSING_UNKNOWN * low-freq series: missing on an observation date -> TEMPORARY_OUTAGE - NOT_YET_RELEASED: reserved for true vintage-aware sources (to add next)
build_first_rate_bars_context(config: FirstRateBarsConfig, *, calendars: Mapping[str, TradingCalendar] | None = None, store: Store | None = None, source_name: str = 'first_rate_bars') -> DataContext
¶
Build a DataContext for locally mounted First Rate 5-minute bar directories.
classify_missingness(*, obs_date: date, asof_date: date, series_frequency: str, panel_frequency: str = 'M', release_rule: ReleaseRule | None = None, publication_lag_months: int | None = None, realized_release_date: date | None = None) -> MissingnessReason
¶
Classify why a cell is missing at a given as-of date.
coerce_ref_revision_query(query: RefRevisionQuery | Mapping[str, Any]) -> RefRevisionQuery
¶
Normalize a ref-period revision query into a validated typed object.
coerce_ref_snapshot_query(query: RefSnapshotQuery | Mapping[str, Any]) -> RefSnapshotQuery
¶
Normalize a ref-period snapshot query into a validated typed object.
default_public_web_sources() -> dict[str, DataSource]
¶
Construct the default public-web source registry.
effective_asof(asof: pd.Timestamp, series_key: str, policy: ReleaseLagPolicy) -> pd.Timestamp
¶
Compute effective asof after lag/cutoff/embargo policy adjustments.
materialize(ctx: DataContext, template: FeatureTemplate, realization: FeatureRealization, policy: MaterializationPolicy, lineage: Optional[LineageGraph] = None, fit_slice: Optional[SliceSpec] = None) -> FeatureFrame
¶
Materialize a FeatureRealization with caching and (optional) stateful fit.
pit_leakage_report(df: pd.DataFrame, ts_col: str = 'obs_date', asof_col: str = 'asof_utc') -> pd.DataFrame
¶
Structured leakage and PIT hygiene diagnostics for PIT-shaped data.
revision_event_stream(pit: PITAccessor, series_key: str, *, start: pd.Timestamp | None = None, end: pd.Timestamp | None = None, min_abs_change: float = 0.0) -> pd.DataFrame
¶
Return revision events across all obs_date timelines.
revision_volatility(pit: PITAccessor, series_key: str, *, start: pd.Timestamp | None = None, end: pd.Timestamp | None = None) -> pd.Series
¶
Compute standard deviation of revision deltas by obs_date.