Feature modes
HUGIML supports three downstream feature representations. The mined pattern matrix is always available through transform(X). The feature_mode parameter controls only the matrix passed to fit, predict, predict_proba, and score.
|
Downstream estimator input |
Use when |
|---|---|---|
|
Binary HUG pattern matrix only. |
You want the standard HUGIML representation and the most compact explanation surface. |
|
Original features plus all mined binary patterns. |
Original features contain strong marginal signal and HUG patterns add nonlinear refinements. |
|
Original features plus only higher-order |
Original features should capture marginal effects while HUGIML contributes interaction regions. |
Augmented pair features
When L > 1, adaptive_binning=True, and augmented_pair_transforms=True, HUGIML can append native augmented-pair transforms to the downstream feature matrix. These are continuous product or absolute-difference features selected from informative numeric source features. They are not fed back into HUG pattern mining.
Use topk_budget_strict=True when a single global topK budget should apply to the final downstream feature space across original features, HUG patterns, and augmented-pair features.
clf = HUGIMLClassifier(
B=-1,
adaptive_binning=True,
L=2,
topK=50,
feature_mode="original_plus_patterns",
augmented_pair_transforms=True,
topk_budget_strict=True,
)
clf.fit(X_train, y_train)
print(clf.get_model_composition())
print(clf.explain_augmented_pair_effects())
Examples
from hugiml import HUGIMLClassifier
clf = HUGIMLClassifier(
B=10,
L=2,
G=1e-3,
topK=150,
adaptive_binning=True,
feature_mode="patterns_only",
)
clf_hybrid = HUGIMLClassifier(
B=10,
L=2,
G=1e-3,
topK=150,
adaptive_binning=True,
feature_mode="original_plus_patterns",
)
clf_interactions = HUGIMLClassifier(
B=10,
L=2,
G=1e-3,
topK=150,
adaptive_binning=True,
feature_mode="original_plus_interactions",
)
Downstream matrix policy
In v1.1.9, patterns_only keeps the downstream representation sparse. Hybrid modes choose dense representation for small or moderate selected widths and CSR representation for larger selected widths. This keeps ordinary sklearn workflows convenient while reducing memory pressure for wider selected feature spaces.
Interpretation notes
get_hug_features and get_pattern_info are always pattern-only APIs. In hybrid modes, feature_importances and model_summary report the downstream feature representation used by the fitted model, which can include original features, mined patterns, and augmented-pair features. Use explain_augmented_pair_effects() for raw-scale interpretation of augmented-pair rows.
Compatibility with transform
transform(X) and fit_transform(X, y) intentionally remain pattern-space APIs in every feature mode. This means existing explanation workflows that expect the binary HUG pattern matrix continue to work after enabling a hybrid downstream representation.
In hybrid modes, the fitted downstream estimator receives a private design matrix built from standardized original features, selected pattern columns, and any active augmented-pair columns. feature_importances(), model_summary(), and get_model_composition() report that downstream feature space so that diagnostics match what the estimator actually used.
Operational guidance
Use
patterns_onlyas the default for audits, model cards, and compact explanations.Use
original_plus_patternswhen original columns contain strong marginal signal that should remain directly available to the downstream classifier.Use
original_plus_interactionswhen the original columns should carry marginal effects and HUGIML should contribute higher-order regions only.Serialized models preserve the selected feature mode, original-feature preprocessing state, pattern-order masks, and downstream feature names.