Getting started
Installation
Install the package from PyPI:
pip install hugiml-core
Install optional extras when needed:
pip install "hugiml-core[plots]" # Plotly dashboards and profile plots
pip install "hugiml-core[benchmarks]" # benchmark comparison dependencies
pip install "hugiml-core[imbalanced]" # imbalanced-learn helper pipeline
pip install "hugiml-core[explainability]" # SHAP bridge
pip install "hugiml-core[server]" # FastAPI inference server dependencies
pip install "hugiml-core[all]" # all optional extras
Build from source when you need to edit the C++ extension or package internals:
git clone https://github.com/srikumar2050/hugiml-core.git
cd hugiml-core
pip install -e ".[dev]"
python setup.py build_ext --inplace
Minimal classifier workflow
prepareXy performs schema and type preparation only. It does not mine patterns or fit the model. Mining and downstream classifier fitting happen inside fit.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from hugiml import HUGIMLClassifier
clf = HUGIMLClassifier(B=7, L=1, G=5e-3)
X_enc, y_enc = clf.prepareXy(X_df, y)
X_train, X_test, y_train, y_test = train_test_split(
X_enc,
y_enc,
test_size=0.25,
stratify=y_enc,
random_state=42,
)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:, 1]
print("AUC:", roc_auc_score(y_test, proba))
print(clf.model_summary())
print(clf.get_pattern_info().head())
Cross-validation and production schemas
When you already know the feature schema, pass allCols and origColumns explicitly. This is often cleaner in cross-validation loops and production pipelines.
clf = HUGIMLClassifier(
allCols=[integer_columns, float_columns, categorical_columns],
origColumns=X_train.columns.tolist(),
B=15,
L=1,
G=1e-5,
topK=150,
adaptive_binning=True,
b_candidates=[2, 3, 5, 7, 10, 15],
)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
Recommended first checks
After fitting, inspect both predictive behavior and explanation complexity:
print(clf.get_transformed_shape())
print(clf.get_hug_features()[:10])
print(clf.feature_importances().head(20))
print(clf.get_pattern_info().head(20))
Performance-oriented starting point
For the current implementation, start with the native L=1 hot path, a bounded pattern budget, and adaptive binning only when per-feature bin selection is useful. Increase complexity only when validation results justify it:
clf = HUGIMLClassifier(
B=7,
L=1,
G=5e-3,
topK=100,
n_jobs=-1,
use_hotpath=True,
)
clf.fit(X_train, y_train)
print(clf.fit_metadata_.summary())
print(clf.fit_metadata_)
Use adaptive_binning=True with L=1 when you want supervised per-feature bin resolution without paying the cost of a fully materialized adaptive pre-binned matrix. Use L=2 when interaction patterns are important, and compensate by tightening G or keeping topK bounded. Use topK=-1 only for smaller datasets or controlled benchmark runs, because it allows the automatic budget to grow with the item universe.
If your logs show HUGIMLConvergenceWarning for a constant column, the model is telling you that the column has zero utility. Drop the column upstream if it is expected; otherwise, treat it as a data-quality signal.