Getting started =============== Installation ------------ Install the package from PyPI: .. code-block:: bash pip install hugiml-core Install optional extras when needed: .. code-block:: bash pip install "hugiml-core[plots]" # Plotly dashboards and profile plots pip install "hugiml-core[benchmarks]" # benchmark comparison dependencies pip install "hugiml-core[imbalanced]" # imbalanced-learn helper pipeline pip install "hugiml-core[explainability]" # SHAP bridge pip install "hugiml-core[server]" # FastAPI inference server dependencies pip install "hugiml-core[all]" # all optional extras Build from source when you need to edit the C++ extension or package internals: .. code-block:: bash git clone https://github.com/srikumar2050/hugiml-core.git cd hugiml-core pip install -e ".[dev]" python setup.py build_ext --inplace Minimal classifier workflow --------------------------- ``prepareXy`` performs schema and type preparation only. It does not mine patterns or fit the model. Mining and downstream classifier fitting happen inside ``fit``. .. code-block:: python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from hugiml import HUGIMLClassifier clf = HUGIMLClassifier(B=7, L=1, G=5e-3) X_enc, y_enc = clf.prepareXy(X_df, y) X_train, X_test, y_train, y_test = train_test_split( X_enc, y_enc, test_size=0.25, stratify=y_enc, random_state=42, ) clf.fit(X_train, y_train) proba = clf.predict_proba(X_test)[:, 1] print("AUC:", roc_auc_score(y_test, proba)) print(clf.model_summary()) print(clf.get_pattern_info().head()) Cross-validation and production schemas --------------------------------------- When you already know the feature schema, pass ``allCols`` and ``origColumns`` explicitly. This is often cleaner in cross-validation loops and production pipelines. .. code-block:: python clf = HUGIMLClassifier( allCols=[integer_columns, float_columns, categorical_columns], origColumns=X_train.columns.tolist(), B=15, L=1, G=1e-5, topK=150, adaptive_binning=True, b_candidates=[2, 3, 5, 7, 10, 15], ) clf.fit(X_train, y_train) predictions = clf.predict(X_test) probabilities = clf.predict_proba(X_test) Recommended first checks ------------------------ After fitting, inspect both predictive behavior and explanation complexity: .. code-block:: python print(clf.get_transformed_shape()) print(clf.get_hug_features()[:10]) print(clf.feature_importances().head(20)) print(clf.get_pattern_info().head(20)) Performance-oriented starting point ----------------------------------- For the current implementation, start with the native ``L=1`` hot path, a bounded pattern budget, and adaptive binning only when per-feature bin selection is useful. Increase complexity only when validation results justify it: .. code-block:: python clf = HUGIMLClassifier( B=7, L=1, G=5e-3, topK=100, n_jobs=-1, use_hotpath=True, ) clf.fit(X_train, y_train) print(clf.fit_metadata_.summary()) print(clf.fit_metadata_) Use ``adaptive_binning=True`` with ``L=1`` when you want supervised per-feature bin resolution without paying the cost of a fully materialized adaptive pre-binned matrix. Use ``L=2`` when interaction patterns are important, and compensate by tightening ``G`` or keeping ``topK`` bounded. Use ``topK=-1`` only for smaller datasets or controlled benchmark runs, because it allows the automatic budget to grow with the item universe. If your logs show ``HUGIMLConvergenceWarning`` for a constant column, the model is telling you that the column has zero utility. Drop the column upstream if it is expected; otherwise, treat it as a data-quality signal.