Deployment

FastAPI inference server

A FastAPI-based inference server is included for containerized scoring.

docker build -t hugiml-core:latest -f docker/Dockerfile .
docker run -p 8080:8080 -v /path/to/models:/models hugiml-core:latest

Example request:

curl -s -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [{"age": 35, "savings": "moderate"}]}'

Kubernetes

A starter manifest is provided in kubernetes/deployment.yaml. Review CPU/memory limits, model volume paths, network policy, and secrets before production use.

Production checklist

Pin hugiml-core and dependency versions.
Save the trained model with save_model and record the model schema version.
Export a model card and audit artifact.
Capture calibration metrics and drift thresholds.
Exercise prediction-time schema validation against representative production payloads.
Configure monitoring, latency budgets, and rollback procedures.
Validate any pattern-pruning actions and retain the pruning audit trail.