💿 Tutorial: Simple logistic classifier with ZK verification
Train A Model
While notebooks are excellent for data exploration and experimentation, at Ritual we believe it's important to structure your training and inference workflows for reproducibility and standardization.
(Note: If you have an existing MLOps / workflow setup and have a trained torch or sci-kit learn model ready, you can use the deployer tool to deploy models, explained in more detail in the next section.)
Start by creating a custom classic ml workflow that extends
ml.workflows.training.BaseClassicTrainingWorkflow. This class defines several common stages in a ML training workflow meant to be overridden and implemented.
What's in a typical AI pipeline? Infernet methods
- Ingest: Collecting or accessing data from various sources to be used in the pipeline.
- Validate: Ensuring that the ingested data meets certain quality criteria or format specifications.
- Transform: Applying processing steps to convert raw data into a format suitable for analysis, which may include normalization or scaling.
- Feature Generation: Creating new features from the transformed data that can be used to train a machine learning model.
- Feature Engineering: Selecting the most relevant features or constructing new features to improve model performance.
- Label Generation: Creating or assigning labels to the data points which will be used for supervised learning.
- Label Engineering: Modifying or ensuring that the labels are correctly assigned and are in the right format for model training.
- Split: Dividing the dataset into training, validation, and testing sets to prepare for model training and evaluation.
- Training: The process where a machine learning model learns from the data by adjusting its parameters with respect to a predefined loss function.
- Scoring: Evaluating the trained model using a held-out dataset (usually the test set) to measure its performance.
- Decision Loop "Performance Acceptable?": Checks if the model's performance is up to the required standard before deployment; if not, the process may iterate through additional rounds of training and evaluation.
- Deploy: If the performance is acceptable, the model is deployed into a production environment for practical use.
Infernet offers integration with existing sklearn pipelines and workflows for hook on.
ml.workflows.training.base_sklearn_training_workflow.BaseSklearnTrainingWorkflow. Examples can be found in the
Once you've implemented the necessary steps to train a model and are satisfied with its performance, you can move on to implementing proof generation as part of your deployment. Currently, the open source library EZKL is used as a backend for implementing proofs, though in the future we envision plugging in different proof systems depending on use case and feature set.
Rather than implementing Zero-Knowledge circuits directly, EZKL (opens in a new tab) compiles model circuits from the ONNX Runtimes (opens in a new tab) format. Leveraging built-in compatibility and conversion tools such as sk2torch, this allows us to leverage existing popular machine learning toolkits like Scikit-Learn and Torch. Again, you may reference
ml.workflows.training.example_workflow.BalanceClassifierEzklWorkflow for an implementation.
The lifecycle of a zero-knowledge proof can be divided into 3 stages
- proof setup
To setup our workflow for these 3 stages, follow the below.
True, # store the trained parameter weights inside the model file
opset_version=10, # the ONNX version to export the model to
True, # whether to execute constant folding for optimization
input_names=["input"], # the model's input names
output_names=["output"], # the model's output names
}, # variable length axes
- next we need to compile our model into a circuit. Depending on use case, this may involve tweaking model visibility settings or providing additional calibration data ( this affects the quantization used by EZKL and may affect the precision of the compiled model).
# generate and calibrate settings (with witness data)
- Generate circuit
res = ezkl.compile_circuit(
- Once the model is compiled, we can generate the structured reference string (SRS) that is also required for the setup ceremony.
res = ezkl.get_srs(paths.srs_path, paths.settings_path)
We generate the proving and verifying key as part of the set up process
Together with the proving key, a proof can be generated
This verifier can perform verification if provided the proper proof call data and verifier address.
Our training workflows use MLFlow (opens in a new tab) to track file uploads for the proof generation and verification steps. MLFlow is an open-source machine learning experiment tracking tool, similar to other tools such as AzureML and Weights and Biases.
In MLFlow, workflow run artifacts are organized by experiment, allowing you to compare results for related runs.
ml.drivers.base_driver.BaseTrainingDriver provides a base class that by default associates workflow runs with an experiment name equal to the workflow class name. See ml.drivers.example_workflow_driver.ExampleWorkflowDriver for an example implementation.
By default, your artifacts are stored locally in a ml/mlruns directory. You may prefer to run a MLFlow server directly (a docker image is provided to simplify this), in which case the workflows can be configured to communicate with the server by setting the MLFLOW_TRACKING_URI environment variable.
Deploying models and proofs from the workflow.
There are 3 services included in the infernet ml api meant to serve classic model inference, LLM model inference, and classic model ZK proofs.
Artifacts are parameters or parts of the configurations needed for the workflow in separate files. To serve proofs and inference, 5 artifacts are required in addition to the torch model (used for inference):
- compiled circuit
- Proving key
- Verifying key
- SRS string
The above are the minimum set of artifacts needed to deploy a workflow. Artifact version control can be easily managed via Huggingface through
upload_prover_files_hf in the ezkl_utils library can help you do this:
Details on deployment options and model support
This is a simple program that deploys a classic model for inference and EZKL based Zero-Knowledge Proof serving. Currently, Torch and Sklearn (via sk2learn) models are supported.
Ensure to set up the
HUGGING_FACE_HUB_TOKEN environment variable appropriately.
Running this program will:
- (Optionally) convert a sklearn model to a torch model
- Convert the torch model to an onnx model, and compile the EZKL circuit
- Generate the artifacts required for EZKL proving setups
model_namespecified) upload the torch model and artifacts to the specified Huggingface model repo (can be turned off)
- Generate verifier contracts for further deployment
- (If onchain data provided) generate data attester contracts for further deployment
The onchain data json file should match EZKL expected on chain data format. See EZKL Data Attest Example (opens in a new tab) for more details.
--use_sk2torch / --no_sk2torch