Orion is a dedicated Cairo-based library designed specifically to build machine learning models for ValidityML. Its purpose is to facilitate verifiable inference. For better performance we will operate with an 8-bit quantized model. In this tutorial, you will be guided on how to train your model using Quantized Aware Training using MNIST dataset, how to convert your pre-trained model to Cairo 1, and how to perform inference with Orion.
You can find all the code and the notebook in the dedicated repository.
The MNIST dataset is an extensive collection of handwritten digits, very popular in the field of image processing. Often, it's used as a reference point for machine learning algorithms. This dataset conveniently comes already partitioned into training and testing sets, a feature we'll delve into later in this tutorial.
The MNIST database comprises a collection of 70,000 images of handwritten digits, ranging from 0 to 9. Each image measures 28 x 28 pixels.
Train the model with Quantization-Aware Training
We will be using Tensorflow to train a neural network to recognize MNIST's handwritten digits in this tutorial. TensorFlow is a very popular framework for deep learning.
Dataset Preparation
In a notebook, import the required libraries and load the dataset.
from tensorflow import kerasfrom keras.datasets import mnistfrom scipy.ndimage import zoomimport numpy as np(x_train, y_train), (x_test, y_test) = mnist.load_data()
We have a total of 70,000 grayscale images, each with a dimension of 28 x 28 pixels. 60,000 images are for training and the remaining 10,000 are for testing.
We now need to pre-process our data. For the purposes of this tutorial and performance, we'll resize the images to 14 x 14 pixels. You'll see later that the neural network's input layer supports a 1D tensor. We, therefore, need to flatten and normalize our data.
# Resizing functiondefresize_images(images):return np.array([zoom(image, 0.5) for image in images])# Resizex_train =resize_images(x_train)x_test =resize_images(x_test)# Then reshapex_train = x_train.reshape(60000, 14*14)x_test = x_test.reshape(10000, 14*14)x_train = x_train.astype('float32')x_test = x_test.astype('float32')# normalize to range [0, 1]x_train /=255x_test /=255
Model Definition and Training
We will design a straightforward feedforward neural network. Here's the model architecture we'll use:\
This model is composed of an input layer with a shape of 14*14, followed by two dense layers, each containing 10 neurons. The first dense layer uses a ReLU activation function, while the second employs a softmax activation function. Let's implement this architecture in the code.
The aim of this tutorial is to guide you through the process of performing verifiable inference with the Orion library. As stated before, Orion exclusively performs inference on 8-bit quantized models. Typically, quantization is executed via two distinct methods: Quantization Aware Training (QAT) or Post-Training Quantization (PTQ), which occurs after the training phase. In this tutorial we will use QAT method.
Concretely QAT is a method where the quantization error is emulated during the training phase itself. In this process, the weights and activations of the model are quantized, and this information is used during both the forward and backward passes of training. This allows the model to learn and adapt to the quantization error. It ensures that once the model is fully quantized post-training, it has already accounted for the effects of quantization, resulting in improved accuracy.
We will use TensorFlow Model Optimization Toolkit to finetune the pre-trained model for QAT.
import tensorflow_model_optimization as tfmot# Apply quantization to the layersquantize_model = tfmot.quantization.keras.quantize_modelq_aware_model =quantize_model(model)# 'quantize_model' requires a recompileq_aware_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])q_aware_model.summary()
We have now created a new model, q_aware_model, which is a quantization-aware version of our original model. Now we can train this model exactly like our original model.
Now, we will convert our model to TFLite format, which is a format optimized for on-device machine learning.
import tensorflow as tf# Create a converterconverter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)# Indicate that you want to perform default optimizations,# which include quantizationconverter.optimizations = [tf.lite.Optimize.DEFAULT]# Define a generator function that provides your test data's numpy arraysdefrepresentative_data_gen():for i inrange(500):yield [x_test[i:i+1]]# Use the generator function to guide the quantization processconverter.representative_dataset = representative_data_gen# Ensure that if any ops can't be quantized, the converter throws an errorconverter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]# Set the input and output tensors to int8converter.inference_input_type = tf.int8converter.inference_output_type = tf.int8# Convert the modeltflite_model = converter.convert()# Save the model to diskopen("q_aware_model.tflite", "wb").write(tflite_model)
Testing the Quantized Model
Now that we have trained a quantization-aware model and converted it to the TFLite format, we can perform inference using the TensorFlow Lite interpreter to test it.
We first load the TFLite model and allocate the required tensors. The Interpreter class provides methods for loading a model and running inferences.
# Load the TFLite model and allocate tensors.interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")interpreter.allocate_tensors()
Next, we get the details of the input and output tensors. Each tensor in a TensorFlow Lite model has a name, index, shape, data type, and quantization parameters. These can be accessed via the input_details and output_details methods.
# Get input and output tensors.input_details = interpreter.get_input_details()output_details = interpreter.get_output_details()
Before performing the inference, we need to normalize the input to match the data type of our model's input tensor, which in our case is int8. Then, we use the set_tensor method to provide the input data to the model. We perform the inference using the invoke method.
# Normalize the input value to int8input_shape = input_details[0]['shape']input_data = np.array(x_test[0:1], dtype=np.int8)interpreter.set_tensor(input_details[0]['index'], input_data)# Perform the inferenceinterpreter.invoke()# Get the resultoutput_data = interpreter.get_tensor(output_details[0]['index'])print(output_data)>>>>[[-128-128-6-6-128-116-128-128-128-128]]
Now, we are going to run the inference for the entire test set.
We normalize the entire test set and initialize an array to store the predictions.
(_, _), (x_test_image, y_test_label) = mnist.load_data()# Resize and Normalize x_test_image to int8x_test_image =resize_images(x_test_image)x_test_image_norm = (x_test_image /255.0*255-128).astype(np.int8)# Initialize an array to store the predictionspredictions = []
We then iterate over the test set, making predictions. For each image, we flatten the image, normalize it, and then expand its dimensions to match the shape of our model's input tensor.
# Iterate over the test data and make predictionsfor i inrange(len(x_test_image_norm)): test_image = np.expand_dims(x_test_image_norm[i].flatten(), axis=0)# Set the value for the input tensor interpreter.set_tensor(input_details[0]['index'], test_image)# Run the inference interpreter.invoke() output = interpreter.get_tensor(output_details[0]['index']) predictions.append(output)
Finally, we use a function to plot the test images along with their predicted labels. This will give us a visual representation of how well our model is performing.
We have successfully trained a quantization-aware model, converted it to the TFLite format, and performed inference using the TensorFlow Lite interpreter.
Now let's convert the pre-trained model to Cairo, in order to perform verifiable inference with Orion library.
Convert your model to Cairo
In this section, you will generate Cairo files for each bias and weight of the model.
Create a new Scarb project
Scarb is a Cairo package manager. We will use Scarb to run inference with Orion. You can find all information about Scarb and Cairo installation here.
Let's create a new Scarb project. In your terminal run:
scarbnewmnist_nn
Replace the content in Scarb.toml file with the following code:
Finally, place the notebook and q_aware_model.tflite file in the mnist_nn directory. We are now ready to generate Cairo files from the pre-trained model.
Generate Cairo files
In a new notebook's cell load TFLite and allocate tensors.
# Load the TFLite model and allocate tensors.interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")interpreter.allocate_tensors()
Then, create an object with an input from the dataset, and all weights and biases.
# Create an object with all tensors #(an input + all weights and biases)tensors ={"input": x_test_image[0].flatten(),"fc1_weights": interpreter.get_tensor(1),"fc1_bias": interpreter.get_tensor(2),"fc2_weights": interpreter.get_tensor(4),"fc2_bias": interpreter.get_tensor(5)}
Now let's generate Cairo files for each tensor in the object.
# Create the directory if it doesn't existos.makedirs('src/generated', exist_ok=True)for tensor_name, tensor in tensors.items():withopen(os.path.join('src', 'generated', f"{tensor_name}.cairo"), "w")as f: f.write("use core::array::ArrayTrait;\n"+"use orion::operators::tensor::{TensorTrait, Tensor, I32Tensor};\n"+"use orion::numbers::i32;\n\n"+"\nfn {0}() -> Tensor<i32> ".format(tensor_name) +"{\n"+" let mut shape = ArrayTrait::<usize>::new();\n" )for dim in tensor.shape: f.write(" shape.append({0});\n".format(dim)) f.write(" let mut data = ArrayTrait::<i32>::new();\n" )for val in np.nditer(tensor.flatten()): f.write(" data.append(i32 {{ mag: {0}, sign: {1}}});\n".format(abs(int(val)), str(val <0).lower())) f.write(" TensorTrait::new(shape.span(), data.span())\n"+"}\n" )withopen(os.path.join('src', 'generated.cairo'), 'w')as f:for param_name in tensors.keys(): f.write(f"mod {param_name};\n")
Your Cairo files are generated in src/generated directory.
In src/lib.cairo replace the content with the following code:
mod generated;
We have just created a file called lib.cairo, which contains a module declaration referencing another module named generated.
fc1_bias is a i32Tensor two concepts that deserve a closer look.
Signed Integer in Orion
In Cairo, there are no built-in signed integers. However, in the field of machine learning, they are very useful. So Orion introduced a full implementation of Signed Integer. It is represented by a struct containing both the magnitude and its sign as a boolean.
The magnitude represents the absolute value of the number, and the sign indicates whether the number is positive or negative.
// Example of an i32.structi32 { mag:u32, sign:bool, // true means a negative sign.}
Tensor in Orion
The second concept Orion introduced is the Tensor. We've used it extensively in previous sections, the tensor is a central object in machine learning. It is represented in Orion as a struct containing the tensor's shape, a flattened array of its data, and extra parameters. The generic Tensor is defined as follows:
You should now be able to understand the content in generated files.
Perform Inference with Orion
We have now reached the last part of our tutorial, performing ML inference in Cairo 1.0.
How to Build a Neural Network with Orion
In this subsection, we will reproduce the same model architecture defined earlier in the training phase with Tensorflow, but with Orion, as the aim is to perform the inference in Cairo.
In src folder, create a nn.cairo file and reference the module in lib.cairo as follow:
mod generated;mod nn;
Now, let's build the layers of our neural network in nn.cairo. As a reminder, this was the architecture of the model we defined earlier:
To build the first layer, we need a Linear function and a ReLU from NNTrait.
use orion::operators::tensor::core::Tensor;use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};use orion::operators::nn::{NNTrait, I32NN};fnfc1(i:Tensor<i32>, w:Tensor<i32>, b:Tensor<i32>) ->Tensor<i32> {let x =NNTrait::linear(i, w, b);NNTrait::relu(@x)}
Dense Layer 2
In a similar way, we can build the second layer fc2, which contains a Linear function and a Softmax from NNTrait. We could convert the tensor to fixed point in order to perform softmax, but for this simple tutorial it's not necessary.
use orion::operators::tensor::core::Tensor;use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};use orion::operators::nn::{NNTrait, I32NN};fnfc1(i:Tensor<i32>, w:Tensor<i32>, b:Tensor<i32>) ->Tensor<i32> {let x =NNTrait::linear(i, w, b);NNTrait::relu(@x)}fnfc2(i:Tensor<i32>, w:Tensor<i32>, b:Tensor<i32>) ->Tensor<i32> {NNTrait::linear(i, w, b)}
We are now ready to perform inference!
Make Prediction
In src folder, create a test.cairo file and reference the module in lib.cairo as follow:
mod generated;mod nn;mod test;
In your test file, create a function mnist_nn_test.
Bravo 👏 You can be proud of yourself! You just built your first Neural Network in Cairo 1.0 with Orion.
Orion leverages Cairo to guarantee the reliability of inference, providing developers with a user-friendly framework to build complex and verifiable machine learning models. We invite the community to join us in shaping a future where trustworthy AI becomes a reliable resource for all.