Comment on page
MNIST Classification with Orion

Orion is a dedicated Cairo-based library designed specifically to build machine learning models for ValidityML. Its purpose is to facilitate verifiable inference. For better performance we will operate with an 8-bit quantized model. In this tutorial, you will be guided on how to train your model using Quantized Aware Training using MNIST dataset, how to convert your pre-trained model to Cairo 1, and how to perform inference with Orion.
Here is the content of the tutorial:
The MNIST dataset is an extensive collection of handwritten digits, very popular in the field of image processing. Often, it's used as a reference point for machine learning algorithms. This dataset conveniently comes already partitioned into training and testing sets, a feature we'll delve into later in this tutorial.
The MNIST database comprises a collection of 70,000 images of handwritten digits, ranging from 0 to 9. Each image measures 28 x 28 pixels.

Source: Wikimedia
We will be using Tensorflow to train a neural network to recognize MNIST's handwritten digits in this tutorial. TensorFlow is a very popular framework for deep learning.
In a notebook, import the required libraries and load the dataset.
from tensorflow import keras
from keras.datasets import mnist
from scipy.ndimage import zoom
import numpy as np
(x_train, y_train), (x_test, y_test) = mnist.load_data()
We have a total of 70,000 grayscale images, each with a dimension of 28 x 28 pixels. 60,000 images are for training and the remaining 10,000 are for testing.
We now need to pre-process our data. For the purposes of this tutorial and performance, we'll resize the images to 14 x 14 pixels. You'll see later that the neural network's input layer supports a 1D tensor. We, therefore, need to flatten and normalize our data.
# Resizing function
def resize_images(images):
return np.array([zoom(image, 0.5) for image in images])
# Resize
x_train = resize_images(x_train)
x_test = resize_images(x_test)
# Then reshape
x_train = x_train.reshape(60000, 14*14)
x_test = x_test.reshape(10000, 14*14)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalize to range [0, 1]
x_train /= 255
x_test /= 255
We will design a straightforward feedforward neural network. Here's the model architecture we'll use:\

Model architecture visualization from Netron.app
This model is composed of an input layer with a shape of 14*14, followed by two dense layers, each containing 10 neurons. The first dense layer uses a ReLU activation function, while the second employs a softmax activation function. Let's implement this architecture in the code.
from tensorflow.keras import layers
num_classes = 10
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(14*14,)),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Now let's train this model on our training data.
batch_size = 256
epochs = 10
history = model.fit(x_train, y_train,
epochs=epochs,
validation_split=0.2)
Epoch 1/10
1500/1500 [==============================] - 1s 438us/step - loss: 0.8641 - accuracy: 0.7506 - val_loss: 0.3855 - val_accuracy: 0.8947
Epoch 2/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.3713 - accuracy: 0.8953 - val_loss: 0.3160 - val_accuracy: 0.9096
Epoch 3/10
1500/1500 [==============================] - 1s 397us/step - loss: 0.3252 - accuracy: 0.9070 - val_loss: 0.2916 - val_accuracy: 0.9150
Epoch 4/10
1500/1500 [==============================] - 1s 389us/step - loss: 0.3041 - accuracy: 0.9122 - val_loss: 0.2758 - val_accuracy: 0.9207
Epoch 5/10
1500/1500 [==============================] - 1s 393us/step - loss: 0.2917 - accuracy: 0.9153 - val_loss: 0.2672 - val_accuracy: 0.9237
Epoch 6/10
1500/1500 [==============================] - 1s 386us/step - loss: 0.2827 - accuracy: 0.9187 - val_loss: 0.2599 - val_accuracy: 0.9258
Epoch 7/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.2752 - accuracy: 0.9201 - val_loss: 0.2554 - val_accuracy: 0.9273
Epoch 8/10
1500/1500 [==============================] - 1s 390us/step - loss: 0.2685 - accuracy: 0.9218 - val_loss: 0.2525 - val_accuracy: 0.9282
Epoch 9/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.2635 - accuracy: 0.9235 - val_loss: 0.2491 - val_accuracy: 0.9303
Epoch 10/10
1500/1500 [==============================] - 1s 392us/step - loss: 0.2593 - accuracy: 0.9256 - val_loss: 0.2467 - val_accuracy: 0.9302
At this point, we have trained a regular model.
The aim of this tutorial is to guide you through the process of performing verifiable inference with the Orion library. As stated before, Orion exclusively performs inference on 8-bit quantized models. Typically, quantization is executed via two distinct methods: Quantization Aware Training (QAT) or Post-Training Quantization (PTQ), which occurs after the training phase. In this tutorial we will use QAT method.
Concretely QAT is a method where the quantization error is emulated during the training phase itself. In this process, the weights and activations of the model are quantized, and this information is used during both the forward and backward passes of training. This allows the model to learn and adapt to the quantization error. It ensures that once the model is fully quantized post-training, it has already accounted for the effects of quantization, resulting in improved accuracy.
We will use TensorFlow Model Optimization Toolkit to finetune the pre-trained model for QAT.
import tensorflow_model_optimization as tfmot
# Apply quantization to the layers
quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
# 'quantize_model' requires a recompile
q_aware_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
q_aware_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
quantize_layer (QuantizeLa (None, 196) 3
yer)
quant_dense (QuantizeWrapp (None, 10) 1975
erV2)
quant_dense_1 (QuantizeWra (None, 10) 115
pperV2)
=================================================================
Total params: 2093 (8.18 KB)
Trainable params: 2080 (8.12 KB)
Non-trainable params: 13 (52.00 Byte)
_________________________________________________________________
We have now created a new model,
q_aware_model
, which is a quantization-aware version of our original model. Now we can train this model exactly like our original model.batch_size = 256
epochs = 10
history = q_aware_model.fit(x_train, y_train,
epochs=epochs,
validation_split=0.2)
scores, acc = q_aware_model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores)
print('Test accuracy:', acc)
Epoch 1/10
1500/1500 [==============================] - 1s 563us/step - loss: 0.2623 - accuracy: 0.9245 - val_loss: 0.2499 - val_accuracy: 0.9293
Epoch 2/10
1500/1500 [==============================] - 1s 503us/step - loss: 0.2539 - accuracy: 0.9260 - val_loss: 0.2473 - val_accuracy: 0.9296
Epoch 3/10
1500/1500 [==============================] - 1s 508us/step - loss: 0.2509 - accuracy: 0.9260 - val_loss: 0.2454 - val_accuracy: 0.9289
Epoch 4/10
1500/1500 [==============================] - 1s 520us/step - loss: 0.2484 - accuracy: 0.9278 - val_loss: 0.2444 - val_accuracy: 0.9293
Epoch 5/10
1500/1500 [==============================] - 1s 533us/step - loss: 0.2464 - accuracy: 0.9284 - val_loss: 0.2428 - val_accuracy: 0.9293
Epoch 6/10
1500/1500 [==============================] - 1s 516us/step - loss: 0.2440 - accuracy: 0.9282 - val_loss: 0.2409 - val_accuracy: 0.9309
Epoch 7/10
1500/1500 [==============================] - 1s 540us/step - loss: 0.2424 - accuracy: 0.9286 - val_loss: 0.2417 - val_accuracy: 0.9308
Epoch 8/10
1500/1500 [==============================] - 1s 517us/step - loss: 0.2409 - accuracy: 0.9294 - val_loss: 0.2391 - val_accuracy: 0.9304
Epoch 9/10
1500/1500 [==============================] - 1s 539us/step - loss: 0.2391 - accuracy: 0.9292 - val_loss: 0.2406 - val_accuracy: 0.9316
Epoch 10/10
1500/1500 [==============================] - 1s 518us/step - loss: 0.2380 - accuracy: 0.9294 - val_loss: 0.2428 - val_accuracy: 0.9304
Test loss: 0.246782124042511
Test accuracy: 0.928600013256073
Now, we will convert our model to TFLite format, which is a format optimized for on-device machine learning.
import tensorflow as tf
# Create a converter
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
# Indicate that you want to perform default optimizations,
# which include quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Define a generator function that provides your test data's numpy arrays
def representative_data_gen():
for i in range(500):
yield [x_test[i:i+1]]
# Use the generator function to guide the quantization process
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to int8
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Convert the model
tflite_model = converter.convert()
# Save the model to disk
open("q_aware_model.tflite", "wb").write(tflite_model)
Now that we have trained a quantization-aware model and converted it to the TFLite format, we can perform inference using the TensorFlow Lite interpreter to test it.
We first load the TFLite model and allocate the required tensors. The Interpreter class provides methods for loading a model and running inferences.
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")
interpreter.allocate_tensors()
Next, we get the details of the input and output tensors. Each tensor in a TensorFlow Lite model has a name, index, shape, data type, and quantization parameters. These can be accessed via the input_details and output_details methods.
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
Before performing the inference, we need to normalize the input to match the data type of our model's input tensor, which in our case is int8. Then, we use the
set_tensor
method to provide the input data to the model. We perform the inference using the invoke method.# Normalize the input value to int8
input_shape = input_details[0]['shape']
input_data = np.array(x_test[0:1], dtype=np.int8)
interpreter.set_tensor(input_details[0]['index'], input_data)
# Perform the inference
interpreter.invoke()
# Get the result
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
>>>>
[[-128 -128 -6 -6 -128 -116 -128 -128 -128 -128]]
Now, we are going to run the inference for the entire test set.
We normalize the entire test set and initialize an array to store the predictions.
(_, _), (x_test_image, y_test_label) = mnist.load_data()
# Resize and Normalize x_test_image to int8
x_test_image = resize_images(x_test_image)
x_test_image_norm = (x_test_image / 255.0 * 255 - 128).astype(np.int8)
# Initialize an array to store the predictions
predictions = []
We then iterate over the test set, making predictions. For each image, we flatten the image, normalize it, and then expand its dimensions to match the shape of our model's input tensor.
# Iterate over the test data and make predictions
for i in range(len(x_test_image_norm)):
test_image = np.expand_dims(x_test_image_norm[i].flatten(), axis=0)
# Set the value for the input tensor
interpreter.set_tensor(input_details[0]['index'], test_image)
# Run the inference
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
predictions.append(output)
Finally, we use a function to plot the test images along with their predicted labels. This will give us a visual representation of how well our model is performing.

We have successfully trained a quantization-aware model, converted it to the TFLite format, and performed inference using the TensorFlow Lite interpreter.
Now let's convert the pre-trained model to Cairo, in order to perform verifiable inference with Orion library.
In this section, you will generate Cairo files for each bias and weight of the model.
Scarb is a Cairo package manager. We will use Scarb to run inference with Orion. You can find all information about Scarb and Cairo installation here.
Let's create a new Scarb project. In your terminal run:
scarb new mnist_nn
Replace the content in Scarb.toml file with the following code:
[package]
name = "mnist_nn"
version = "0.1.0"
[dependencies]
orion = { git = "https://github.com/gizatechxyz/orion.git" }
[scripts]
test = "scarb cairo-test -f mnist_nn_test"
Finally, place the notebook and
q_aware_model.tflite
file in the mnist_nn
directory. We are now ready to generate Cairo files from the pre-trained model.In a new notebook's cell load TFLite and allocate tensors.
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")
interpreter.allocate_tensors()
Then, create an object with an input from the dataset, and all weights and biases.
# Create an object with all tensors
#(an input + all weights and biases)
tensors = {
"input": x_test_image[0].flatten(),
"fc1_weights": interpreter.get_tensor(1),
"fc1_bias": interpreter.get_tensor(2),
"fc2_weights": interpreter.get_tensor(4),
"fc2_bias": interpreter.get_tensor(5)
}
Now let's generate Cairo files for each tensor in the object.
# Create the directory if it doesn't exist
os.makedirs('src/generated', exist_ok=True)
for tensor_name, tensor in tensors.items():
with open(os.path.join('src', 'generated', f"{tensor_name}.cairo"), "w") as f:
f.write(
"use array::ArrayTrait;\n" +
"use orion::operators::tensor::{TensorTrait, Tensor, I32Tensor};\n" +
"use orion::numbers::i32;\n\n" +
"\nfn {0}() -> Tensor<i32> ".format(tensor_name) + "{\n" +
" let mut shape = ArrayTrait::<usize>::new();\n"
)
for dim in tensor.shape:
f.write(" shape.append({0});\n".format(dim))
f.write(
" let mut data = ArrayTrait::<i32>::new();\n"
)
for val in np.nditer(tensor.flatten()):
f.write(" data.append(i32 {{ mag: {0}, sign: {1} }});\n".format(abs(int(val)), str(val < 0).lower()))
f.write(
" TensorTrait::new(shape.span(), data.span())\n" +
"}\n"
)
with open(os.path.join('src', 'generated.cairo'), 'w') as f:
for param_name in tensors.keys():
f.write(f"mod {param_name};\n")
Your Cairo files are generated in
src/generated
directory.In
src/lib.cairo
replace the content with the following code:mod generated;
We have just created a file called
lib.cairo
, which contains a module declaration referencing another module named generated
.Here is a file we generated:
fc1_bias.cairo
use array::ArrayTrait;
use orion::operators::tensor::{TensorTrait, Tensor, I32Tensor};
use orion::numbers::i32;
fn fc1_bias() -> Tensor<i32> {
let mut shape = ArrayTrait::<usize>::new();
shape.append(10);
let mut data = ArrayTrait::<i32>::new();
data.append(i32 { mag: 1287, sign: false });
data.append(i32 { mag: 3667, sign: true });
data.append(i32 { mag: 2954, sign: false });
data.append(i32 { mag: 7938, sign: false });
data.append(i32 { mag: 3959, sign: false });
data.append(i32 { mag: 5862, sign: true });
data.append(i32 { mag: 4886, sign: false });
data.append(i32 { mag: 4992, sign: false });
data.append(i32 { mag: 10126, sign: false });
data.append(i32 { mag: 2237, sign: true });
TensorTrait::new(shape.span(), data.span())
}
fc1_bias
is a i32
Tensor
two concepts that deserve a closer look.In Cairo, there are no built-in signed integers. However, in the field of machine learning, they are very useful. So Orion introduced a full implementation of Signed Integer. It is represented by a struct containing both the magnitude and its sign as a boolean.
The magnitude represents the absolute value of the number, and the sign indicates whether the number is positive or negative.
// Example of an i32.
struct i32 {
mag: u32,
sign: bool, // true means a negative sign.
}
The second concept Orion introduced is the Tensor. We've used it extensively in previous sections, the tensor is a central object in machine learning. It is represented in Orion as a struct containing the tensor's shape, a flattened array of its data, and extra parameters. The generic Tensor is defined as follows:
struct Tensor<T> {
shape: Span<usize>,
data: Span<T>
}
You should now be able to understand the content in generated files.
We have now reached the last part of our tutorial, performing ML inference in Cairo 1.0.
In this subsection, we will reproduce the same model architecture defined earlier in the training phase with Tensorflow, but with Orion, as the aim is to perform the inference in Cairo.
In
src
folder, create a nn.cairo
file and reference the module in lib.cairo
as follow:mod generated;
mod nn;
Now, let's build the layers of our neural network in
nn.cairo
. As a reminder, this was the architecture of the model we defined earlier:Input -> FC1 (activation = 'relu') -> FC2 (activation= 'softmax') -> Output
In
nn.cairo
let's create a function fc1
that takes three parameters:i: Tensor<i32>
- A tensor ofi32
values representing the input data.w: Tensor<i32>
- A tensor ofi32
values representing the weights of the first layer.b: Tensor<i32>
- A tensor ofi32
values representing the biases of the first layer.
It should return a
Tensor<i32>
.use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};
fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
// ...
}
use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};
fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
let x = NNTrait::linear(i, w, b);
NNTrait::relu(@x)
}
use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};
fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
let x = NNTrait::linear(i, w, b);
NNTrait::relu(@x)
}
fn fc2(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
NNTrait::linear(i, w, b)
}
We are now ready to perform inference!
In
src
folder, create a test.cairo
file and reference the module in lib.cairo
as follow:mod generated;
mod nn;
mod test;
In your test file, create a function
mnist_nn_test
.#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
//...
}
Now let's import and set the input data and the parameters generated previously.
use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;
#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
let input = input();
let fc1_bias = fc1_bias();
let fc1_weights = fc1_weights();
let fc2_bias = fc2_bias();
let fc2_weights = fc2_weights();
}
Then import and set the neural network we built just above.
use mnist_nn::nn::fc1;
use mnist_nn::nn::fc2;
use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;
#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
let input = input();
let fc1_bias = fc1_bias();
let fc1_weights = fc1_weights();
let fc2_bias = fc2_bias();
let fc2_weights = fc2_weights();
let x = fc1(input, fc1_weights, fc1_bias);
let x = fc2(x, fc2_weights, fc2_bias);
}
Finally, let's make a prediction. The input data represents the digit 7. So the index 7 should have the highest probability.
use core::array::SpanTrait;
use mnist_nn::nn::fc1;
use mnist_nn::nn::fc2;
use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;
use orion::operators::tensor::I32Tensor;
#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
let input = input();
let fc1_bias = fc1_bias();
let fc1_weights = fc1_weights();
let fc2_bias = fc2_bias();
let fc2_weights = fc2_weights();
let x = fc1(input, fc1_weights, fc1_bias);
let x = fc2(x, fc2_weights, fc2_bias);
let x = *x.argmax(0, Option::None(()), Option::None(())).data.at(0);
assert(x == 7, 'should predict 7');
}
Test your model by running
scarb test
.testing mnist_nn ...
running 1 tests
test mnist_nn::test::mnist_nn_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 filtered out;
Bravo 👏 You can be proud of yourself! You just built your first Neural Network in Cairo 1.0 with Orion.
Orion leverages Cairo to guarantee the reliability of inference, providing developers with a user-friendly framework to build complex and verifiable machine learning models. We invite the community to join us in shaping a future where trustworthy AI becomes a reliable resource for all.
Last modified 2mo ago