MNIST Classification with Orion

Orion is a dedicated Cairo-based library designed specifically to build machine learning models for ValidityML. Its purpose is to facilitate verifiable inference. For better performance we will operate with an 8-bit quantized model. In this tutorial, you will be guided on how to train your model using Quantized Aware Training using MNIST dataset, how to convert your pre-trained model to Cairo 1, and how to perform inference with Orion.

You can find all the code and the notebook in the dedicated repository.

Here is the content of the tutorial:

What is MNIST dataset?

The MNIST dataset is an extensive collection of handwritten digits, very popular in the field of image processing. Often, it's used as a reference point for machine learning algorithms. This dataset conveniently comes already partitioned into training and testing sets, a feature we'll delve into later in this tutorial.

The MNIST database comprises a collection of 70,000 images of handwritten digits, ranging from 0 to 9. Each image measures 28 x 28 pixels.

Train the model with Quantization-Aware Training

We will be using Tensorflow to train a neural network to recognize MNIST's handwritten digits in this tutorial. TensorFlow is a very popular framework for deep learning.

Dataset Preparation

In a notebook, import the required libraries and load the dataset.

from tensorflow import keras
from keras.datasets import mnist
from scipy.ndimage import zoom
import numpy as np

(x_train, y_train), (x_test, y_test) = mnist.load_data()

We have a total of 70,000 grayscale images, each with a dimension of 28 x 28 pixels. 60,000 images are for training and the remaining 10,000 are for testing.

We now need to pre-process our data. For the purposes of this tutorial and performance, we'll resize the images to 14 x 14 pixels. You'll see later that the neural network's input layer supports a 1D tensor. We, therefore, need to flatten and normalize our data.

# Resizing function
def resize_images(images):
    return np.array([zoom(image, 0.5) for image in images])

# Resize
x_train = resize_images(x_train)
x_test = resize_images(x_test)

# Then reshape
x_train = x_train.reshape(60000, 14*14)
x_test = x_test.reshape(10000, 14*14)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize to range [0, 1]
x_train /= 255
x_test /= 255

Model Definition and Training

We will design a straightforward feedforward neural network. Here's the model architecture we'll use:\

This model is composed of an input layer with a shape of 14*14, followed by two dense layers, each containing 10 neurons. The first dense layer uses a ReLU activation function, while the second employs a softmax activation function. Let's implement this architecture in the code.

from tensorflow.keras import layers

num_classes = 10

model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(14*14,)),
    keras.layers.Dense(10, activation='relu'), 
    keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])
              

Now let's train this model on our training data.

batch_size = 256
epochs = 10
history = model.fit(x_train, y_train,
                    epochs=epochs,
                    validation_split=0.2)
                    
Epoch 1/10
1500/1500 [==============================] - 1s 438us/step - loss: 0.8641 - accuracy: 0.7506 - val_loss: 0.3855 - val_accuracy: 0.8947
Epoch 2/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.3713 - accuracy: 0.8953 - val_loss: 0.3160 - val_accuracy: 0.9096
Epoch 3/10
1500/1500 [==============================] - 1s 397us/step - loss: 0.3252 - accuracy: 0.9070 - val_loss: 0.2916 - val_accuracy: 0.9150
Epoch 4/10
1500/1500 [==============================] - 1s 389us/step - loss: 0.3041 - accuracy: 0.9122 - val_loss: 0.2758 - val_accuracy: 0.9207
Epoch 5/10
1500/1500 [==============================] - 1s 393us/step - loss: 0.2917 - accuracy: 0.9153 - val_loss: 0.2672 - val_accuracy: 0.9237
Epoch 6/10
1500/1500 [==============================] - 1s 386us/step - loss: 0.2827 - accuracy: 0.9187 - val_loss: 0.2599 - val_accuracy: 0.9258
Epoch 7/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.2752 - accuracy: 0.9201 - val_loss: 0.2554 - val_accuracy: 0.9273
Epoch 8/10
1500/1500 [==============================] - 1s 390us/step - loss: 0.2685 - accuracy: 0.9218 - val_loss: 0.2525 - val_accuracy: 0.9282
Epoch 9/10
1500/1500 [==============================] - 1s 391us/step - loss: 0.2635 - accuracy: 0.9235 - val_loss: 0.2491 - val_accuracy: 0.9303
Epoch 10/10
1500/1500 [==============================] - 1s 392us/step - loss: 0.2593 - accuracy: 0.9256 - val_loss: 0.2467 - val_accuracy: 0.9302

At this point, we have trained a regular model.

Making the model Quantization Aware

The aim of this tutorial is to guide you through the process of performing verifiable inference with the Orion library. As stated before, Orion exclusively performs inference on 8-bit quantized models. Typically, quantization is executed via two distinct methods: Quantization Aware Training (QAT) or Post-Training Quantization (PTQ), which occurs after the training phase. In this tutorial we will use QAT method.

Concretely QAT is a method where the quantization error is emulated during the training phase itself. In this process, the weights and activations of the model are quantized, and this information is used during both the forward and backward passes of training. This allows the model to learn and adapt to the quantization error. It ensures that once the model is fully quantized post-training, it has already accounted for the effects of quantization, resulting in improved accuracy.

We will use TensorFlow Model Optimization Toolkit to finetune the pre-trained model for QAT.

import tensorflow_model_optimization as tfmot

# Apply quantization to the layers
quantize_model = tfmot.quantization.keras.quantize_model

q_aware_model = quantize_model(model)

# 'quantize_model' requires a recompile
q_aware_model.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])

q_aware_model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer (QuantizeLa  (None, 196)               3         
 yer)                                                            
                                                                 
 quant_dense (QuantizeWrapp  (None, 10)                1975      
 erV2)                                                           
                                                                 
 quant_dense_1 (QuantizeWra  (None, 10)                115       
 pperV2)                                                         
                                                                 
=================================================================
Total params: 2093 (8.18 KB)
Trainable params: 2080 (8.12 KB)
Non-trainable params: 13 (52.00 Byte)
_________________________________________________________________

We have now created a new model, q_aware_model, which is a quantization-aware version of our original model. Now we can train this model exactly like our original model.

batch_size = 256
epochs = 10
history = q_aware_model.fit(x_train, y_train,
                            epochs=epochs,
                            validation_split=0.2)

scores, acc = q_aware_model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores)
print('Test accuracy:', acc)
Epoch 1/10
1500/1500 [==============================] - 1s 563us/step - loss: 0.2623 - accuracy: 0.9245 - val_loss: 0.2499 - val_accuracy: 0.9293
Epoch 2/10
1500/1500 [==============================] - 1s 503us/step - loss: 0.2539 - accuracy: 0.9260 - val_loss: 0.2473 - val_accuracy: 0.9296
Epoch 3/10
1500/1500 [==============================] - 1s 508us/step - loss: 0.2509 - accuracy: 0.9260 - val_loss: 0.2454 - val_accuracy: 0.9289
Epoch 4/10
1500/1500 [==============================] - 1s 520us/step - loss: 0.2484 - accuracy: 0.9278 - val_loss: 0.2444 - val_accuracy: 0.9293
Epoch 5/10
1500/1500 [==============================] - 1s 533us/step - loss: 0.2464 - accuracy: 0.9284 - val_loss: 0.2428 - val_accuracy: 0.9293
Epoch 6/10
1500/1500 [==============================] - 1s 516us/step - loss: 0.2440 - accuracy: 0.9282 - val_loss: 0.2409 - val_accuracy: 0.9309
Epoch 7/10
1500/1500 [==============================] - 1s 540us/step - loss: 0.2424 - accuracy: 0.9286 - val_loss: 0.2417 - val_accuracy: 0.9308
Epoch 8/10
1500/1500 [==============================] - 1s 517us/step - loss: 0.2409 - accuracy: 0.9294 - val_loss: 0.2391 - val_accuracy: 0.9304
Epoch 9/10
1500/1500 [==============================] - 1s 539us/step - loss: 0.2391 - accuracy: 0.9292 - val_loss: 0.2406 - val_accuracy: 0.9316
Epoch 10/10
1500/1500 [==============================] - 1s 518us/step - loss: 0.2380 - accuracy: 0.9294 - val_loss: 0.2428 - val_accuracy: 0.9304
Test loss: 0.246782124042511
Test accuracy: 0.928600013256073

Converting to TFLite Format

Now, we will convert our model to TFLite format, which is a format optimized for on-device machine learning.

import tensorflow as tf

# Create a converter
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)

# Indicate that you want to perform default optimizations,
# which include quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Define a generator function that provides your test data's numpy arrays
def representative_data_gen():
  for i in range(500):
    yield [x_test[i:i+1]]

# Use the generator function to guide the quantization process
converter.representative_dataset = representative_data_gen

# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Set the input and output tensors to int8
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# Convert the model
tflite_model = converter.convert()

# Save the model to disk
open("q_aware_model.tflite", "wb").write(tflite_model)

Testing the Quantized Model

Now that we have trained a quantization-aware model and converted it to the TFLite format, we can perform inference using the TensorFlow Lite interpreter to test it.

We first load the TFLite model and allocate the required tensors. The Interpreter class provides methods for loading a model and running inferences.

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")
interpreter.allocate_tensors()

Next, we get the details of the input and output tensors. Each tensor in a TensorFlow Lite model has a name, index, shape, data type, and quantization parameters. These can be accessed via the input_details and output_details methods.

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

Before performing the inference, we need to normalize the input to match the data type of our model's input tensor, which in our case is int8. Then, we use the set_tensor method to provide the input data to the model. We perform the inference using the invoke method.

# Normalize the input value to int8
input_shape = input_details[0]['shape']
input_data = np.array(x_test[0:1], dtype=np.int8)
interpreter.set_tensor(input_details[0]['index'], input_data)

# Perform the inference
interpreter.invoke()

# Get the result
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

>>>>
[[-128 -128   -6   -6 -128 -116 -128 -128 -128 -128]]

Now, we are going to run the inference for the entire test set.

We normalize the entire test set and initialize an array to store the predictions.

(_, _), (x_test_image, y_test_label) = mnist.load_data()

# Resize and Normalize x_test_image to int8
x_test_image = resize_images(x_test_image)
x_test_image_norm = (x_test_image / 255.0 * 255 - 128).astype(np.int8)

# Initialize an array to store the predictions
predictions = []

We then iterate over the test set, making predictions. For each image, we flatten the image, normalize it, and then expand its dimensions to match the shape of our model's input tensor.

# Iterate over the test data and make predictions
for i in range(len(x_test_image_norm)):
    test_image = np.expand_dims(x_test_image_norm[i].flatten(), axis=0)
    
    # Set the value for the input tensor
    interpreter.set_tensor(input_details[0]['index'], test_image)
    
    # Run the inference
    interpreter.invoke()

    output = interpreter.get_tensor(output_details[0]['index'])
    predictions.append(output)
    

Finally, we use a function to plot the test images along with their predicted labels. This will give us a visual representation of how well our model is performing.

We have successfully trained a quantization-aware model, converted it to the TFLite format, and performed inference using the TensorFlow Lite interpreter.

Now let's convert the pre-trained model to Cairo, in order to perform verifiable inference with Orion library.

Convert your model to Cairo

In this section, you will generate Cairo files for each bias and weight of the model.

Create a new Scarb project

Scarb is a Cairo package manager. We will use Scarb to run inference with Orion. You can find all information about Scarb and Cairo installation here.

Let's create a new Scarb project. In your terminal run:

scarb new mnist_nn

Replace the content in Scarb.toml file with the following code:

[package]
name = "mnist_nn"
version = "0.1.0"

[dependencies]
orion = { git = "https://github.com/gizatechxyz/orion.git" }

[scripts]
test = "scarb cairo-test -f mnist_nn_test"

Finally, place the notebook and q_aware_model.tflite file in the mnist_nn directory. We are now ready to generate Cairo files from the pre-trained model.

Generate Cairo files

In a new notebook's cell load TFLite and allocate tensors.

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="q_aware_model.tflite")
interpreter.allocate_tensors()

Then, create an object with an input from the dataset, and all weights and biases.

# Create an object with all tensors 
#(an input + all weights and biases)
tensors = {
    "input": x_test_image[0].flatten(),
    "fc1_weights": interpreter.get_tensor(1), 
    "fc1_bias": interpreter.get_tensor(2), 
    "fc2_weights": interpreter.get_tensor(4), 
    "fc2_bias": interpreter.get_tensor(5)
}

Now let's generate Cairo files for each tensor in the object.

# Create the directory if it doesn't exist
os.makedirs('src/generated', exist_ok=True)

for tensor_name, tensor in tensors.items():
    with open(os.path.join('src', 'generated', f"{tensor_name}.cairo"), "w") as f:
        f.write(
            "use core::array::ArrayTrait;\n" +
            "use orion::operators::tensor::{TensorTrait, Tensor, I32Tensor};\n" +
            "use orion::numbers::i32;\n\n" +
            "\nfn {0}() -> Tensor<i32> ".format(tensor_name) + "{\n" +
            "    let mut shape = ArrayTrait::<usize>::new();\n"
        )
        for dim in tensor.shape:
            f.write("    shape.append({0});\n".format(dim))
        f.write(
            "    let mut data = ArrayTrait::<i32>::new();\n"
        )
        for val in np.nditer(tensor.flatten()):
            f.write("    data.append(i32 {{ mag: {0}, sign: {1} }});\n".format(abs(int(val)), str(val < 0).lower()))
        f.write(
            "    TensorTrait::new(shape.span(), data.span())\n" +
            "}\n"
        )
      
with open(os.path.join('src', 'generated.cairo'), 'w') as f:
    for param_name in tensors.keys():
        f.write(f"mod {param_name};\n")

Your Cairo files are generated in src/generated directory.

In src/lib.cairo replace the content with the following code:

mod generated;

We have just created a file called lib.cairo, which contains a module declaration referencing another module named generated.

Let's analyze the generated files

Here is a file we generated: fc1_bias.cairo

use core::array::ArrayTrait;
use orion::operators::tensor::{TensorTrait, Tensor, I32Tensor};
use orion::numbers::i32;


fn fc1_bias() -> Tensor<i32> {
    let mut shape = ArrayTrait::<usize>::new();
    shape.append(10);
    let mut data = ArrayTrait::<i32>::new();
    data.append(i32 { mag: 1287, sign: false });
    data.append(i32 { mag: 3667, sign: true });
    data.append(i32 { mag: 2954, sign: false });
    data.append(i32 { mag: 7938, sign: false });
    data.append(i32 { mag: 3959, sign: false });
    data.append(i32 { mag: 5862, sign: true });
    data.append(i32 { mag: 4886, sign: false });
    data.append(i32 { mag: 4992, sign: false });
    data.append(i32 { mag: 10126, sign: false });
    data.append(i32 { mag: 2237, sign: true });
    TensorTrait::new(shape.span(), data.span())
}

fc1_bias is a i32 Tensor two concepts that deserve a closer look.

Signed Integer in Orion

In Cairo, there are no built-in signed integers. However, in the field of machine learning, they are very useful. So Orion introduced a full implementation of Signed Integer. It is represented by a struct containing both the magnitude and its sign as a boolean.

The magnitude represents the absolute value of the number, and the sign indicates whether the number is positive or negative.

// Example of an i32.
struct i32 {
    mag: u32,
    sign: bool, // true means a negative sign.
}

Tensor in Orion

The second concept Orion introduced is the Tensor. We've used it extensively in previous sections, the tensor is a central object in machine learning. It is represented in Orion as a struct containing the tensor's shape, a flattened array of its data, and extra parameters. The generic Tensor is defined as follows:

struct Tensor<T> {
    shape: Span<usize>,
    data: Span<T>
}

You should now be able to understand the content in generated files.

Perform Inference with Orion

We have now reached the last part of our tutorial, performing ML inference in Cairo 1.0.

How to Build a Neural Network with Orion

In this subsection, we will reproduce the same model architecture defined earlier in the training phase with Tensorflow, but with Orion, as the aim is to perform the inference in Cairo.

In src folder, create a nn.cairo file and reference the module in lib.cairo as follow:

mod generated;
mod nn;

Now, let's build the layers of our neural network in nn.cairo. As a reminder, this was the architecture of the model we defined earlier:

Input -> FC1 (activation = 'relu') -> FC2 (activation= 'softmax') -> Output

Dense Layer 1

In nn.cairo let's create a function fc1 that takes three parameters:

  • i: Tensor<i32> - A tensor of i32 values representing the input data.

  • w: Tensor<i32> - A tensor of i32 values representing the weights of the first layer.

  • b: Tensor<i32> - A tensor of i32 values representing the biases of the first layer.

It should return a Tensor<i32>.

use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};

fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
    // ...
}

To build the first layer, we need a Linear function and a ReLU from NNTrait.

use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};

fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
    let x = NNTrait::linear(i, w, b);
    NNTrait::relu(@x)
}

Dense Layer 2

In a similar way, we can build the second layer fc2, which contains a Linear function and a Softmax from NNTrait. We could convert the tensor to fixed point in order to perform softmax, but for this simple tutorial it's not necessary.

use orion::operators::tensor::core::Tensor;
use orion::numbers::signed_integer::{integer_trait::IntegerTrait, i32::i32};
use orion::operators::nn::{NNTrait, I32NN};

fn fc1(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
    let x = NNTrait::linear(i, w, b);
    NNTrait::relu(@x)
}

fn fc2(i: Tensor<i32>, w: Tensor<i32>, b: Tensor<i32>) -> Tensor<i32> {
    NNTrait::linear(i, w, b)
}

We are now ready to perform inference!

Make Prediction

In src folder, create a test.cairo file and reference the module in lib.cairo as follow:

mod generated;
mod nn;
mod test;

In your test file, create a function mnist_nn_test.

#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
    //...
}

Now let's import and set the input data and the parameters generated previously.

use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;

#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
    let input = input();
    let fc1_bias = fc1_bias();
    let fc1_weights = fc1_weights();
    let fc2_bias = fc2_bias();
    let fc2_weights = fc2_weights();
}

Then import and set the neural network we built just above.

use mnist_nn::nn::fc1;
use mnist_nn::nn::fc2;
use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;

#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
    let input = input();
    let fc1_bias = fc1_bias();
    let fc1_weights = fc1_weights();
    let fc2_bias = fc2_bias();
    let fc2_weights = fc2_weights();

    let x = fc1(input, fc1_weights, fc1_bias);
    let x = fc2(x, fc2_weights, fc2_bias);
}

Finally, let's make a prediction. The input data represents the digit 7. So the index 7 should have the highest probability.

use core::array::SpanTrait;

use mnist_nn::nn::fc1;
use mnist_nn::nn::fc2;
use mnist_nn::generated::input::input;
use mnist_nn::generated::fc1_bias::fc1_bias;
use mnist_nn::generated::fc1_weights::fc1_weights;
use mnist_nn::generated::fc2_bias::fc2_bias;
use mnist_nn::generated::fc2_weights::fc2_weights;

use orion::operators::tensor::I32Tensor;

#[test]
#[available_gas(99999999999999999)]
fn mnist_nn_test() {
    let input = input();
    let fc1_bias = fc1_bias();
    let fc1_weights = fc1_weights();
    let fc2_bias = fc2_bias();
    let fc2_weights = fc2_weights();

    let x = fc1(input, fc1_weights, fc1_bias);
    let x = fc2(x, fc2_weights, fc2_bias);

    let x = *x.argmax(0, Option::None(()), Option::None(())).data.at(0);

    assert(x == 7, 'should predict 7');
}

Test your model by running scarb test.

testing mnist_nn ...
running 1 tests
test mnist_nn::test::mnist_nn_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 filtered out;

Bravo 👏 You can be proud of yourself! You just built your first Neural Network in Cairo 1.0 with Orion.

Orion leverages Cairo to guarantee the reliability of inference, providing developers with a user-friendly framework to build complex and verifiable machine learning models. We invite the community to join us in shaping a future where trustworthy AI becomes a reliable resource for all.

Last updated