Introduction

Welcome to TinyMLHub! TinyMLHub lets you compress your vision, language, speech, and other AI models into smaller, more efficient, and same-performance models for power, latency, and computation-constrained edge and mobile devices.

Tutorials

Pruning

In the pruning page, we divide the steps into:

  • Upload files
  • Select prune parameters
  • Prune the model
  • Compare metrics of the original model and the pruned model

Example

1. Upload files

Drag and drop or upload two files in two white boxes: (1) a Python file for the model architecture class in .py format and (2) a PyTorch model weights file in .pth format.

For example, download a sample .zip file from here. This file contains a VGG model class and its weights. Upon successful download of the file, extract the compressed file to access its contents and proceed to upload both files found within the extracted folder.

Upload Files

2. Select prune parameters

Adjust the prune parameters for each diagram using the drag bar on the right of each diagram.

In the VGG example, we recommend only pruning weights in conv layers. Pruning bias might decrease performance significantly. The decimal value shows the percentage of weights to keep in the pruned model. For example, 0.9 means keeping 90% of the weights in the backbone.conv0.weight layer. We recommend gradually decreasing the prune parameter values from the first layer to the last layer since the first few layers are more important to model performance.

Select prune parameters

3. Prune the model

After inputting all prune parameters, scroll down to the end and click Prune Model.

Prune model

4. Compare metrics of the original model and the pruned model

Quantization

This functionality is under development. Please contact us at support@ampleai.io if you have any questions or feature requests.

Distillation

This functionality is under development. Please contact us at support@ampleai.io if you have any questions or feature requests.

Status and Error Code

Glossary

The explanations below are from Wikipedia.

  1. Pruning: Pruning is the practice of removing parameters (which may entail removing individual parameters or parameters in groups, such as by neurons) from an existing network.
  2. Quantization: In mathematics and digital signal processing, quantization is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes.
  3. Distillation: In machine learning, knowledge distillation or model distillation is the process of transferring knowledge from a large model to a smaller one.