Quantization
DeeperShrinking a model by storing its numbers with less precision.
Quantization can reduce memory use and make inference faster, helping models run on smaller devices. The trade-off is a possible loss of accuracy or subtle capability.
For example
#
A quantized model fits on a laptop that could not hold the full-precision version.