Data-Free Quantization-Aware Training Research Project Weekly Stand-up
Quantization is a technique applied to reduce the size of Large Language Models and machine learning models in general. It reduces the precision of weights and thus the size of data, allowing the model to more easily reside in the memory of a system running inference on the model.
Most quantized models you can find on Hugging Face are created by a technique called Post-Training Quantization, or PTG for lack of imagination. It does pretty well.
Another technique is Quantization-Aware Training or QAT. In other words, train your model with the understanding that it will also be quantized. This is expensive since it requires additional training.
But what if there's a way we could achieve Quantization-Aware Training without data?
That is what this open research project is investigating.
Check out the proposal on our Github here: https://github.com/aifoundry-org/.github/wiki/Proposal-on-QAT-LLM-quantization
Drop by our weekly stand-up meeting at the AIFoundry.org Discord Server to learn more and see how you can participate.