NLP acceleration with HuggingFace and ONNX Runtime
The performance improvement shown by Transformer-based language models is surprising, but as the model size increases exponentially, concerns about service costs are also becoming important. Bert-base or GPT-2 has about 100 million parameters, so the model size, memory bandwidth,...