optimization

Oct52020

NLP acceleration with HuggingFace and ONNX Runtime

The performance improvement shown by Transformer-based language models is surprising, but as the model size increases exponentially, concerns about service costs are also becoming important. Bert-base or GPT-2 has about 100 million parameters, so the model size, memory bandwidth,...