Combining multiple network models into an ensemble increases performance, but it is a reality that there are many difficulties in practical application because the total network size and inference time also increase.
MEAL (Multi-model Ensemble via Adversarial Learning) applies Teacher-Student Learning to solve this problem. Specifically, it considers several networks that have already been learned as multiple teachers, and selects one of them by a selection module. And distillation to finally create a single student network.
The shared link is the MEAL v2 github, and several improvements have been applied, such as using the teacher's ensemble directly instead of the selection module, and excluding one-hot/hard labels from the distillation process. As a result, ResNet-50's ImageNet-1K performance (224×224 single crop) surpassed TOP-1 80% without changing the network structure. The title of the paper is also “ResNet-50 performance 80% pass without trick”.
The code, as well as pre-trained models of ResNet-50, MobileNet V3, and EfficientNet-B0, are all publicly available, so they look very suitable for transfer learning.