Knowledge extraction, also known as model extraction, involves transferring knowledge from a large model (usually called "teacher" model) to a smaller model (called "student" model). It is a process in machine learning, and its performance is as accurate as a larger model. Our goal is to create a smaller model, but the calculation cost is less. This makes smaller models more suitable for deployment on weaker hardware, such as mobile devices. This migration set is different from the data set used to train the teacher model.
The process of knowledge extraction includes training the student model on the transfer set, and the loss function used is usually the cross entropy between the output of the student model and the output generated by the teacher model on the same record. In this process, both models use a higher softmax temperature value. There are different types of knowledge refining techniques.
According to the way knowledge spreads in the teacher-student network, student model learning imitates the teacher model. Once the large-scale deep neural network is properly compressed, it can be deployed on the underlying hardware devices to run real-world reasoning.
Knowledge extraction has been successfully applied in many applications of machine learning to achieve similar or even higher performance accuracy. Such as object detection, acoustic model, natural language processing and graphic neural network suitable for non-grid data. It is especially popular to obtain fast and lightweight models, which are easier to train and have lower computational cost. Other NLP use cases include neural machine translation and text generation. In the field of natural language processing (NLP).
Knowledge distillation is a method of extracting knowledge from complicated models and compressing it into a single model so that it can be deployed in practical applications. Geoffrey Hinton, the godfather of AI, and his two colleagues at Google, Oriol Vinyals and Jeff Dean, introduced knowledge distillation in 2015.
Knowledge distillation refers to transferring the learning behavior of a clumsy model (teacher) to a smaller model (student), in which the output generated by the teacher is used as a "soft target" for training students. By applying this method, the authors find that they have achieved surprising results on MNIST data sets, and show that significant improvement can be achieved by extracting the knowledge from model integration into a single model.