The deep learning-based large-scale language model represented by BERT shows excellent performance in various tasks related to natural language such as Q&A, document summary, document generation, and conversation. Some people rate it as.
Walid Saba, chief AI scientist at ONTOLOGIK, a specialist in natural language understanding (NLU), has posted a different perspective, mainly because language models such as BERT (collectively called BERTology) are in fact far from the concept of “understanding”. From that point of view, they commented that it is not a good way to solve NLU. Share the original link.
The original text talks about three perspectives: Missing Text, Intention, and Statistical Insignificance. The first perspective, an example that illustrates Missing Text, is as follows:
Sentence: Xanadu quits graduate school to join a software company.
The sentence above is short, but there are various meanings besides the literal meaning. For example, that he was a graduate student, that he was a human adult, and that a software company was hiring people. In other words, it is pointed out that it is not possible to understand all the contents only with the given sentence, and it is pointed out that it can be called "understanding" only when various facts can be derived from the perspective of "common sense". In addition, various examples are included in the original text, so we recommend reading them.
The second point of view, Intention, is that a word is a symbol, but it is also a concept, and it is also an object mapped to the real world, so depending on what it means, the same word must be interpreted differently:
Sentence 1: Mary taught her little brother that 7+9=16.
Sentence 2: Mary taught his little brother that 7+9=ROOT(256).
16 and ROOT(256) are the same value, but you can guess that the first sentence taught “addition” and the second sentence taught “root”. Current language models that are trained with blank word filling are talking about what they are trying to teach to cope with this phenomenon, i.e., Intention must be considered.
Finally, the third point of view, Statistical Significance, tells us that it is often not enough to just learn patterns in large data sets:
Sentence 1: The trophy did not fit in the suitcase because it was too large.
Sentence 2: The trophy did not fit in the suitcase because it was too small.
Humans would deduce the reason why the trophy did not fit in the bag, and as a result, it is easy to see that “it” in sentence 1 is a trophy, and “it” in sentence 2 is a suitcase. However, current language models, which are simply approached by statistical pattern learning methods, cannot predict “large” or “small” because there is no explicit reasoning process, and they cannot understand whether “it” is a trophy or a bag.
The author concludes, "Language is not just data." Personally, I think some of the examples in this article can be overcome by a number of complementary measures (common sense knowledge graph, intention analysis, logical reasoning), but it is not easy for the current language model to do them in the unified manner. If these structural shortcomings are compensated, I think that more human-like conversations will be possible with much less data.