One of the recent trends is the use of super-large models, i.e. the number of parameters, and the application of conventional learning methods. Apart from the "software capabilities" that the human brain shows, I wondered how much should I increase to reach the "hardware" capabilities of the human brain? (If we reach the same hardware capabilities, it will now be a fair comparison of the algorithmic efficiency of human vs. AI)
The number of neurons that can be viewed as the main units that make up the human brain is said to be about 100B.
The number of semiconductor transistors is doubling every two years (Moore's Law). For example, the number of transistors for the Apple A12X Bionic used in the iPad Pro is 10B, and the AMD Epyc Rome, a server CPU, is 40B. RTX 20 series is expected to reach 20B, and future 30 series will reach 50B. If so, the number of transistors will exceed the number of neurons within the next four years. (This does not mean that the roles of transistors and neurons are the same)
By simply comparing the numbers, it can be seen that it is catching up to the human brain, at least on a unit-unit basis. Of course, neurons can be seen to perform much more complex functions than transistors, but neurons operate at 1KHz, whereas processor clock speeds operate at 3GHz or more, so there is a difference of about 3 million times or more. In other words, if we can simulate the behavior of neurons with 3 million transistors, there is no significant difference in the capabilities between the two unit units. (For reference, Intel Pentium has about 3 million transistors)
A quick look at the number of parameters of the deep learning model, GPT-2 is 1.5B and GPT-3 is 175B. Considering that the number of parameters roughly matches the number of basic units in a neural network, it can be said that the number of basic units in GPT-3 has already exceeded the number of neurons.
The current "number of units" or "model parameters" reached by humans is already close to the level of the human brain (or is it not a very big difference), and in this structure, the number of units is not approached by scale-up alone. Even if the and model parameters are the same, we decide whether innovation should be made in the direction of more efficient operation.
For example, the number of neurons is 100B, but the number of connections between neurons is 100T, which is 1,000 times that, and all of them can operate in parallel. Current processor architectures have many serial operations. In addition, a multi-layered neural network model can be seen as a model in which layers are connected in series (either in hardware or software). I think the current neural network model reflects the behavior of unit neurons to some extent, but what if we could put the characteristics of interconnection and parallel motion between all neurons into the neural network model.
The link doesn't necessarily match the above, but it is a related topic and I enjoyed reading it.