AI Scaling Laws and Human Growth
In the early days of LLMs, the scaling hypothesis was not initially intuitive. Scaling, in very simple AI terms, is the idea that throwing more data and compute into a model will make it better. Surely, super-intelligence would be achieved through complex architecture, not just... more?
However, the huge jump from GPT-2 to GPT-3 was not due to a fundamentally different design; the model had simply seen more data. It developed a greater understanding, not mainly through more complex coding, but through the sheer amount of data it had been exposed to. Past a certain threshold, the model can understand things it was never explicitly taught.
I learned this in my Digital Transformation class this semester, and it had me thinking about what that means for humans. After all, artificial neural networks were themselves inspired by the structure of the human brain.
So surely the brain functions in the same way when exposed to enough of the world?
Read enough history books, and you seem to be able to think more critically about the news you see. Take some political science classes, and suddenly you approach problems in your business class differently. Spend a semester abroad, and you look at your home country differently than before.
The scaling law presents a problem, too: models trained on narrow, low-quality data get worse with scale. Volume amplifies what's there. If you only focus on reading one worldview or honing a single skill, you are just reinforcing rather than growing.
Like the engineers who worked on GPT-3, you cannot predict who you'll become after reading seriously or living adventurously. But something will emerge that wasn't there before, and, at the end of the day, isn't that uncertainty the exciting part?