OpenAI shipped GPT-4 today, the much-anticipated text-generating AI model, and it’s a curious piece of work.
GPT-4 improves upon its predecessor, GPT-3, in key ways, for example giving more factually true statements and allowing developers to prescribe its style and behavior more easily. It’s also multimodal in the sense that it can understand images, allowing it to caption and even explain in detail the contents of a photo.
But GPT-4 has serious shortcomings. Like GPT-3, the model “hallucinates” facts and makes basic reasoning errors. In one example on OpenAI’s own blog, GPT-4 describes Elvis Presley as the “son of an actor.” (Neither of his parents were actors.)
To get a better handle on GPT-4’s development cycle and its capabilities, as well as its limitations, TechCrunch spoke with Greg Brockman, one of the co-founders of OpenAI and its president, via a video call on Tuesday.
Asked to compare GPT-4 to GPT-3, Brockman had one word: Different.
“It’s just different,” he told TechCrunch. “There’s still a lot of problems and mistakes that [the model] makes … but you can really see the jump in skill in things like calculus or law, where it went from being really bad at certain domains to actually quite good relative to humans.”
Test results support his case. On the AP Calculus BC exam, GPT-4 scores a 4 out of 5 while GPT-3 scores a 1. (GPT-3.5, the intermediate model between GPT-3 and GPT-4, also scores a 4.) And in a simulated bar exam, GPT-4 passes with a score around the top 10 percent of test takers; GPT-3.5’s score hovered around the bottom 10 percent…..