< Back to 68k.news AT front page

Microsoft, Apple look to go big with smaller AI models | Semafor

Original source (on modern site) | Article images: [1]

Competition is heating up between big technology companies building smaller AI models, which can be used in a broader array of devices like smartphones, cameras, and sensors, potentially allowing them to tap more users.

This week, Microsoft and Apple launched Phi-3 and OpenELM, respectively, new large language models that use less compute than the likes of OpenAI's GPT-4. The moves come as the AI industry realizes that the size of models should be tailored to different applications, and keep finding ways to make smaller, cheaper LLMs more capable.

"The approach that we're taking in the Phi series is different," Sébastién Bubeck, Microsoft's vice president of generative AI research, told Semafor. "The rest of the industry seems to be mostly about scaling up, trying to add more data and keep making the model bigger." Bubeck, however, wants to squeeze as much performance out of small models as possible.

For Microsoft, investing in smaller models means it can give customers more options beyond the larger systems it offers from its partnership with OpenAI. Those that can't afford to use top tier models can use smaller alternatives like Phi-3.

For Apple, OpenELM is relatively slow and limited, which still leaves it behind in the AI race. But it can run on iPhones, an ecosystem the company is keen to develop.

The trick to building small but mighty models lies in the quality of the text used to train them. Researchers at Apple filtered text from publicly available datasets, keeping sentences that are made up of a wider variety of words and more complex.

Microsoft used a mixture of real data scraped from the web and synthetic data generated by AI to train Phi-3. Prompting models to produce data means developers can better control the text used for training. "The reason why Phi-3 is so good for its given size is because we have crafted the data much more carefully," Bubeck said.

It's unclear how much data is needed to make a model as powerful as possible, and what capabilities might arise as they improve. These small AI models show that there are a lot of performance gains to be made by training on higher quality data than just scraping from the internet.

"This kind of iterative process of finding the right complexity of data for model size is a journey that the community hasn't really embarked on yet," Bubeck said. "And this is why we released these models. It's to empower all of the developers to use them to see how far you can go once you get into this data optimal regime."

< Back to 68k.news AT front page