In May, I had the opportunity to attend the London Fully Connected conference in London. The conference was run by Weights & Biases, which is a company that provides tools and services that help ML (machine learning) engineers manage the complexity of training and deploying machine learning models.
They’ve expanded their view of the market for ML tools and services to reflect the ways that it is maturing. Some of these ideas are influencing the way that I think about how companies develop and integrate ML.
I’ve been following them and using their tools off and on since shortly after they began in 2017. Their initial tools were a great support for people who are training machine learning models. At that time, there had been huge progress in vision problems, like classification or object detection using deep convolutional neural nets (CNNs). Foundational vision models were available in a flourishing array of different architectures. They provided a starting point and could recognize many types of objects, like cats, flowers, buses, or people. Fine-tuning these models, starting with a pre-trained foundation vision model, can be relatively straightforward.
For many practical tasks, you can get good results by fine-tuning with hundreds or a few thousand training examples. Of course there is more to it. In addition to the training data, there is a large space of hyperparameters that you can select from to try to improve on your current best result. For example, you can select different model architectures, model sizes, learning rates, optimizers, batch size, number of epochs, size of input images, and more.
I guess that just about everyone who was training models in 2017 spent time wandering through the hyperparameter space – losing their notes, forgetting which combinations of hyperparameters and models they had run before, which datasets they had trained on, and wondering which combinations were most effective. In any case, I know that I did! To be fair, there was a lot to track, and not everything that needed tracking was obvious.
Weights & Biases provides services that help ML engineers to manage this complexity. It made a huge difference in making sense of model training . It only took a few lines of code, and you could readily log and monitor a training run. They captured most of the relevant hyperparameters and logged both ML information (e.g., loss, epoch) as well as system information (e.g., GPU usage and temperature ). They also captured thumbnails of sampled input and output images . This helped a lot to understand what was going on and catch certain sorts of bugs. They also have hyperparameter search capabilities to help you look at the space of possibilities systematically. One of the things that I liked best about their tools was how robust they were and how little impact they had on the servers that I used to train the models.
Weights & Biases have made substantial progress since those early days, and I was impressed by the clarity with which they see the market developing. They used to target people who were training small models or perhaps fine-tuning medium-sized ones. Today they look at their market as having three components:
- Foundation model builders. These are the people who train large models from scratch. They typically work for large corporations or well-funded university research groups. Today, training these large models on internet-scale datasets can cost millions of dollars. Perhaps there are a few thousand of these people in this prestige market including engineers at Google, Meta, OpenAI, and Anthropic.
- Specialized model builders. These are the people who are taking a foundation model (either an LLM – large language model – or a vision model) and adapting it for a specialized setting. For example, taking a vision model and further training it to identify cancerous growths from scans or segmenting an image into subject and background. These settings often have stringent operational requirements and need both additional training and rigorous evaluation. You might see specialized models being developed for many markets including hospitals, law firms, or retailers. Perhaps there are a hundred thousand people in this market addressing specific sector and organisational needs.
- Application builders. This is the fastest-growing market – people building applications that include an off-the-shelf ML/AI component as just another widget that helps them to solve a business problem or make their users happier. There could eventually be millions of these people building everything from in-house tools to scalable software-as-a-service offerings.
Providers targeting ML developers will benefit from recognizing these as three distinct markets.
Pingback: Running a (not-so) large LLM locally | Digital Lifecycle Management Ltd