Intro

Recently it seems like there have been a truly dizzying number of LLM models released. I normally come across these announcements on X or Hacker News and I find it increasingly difficult to keep track of who released which model when. And I find it even harder to fit each new release into the larger context.

What version of Llama are we on?
Did Qwen 2 come out before or after Phi-3?
When might we be due for the next release from Anthropic?

The Dataset

I searched for a dataset with organization, model series, model, model release date, and number of parameters but had trouble finding something comprehensive. So I set about curating my own.

I focused on “open” series, defined here as those that supported commercial use in most cases. I also threw in some of the primary closed models for additional context. I wanted to avoid recording all models to keep the visualization interpretable, so I focused on “high profile” models.

Wikipedia’s LLM was a great starting point. Ollama and Hugging Face models cards helped to fill in some gaps along with individual model release pages from the releasing organizations themselves.

Timeline

I quite like timeline visualizations for making sense of the world.

I’m a big fan of the plotly.express.timeline implementation in particular. px.timeline yields an interactive, web-based visualization with filtering, zooming, and hover capabilities. It has sensible defaults for a quick first draft, but is also highly configurable to enable fine-tuning of the visualization.

Figure 1:Timeline of Recent LLM Releases

As you can see above, I set each releasing organization to have its own swinlane. Each model is represented by a bar and colored by its model series. Cross hatches indicate that commericial use is not allowed under the license. Additional metadata including exact release date and model sizes become visible when hovering over the individual bars.

I like how the timeline allows me to see the cadence of releases from each organization and how the release rate is generally slowing down across organizations (see Microsoft’s Phi and Alibaba’s Qwen series).

Next steps

After investing the time to build the initial dataset and visualization, it should be easy to keep up to date, which I’ll likely do every several months.

I think there are also some interesting extensions to this work:

Prompt an LLM to generate the dataset. Knowledge cut-offs mean LLMs are blind to the recent past, so I’d have to configure the LLM to use web browsing tools (say with Ollama or LangChain) to query for the very latest announcements. If it’s possible to automate the data curation process, then generating similar timelines for other families of models (such speech-to-text, image generation, etc) would be feasible.
I’m also interested in generating other types visualizations that I haven’t yet come across, for example:
- a scatter plot visualization of model performance versus quantized model size to aid in model selection. Is full size 16-bit Llama3 8B better or worse than 4-bit Llama3 70B?
- tree diagrams to illustrate the lineage of models through pre-training, fine-tuning, HLRF, quantization, and any other steps.

Conclusion

Please let me know if you have any feedback! I’m interested to know if others find this useful, if I’m missing any major models or if you have any other ideas for extensions!