Hi I'm Matt. toodle represents me, and the projects, applications, tools and other bits and pieces I'm playing around with.
Why toodle? Because according to a popular LLM, it is a word that is "memorable, pleasant, and conveys a warmth that would likely be appealing for a personal portfolio".
Latest developments and exciting updates.
Projects within the toodle website family.
The UK's Leading Competition Directory
Tracking strikes
Live Music in Sheffield
Mass market competitions, anything and everything
Yoga for your soul
Watching what Donald Trump is getting up to
I've been up to
Thursday, 10 July 2025
A little research into using Macs for running LLMs locally.
For short conversations, ~20% of the processing would be for prompt processing and 80% for the token generation response.
Going by this summary, despite being 5 years old, the M1 Max and Ultra are still competitive for token generation (400GB/s and 800GB/s), matching the M2 Max & Ultra, M3 Max & Ultra while barely being surpassed by the M4 Max & Ultra (410GB/s & 820GB/s).
While memory bandwidth determines how fast you can run models, RAM size determines which models you can run at all.
Modern LLMs require substantial memory - a 7B parameter model needs roughly 14GB RAM for full precision, or 7GB with 8-bit quantization. The M1/M2 base models with 8GB RAM can only handle smaller quantized models, while the 32GB+ configurations of Max and Ultra variants can comfortably run 13B models and even some 30B models with aggressive quantization.
For local LLM work, that older M1 Max or Ultra might actually be the better buy than the latest M4 - especially on the used market where you can get serious memory bandwidth and capacity for much less.
However, NVIDIA chips still offer far faster memory bandwidth, but also require much more power.
That doesn't fit anywhere else
Music that inspires and drives progress
Healthy body. Healthy mind.