>, making local LLMs far more viable on future Macs
If you want to run on a Mac Pro or iMac, this will be fine, but at the price points, youd be silly to spend money on either when you can do a dual nvidias for the same ram, and that will be dedicated ram.
For portable Apple devices, the max memory you can get currently is 24GB IIRC and thats probably not going to change any time soon. The only decent model that can run locally is Gemma 27B QAT which will eat up 17gb at the minimum, and that model really struggles with some stuff that you can do for free on ChatGPT or Gemini
So yeah, speed is not gonna matter when results are shit.
24gb is the minimum ram in the Pro lineup, the max is 128gb and can run 200B models.
The mac ultra has up 512gb and while expensive has more than twice the memory than any GPU alternative on a similar price point.
What was dragging it behind was the lack of Matmul acceleration, which seems that will change soon. Likely nvidia cards will still be faster and have better support, but paying a very big premium for it (ironic that apple is the cheaper option for once)
The first SoC including Neural Engine was the A11 Bionic, used in iPhone 8, 8 Plus and iPhone X, introduced in 2017. Since then, every Apple A-series SoC has included a Neural Engine.
https://news.ycombinator.com/newsfaq.html
If you want to run on a Mac Pro or iMac, this will be fine, but at the price points, youd be silly to spend money on either when you can do a dual nvidias for the same ram, and that will be dedicated ram.
For portable Apple devices, the max memory you can get currently is 24GB IIRC and thats probably not going to change any time soon. The only decent model that can run locally is Gemma 27B QAT which will eat up 17gb at the minimum, and that model really struggles with some stuff that you can do for free on ChatGPT or Gemini
So yeah, speed is not gonna matter when results are shit.
The mac ultra has up 512gb and while expensive has more than twice the memory than any GPU alternative on a similar price point.
What was dragging it behind was the lack of Matmul acceleration, which seems that will change soon. Likely nvidia cards will still be faster and have better support, but paying a very big premium for it (ironic that apple is the cheaper option for once)
It isn’t cheap, but you can buy a 16 inch MacBook Pro with 128GB unified memory today.
Useful /r/LocalLlama discussion: https://www.reddit.com/r/LocalLLaMA/comments/1ncprrq/apple_a...