What will be the lifetime of AVX512? There have been many similar extensions before it. So it's a great result, but heavily marked by the target platform. I have the hope that RISC-V vector extensions will prove to be the more durable substrate to develop on, and a result there would be much more relevant for the future.
AVX-512, originally called the "Larrabee New Instructions" has been the only decent vector extension of the Intel-AMD ISA, which has been coherently planned instead of being a heap of more or less randomly chosen instructions, each being thought to be useful to accelerate some particular benchmark or a certain workload of one of the big customers.
MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).
AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.
The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.
It will be literal decades before RISC-V becomes mainstream. Not because it’s not a perfectly fine ISA, but because business incentive structures aren’t nowhere near supporting it.
Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.
I wonder if this can be categorized as galactic algorithm. I can't imagine systems where bulk of processing goes into integer to decimal string conversion but maybe there are such.
My understanding of a Galactic Algorithm is that it has better performance scaling based on input size/complexity, but its overhead is such that it will not actually be faster unless you use it for impracticality large inputs.
I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).
It takes a substantial amount of time when emitting lots of numbers in JSON, happens very commonly.
And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.
It’s faster for 3 digits and more. 3 digits is not galactic scale. Otoh, if over half of your numbers are single digits, it will lose to other implementations. I think that is more often the case that we’d like it to be.
I always use binary interchange formats between programs so I am not familiar with the overhead caused by format conversions. Even when displaying numbers for reading them, in the case of floating-point numbers that are displayed in the "scientific" format, i.e. with exponents, I prefer to have only the exponent as a decimal number, but the significand as a hexadecimal number. So I do not need fast algorithms for number conversions.
Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.
You know, if can change code without overhead to ends of the pipeline, using the language & library of my choice, I’d do this too. For many of us this isn’t always the case.
There already exists a large installed base of AMD Zen 4 and Zen 5 CPUs.
Next year, these AVX-512 supporting CPUs will be joined by AMD Zen 6 and Intel Nova Lake. Starting with Intel Nova Lake, all future Intel CPUs will support AVX-512.
I don’t think that’s correct, Intel is transitioning to AVX10, which is essentially the instruction set of AVX-512 but without mandating 512 but vector width. Future E cores, afaik, will still only be capable of 256 bit vector ops. EDIT: ok maybe not, it sounds like that was the plan a year or so ago but newer articles are saying future E cores will actually support 512b.
About a half of year ago Intel has announced that they will mandate the 512-bit vector width and the full AVX-512 support in all future CPUs, starting with Nova Lake.
Obviously, they were forced to do this to align with AMD. Moreover, Intel has announced that they will coordinate with AMD and with the major customers the future ISA extensions, so that all future Intel and AMD CPUs will remain mostly ISA compatible, at least for the user applications.
Not long ago, there has been published a joint AMD-Intel whitepaper about the future "AI Compute Extensions for x86", which will be present in future AMD and Intel CPUs for accelerating AI inference, extending the AVX-512 ISA, and which are similar to the Advanced Matrix Extension currently supported by some of the Intel server CPUs, but the new ISA extensionn are better compatible with AVX-512.
This document demonstrates that at least for now Intel and AMD have understood that implementing a compatible ISA is their greatest moat against Arm and other competitors, so they must better coordinate their extensions than trying to pull in different directions.
The problem is AVX-512 was disabled in later Intel Alder Lake CPUs, and later generation Intel desktop CPUs, so very few Intel desktop CPUs have AVX-512 now. Ironic that AMD has better support/performance for an ISA extension that Intel invented.
Sure, it's not just the support though. As I understand it it also has serious power and frequency implications. Also if your process uses AVX-512 you suddenly have an extra 2kB of data to save/restore on context switches. Maybe not super significant but I really doubt this will ever make it into standard libraries.
AVX-512 is being discontinued in newer Intel consumer CPUs, particularly with the Alder Lake series, where it has been completely disabled through BIOS updates.
AVX-512 had been discontinued in the CPU generations from Alder Lake until the Panther Lake, Wildcat Lake and Clearwater Forest CPUs introduced during the first half of 2026, but Intel has committed than all future Intel CPUs will implement the complete 512-bit variant of the AVX-512 a.k.a. AVX10 ISA, starting with the Nova Lake desktop and laptop CPUs, to be launched by the end of this year.
Obviously, the competition from the AMD Zen 4, Zen 5 and Zen 6 CPUs, all of which implement AVX-512 and easily beat any Intel CPU in any workload that has been updated to use the AVX-512 ISA, has forced Intel to reconsider its previous decision.
From all the workloads that I execute on my laptops or desktops, there is only one where the speed matters yet it is not significantly affected by the use of the AVX-512 ISA: the compilation of big software projects.
All the other things that I do and which can take a noticeable CPU time (i.e. not time used for waiting on SSDs or other peripherals) can be accelerated by AVX-512. This includes things like computing file hashes, data compression and encryption algorithms, graphics/audio/video algorithms and also EDA/CAD applications.
SIMD-accelerated integer-to-string conversion https://lemire.me/blog/2026/05/18/simd-accelerated-integer-t...
Other speedy things:
On-Demand JSON: A Better Way to Parse Documents? https://lemire.me/en/publication/arxiv231217149/
Parsing Millions of URLs per Second https://lemire.me/en/publication/arxiv231110533/
Transcoding Unicode Characters with AVX-512 Instructions https://lemire.me/en/publication/arxiv221205098/
MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).
AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.
The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.
Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.
https://en.wikipedia.org/wiki/Galactic_algorithm
I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).
And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.
Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.
Next year, these AVX-512 supporting CPUs will be joined by AMD Zen 6 and Intel Nova Lake. Starting with Intel Nova Lake, all future Intel CPUs will support AVX-512.
About a half of year ago Intel has announced that they will mandate the 512-bit vector width and the full AVX-512 support in all future CPUs, starting with Nova Lake.
Obviously, they were forced to do this to align with AMD. Moreover, Intel has announced that they will coordinate with AMD and with the major customers the future ISA extensions, so that all future Intel and AMD CPUs will remain mostly ISA compatible, at least for the user applications.
Not long ago, there has been published a joint AMD-Intel whitepaper about the future "AI Compute Extensions for x86", which will be present in future AMD and Intel CPUs for accelerating AI inference, extending the AVX-512 ISA, and which are similar to the Advanced Matrix Extension currently supported by some of the Intel server CPUs, but the new ISA extensionn are better compatible with AVX-512.
This document demonstrates that at least for now Intel and AMD have understood that implementing a compatible ISA is their greatest moat against Arm and other competitors, so they must better coordinate their extensions than trying to pull in different directions.
AVX-512 is being discontinued in newer Intel consumer CPUs, particularly with the Alder Lake series, where it has been completely disabled through BIOS updates.
AVX-512 had been discontinued in the CPU generations from Alder Lake until the Panther Lake, Wildcat Lake and Clearwater Forest CPUs introduced during the first half of 2026, but Intel has committed than all future Intel CPUs will implement the complete 512-bit variant of the AVX-512 a.k.a. AVX10 ISA, starting with the Nova Lake desktop and laptop CPUs, to be launched by the end of this year.
Obviously, the competition from the AMD Zen 4, Zen 5 and Zen 6 CPUs, all of which implement AVX-512 and easily beat any Intel CPU in any workload that has been updated to use the AVX-512 ISA, has forced Intel to reconsider its previous decision.
All the other things that I do and which can take a noticeable CPU time (i.e. not time used for waiting on SSDs or other peripherals) can be accelerated by AVX-512. This includes things like computing file hashes, data compression and encryption algorithms, graphics/audio/video algorithms and also EDA/CAD applications.