Last summer, I had the opportunity to give the keynote at HIPS 2025—the 30th International Workshop on High-Level Parallel Programming Models and Supportive Environments. This was quite an honor since, over its history, HIPS has been a key workshop for projects like Chapel that strive to create [note:For readers unfamiliar with HIPS, its publications focus on high-level programming of multiprocessors, compute clusters, and massively parallel machines via language design, compilers, runtime systems, and programming tools. A long-term refrain from its call for papers has been “We especially invite papers demonstrating innovative approaches in the area of emerging programming models for large-scale parallel systems and many-core architectures.”].

To commemorate the 30th instance of HIPS, I took the approach of using my talk to reflect on the past 30 years of programming within the field of HPC, or High-Performance Computing. This was a sobering exercise, but one that was well-received. In November, I reprised the talk in a condensed lightning talk format for CLSAC 2025. In this blog article, I’ll attempt to capture some of the main elements of those talks for a wider audience.

30 Years of Top HPC Systems

Like so many “n years of HPC” retrospectives, let’s start by looking to the [note:The TOP500 is a ranking of HPC systems, as measured by their performance on the Linpack benchmark. All TOP500 results and images in this article originate from top500.org and are used with permission. Note that I’ve updated the original talk contents to reflect the latest results from November 2025.] to see how HPC systems themselves have changed over the past three decades. For simplicity, I’ll just focus on the top five systems from each list.

Top HPC Systems in 1995

Browsing the results from 30 years ago—November 1995—we see that systems from Fujitsu, Intel, and Cray make up the top five, where their network interconnects used crossbar, 2D mesh, and 3D torus topologies, respectively. Core counts ranged from 80 to 3,680, and performance as measured by Rmax values ranged from 98.9 to 170 GFlop/s. The following screenshot from the TOP500 website summarizes these systems and results:

 

Top HPC Systems Today

Jumping forward to the latest TOP500 list, published in November 2025, we see systems from HPE Cray, Eviden/Bull, and Microsoft. These are running using Slingshot-11 and InfiniBand NDR interconnects that utilize topologies based on dragonfly[+] and/or fat-trees. Core counts have jumped to the millions (2,073,600–11,340,000 cores), and Rmax values range from 561 to 1809 PFlop/s:

 

HPC Systems: Then vs. Now

Summarizing the changes over these 30 years, core counts have increased by a factor of 100s to 100s of thousands, while performance has improved by factors of millions to 10s of millions—a massive improvement!

1995 top 5 2025 top 5 Delta
Cores 80–3680 2,073,600–11,340,000 ~563–141,750×\times
Rmax 98.9–170 GFlop/s 561.2–1809 PFlop/s ~3,300,000–18,300,000×\times
Vendors Fujitsu, Intel, Cray HPE, Eviden, Microsoft
Networks crossbar, mesh, torus dragonfly[+], fat-trees higher-radix, lower-diameter

 

Million-fold improvements like these don’t happen without significant effort, even with the passage of decades of time; so it’s worth reflecting on what changes in hardware and HPC system architecture took place over this period to generate the massive gains seen here. Though I’m not a hardware architect, from my perspective, I tend to think of the main factors as having been:

Beyond the performance improvements that can be attributed to these changes, it’s interesting to consider their impacts on programmers. Specifically, which changes have made HPC programming easier, and which have made it harder? Think about your answers, and I’ll return to this question in a bit.

30 Years of HPC Programming

Next, let’s consider the dominant HPC programming notations over this same time period. Unfortunately, there isn’t an obvious analogue to the TOP500 for HPC programming, so for this article, I’ll give you my take on things based on my experiences, research, and memory.

HPC Programming circa 1995

From my perspective in November 1995, the dominant and most broadly adopted HPC programming languages were Fortran, C, and C++. For scripting, the dominant technologies seemed to be Perl, sh/csh/tcsh, or Tcl/TK.

MPI, PVM, and SHMEM were the [note:It’s fair to wonder to what degree hindsight affects my characterizations here. Were MPI or SHMEM truly “dominant” in 1995? Or is it only because we can validate their longevity today that I consider them to be?] ways of programming distributed-memory systems at the time. High Performance Fortran (HPF) was getting a lot of attention and funding, but my perception is that it was not really getting a lot of use in practical applications developed outside of the teams who were researching and developing it.

For shared-memory parallelism, I was surprised to be reminded that OpenMP was still a few years in the future at this time, forming its Architecture Review Board and publishing its 1.0 specification in 1997. In 1995, you likely would have turned to POSIX threads or vendor-specific compiler pragmas and markups (such as Cray Microtasking) if you wanted loop- or thread-level parallelism. Then again, since processors were typically single-core at that time, you also might not bother unless they supported vector instructions.

HPC Programming Today

If we think about what is broadly adopted in HPC today, the list is disappointingly similar to 1995. As far as programming languages go, Fortran, C, and C++ still dominate the landscape in HPC. Though PVM has fallen off and HPF failed to catch on, MPI and SHMEM are still alive and well, dominating distributed-memory HPC programming. After its 1997 launch, OpenMP quickly became dominant for shared-memory programming and remains so today, making it a mainstay for most of the past 30 years. Kokkos, a C++ library-based notation is one of the few programming models to make significant inroads towards HPC adoption over the past decade or so, serving as an alternative to OpenMP for shared-memory parallelism.

The biggest change in HPC programming notations since 1995 has been caused by the advent of GPUs on HPC systems, and the resulting need to program them. Unfortunately, none of the 1995-era technologies were sufficient to target GPUs, leading to a plethora of new technologies being created to fill the gap. These arrived in the form of language extensions and libraries, such as CUDA, HIP, SYCL, OpenACC, OpenCL, and Kokkos. Other technologies like OpenMP evolved significantly in order to support GPUs, becoming a bit more imperative by nature in the process.

In the realm of scripting, Python largely displaced Perl and Tcl/TK, while bash has generally replaced sh, csh, and tcsh as the dominant shell scripting language.

While HPC hardware has become far more capable over the past 30 years, the HPC notations used in practice have largely stayed the same. Notably, we have failed to broadly adopt any new compiled programming languages.

HPC Programming: Then vs. Now

Summarizing, I’d consider the broadly adopted HPC programming notations of 30 years ago vs. today to be as follows:

Category 1995 Notations 2025 Notations
Languages Fortran, C, C++ Fortran, C, C++
Inter-node MPI, PVM, SHMEM MPI, SHMEM
Intra-node Pthreads, vendor extensions
(with OpenMP on the horizon)
Pthreads, OpenMP, Kokkos
GPUs N/A CUDA, HIP, SYCL, OpenMP,
OpenACC, OpenCL, Kokkos
Scripting Perl, sh/csh/tcsh, Tcl/TK Python, bash

 

So, while HPC hardware has become far more capable over the past 30 years, resulting in amazing strides in terms of system performance, efficiency, and scalability, the HPC notations used in practice have largely [note:Champions of Fortran, C++, MPI, or other entries on this list could argue that while the names may be the same, the technologies themselves have evolved and improved significantly over the past 30 years. For example, Fortran 2008 evolved to support distributed programming, and C++ added features for shared-memory parallelism. While such advances are important and notable, I’d say that the overall paradigm presented to users by these models remains very similar, relying on SPMD programming models, explicit communication, and relatively low-level base languages compared to more modern alternatives.], modulo the introduction of GPU computing. Perhaps most notably, as a community, we have failed to broadly adopt any new compiled programming languages for HPC.

Standing Still? Or Losing Ground?

In addition to not taking a great leap forward in the past 30 years, HPC programming has arguably lost ground due to the increased complexity of the hardware. Of the hardware changes listed above, most of them have made programming more difficult. Vector instructions, multicore processors, and GPUs have introduced new styles of parallelism for programmers to express in order to use their processors effectively. Meanwhile, the growth in cores per CPU, chiplet-based designs, and GPUs have introduced Non-Uniform Memory Access (NUMA) characteristics, which require greater sensitivity to data placement and affinity on the programmer’s part.

The fact that most of our hardware advances have required us to supplement programming notations of the past with new approaches suggests that our programming models haven’t been sufficiently abstracted from the hardware they target.

In fact, of the hardware advances on my list, I’d say that only the high-radix, low-diameter networks have been a boon to programmability, in the sense that they have made sensitivity to network topology much less of an issue than it was in the 1990’s. Back then, HPC programmers would often spend effort optimizing for a particular network topology—e.g., mesh, hypercube, or ring-of-rings. Such concerns are much rarer today, thankfully, where “local vs. remote” tends to be the dominant issue rather than the specifics of which nodes are communicating.

The fact that most of our hardware advances have required us to supplement programming notations of the past with new features or approaches suggests that our programming models haven’t been sufficiently abstracted from the hardware they target. Arguably, if they were able to express parallelism and locality in ways that were more general-purpose and hardware-neutral, we wouldn’t need to be writing programs using a mix of programming notations, such as C++, MPI, OpenMP and/or CUDA.

Why the Stasis in HPC Languages?

Focusing on the ‘Languages’ row of the summary table above, it’s interesting to speculate about why no new programming languages have been broadly adopted in HPC over the past 30 years. Here are some possible explanations, as well as why I don’t think they necessarily hold up:

Is Language Design Dead?

Could the reason be that language design is dead, as was asserted by an anonymous reviewer on one of our team’s papers ~30 years ago?

“Programming language design ceased to be relevant in the 1980’s.”

— Anonymous reviewer, circa 1995 (paraphrased, from memory)

If we look to programming outside of HPC, the answer seems to be an obvious “no.” Specifically, a plethora of new languages have emerged or risen to prominence in the mainstream during the past 30 years, including:

Such languages have become favorite day-to-day languages of many users across multiple disciplines, suggesting that language design is far from dead.

Moreover, if we look at what motivated these language designs and why they took hold, recurring themes include productivity, safety, portability, and performance—things that are also very important and desirable to HPC programmers:

Language Productivity Safety Portability Performance
Java
Javascript
Python
C#
Go
Rust
Julia
Swift

 

Despite that thematic resonance, these languages aren’t particularly HPC-ready, at least without continuing to mix in other technologies like MPI. Although most of them have built-in features for concurrency, parallelism, or asynchrony, they provide little to no help with controlling locality or affinity, which is crucial for scalable performance in HPC, and arguably where existing HPC notations result in the most headache for users.

Maybe HPC Doesn’t Need New Languages?

Another explanation might be that HPC doesn’t really need new languages; that Fortran, C, and C++ are somehow optimal choices for HPC. But this is hard to take very seriously given some of the languages’ demerits, combined with the fact that they are being (or have been) supplanted by more modern alternatives in mainstream sectors.

I think it’s definitely fair to say that Fortran, C, and C++ are sufficient for HPC, in the sense that the vast majority of notable HPC computations from the past 30 years have been achieved using them (in combination with libraries, directives, and extensions). However, to me, that’s a bit like saying assembly programmers in the 1950’s didn’t really need Fortran. Though assembly may have been sufficient, raising the level of abstraction to provide cleaner syntax and semantic checks, while also enabling compiler optimizations was, in hindsight, pretty clearly the obvious right evolutionary step to take.

Modern programmers would be shocked if they were expected to manually move values in and out of registers. We should be striving for languages and compilers that similarly handle data transfers across nodes, or between GPU and CPU memories.

Continuing with the Fortran analogy, at their core, most HPC notations tend to be fairly mechanism-oriented:

This is arguably a big part of why we have to keep adding new notations whenever system architectures evolve. Though HPC programming isn’t literally assembly, it’s similarly focused on manually directing the use of system capabilities. It’s also similar in its focus on explicitly moving data across the memory hierarchy—simply at a different levels than before. Where assembly programmers move values between memory and registers, HPC programmers express copies between distinct memories using various mechanisms like MPI_Send/Recv(), shmem_put(), or cudaMemcpy().

A good language would bring similar benefits to the HPC field as Fortran did for assembly: improved syntax for productivity, semantic checks for safety, and compiler optimizations for performance. In the same way that most modern programmers would be shocked if they were expected to manually move values in and out of registers today, we should be striving for languages and compilers that produce a similar response in future HPC programmers by handling data transfers across nodes, or between GPU and CPU memories.

The Fortran analogy also extends to programmer attitudes: Just as assembly programmers were reluctant to give up their control and place faith in optimizing compilers, so have HPC programmers been reluctant to give up their Fortran, C++, and MPI—and not without reason! Having control is important in HPC, since (in theory) it gives programmers access to the system’s raw capabilities with nothing standing in the way. But just as Fortran didn’t remove the ability to drop down to assembly when needed, good HPC languages would similarly support calling out to existing low-level notations, or embedding them directly.

Is it for Lack of Trying?

A third potential explanation for why new HPC languages haven’t taken off could be due to a lack of attempts to create them. But as anyone paying attention to the past 30 years of HPC research knows, this is clearly not the case. Focusing on what I’d consider to be the most notable HPC programming language designs from the past 30 years, we have:

And there have been many more in addition to these.

In creating this list, I don’t mean to imply that all of these attempts were suitable for broad adoption. As a personal example, while I consider my graduate school team’s work on ZPL to have been a great academic project that made notable contributions, it’s not a language that was positioned to be broadly adopted for [note:Among them: a lack of generality; a lack of typical commonplace mainstream features like object-oriented programming; insufficiently rich forms of parallelism for the architectures that were on the horizon at the time; and insufficient capabilities for programming at a lower level or interoperating with other languages.].

Failure to broadly adopt new HPC languages thus far doesn’t mean that we should stop trying. Failures should be considered an opportunity for learning and inspiration rather than “proof” that pursuing HPC languages is pointless or without value.

OK, Then Why?

In my opinion, the relative stasis in HPC programming languages can be attributed to a number of factors:

So What Should We Do?

If you believe, as I do, that we can and should do more to nurture the creation and adoption of new languages for scalable parallel programming, here are some things for us to do:

Hold on, What About Chapel?

Those who know me, or my team’s work on the Chapel language, may be surprised not to see it mentioned more in this article, and curious to know how it fits into this narrative. I didn’t want Chapel to dominate this article, but I would like to touch on its place in the landscape before wrapping up.

Chapel is a prime example of several benefits that languages can bring to scalable computing that I mentioned in this article:

I didn’t put Chapel on my list of broadly adopted HPC programming notations above, in large part to avoid being presumptuous. But it’s also because, regrettably, I don’t consider Chapel’s support within the community to be as solid as the others on my list. Despite those hesitations, I think Chapel is competitive with them in many respects. For example, I believe we have grown a larger user community than some of the other notations on my list, and in a more organic manner, with less marketing from large institutions. Unfortunately, most of Chapel’s users tend to be academic groups who can afford to try an emerging language in their work, yet without being in a position to fund its development themselves.

Chapel’s future in large part depends on the degree to which the parallel programming community has an appetite for alternatives to the status quo and a desire to support such an alternative.

When I think of the biggest risks to Chapel’s longevity, they overlap heavily with the factors above related to stasis in HPC language design. Finding research funding for Chapel was not terribly difficult, but finding funding to support users and improve our implementation over the long-haul has been far more so. Chapel is considered an expensive software project, and perhaps it has been relative to many HPC software teams; yet it’s dwarfed by most HPC hardware projects, despite continually building on its investments rather than needing to start from scratch with each new hardware generation. Ironically, its longevity has also become something of a hindrance because we’re no longer the flashy new kid on the block, so it’s easy to lazily think things like “if it hasn’t taken over the world by now, something must be wrong with it;” or, on the opposite end of the spectrum, “it’s been around for quite awhile, so probably will be forever.”

Meanwhile, some of my factors for stasis are also to our advantage. Chapel does meet the unique needs of HPC, while also having a role to play in desktop, cloud, and AI computing. There are not many other languages vying for the title of general-purpose scalable language anymore. And given the choice of modifying or maintaining code written for libraries and/or by AI in Chapel vs. conventional languages, Chapel has distinct strengths and advantages.

At this point, Chapel’s future depends primarily on our ability to grow the community of contributors, stakeholders, and investors, which in large part depends on the degree to which the parallel programming community has an appetite for alternatives to the status quo, and a desire to support such an alternative.

In Closing

Though the lack of new, broadly adopted programming languages in HPC over the past 30 years is disheartening to me, I still retain hope. I believe that the benefits of using a language that’s purpose-built for parallelism and scalability are significant. I also believe they are largely unknown to most HPC programmers, due to their not having had the opportunity to try them. In our project’s experience, we’ve seen the impact that Chapel can have on users’ ability to get things done productively and efficiently, and we want to replicate that experience from tens of applications to hundreds or thousands.

I consider current and aspiring parallel programmers to be at least as worthy of modern, post-Fortran/C/C++ languages as the Python, Rust, Swift, and Julia communities are.

I’d like to close by asserting that for all the reasons that new HPC languages have not been adopted, I consider current and aspiring parallel programmers to be at least as worthy of modern, post-Fortran/C/C++ languages as the Python, Rust, Swift, and Julia communities are. I also desperately hope that when 30 more years have passed—or ideally, well before then—we’ll have at least one broadly adopted language that supports scalable parallel programming rather than our current count of zero.

For More Information

On the Chapel website, you can browse the slides from the HIPS and CLSAC talks that this article was based upon. If you’d like to read more about why I think Chapel is well-positioned to be a broadly adopted HPC language despite all the challenges around doing so, check out my 10 Myths About Scalable Parallel Programming Languages (Redux) series on this blog, or jump to the final article’s summary to get the takeaways and pick an entry point that’s attractive to you. And, if you’d like to discuss this topic more, I’m always interested in good conversations on it.


Acknowledgments: I’d like to thank Engin Kayraklioglu for providing helpful feedback and advice on this article, and also for encouraging me to capture these talks in blog form to begin with. I’d also like to thank Michael Gerndt, Amir Raoofy, and the HIPS 2025 committee for the opportunity to create and present this talk in its original form.