The Chapel Parallel Programming Language


Archived Publications and Papers (reverse chronologically)

Towards Chapel-based Exascale Tree Search Algorithms: dealing with multiple GPU accelerators, [slides]. Tiago Carneiro, Nouredine Melab, Akihiro Hayashi, Vivek Sarkar. HPCS 2020, Outstanding Paper Award winner. March 22–27, 2021.
This paper revisits the design and implementation of tree search algorithms dealing with multiple GPUs, in addition to scalability and productivity-awareness using Chapel. The proposed algorithm exploits Chapel's distributed iterators by combining a partial search strategy with pre-compiled CUDA kernels for more efficient exploitation of the intra-node parallelism.
A Machine-Learning-Based Framework for Productive Locality Exploitation. Engin Kayraklioglu, Erwan Fawry, Tarek El-Ghazawi. IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS). Volume 32, Issue 6. June 2021
This paper describes an approach that can efficiently train machine learning models that can be used to improve application execution times and scalability on distributed memory systems. This is achieved by analyzing the fine-grained communication profile of the application with small input data, and then predicting the communication patterns for more realistic inputs and coarsening the communication.
Development of Parallel CFD Applications with the Chapel Programming Language, [video | slides]. Matthieu Parenteau, Simon Bourgault-Côté, Frédéric Plante, Engin Kayraklioglu, Éric Laurendeau. AIAA Scitech 2021 Forum. January 13, 2021.
This paper describes a Computational Fluid Dynamics framework being developed using Chapel by a team at Polytechnique Montreal. The use of Chapel is described, and scaling results are given on up to 9k cores of a Cray XC. Comparisons are made against well-established CFD software packages.
Hypergraph Analytics of Domain Name System Relationships. Cliff A Joslyn, Sinan Aksoy, Dustin Arendt, Jesun Firoz, Louis Jenkins, Brenda Praggastis, Emilie Purvine, Marcin Zalewski. 17th Workshop on Algorithms and Models for the Web Graph (WAW 2020). September 21–24, 2020.
This paper applies hypergraph analytics over a gigascale DNS data using CHGL, performing compute-intensive calculations for data reduction and segmentation. Identified portions are then sent to HNX for both exploratory analysis and knowledge discovery targeting known tactics, techniques, and procedures.
A Comparative Study of High-Productivity High-Performance Programming Languages for Parallel Metaheuristics [PDF]. Jan Gmys, Tiago Carneiro, Nouredine Melab, El-Ghazali Talbi, Daniel Tuyttens. Swarm and Evolutionary Computation, volume 57. September 2020.
This paper compares Chapel with Julia, Python/Numba, and C+OpenMP in terms of performance, scalability and productivity. Two parallel metaheuristics are implemented for solving the 3D Quadratic Assignment Problem (Q3AP), using thread-based parallelism on a multi-core shared-memory computer. The paper also evaluates and compares the performance of the languages for a parallel fitness evaluation loop, using four different test functions with different computational characteristics. The authors provide feedback on the implementation and parallelization process in each language.
Towards ultra-scale Branch-and-Bound using a high-productivity language. Tiago Carneiro, Jan Gmys, Nouredine Melab, and Daniel Tuyttens. Future Generation Computer Systems, volume 105, pages 196-209. April 2020.
This paper uses Chapel to study the design and implementation of distributed Branch-and-Bound algorithms for solving large combinatorial optimization problems. Experiments on the proposed algorithms are performed using the Flow-shop scheduling problem as a test-case. The Chapel-based application is compared to a state-of-the-art MPI+Pthreads-based counterpart in terms of performance, scalability, and productivity.
withall: A Shorthand for Nested for Loop + If Statement [slides]. Tomsy Paul and Sheena Mathew. The International Conference on Innovative Data Communication Technologies and Application (ICIDCA 2019). October 17–18, 2019.
This paper describes a new programming language construct, 'withall', designed to support a combined for-loop and conditional for array processing. The authors implemented the feature in the context of the Chapel compiler.
Graph Algorithms in PGAS: Chapel and UPC++. Louis Jenkins, Jesun Sahariar Firoz, Marcin Zalewski, Cliff Joslyn, Mark Raugas. 2019 IEEE High Performance Extreme Computing Conference (HPEC ‘19), September 24–26, 2019.
This paper compares implementations of Breadth-First Search and Triangle Counting in Chapel and UPC++
An Incremental Parallel PGAS-based Tree Search Algorithm. Tiago Carneiro and Nouredine Melab. The International Conference on High Performance Computing & Simulation (HPCS 2019). July 15–19, 2019.
This paper describes the use of Chapel in implementing a parallel tree search algorithm for solving combinatorial problems.
Productivity-Aware Design and Implementation of Distributed Tree-Based Search Algorithms. Tiago Carneiro and Nouredine Melab. International Conference on Computational Science (ICCS 2019). June 12–14, 2019.
This paper describes an initial exploration of using Chapel to implement a parallel tree search algorithm for solving combinatorial problems.
A Machine Learning Approach for Productive Data Locality Exploitation in Parallel Computing Systems. Engin Kayraklioglu, Erwan Favry, and Tarek El-Ghazawi. CCGrid 2019, May 14–17, 2019.
This paper describes a machine learning-based approach to automatically add bulk communication in distributed memory applications, as prototyped in Chapel.
High Performance Hypergraph Analytics of Domain Name System Relationships [slides]. Cliff A Joslyn, Sinan Aksoy, Dustin Arendt, Louis Jenkins, Brenda Praggastis, Emilie Purvine, Marcin Zalewski. HICSS Symposium on Cybersecurity Big Data, January 8, 2019.
This paper reports on the use of novel mathematical methods over a large quantity of DNS data in the Chapel Hypergraph Library, a new platform for high performance hypergraph analytics.
Chapel Aggregation Library (CAL) [slides]. Louis Jenkins, Marcin Zalewski, and Michael Ferguson. Parallel Applications Workshop, Alternatives to MPI (PAW-ATM 2018), held at SC18. November 16, 2018.
This paper describes a minimal, generic, and easy-to-use aggregation library written entirely in Chapel, for Chapel.
Chapel HyperGraph Library (CHGL) [slides]. Louis Jenkins, Tanveer Bhuiyan, Sarah Harun, Christopher Lightsey, David Mentgen, Sinan Aksoy, Timothy Stavenger, Marcin Zalewski, Hugh Medal, and Cliff Joslyn. 2018 IEEE High Performance Extreme Computing Conference (HPEC '18). September 25–27, 2018.
This paper describes the design and implementation of a HyperGraph library provided as a scalable distributed data structure.
LAPPS: Locality-Aware Productive Prefetching Support for PGAS. Engin Kayraklioglu, Michael Ferguson, and Tarek El-Ghazawi. ACM Transactions on Code and Architecture Optimizations. Volume 15, Issue 3. September 2018.
This paper describes a high-level, easy-to-use language feature to improve data locality efficiently.
ChplBlamer: A Data-centric and Code-centric Combined Profiler for Multi-locale Chapel Programs [slides]. Hui Zhang and Jeffrey K. Hollingsworth. In Proceedings of the 32nd ACM International Conference on Supercomputing (ICS'18), pages 252–262. June 2018.
This paper describes a tool that uses a combination of data-centric and code-centric information to relate performance profiling information back to user-level data structures and source code in Chapel programs.
Chapel Comes of Age: Productive Parallelism at Scale [slides (with outtakes)]. Brad Chamberlain, Elliot Ronaghan, Ben Albrecht, Lydia Duncan, Michael Ferguson, Ben Harshbarger, David Iten, David Keaton, Vassily Litvinov, Preston Sahabu, and Greg Titus. CUG 2018, Stockholm Sweden, May 22, 2018.
This paper describes the progress that has been made with Chapel since the HPCS program wrapped up.
APAT: an access pattern analysis tool for distributed arrays. Engin Kayraklioglu and Tarek El-Ghazawi. In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF'18), pages 248–251. May 2018.
This paper proposes a high-level, data-centric profiler to analyze how distributed arrays are used by each locale.
Data-Centric Performance Measurement Techniques for Chapel Programs [slides]. Hui Zhang and Jeffrey K. Hollingsworth. the 31st IEEE International Parallel and Distributed Processing Symposium, Orlando FL, May 30, 2017.
This paper describes a profiling tool that associates performance with data structures (e.g., arrays) rather than code locations and its use in optimizing Chapel code.
A Study of the Bucket-Exchange Pattern in the PGAS Model Using the ISx Integer Sort Mini-Application. Jacob Hemstad, Ulf R. Hanebutte, Ben Harshbarger, and Bradford L. Chamberlain. PGAS Applications Workshop (PAW) at SC16, November 14, 2016.
This is a short paper on the ISx benchmark in SHMEM and Chapel, including early performance results for the Chapel version (for updates see recent Chapel release notes).
Optimizing PGAS Overhead in a Multi-Locale Chapel Implementation of CoMD [slides]. Riyaz Haque and David Richards. PGAS Applications Workshop (PAW) at SC16, November 14, 2016.
This is a study of the CoMD proxy application in Chapel conducted by LLNL.
Chapel chapter, Bradford L. Chamberlain, Programming Models for Parallel Computing, edited by Pavan Balaji, published by MIT Press, November 2015.
This is currently the best introduction to Chapel's history, motivating themes, and features. It also provides a brief summary of current and future activities at the time of writing. An early pre-print of this chapter was made available under the name A Brief Overview of Chapel.
LLVM-based Communication Optimizations for PGAS Programs. Akihiro Hayashi, Jisheng Zhao, Michael Ferguson, Vivek Sarkar. 2nd Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC2), November 2015.
This paper describes how LLVM passes can optimize communication in PGAS languages like Chapel. In particular, by representing potentially remote addresses using a distinct address space, existing LLVM optimization passes can be used to reduce communication.
Caching Puts and Gets in a PGAS Language Runtime [slides]. Michael P. Ferguson, Daniel Buettner. 9th International Conference on Partitioned Global Address Space Programming Models (PGAS 2015), Sept 2015.
This paper describes an optimization implemented for Chapel in which the runtime library aggregates puts and gets in accordance with Chapel's memory consistency model in order to reduce the potential overhead of doing fine-grained communications.
Parameterized Diamond Tiling for Stencil Computations with Chapel Parallel Iterators. [slides]. Ian J. Bertolacci, Catherine Olschanowsky, Ben Harshbarger, Bradford L. Chamberlain, David G. Wonnacott, Michelle Mills Strout. ICS 2015, June 2015.
This paper explores the expression of parameterized diamond-shaped time-space tilings in Chapel, demonstrating competitive performance with C+OpenMP along with significant software engineering benefits due to Chapel's support for parallel iterators.
Towards Resilient Chapel: Design and Implementation of a Transparent Resilience Mechanism for Chapel, Konstantina Panagiotopoulou and Hans-Wolfgang Loidl. EASC '15, April 21-23 2015.
This paper describes the design and prototype implementation of resilience support for Chapel in a transparent manner.
A Study of Successive Over-relaxation (SOR) Method Parallelization Over Modern HPC Languages [code], Sparsh Mittal, International Journal of High Performance Computing and Networking (IJHPCN), vol. 7, no. 4, 2014
This paper compares Chapel, D, and Go in the context of Successive Over-relazation.
Affine Loop Optimization Based on Modulo Unrolling in Chapel [slides], Aroon Sharma, Darren Smith, Joshua Koehler, Rajeev Barua, and Michael Ferguson, PGAS 2014, October 7-10, 2014
This paper describes an optimization that coarsens communications via modifications to Chapel's leader/follower iterators.
Benchmarking Usability and Performance of Multicore Languages (awarded "Best Paper"), Sebastian Nanz, Scott West, Kaue Soares da Silveira, and Bertrand Meyer. ESEM 2013, October 2013.
This paper compares Chapel, Cilk, Go, and TBB across a suite of six benchmarks (with both beginner and expert versions of each), comparing code size, coding time, execution time, and speedup.
Examining the Expert Gap in Parallel Programming, Sebastian Nanz, Scott West, and Kaue Soares da Silveira. Euro-Par 2013, August 2013.
This paper studies the impact of expert opinions on benchmark codes written in Chapel, Cilk, Go, and TBB.
The State of the Chapel Union [slides]. Bradford L. Chamberlain, Sung-Eun Choi, Martha Dumler, Thomas Hildebrandt, David Iten, Vassily Litvinov, Greg Titus. CUG 2013, May 2013.
This paper provides a snapshot of the Chapel project at the juncture between the end of the HPCS project and the start of the next phase in Chapel's development. It covers past successes, current status, and future directions.
A Brief Overview of Chapel (revision 1.0). Bradford L. Chamberlain. (pre-print of a chapter that is to appear in an upcoming programming models book), January 2013.
This pre-print chapter serves as a good overview of Chapel's history, motivating themes, and features. It also provides a brief summary of future activities. It's currently the best overview in print about the Chapel project.
Run, Stencil, Run! HPC Productivity Studies in the Classroom [slides], Helmar Burkhart, Madan Sathe, Matthias Christen, Olaf Schenk, and Max Rietmann. PGAS 2012, October 2012.
This paper describes classroom productivity studies conducted at the University of Basel, comparing Chapel with Java, OpenMP, MPI, UPC, and PATUS.
Global Data Re-allocation via Communication Aggregation in Chapel [slides], Alberto Sanz, Rafael Asenjo, Juan Lopez, Rafael Larrosa, Angeles Navarro, Vassily Litvinov, Sung-Eun Choi, and Bradford L. Chamberlain. UMA-DAC-12/02 (this is an extended version of the paper that appeared at the 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'2012), New York City, NY), October 2012.
This paper describes a Chapel optimization that aggregates communication for array-to-array assignments (or slices thereof) to reduce communication overheads.
An Empirical Performance Study of Chapel Programming Language [slides], Nan Dun, Kenjiro Taura. HIPS 2012, May 2012.
This paper performs a performance study of various Chapel features with the goal of understanding the current performance obtained and identifying future optimization opportunities for the development team.
Performance Portability with the Chapel Language. Albert Sidelnik, Saeed Maleki, Bradford L. Chamberlain, María J. Garzarán, David Padua. IPDPS 2012, May 2012.
This paper describes the use of Chapel to target GPUs and multicore processors using a unified set of language concepts.
User-Defined Parallel Zippered iterators in Chapel [slides]. Bradford L. Chamberlain, Sung-Eun Choi, Steven J. Deitz, Angeles Navarro. PGAS 2011: Fifth Conference on Partitioned Global Address Space Programming Models, October 2011.
This paper describes how users can create parallel iterators that support zippered iteration in Chapel, demonstrating them via several examples that partition iteration spaces statically and dynamically.
Interfacing Chapel with Traditional HPC Programming Languages [slides], Adrian Prantl, Thomas Epperly, Shams Imam, Vivek Sarkar. PGAS 2011: Fifth Conference on Partitioned Global Address Space Programming Models, October 2011.
This paper describes work being done by LLNL and Rice to extend Babel's interoperability capabilities to support calls between Chapel and other HPC-oriented languages.
Composite Parallelism: Creating Interoperability Between PGAS Languages, HPCS Languages, and Message Passing Libraries, Thomas Epperly, Adrian Prantl, Bradford Chamberlain, LLNL Progress Report, September 2011.
This is a progress work reporting on the work described in the Prantl et al. PGAS 2011 paper in more detail.
A First Implementation of Parallel IO in Chapel for Block Data Distribution, Rafael Larrosa, Rafael Asenjo, Angeles Navarro, Bradford L. Chamberlain. ParCo 2011, September 2011.
This paper reports on some initial work to parallelize file I/O for Block-distribted arrays in Chapel
Authoring User-Defined Domain Maps in Chapel [slides]. Bradford L. Chamberlain, Sung-Eun Choi, Steven J. Deitz, David Iten, Vassily Litvinov. CUG 2011, June 2011.
This paper builds on our HotPAR 2010 paper by describing the programmer's role in implementing user-defined distributions and layouts in Chapel.
The Chapel Tasking Layer Over Qthreads [slides], Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, Bradford L. Chamberlain. CUG 2011, May 2011.
This paper reports on our initial work mapping Chapel's parallel tasks down to the Qthreads user-level tasking library being developed at Sandia National Laboratories.
A Scalable Implementation of Language-Based Software Transactional Memory for Distributed Memory Systems. Srinivas Sridharan, Jeffrey Vetter, Bradford L. Chamberlain, Peter Kogge, Steve Deitz. Technical Report Series No. FTGTR-2011-02, Oak Ridge, TN: Future Technologies Group, Oak Ridge National Lab, May 2011.
This paper reports on an implementation of Chapel's atomic statements using distributed Software Transactional Memory (STM) techniques.
Translating Chapel to Use FREERIDE: A Case Study in Using an HPC Language for Data-Intensive Computing. Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz. 16th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2011), May 2011.
This paper reports on a study investigating compiling Chapel features like reductions down to the FREERIDE library developed at OSU in support of data-intensive computing.
Using the High Productivity Language Chapel to Target GPGPU Architectures. Albert Sidelnik, Maria J. Garzaran, David Padua. UIUC Dept. of Computer Science Technical Report, April 2011.
This report presents initial work to target Chapel computation to GPUs using specialized domain maps.
User-Defined Distributions and Layouts in Chapel: Philosophy and Framework [slides]. Bradford L. Chamberlain, Steven J. Deitz, David Iten, Sung-Eun Choi. 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar'10), June 2010.
This paper describes our approach and software framework for implementing user-defined distributions and memory layouts using Chapel's domain map concept.
Five Powerful Chapel Idioms [slides] Steven J. Deitz, Bradford L. Chamberlain, Sung-Eun Choi, David Iten. CUG 2010, May 2010.
This paper highlights some powerful Chapel features through five short example codes.
Mechanisms that Separate Algorithms from Implementations for Parallel Patterns. Christopher D. Krieger, Andrew Stone, and Michelle Mills Strout. Workshop on Parallel Programming Patterns (ParaPLOP), March 2010.
This paper studies some common parallel programming patterns in Chapel and other programming models to study how entangled different concerns end up being.
HPC Challenge Benchmarks in Chapel (2009 entry) [slides]
This paper reports on our 2009 entry for the class 2 HPC Challenge competition, which was awarded "most elegant implementation." Our entries to previous years' competitions can be downloaded as well:
HPCC STREAM and RA in Chapel: Performance and Potential [slides], Steven J. Deitz, Bradford L. Chamberlain, Samuel Figueroa, David Iten, CUG 2009, May 2009.
This is an update to our May 2007 CUG paper, presenting initial results on the HPC Challenge benchmarks using distributed domains and arrays, along with pointers to next steps.
Scalable Software Transactional Memory for Global Address Space Architectures. Srinivas Sridharan, Jeffrey Vetter, Peter Kogge. Technical Report Series No. FTGTR-2009-04. Oak Ridge, TN: Future Technologies Group, Oak Ridge National Lab, April 2009.
This report describes GTM, a library designed to support scalable asynchronous distributed software transactional memory (STM).
Software Transactional Memory for Large-Scale Clusters, Robert L. Bocchino Jr., Vikram S. Adve, and Bradford L. Chamberlain, The 13th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP 2008), Salt Lake City, UT, February 2008..
This paper describes an initial effort to develop software to support distributed memory software transactional memory (STM) for use in Chapel.
Chapel: Productive Parallel Programming at Scale [slides | video], Bradford L. Chamberlain, Google Seattle Conference on Scalability, Seattle, WA, June 2008.
This is an abridged overview of Chapel, aimed at more of a mainstream technical audience, possibly with datacenter leanings, ratehr than the HPC community.
Multiresolution Languages for Portable yet Efficient Parallel Programming, Bradford L. Chamberlain, whitepaper, October 2007.
This is a position paper written in Q&A format that serves as the first written description of Chapel's multiresolution language design philosophy.
Parallel Programmability and the Chapel Language Bradford L. Chamberlain, David Callahan, Hans P. Zima. International Journal of High Performance Computing Applications, August 2007, 21(3): 291-312.
This is an early overview of Chapel's themes and main language concepts.
An Approach to Data Distributions in Chapel. Roxana E. Diaconescu and Hans P. Zima. International Journal of High Performance Computing Applications, August 2007, 21(3): 313-335.
This paper presents early exploratory work in developing a philosophy and foundation for Chapel's user-defined distributions.
Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT [slides]. Bradford L. Chamberlain, Steven J. Deitz, Mary Beth Hribar, Wayne A. Wong, CUG 2007, Seattle, WA, May 2007.
This paper provided the CUG community with an early look at three of the HPC Challenge benchmarks in Chapel.
Chapel: Cascade High-Productivity Language; An Overview of the Chapel Parallel Programming Model [slides]. Steven J. Deitz, Bradford L. Chamberlain, Mary Beth Hribar, CUG 2006, Lugano, Switzerland, May 2006.
This was a language overview to introduce the CUG community to Chapel.
Iterators in Chapel. Mackale Joyner, Bradford L. Chamberlain, Steven J. Deitz. Eleventh International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2006), Rhodes Island, Greece, April 25, 2006.
This paper presents some early work and approaches for implementing Chapel's iterators.
Global-view Abstractions for User-Defined Reductions and Scans. Steven J. Deitz, David Callahan, Bradford L. Chamberlain, Lawrence Snyder. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2006), March 2006
This paper outlines our general strategy for supporting user-defined reductions and scans in Chapel.
Reusable and Extensible High Level Data Distributions, Roxana E. Diaconescu, Bradford Chamberlain, Mark L. James, Hans P. Zima. In Proceedings of the Workshop on Patterns in High Performance Computing (patHPC), May 2005.
This paper strived to express the early ideas we were pursuing for user-defined data distributions using a patterns framework.
The Cascade High Productivity Language. David Callahan, Bradford L. Chamberlain, Hans P. Zima. In 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pages 52-60. IEEE Computer Society, April 2004.
This is the original Chapel paper which lays out some of our motivation and foundations for exploring the language. The language has evolved significantly since this paper was published, but it remains a good starting point for learning about Chapel.