chplspell

Overview

util/devel/chplspell is a script to assist in spell-checking the Chapel documentation and source code. It is a wrapper around the scspell source-code spell-checker.

chplspell provides four main conveniences over simply using scspell:

  1. It has built-in knowledge of which files and directories in the Chapel repository benefit from being spell-checked.

  2. It directs scspell to use the project dictionary file $CHPL_HOME/util/devel/chplspell-dictionary.

  3. It recurses through directories given on the command line.

  4. It invokes the scspell that’s installed in the Chapel virtualenv.

This document describes some basic information about chplspell, common use cases, a description of the dictionary files, and some less common use cases related to management of the dictionary files.

Basic information

chplspell depends on scspell being installed in the virtualenv. To install it, use

make chplspell-venv

scspell (and thus chplspell) has two main modes of invocation: interactive and non-interactive. chplspell further provides two ways of using each mode: spell-checking the whole project or only specific files or directories.

chplspell maintains a project dictionary for Chapel in $CHPL_HOME/util/devel/chplspell-dictionary. This dictionary file contains several types of word lists, as supported by scspell:

Natural language dictionary:

Words that may be found in any file.

Programming-language dictionaries:

Words that may be found in certain types of files, identified by file extension.

File-specific dictionaries:

Words that may be found only in particular files.

chplspell passes nearly all command line options through to scspell, and reports scspell’s usage message when invoked with --help. See the scspell page on python.org for information about scspell’s command line arguments, approach to spell checking source code, and user interface.

chplspell adjusts the command line in several ways:

  1. chplspell passes options to scspell directing it at $CHPL_HOME/util/devel/chplspell-dictionary.

  2. If no files or directories are given on the command line, chplspell invokes scspell on a default set of files and directories that make sense for the Chapel repository.

  3. If directories are given on the command line, chplspell invokes scspell on all files of certain types within those directories, recursively.

  4. In case 2 or 3 above, chplspell may invoke scspell twice: once for most of the files it finds, and again for any LaTeX files it finds. This is because LaTeX does not use C-style escapes.

    • This is a minor point, relevant only to understanding why, when you hit ^C, chplspell keeps spell-checking; and why words ignored with the “Ignore all” interactive command are forgotten when the LaTeX portion begins.

The configuration of the “default set of files and directories” is at the top of the chplspell script, and may be easily altered.

Non-interactive invocation

The simplest use is to produce a non-interactive report for the default files and directories of the Chapel repository. As of the date of this writing, there were still some words yet to be added to the project dictionary or corrected, which make a good example:

$ chplspell --report-only

doc/rst/developer/bestPractices/README.md:50: 'chplspell' not found in dictionary (from token 'chplspell')
CHANGES.md:445: 'chpldocumentation' not found in dictionary (from token 'chpldocumentation')
CHANGES.md:2001: 'pshm' not found in dictionary (from token 'pshm')
CHANGES.md:2341: 'pshm' not found in dictionary (from token 'pshm')
CHANGES.md:3360: 'circularities' not found in dictionary (from token 'circularities')

chplspell may also be invoked on only particular files or directories named on the command line. For example, this file with one tyop:

$ chplspell --report-only doc/rst/developer/bestPractices/SpellChecking.rst
doc/rst/developer/bestPractices/SpellChecking.rst:109: 'tyop' not found in dictionary (from token 'tyop')
$

(The project dictionary now includes the word “tyop” for this file, so the above command no longer produces that result.)

Interactive invocation

scspell provides an interactive mode for making corrections and for adding words to the various dictionaries. This mode is also available through chplspell.

See the scspell page on python.org for details.

chplspell’s invocation of scspell makes any requested dictionary changes to $CHPL_HOME/util/devel/chplspell-dictionary

Dictionary file details

This section provides a few details about the format of scspell’s dictionary file. Understanding of these details is not necessary to make use of chplspell as described above. It will be helpful in making use of the more advanced options in the next section.

The natural language word list contains the words that may appear in any file being spell checked. It is the last word list in the dictionary file, under the heading NATURAL:.

A “programming language” word list is used in addition to the natural language word list when the file being checked matches one of the file extensions given for that word list. They appear in the dictionary file on lines beginning with FILETYPE:, e.g.,

FILETYPE: TeX/LaTeX; .tex, .bib

A file-specific word list is used in addition when a file has a matching “file id”. These are stored in the dictionary file under FILEID: headings, e.g.

FILEID: 42424242-4242-4242-4242-424242424242

There are two ways that a file id’s association with a file may be represented to scspell:

  1. The file contains the string “scspell-id: “ followed by a file id; e.g., in a comment.

  2. There is an entry in the “file id mapping file”, $CHPL_HOME/util/devel/chplspell-dictionary.fileids.json, associating the file name to the file id. For example, the following file id is associated with two files in the Chapel repository:

"63b96a22-1e46-11e6-a3a6-10ddb1d4c3d5": [
  "doc/rst/developer/hdfs_and_chapel/API.tex",
  "doc/rst/developer/hdfs_and_chapel/examples.tex"
],

If a file has a file id associated, when scspell offers to add an unrecognized word to a dictionary, one of the offered dictionaries is this (f)ile-specific dictionary.

If there is no file id associated with the file, scspell will instead offer the option to create a (N)ew file-specific dictionary. This option will create the new file id, add it to the dictionary.fileids.json file, and add the unrecognized word to that file-specific word list in the dictionary file.

If a file with a file-specific word list is moved or copied (e.g., the shootout benchmarks), and the association is via the file id mapping, chplspell won’t have the existing word list associated with the new file. The next section describes several ways to remedy this situation and similar ones without creating duplicate file-specific word lists.

As of this writing, no files in the Chapel repository contain a file id literal; all file id mappings are done through the file id mapping file.

Dictionary file management options

–rename-file

chplspell makes scspell’s –rename-file option available to update the file id map after a file has been renamed:

git mv path/to/old.chpl new/path/and/new.chpl
chplspell --rename-file path/to/old.chpl new/path/and/new.chpl

Unfortunately there is not yet a straight-up --copy-file

–merge-file-ids

scspell also provides a –merge-file-ids option for the case that two files have file-specific word lists, and the word lists are similar enough that they should be merged. The file ids may be given by the file id literal string, or by the name of a file associated with the file id:

chplspell --merge-file-ids one/file.chpl a/similar/file.chpl

The only impact of the order is which file id hex string ends up associated with the files.

–delete-files

The --delete-files option to chplspell may be used to remove the association between a file id and a deleted file from the dictionary file. If that was the only file associated with that file id, chplspell will also remove the file id itself and the file-specific dictionary.

git rm doc/obsolete doc/archaic.md
chplspell --delete-files doc/obsolete doc/archaic.md

Edit the dictionary.fileids.json file

You can edit the file by hand to add a filename to a file id, or change a filename. The format is straightforward JSON.

One minor detail (likely of interest only to those so hung up on minutiae as to write a spell checking utility) is that while scspell emits the file id mapping file with no trailing newline, most text editors take some convincing to save a file that way. To avoid git commits fighting over that last byte, it’d be considerate to get rid of that newline before committing.

pico -L is the simplest way I’ve found. Otherwise, you can make the change, then invoke chplspell to get it to re-write the file. The file will be rewritten only if there are changes to make to it, so you’ll likely need to make two changes that add up to no effect, such as the sequence

chplspell --rename-file CONTRIBUTORS.md SCHMONTRIBUTORS.md
chplspell --rename-file SCHMONTRIBUTORS.md CONTRIBUTORS.md

(No files are renamed by this – these operations manipulate only the file id mapping.)