chplspell¶
Overview¶
util/devel/chplspell
is a script to assist in spell-checking the
Chapel documentation and source code. It is a wrapper around the
scspell source-code spell-checker.
chplspell
provides four main conveniences over simply using scspell
:
It has built-in knowledge of which files and directories in the Chapel repository benefit from being spell-checked.
It directs
scspell
to use the project dictionary file$CHPL_HOME/util/devel/chplspell-dictionary
.It recurses through directories given on the command line.
It invokes the
scspell
that’s installed in the Chapel virtualenv.
This document describes some basic information about chplspell
, common
use cases, a description of the dictionary files, and some less common
use cases related to management of the dictionary files.
Basic information¶
chplspell
depends on scspell
being installed in the virtualenv. To
install it, use
make chplspell-venv
scspell
(and thus chplspell
) has two main modes of invocation:
interactive and non-interactive. chplspell
further provides two ways
of using each mode: spell-checking the whole project or only specific
files or directories.
chplspell
maintains a project dictionary for Chapel in
$CHPL_HOME/util/devel/chplspell-dictionary
. This dictionary file
contains several types of word lists, as supported by scspell
:
- Natural language dictionary:
Words that may be found in any file.
- Programming-language dictionaries:
Words that may be found in certain types of files, identified by file extension.
- File-specific dictionaries:
Words that may be found only in particular files.
chplspell
passes nearly all command line options through to
scspell
, and reports scspell
’s usage message when invoked with
--help
. See the scspell page on python.org for information
about scspell
’s command line arguments, approach to spell
checking source code, and user interface.
chplspell
adjusts the command line in several ways:
chplspell
passes options toscspell
directing it at$CHPL_HOME/util/devel/chplspell-dictionary
.If no files or directories are given on the command line,
chplspell
invokesscspell
on a default set of files and directories that make sense for the Chapel repository.If directories are given on the command line,
chplspell
invokesscspell
on all files of certain types within those directories, recursively.In case 2 or 3 above,
chplspell
may invokescspell
twice: once for most of the files it finds, and again for any LaTeX files it finds. This is because LaTeX does not use C-style escapes.This is a minor point, relevant only to understanding why, when you hit ^C,
chplspell
keeps spell-checking; and why words ignored with the “Ignore all” interactive command are forgotten when the LaTeX portion begins.
The configuration of the “default set of files and directories” is at
the top of the chplspell
script, and may be easily altered.
Non-interactive invocation¶
The simplest use is to produce a non-interactive report for the default files and directories of the Chapel repository. As of the date of this writing, there were still some words yet to be added to the project dictionary or corrected, which make a good example:
$ chplspell --report-only doc/rst/developer/bestPractices/README.md:50: 'chplspell' not found in dictionary (from token 'chplspell') CHANGES.md:445: 'chpldocumentation' not found in dictionary (from token 'chpldocumentation') CHANGES.md:2001: 'pshm' not found in dictionary (from token 'pshm') CHANGES.md:2341: 'pshm' not found in dictionary (from token 'pshm') CHANGES.md:3360: 'circularities' not found in dictionary (from token 'circularities')
chplspell
may also be invoked on only particular files or directories
named on the command line. For example, this file with one tyop:
$ chplspell --report-only doc/rst/developer/bestPractices/SpellChecking.rst doc/rst/developer/bestPractices/SpellChecking.rst:109: 'tyop' not found in dictionary (from token 'tyop') $
(The project dictionary now includes the word “tyop” for this file, so the above command no longer produces that result.)
Interactive invocation¶
scspell
provides an interactive mode for making corrections and for
adding words to the various dictionaries. This mode is also available
through chplspell
.
See the scspell page on python.org for details.
chplspell
’s invocation of scspell
makes any requested
dictionary changes to $CHPL_HOME/util/devel/chplspell-dictionary
Dictionary file details¶
This section provides a few details about the format of scspell
’s
dictionary file. Understanding of these details is not necessary to
make use of chplspell as described above. It will be helpful in
making use of the more advanced options in the next section.
The natural language word list contains the words that may appear in
any file being spell checked. It is the last word list in the
dictionary file, under the heading NATURAL:
.
A “programming language” word list is used in addition to the natural
language word list when the file being checked matches one of the file
extensions given for that word list. They appear in the dictionary
file on lines beginning with FILETYPE:
, e.g.,
FILETYPE: TeX/LaTeX; .tex, .bib
A file-specific word list is used in addition when a file has a
matching “file id”. These are stored in the dictionary file under
FILEID:
headings, e.g.
FILEID: 42424242-4242-4242-4242-424242424242
There are two ways that a file id’s association with a file may be
represented to scspell
:
The file contains the string “scspell-id: “ followed by a file id; e.g., in a comment.
There is an entry in the “file id mapping file”,
$CHPL_HOME/util/devel/chplspell-dictionary.fileids.json
, associating the file name to the file id. For example, the following file id is associated with two files in the Chapel repository:
"63b96a22-1e46-11e6-a3a6-10ddb1d4c3d5": [
"doc/rst/developer/hdfs_and_chapel/API.tex",
"doc/rst/developer/hdfs_and_chapel/examples.tex"
],
If a file has a file id associated, when scspell
offers to add an
unrecognized word to a dictionary, one of the offered dictionaries is
this (f)ile-specific dictionary
.
If there is no file id associated with the file, scspell
will
instead offer the option to create a (N)ew file-specific
dictionary
. This option will create the new file id, add it to the
dictionary.fileids.json
file, and add the unrecognized word to
that file-specific word list in the dictionary
file.
If a file with a file-specific word list is moved or copied (e.g., the
shootout benchmarks), and the association is via the file id mapping,
chplspell
won’t have the existing word list associated with the
new file. The next section describes several ways to remedy this
situation and similar ones without creating duplicate file-specific word
lists.
As of this writing, no files in the Chapel repository contain a file id literal; all file id mappings are done through the file id mapping file.
Dictionary file management options¶
–rename-file¶
chplspell
makes scspell
’s –rename-file option available to
update the file id map after a file has been renamed:
git mv path/to/old.chpl new/path/and/new.chpl chplspell --rename-file path/to/old.chpl new/path/and/new.chpl
Unfortunately there is not yet a straight-up --copy-file
–merge-file-ids¶
scspell
also provides a –merge-file-ids option for the case that two
files have file-specific word lists, and the word lists are similar enough
that they should be merged. The file ids may be given by the file id
literal string, or by the name of a file associated with the file id:
chplspell --merge-file-ids one/file.chpl a/similar/file.chpl
The only impact of the order is which file id hex string ends up associated with the files.
–delete-files¶
The --delete-files
option to chplspell
may be used to remove the
association between a file id and a deleted file from the dictionary
file. If that was the only file associated with that file id,
chplspell
will also remove the file id itself and the file-specific
dictionary.
git rm doc/obsolete doc/archaic.md chplspell --delete-files doc/obsolete doc/archaic.md
Edit the dictionary.fileids.json file¶
You can edit the file by hand to add a filename to a file id, or change a filename. The format is straightforward JSON.
One minor detail (likely of interest only to those so hung up on
minutiae as to write a spell checking utility) is that while scspell
emits the file id mapping file with no trailing newline, most text
editors take some convincing to save a file that way. To avoid git
commits fighting over that last byte, it’d be considerate to get rid
of that newline before committing.
pico -L
is the simplest way I’ve found. Otherwise, you can make
the change, then invoke chplspell
to get it to re-write the file. The
file will be rewritten only if there are changes to make to it, so
you’ll likely need to make two changes that add up to no effect, such
as the sequence
chplspell --rename-file CONTRIBUTORS.md SCHMONTRIBUTORS.md chplspell --rename-file SCHMONTRIBUTORS.md CONTRIBUTORS.md
(No files are renamed by this – these operations manipulate only the file id mapping.)