RFC Reader Script

It is often useful to read RFCs of the Internet Engineering Task Force (IETF) when trying to understand some networking issue or differences in vendor implementations. RFCs can be read using a web browser, via the RFC Editor web site (RFC URLs look like https://www.rfc-editor.org/rfc/rfc<NUMBER>). The problem with this approach is RFC pagination, at least for RFC numbers below 8650, which is not well supported by web browser. Thus the header and footer line included as normal text in an RFC disturb the reading flow in a web browser. If only a web browser is available, the PDF rendering of an RFC is often the better choice than the HTML version.

Starting with the number 8650, RFCs are published as XML files that are not intended to be read directly, but require rendering into a different format, e.g., HTML or PDF. While there exists a rendering into text, this may omit figures and diagrams. Thus it seems safer to read those newer RFCs in the official PDF or HTML format. The PDF rendering uses pagination, while neither the HTML nor the text rendering are paginated. The official HTML rendering can be found at the RFC Editor web site (the URLs look like https://rfc-editor.org/rfc/rfc<NUMBER>).

Reading RFC Documents on GNU/Linux

When using a GNU/Linux system I prefer to retrieve the RFC text version with GNU wget, (slightly) reformat the text using a small AWK script and sed, and display it inside an appropriately sized XTerm (or other terminal emulator window) using the less pager. This allows flipping through the pages similar to a printed RFC. If the system supports UTF-8, e.g. using a relatively recent GNU/Linux distribution with default configuration, this method automatically supports the use of UTF-8 as found in, e.g., RFC 8187. This can done using the rfc-reader script found on this page:

rfc-reader <NUMBER>

For new RFCs published in XML with an official (or original) HTML rendering a more appropriate way to read an RFC on a GNU/Linux desktop system is to use xdg-open from xdg-utils to open the HTML version in a web browser:

xdg-open https://www.rfc-editor.org/rfc/rfc<NUMBER>.html

RFC-Reader Download

I have written a small Bash script called rfc-reader to comfortably read RFC documents in text format on GNU/Linux. It uses the method described above. It relies on GNU Wget to download files. When opening a new terminal window, it relies on the X Window System (also known as X Windows or just X) and works best with XTerm, but GNOME Terminal can work, too.

Download link

Download link for the rfc-reader Bash script: rfc-reader

Also in Single File Tools Collection

Besides the download link on this page, it is part of my collection of single file tools in a git repository on GitHub. A change log for rfc-reader can be found in form of the git commit log.

Since I am using GNU/Linux and prefer the GNU versions of Awk and sed, I am only testing with those implementations. Non-GNU implementations are often inferior and their use may break rfc-reader. Since version 0.40 rfc-reader requires the GNU Bash shell.

Usage

rfc-reader [OPTION...] [{rfc|bcp|fyi|ien|std}][{-|.| }]NUMBER[.txt]
rfc-reader [OPTION...] {I-D.|draft-}DRAFT_NAME[-DRAFT_NUMBER][.txt]
rfc-reader [OPTION...]

Options

-h
print help and exit
-V
print version information and exit
-L
print license and exit
-t
use current terminal instead of spawning a new one of appropriate size

Examples

rfc-reader                       # print version, copyright, help
rfc-reader 1                     # downloads and displays RFC 1
rfc-reader rfc1                  # downloads and displays RFC 1
rfc-reader rfc 1                 # downloads and displays RFC 1
rfc-reader rfc-1                 # downloads and displays RFC 1
rfc-reader rfc.1                 # downloads and displays RFC 1
rfc-reader rfc1.txt              # downloads and displays RFC 1
rfc-reader rfc0001.txt           # downloads and displays RFC 1
rfc-reader bcp78                 # downloads and displays BCP 78
rfc-reader fyi3                  # downloads and displays FYI 3
rfc-reader ien137                # downloads and displays IEN 137
rfc-reader std1                  # downloads and displays STD 1
rfc-reader draft-rep-wg-topic-00 # downloads & displays REP I-D
rfc-reader I-D.rep-wg-topic-00   # downloads & displays REP I-D
rfc-reader draft-rep-wg-topic    # downloads & displays REP I-D
rfc-reader I-D.rep-wg-topic      # downloads & displays REP I-D

Local RFC Storage

Some GNU/Linux Distributions include packages containing RFCs. rfc-reader looks for RFC files in the following directories, used by OpenSUSE and Debian/Ubuntu respectively:

/usr/share/doc/rfc
/usr/share/doc/RFC/links

If you know of additional locations used by GNU/Linux distributions or other operating systems to provide local RFC copies, please let me know.

Cache for Downloaded RFC Files

If one of the directories $HOME/.rfc-reader/cache or $XDG_CACHE_DIR/rfc-reader (if $XDG_CACHE_DIR is unset or set to the empty string, $HOME/.cache is used instead) exists and is readable, rfc-reader will look for RFC files there. If one of the directories is writable, rfc-reader will use it to save downloaded RFC files. If both directories are writable, rfc-reader will prefer $XDG_CACHE_DIR/rfc-reader as download cache, but will still read RFCs from both directories.

If a file from a series with changing contents for a constant document number is found in the directory selected as download cache, a refresh of the file is attempted. If this fails (e.g., while offline), the cached copy is used. Currently (2022-08-14) I know that Best Current Practice (BCP) and Internet Standard (STD) document contents for a given document number can change.

The download cache, if available, is used for support files as well, e.g., the list of Internet Drafts (all_id.txt) used to determine the latest version of an Internet Draft, if no specific version is requested.

Use Without the X Window System

If rfc-reader is started without an X Window System environment, i.e., with empty or unset DISPLAY environment variable, instead of opening a new terminal window of appropriate size, rfc-reader will use the terminal it is started in. If the RFC is paginated, and the terminal is too small to show a full RFC page inside the selected pager, rfc-reader will exit with an error.

Screenshots

Using XTerm and less

[image of rfc-reader using XTerm and less]

Significant Options

XTerm:
-fn c-9x18 -fg green -bg black -bd green -g 72x59
less:
-M -S
(For full options see the actual script.)

Using GNOME Terminal and less

[image of rfc-reader using GNOME Terminal and less]

Significant Options

GNOME Terminal:
--hide-menubar --disable-factory --geometry 72x59
less:
-M -S
(For full options see the actual script.)

Note on GNOME Terminal Use

Starting rfc-reader as a background job does not work well with GNOME Terminal. There seems to be a race regarding setting the terminal dimensions and starting the pager inside the terminal. The result is a wrong output size inside the pager. As a workaround you can start rfc-reader in the foreground, suspend it (usually CTRL+Z in the terminal it was started from), and then make it a background process with bg.

The problem mentioned above does not occur with XTerm.

On Re-Formatting the RFC

The RFCs use a format that works well when printed on paper. It uses pages with headers and footers. All pages are roughly the same size, one which fits the paper size of most text printers (but actual RFC page size varies slightly). Defining page breaks and thus pagination for all of the RFC is actually necessary to ensure that ASCII art drawings, diagrams, and tables are not affected by page breaks automatically inserted when printing.

The Problem

Pagination, specifically using header and footer lines, does not work well with the variable size of screens, terminal windows, and fonts. So called pagers (like less) usually adapt to the current screen resp. window size and allow continuous scrolling through the text, losing the benefits of pagination. Worse, headers and footers interrupt reading instead of unobtrusively providing reference information. Therefore something a little more sophisticated than just viewing the original RFC text file with a pager inside an arbitrarily sized terminal window is needed. The good news ist that simple adjustments to both the RFC text file and the terminal window suffice.

It should be noted that some pagers, e.g., some versions of more and pg, support pausing on Form Feed characters. Using such a pager in a sufficiently large terminal allows reading an RFC page-by-page. With more -c, only the current page is displayed. Sadly, the more comfortable pagers less (de-facto default on GNU/Linux) and most do not support to pause on Form Feed.

The Solution

A text document comprised of pages with a fixed number of lines, the first and last used as header and footer, looks good in a pager providing exactly the same number of lines. Since RFCs have a defined maximum number of lines (58) and maximum number of characters per line (72) [see RFC 2223], using a text display area of this size should work fine, but not all pages in an RFC have the same length. For printing, use of the form feed character (FF, ASCII code 12) ends a page. Modern pagers usually do not use this convention. Therefore my rfc-reader script fills all pages to be of equal length (58 lines), allowing paginated reading on screen in an appropriatly sized terminal window showing the pager output, and opens a terminal window of just the right size wherein the re-formatted text is shown inside a pager.

Using an Existing Terminal

To display an RFC in text format with a pager inside a sufficiently large terminal window, the form feed character should be expanded to fill the terminal (excluding the status line of the pager). When using a shell that sets the LINES environment variable, e.g. GNU Bash, this can be done as follows (using less as pager):

awk -v n_lines="${LINES}" '!/\f/ {line++; print}; /\f/ {for (i=line; i+1<n_lines; i++) print ""; line=0}; END {for (i=line; i+1<n_lines; i++) print ""}' RFC.txt | sed '/^$/!s/^/  /' | less -MS

Alternatively, you can use env to unset the DISPLAY environment variable and use rfc-reader. Thus rfc-reader cannot open a new terminal and will try to use the existing one in which it was started instead:

env -u DISPLAY rfc-reader RFC

Alternative Solution

While I prefer to read paginated RFC documents in a page-based viewer, you might prefer to remove pagination from the RFC and read the contents as one long stream of text. To do this you can use the rfcstrip tool (not to be confused with the different rfcstrip mentioned in RFC 8407 section 3.11) in combination with a pager. Since rfcstrip removes the page breaks (i.e., form feed characters) along with running headers and footers, and uses a sophisticated squeezing of empty lines, the results constitute nice input for any pager, including less, without need for specific options:

rfcstrip RFC.txt | less

Future RFC Format Change

See the RFC Format Change FAQ for an overview and links to the details of the planned changes to the format of (new) RFC documents. Information pertaining to rfc-reader and reading new-format RFCs on text-based systems can be found below.

RFC 6949 drops the pagination requirements (section 3.3), thus the rfc-reader script will lose some of its usefulness for new RFCs some time in the future, but will remain very helpful for RFCs published according to RFC 2223. This includes the already published well-paginated RFCs since there is no plan to re-publish them in the new format.

RFC 7990 names XML using the xml2rfc v3 vocabulary as described in RFC 7991 the canonical format for future RFC documents. While XML is a text-based format, it is not conveniently readable as-is.

A plain-text version of the XML RFC shall be produced as one of several RFC publication formats. This plain-text version is intended to be a low-fidelity version of the RFC, possibly omitting figures instead of using ASCII art as known from previous RFCs. This plain-text version might still use pagination, i.e., form feed characters, with a maximum of 58 lines per page, but omit page headers and footers (see RFC 7994, but RFC 7990 specifies that unpaginated plain-text will be created, and RFC 6949 retired the pagination requirement). Thus rfc-reader might still be able to correctly format the new plain-text RFCs, but with little, if any, benefit over just using a pager that ignores the meaning of form feed characters. Perhaps the best way to view those new-style plain-text RFCs will be to remove the form feeds, reduce consecutive empty lines to a single empty line (often called squeezing), and use a pager to view the result. This sqeezing is done automatically by rfc-reader if the RFC text does not contain form feed characters.

RFC Format Change

As of October 2019, the RFC document format change no longer lies in the future.

Basic Pagination (RFC 7994 section 4)

According to section 4 of RFC 7994, the text rendering of new RFCs in canonical XML format was supposed to use basic pagination, i.e., form feed characters as page breaks, but neither per page headers nor footers. This section promised instructions or a script on how to remove this basic pagination.

Removing (stripping) Pagination

The rfcstrip script can remove basic pagination from such a text rendering. It can remove classical pagination as well. Thus a script is already available.

Since basic pagination means just the addition of form feed characters, it can be stripped out using a POSIX compatible tr:

tr -d '\f' < RFC.txt

Adding Pagination

But it seems as if the new text rendering of RFC documents in canonical XML format does not use any pagination, not even basic pagination. To prepare the text version for printing on a text printer, it may help to use pr to add pagination. This, of course, risks introducing page breaks inside of figures, since pr does not analyze the content to avoid this.

There also is a simple Python script available to add pagination to the unpaginated text rendering of an RFC called pagerfc available under a free software license. Currently (2024-04-08), this script does not seem to consider artwork when inserting page breaks.

The free software xml2rfc Python script can still create pagination, at least for Internet-Drafts, even though the RFC text renderings created by the RFC Editor are not paginated. It could be possible to use it to create a paginated text rendering from the XML version of an RFC, too.

New RFC Format without Pagination for Text Rendering Observed in the Wild (October 2019)

Looking at RFC 8655, pagination seems to be dropped from the text version, at least the text version of this RFC accessed on 2019-10-25 does not contain any form feed characters. The same holds for RFC 8651, the first RFC published using the new format. The lowest numbered RFC using the new format seems to be RFC 8650.

The plain-text version of an RFC will no longer be restricted to ASCII, but will use UTF-8 encoding, with the (at best useless for UTF-8) Unicode Byte Order Mark (BOM) prepended. It is probably best to remove the BOM before viewing the text-format RFC.

Viewing the (possibly crippled, because figures and diagrams might be omitted) plain-text version of a future RFC could thus be done as follows:

sed '1s/^'$'\uFEFF''//;/^[[:space:]]*\f[[:space:]]*$/d' RFC.txt | less -s
(The above works with GNU sed, uses sed to remove the BOM as well as lines with a form feed character possibly surrounded by white space used for pagination, and uses a function of the less pager to squeeze blank lines. The rfc-reader script automatically removes a prepended BOM in UTF-8 encoding.)

Questionable Usefulness of Text Rendering Going Forward

As RFC 7994 notes in its Security Considerations section, “[U]nintended changes to the text as a result of the transformation from the base XML file could in turn corrupt a standard, practice, or critical piece of information about a protocol.”

The PDF/A-3 (see RFC 7995) and HTML (see RFC 7992 and RFC 7993) versions of those future RFCs, rendered from the canonical XML version, are probably the only realistic options left to read all content of a future RFC, including figures and diagrams. This, of course, requires the use of complex software and graphical displays.

The 50th anniversary of the RFC series in 2019 thus marks the end of an era.


back to my homepage.