Patches for GNU Datamash and Decorate

I have been trying to contribute to GNU datamash (and the bundled decorate tool), but I did not receive any reply to any of the patches I have sent to the bug-datamash mailing list for over two years. The patches had not been applied to the upstream git respository either. Thus I publish the patches here to make them available to whoever might be interested. I would prefer them to be applied upstream.

Update 2022-05-14: Shawn Wagner has started to apply some of the patches to the GNU datamash git repository. :-)

Update 2022-05-27: Tim Rice has added me to the GNU Datamash group on Savannah with push rights to the git repository. I will now apply my patches myself starting on 2022-05-28.


Patches Not (Yet) Applied Upstream

As of 2022-05-28, all of my patches have been applied upstream. Some have been applied directly by me.


Patches Already Applied Upstream

Fix Typo devian for deviation in Documentation

This trivial patch fixes the typo devian of the word deviation occurring twice in the GNU datamash documentation. (Mailing list post.)

Download datamash-doc-devian_typo_fix.patch.

This patch has been applied in git commit 0c838dbb63669391640d132b8b80c9d0b848c5dc (mailing list post).

Possible Build Fix for Fedora Datamash Package on armv7hl

A datamash package maintainer for Fedora has reported build failures for GNU datamash version 1.7, more specifically for the decorate program, on armv7hl (but not on other hardware plattforms). The problem is a warning regarding use of the wrong format conversion specifier (-Wformat=) combined with using the compiler option -Werror=format-security.

Debian bug 982869 might be related to this. In this bug, the line number reported for invalid input is wrong, but only on non-x86 32-bit architectures. This error message comes from the line where the wrong format specifier is reported during compilation.

Both of the following two patches are an attempt to fix this, but I do not have an armv7hl test system to verify. The two patches are mutually exclusive, either one is an attempt to use a correct format conversion specifier instead of the incorrect %zu. The first uses %jd, the second PRIdMAX. Since PRIuMAX is already used in the GNU datamash sources, but not %jd, I would use the second patch. Both work on my x86 64-bit system, but there the original code in question compiles without the warning, so I cannot reproduce the problem and thus cannot verify if the suggested solutions really work. (Mailing list post.)

Download datamash-decorate-possible_armv7hl_build_fix-variant_1.patch or datamash-decorate-possible_armv7hl_build_fix-variant_2.patch.

The second patch has been applied in git commit a6b1742741da516929bdbb6ee455c2f49707ae5d (mailing list post).

Fix Typos in GNU Datamash Info Documentation

A couple of typos in the GNU datamash manual were reported to the mailing list. This patch fixes them. (Mailing list post.)

Download datamash-doc-info-fix_typos.patch.

This patch has been applied in git commit 708ae4eeda42f8719c7515ef1f5e7bb74168be93 (mailing list post).

Fix Typo -h vs -H in Datamash Man Page

In one example in the datamash man page, the (upper case) short option -H is written in lower case, i.e., as -h, but case is significant for options. This trivial patch fixes this. (Mailing list post.)

Download datamash-man-short_option-H_typo.patch.

This patch has been applied in git commit f0fcfac4f62151cb930cfbb668a58cc2823b65f5 (mailing list post).

Fix --filler=X Default in datamash --help Output

The help output for the --filler=X option contains a format specifier (%s) to print the default filler value, but it is printed using fputs() and no value for %s is provided. This small patch fixes this by using printf() and providing the variable holding the filler value. (Mailing list post.)

Download 0001-datamash-fix-filler-X-default-in-help-output.patch.

This patch has been applied in git commit 9870438b80faaefa697c3f35465a09bc6612f3b1 (mailing list post).

Fix Description of Floor and Ceil in Man Page

The man page of datamash confuses the descriptions of the floor and ceil operations. This small patch fixes this. (Mailing list post.)

Download datamash-man-floor_ceil_confusion.patch.

This patch has been applied in git commit 7fd8288e6ef3d31f263b76e351409058f9cc46bd.

Mention LC_NUMERIC in the Documentation

A GNU datamash user was surprised to learn that the locale settings, i.e., the LC_NUMERIC environment variable contents, affect floating point format according to local customs (e.g., decimal-point character) and asked if this could be added to the documentation. This patch adds mentioning the LC_NUMERIC locale setting variable to the GNU datamash info documentation. (Mailing list post.)

Download datamash-doc-mention_LC_NUMERIC.patch.

This patch has been applied in git commit 998ab56678ee21d1d1e1c5661a81373f0c11c851.

Fix Man Page Name Given in --help Output

The --help output of both GNU datamash and decorate mentions a man page, but the man page name given is not completely correct.

For GNU datamash, the help output gives the command man GNU datamash, but the man program gives an error message for the GNU part of this command:

$ datamash --help | grep 'man '
  man GNU datamash
$ man GNU datamash
No manual entry for GNU
$ man 'GNU datamash'
No manual entry for GNU datamash
    

For decorate, the help output uses argv[0] as the man page name:

$ ./decorate --help | grep ' man '
  man ./decorate
    

Using the man invocation given in decorate's help output results in giving the decorate executable to man, instead of the man page, which does work as intended. This small patch fixes this. (Mailing list post.)

Download datamash-decorate-help-fix_man_page_name.patch.

This patch has been applied in git commit 3ee20a96f5299729e18b2f74002dc2c6bdde8907.

Combined Sort of IPv4 and IPv6 Addresses Using Decorate

IPv4 and IPv6 addresses can be seen as IP addresses. Logs from a dual-stack application, e.g., a web server, may contain either an IPv4 or IPv6 address at a given position in each line. Thus if one wants to sort the log file on the IP address, both IPv4 and IPv6 addresses need to be accepted as sort key and sorted consistently. One approach is to transform one address type into the other before sorting. IPv6 supports the transformation of IPv4 addresses into IPv6 addresses.

There are two common methods for accomodating IPv4 addresses in IPv6: IPv4-Mapped addresses and the deprecated IPv4-Compatible addresses. Both can be used to convert a given IPv4 address to an IPv6 address. Both IPv4-Mapped and IPv4-Compatible IPv6 address ranges are reserved by IANA and always represent IPv4 addresses in a dual stack enabled application. IPv4-Compatible addresses just add 96 leading zero bits to the 32 bit IPv4 address to create a 128 bit IPv6 address. This results in an ambiguity for the unspecified address (all-zero in both IPv4 and IPv6) and the IPv6 localhost address ::1 with the first host address of this network in IPv4 (0.0.0.1). IPv4-Mapped addresses avoid this ambiguity. But since IPv4-Compatible IPv6 addresses can be seen as treating the IP address (both version 4 and version 6) as a specific way to represent an integer value I think it is useful to support this transformation as well.

This patch (generated with git format-patch to allow convenient application to a git repository using git am) adds two conversion methods to decorate: ipv6v4map and ipv6v4comp. The conversion logically converts an IPv4 address to an IPv6 address, but the code actually creates a textual representation of an 128 bit integer from either an IPv4 or IPv6 address. (Mailing list post.)

Functionality like this was requested for sort from GNU Coreutils in 2011 and in 2015, but not accepted and rejected for GNU Coreutils.

Thus it seems appropriate for decorate to provide this functionality, since decorate has been created to ease using the decorate, sort, undecorate pattern to sort data in ways not directly supported by the system's sort program.

Download 0001-decorate-combined-sort-of-IPv4-and-IPv6-addresses.patch.

This patch has been applied in git commit 60245ffc120f80bc3d4e7a38154b87cd1aa4e680.


back to my homepage.