datamash.git
8 months agodatamash: keep "getnum" operation inside fields master
Erik Auerswald [Sun, 17 Dec 2023 16:39:40 +0000 (17:39 +0100)]
datamash: keep "getnum" operation inside fields

Before, the "getnum" operation could scan into adjacent fields,
because it used string operations on fields terminated by a
field separator, not a NUL byte.

* NEWS: Mention bug fix.
* src/utils.c (extract_number): Copy field contents to
  NUL-terminated string to ensure string operations stay
  inside the field.
* tests/datamash-tests-2.pl: Add new, and activate existing,
  tests for the corrected "getnum" behavior.
* tests/datamash-vnlog.pl: Add new tests for the corrected
  "getnum" behavior.

8 months agodatamash: bug fix for decimal sep. as field sep.
Erik Auerswald [Sat, 16 Dec 2023 17:09:04 +0000 (18:09 +0100)]
datamash: bug fix for decimal sep. as field sep.

When the locale's decimal separator is used as field separator,
reading a numeric value would continue over the field separator,
resulting in an "invalid numeric value" error message.  This is
now prevented by re-trying to read the numeric value after the
field contents have been copied to a NUL-terminated buffer.

* NEWS: Mention bug fix.
* THANKS: Add bug reporters.
* src/field-ops.c (field_op_collect): If strtold reads past the
  field contents, copy field contents to temporary buffer and
  try strtold again.
* tests/datamash-i18n-de.pl: Activate two previously failing
  tests, and add a test case from the recent bug report.
* tests/datamash-tests-2.pl: Activate two previously failing
  tests.

9 months agomaint: fix "make syntax-check" errors
Erik Auerswald [Sun, 10 Dec 2023 19:17:56 +0000 (20:17 +0100)]
maint: fix "make syntax-check" errors

* cfg.mk: Except file doc/fdl.texi from line length check.
* src/utils.c (update_best_seq): Add space before opening parenthesis.

9 months agotweak NEWS entry for datamash antimode bug fix
Erik Auerswald [Sun, 10 Dec 2023 11:21:50 +0000 (12:21 +0100)]
tweak NEWS entry for datamash antimode bug fix

A "set of numbers" implies that no number is repeated.  Both
the mode and antimode operations look at repeated numbers.
Thus use "sequence" instead of "set" in the description.

* NEWS: Replace "set" with "sequence" in latest bug fix entry.

9 months agodatamash: fix "antimode" operation
Erik Auerswald [Sat, 9 Dec 2023 16:58:29 +0000 (17:58 +0100)]
datamash: fix "antimode" operation

Both "mode" and "antimode" operations are implemented in the same
function "mode_value()" in file "src/utils.c".  The algorithm
traverses a list of sequences of numbers and attempts to remember
the longest (for mode) or shortest (for antimode).

Before, the sequence information was updated for every new number
encountered, i.e., in general before the actual sequence length
was known.  This approach works for mode, but not for antimode.

Now, the sequence information is updated only after a sequence
has ended.  There are two ways a sequence can end: (1) a different
number is read, (2) there are no more numbers to read.  In both
cases, this new sequence information (length and value) needs
to be checked against the previous longest/shortest length and
possibly updated.

* NEWS: Mention bug fix.
* src/utils.c (mode_value): Update best sequence information only
  after end of sequence.  Move update code to new function, because
  it is now used twice.
  (update_best_seq): New function.
* tests/datamash-tests-2.pl: Activate formerly broken now working
  "antimode" tests.

9 months agodatamash: new "mode" and inactive "antimode" test
Erik Auerswald [Sat, 9 Dec 2023 16:06:53 +0000 (17:06 +0100)]
datamash: new "mode" and inactive "antimode" test

* tests/datamash-tests-2.pl: Add new test for both "mode" and "antimode".
  The "antimode" test currently fails and is commented out.

9 months agodatamash: four more "mode" and "antimode" tests
Erik Auerswald [Fri, 8 Dec 2023 21:52:56 +0000 (22:52 +0100)]
datamash: four more "mode" and "antimode" tests

* tests/datamash-tests-2.pl: Duplicate regression test for "mode"
  to also test "antimode".  Add normal test with single input value
  to both operations.

9 months agodatamash: more "mode" and "antimode" tests
Erik Auerswald [Thu, 7 Dec 2023 18:59:45 +0000 (19:59 +0100)]
datamash: more "mode" and "antimode" tests

Kingsley G. Morse Jr. has reported a bug regarding the
"antimode" operation in
<https://lists.gnu.org/archive/html/bug-datamash/2023-12/msg00003.html>.

Both the "mode" and "antimode" operations are implemented
in the function "mode_value()" in the file "src/utils.c".

Currently, there are only a few tests for each of the two
operations.  Add more tests to have working examples.
Also add commented out tests that trigger "antimode" bugs.

* tests/datamash-tests-2.pl: Add 11 tests for "mode", 11
  working tests for "antimode", and 5 commented out tests
  for "antimode" that are currently broken.

22 months agoRemove obsolete autoconf AC_PROG_CC_STDC macro
Shawn Wagner [Mon, 7 Nov 2022 04:04:30 +0000 (20:04 -0800)]
Remove obsolete autoconf AC_PROG_CC_STDC macro

* configure.ac: Remove AC_PROG_CC_STDC

22 months agoRemove deprecated gnulib fdl module
Shawn Wagner [Mon, 7 Nov 2022 03:18:35 +0000 (19:18 -0800)]
Remove deprecated gnulib fdl module

* bootstrap.conf: Remove fdl from the module list
* doc/fdl.texi: Explicitly include instead of relying on gnulib to
                generate it.

22 months agoRemove deprecated gnulib non-recursive-gnulib-prefix-hack module
Shawn Wagner [Mon, 7 Nov 2022 02:55:19 +0000 (18:55 -0800)]
Remove deprecated gnulib non-recursive-gnulib-prefix-hack module

* Makefile.am: Add AUTOMAKE_OPTIONS needed for the new approach
* bootstrap.conf: Remove old module, enable suggested replacement for it.
* m4/.gitignore: Update for removed module

22 months agoUpdate bootstrap notes for OpenBSD.
Shawn Wagner [Mon, 7 Nov 2022 02:38:14 +0000 (18:38 -0800)]
Update bootstrap notes for OpenBSD.

* HACKING.md: Updates for OpenBSD 7.2, related typo fixes.

22 months agoAdd gnulib checks for sys/random.h and getrandom().
Shawn Wagner [Mon, 7 Nov 2022 02:31:58 +0000 (18:31 -0800)]
Add gnulib checks for sys/random.h and getrandom().

Needed to compile rand on non-GNU OSes.

 * Makefile.am: Add $(LIB_GETRANDOM) to rand libraries
 * bootstrap.conf: Add getrandom and sys_random modules
 * m4/.gitignore: Updates for the new modules

2 years agodatamash: more --vnlog tests (legend corner cases)
Erik Auerswald [Sat, 3 Sep 2022 10:24:00 +0000 (12:24 +0200)]
datamash: more --vnlog tests (legend corner cases)

The vnlog format is quite permissive regarding field names.
Field names can comprise non-alphanumeric characters, and
field names may be duplicated.  Add tests to check that GNU
Datamash can handle (at least some) such vnlog input.

The test with a duplicate field name is intended to help avoid
accidental changes to GNU Datamash behavior, not to prescribe
that the first matching field name should be used.

* tests/datamash-vnlog.pl: Add two tests, the first with non-
  alphanumeric characters, the second with a duplicate field
  name.

2 years agodatamash: --vnlog "legend" parsing fixes
Erik Auerswald [Sat, 27 Aug 2022 12:41:11 +0000 (14:41 +0200)]
datamash: --vnlog "legend" parsing fixes

* src/text-lines.c (line_record_fread): Skip prefix matching
  regex '^\s*#\s*' instead of regex '^[\s#]+'.
* tests/datamash-vnlog.pl: Additional vnlog legend tests.

2 years agoMake -g and groupby take ranges of fields.
Shawn Wagner [Wed, 17 Aug 2022 10:18:46 +0000 (03:18 -0700)]
Make -g and groupby take ranges of fields.

    So that 1-4 works instead of needin

  * src/op-parser.c: Accept ranges for -g and groupby
  * tests/datamash-tests.pl, datamash-error-msgs.pl, datamash-crosstab.pl:
    Some tests for the above.
  * doc/datamash.text: Document the above.
  * NEWS: Changelog entry.

2 years agorand: Add new program
Timothy Rice [Sun, 7 Aug 2022 03:11:15 +0000 (13:11 +1000)]
rand: Add new program

2 years agodocs: adjust damatash --vnlog texinfo description
Erik Auerswald [Sun, 14 Aug 2022 12:39:06 +0000 (14:39 +0200)]
docs: adjust damatash --vnlog texinfo description

* doc/datamash.texi: Add that both input and output are affected.
  Mark the link to the format description as a URL.

2 years agotests: comments and empty lines
Erik Auerswald [Sat, 13 Aug 2022 12:32:30 +0000 (14:32 +0200)]
tests: comments and empty lines

Without any options, GNU Datamash does not support comments,
and does not treat empty lines as different from any other
line.

With -C/--skip-comments, it ignores complete lines starting
with either '#' or ';' as first non-whitespace character.

With --vnlog, it ignores empty lines, complete lines where the
first non-whitespace character is a '#' (followed by either
'!' or '#' in the vnlog prologue), and trailing comments,
i.e., the part of a non-empty, non-comment, and non-header
line started with optional whitespace and a '#' character.

* tests/datamash-check.pl: Add tests with empty and comment
  lines in the input with different options.
* tests/datamash-vnlog.pl: Add trailing comment started with
  whitespace to existing input.  Add test to verify that a
  Semicolon (';') does not start a trailing comment.  Add a
  test that the header line does not support comments.

2 years agomaint: Add vnlog to bash completion
Timothy Rice [Fri, 12 Aug 2022 22:30:08 +0000 (08:30 +1000)]
maint: Add vnlog to bash completion

2 years agomaint: Add Dima Kogan to AUTHORS/THANKS files
Timothy Rice [Fri, 12 Aug 2022 22:25:51 +0000 (08:25 +1000)]
maint: Add Dima Kogan to AUTHORS/THANKS files

2 years agoAdd vnlog support
Dima Kogan [Mon, 8 Aug 2022 21:15:40 +0000 (07:15 +1000)]
Add vnlog support

Dima Kogan's vnlog data format is explained at
https://github.com/dkogan/vnlog

This support is experimental in GNU Datamash.

2 years agomaint: Tidy up NEWS item for dotprod
Timothy Rice [Sun, 7 Aug 2022 20:40:21 +0000 (06:40 +1000)]
maint: Tidy up NEWS item for dotprod

2 years agodatamash: add new option -S/--seed to set random seed
Timothy Rice [Sun, 7 Aug 2022 20:39:16 +0000 (06:39 +1000)]
datamash: add new option -S/--seed to set random seed

2 years agotests: datamash -C has no inline comments
Erik Auerswald [Sun, 7 Aug 2022 12:33:39 +0000 (14:33 +0200)]
tests: datamash -C has no inline comments

* tests/datamash-tests-2.pl: Add two tests for errors when
  trying to use inline comments.

2 years agomaint: wrap two lines to make syntax-check happy
Erik Auerswald [Sun, 7 Aug 2022 10:38:19 +0000 (12:38 +0200)]
maint: wrap two lines to make syntax-check happy

2 years agomaint: Fix small typo 'data' -> 'date' in NEWS
Timothy Rice [Sat, 6 Aug 2022 20:18:59 +0000 (06:18 +1000)]
maint: Fix small typo 'data' -> 'date' in NEWS

2 years agodatamash: new operation: dotprod
Timothy Rice [Sat, 6 Aug 2022 02:11:25 +0000 (12:11 +1000)]
datamash: new operation: dotprod

2 years agoAdd NEWS item for getrandom change
Timothy Rice [Sat, 6 Aug 2022 06:20:00 +0000 (16:20 +1000)]
Add NEWS item for getrandom change

2 years agoSwitch to getrandom for seed source
Timothy Rice [Sat, 6 Aug 2022 06:09:49 +0000 (16:09 +1000)]
Switch to getrandom for seed source

2 years agoMove init_random function into new randutils source
Timothy Rice [Sat, 6 Aug 2022 05:04:56 +0000 (15:04 +1000)]
Move init_random function into new randutils source

2 years agoRemove sc_indent from syntax checks
Timothy Rice [Sat, 6 Aug 2022 05:49:53 +0000 (15:49 +1000)]
Remove sc_indent from syntax checks

The sc_indent check was added to gnulib in 8f043c6 on 2021-09-03. By default it
forces code to conform to `indent -ppi 1`. This contradicts the list of format
constraints suggested at [1], in particular the GNU preference of having
two spaces for each indent level.

[1] https://www.gnu.org/prep/standards/standards.html#Formatting

2 years agodatamash, decorate: add -h/-V for --help/--version
Timothy Rice [Sat, 6 Aug 2022 03:23:20 +0000 (13:23 +1000)]
datamash, decorate: add -h/-V for --help/--version

2 years agomaint: Fix typo 'syntaax'
Timothy Rice [Sat, 6 Aug 2022 00:43:47 +0000 (10:43 +1000)]
maint: Fix typo 'syntaax'

2 years agoUpdate gnulib to latest
Timothy Rice [Tue, 2 Aug 2022 21:06:49 +0000 (07:06 +1000)]
Update gnulib to latest

Includes changing or casting a couple of size_t variables/outputs to idx_t.

2 years agomaint: Ignore vc-diffs
Timothy Rice [Sat, 23 Jul 2022 03:30:31 +0000 (13:30 +1000)]
maint: Ignore vc-diffs

2 years agomaint: post-release administrivia
Timothy Rice [Sat, 23 Jul 2022 02:01:28 +0000 (12:01 +1000)]
maint: post-release administrivia

Automatically done by `make release RELEASE='1.8 stable'`

* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.

2 years agoversion 1.8 v1.8
Timothy Rice [Sat, 23 Jul 2022 00:54:30 +0000 (10:54 +1000)]
version 1.8

* NEWS: Record release date.

2 years agodatamash: tests: decimal point as field separator
Erik Auerswald [Sun, 17 Jul 2022 13:44:08 +0000 (15:44 +0200)]
datamash: tests: decimal point as field separator

* tests/datamash-tests-2.pl: Add a commented out test for the
  "getnum" operation with a decimal point as field separator.
  Adjust alignment of two existing commented out tests.

2 years agodatamash: tests: commented out getnum i18n tests
Erik Auerswald [Sun, 17 Jul 2022 13:35:42 +0000 (15:35 +0200)]
datamash: tests: commented out getnum i18n tests

The "getnum" operation is neither properly locale-aware nor does
it ignore the locale setting.  As a result floating point numbers
are not extracted correctly unless the locale-specific decimal
separator is a period.

* tests/datamash-i18n-de.pl: Add commented out getnum tests.

2 years agotests: numbers with two decimal points in input
Erik Auerswald [Sun, 17 Jul 2022 11:39:49 +0000 (13:39 +0200)]
tests: numbers with two decimal points in input

* tests/datamash-error-msgs.pl: Add tests for error message when
  a number with two decimal points is given as input data.

2 years agotests: decimal separator as field separator
Erik Auerswald [Sun, 17 Jul 2022 11:19:20 +0000 (13:19 +0200)]
tests: decimal separator as field separator

* tests/datamash-i18n-de.pl: Add tests using the locale's decimal
  separator (a comma) as field separator.
* tests/datamash-tests-2.pl: Add tests using the locale's decimal
  separator (a period) as field separator.

2 years agotests: adjust commented out i18n test
Erik Auerswald [Sun, 17 Jul 2022 11:01:01 +0000 (13:01 +0200)]
tests: adjust commented out i18n test

* tests/datamash-i18n-de.pl: Use a unique ID to allow commenting
  in without further adjustments.  Align test definition fields.

2 years agomaint: regression test consolidation
Erik Auerswald [Sun, 17 Jul 2022 10:36:49 +0000 (12:36 +0200)]
maint: regression test consolidation

* tests/datamash-tests-2.pl: Move regression tests to have them
  next to each other in one group.

2 years agodatamash: tests: Add test for crosstab with header-in, no header-out
Timothy Rice [Sun, 10 Jul 2022 21:02:00 +0000 (07:02 +1000)]
datamash: tests: Add test for crosstab with header-in, no header-out

2 years agodatamash: tests: Rename c3 header-in test for pcov consistent with scov
Timothy Rice [Sun, 10 Jul 2022 20:59:21 +0000 (06:59 +1000)]
datamash: tests: Rename c3 header-in test for pcov consistent with scov

2 years agodatamash: tests: Add more paired-param tests with header in/out
Timothy Rice [Sun, 10 Jul 2022 20:56:41 +0000 (06:56 +1000)]
datamash: tests: Add more paired-param tests with header in/out

2 years agodocs: Note non-standard --header-out with crosstab
Timothy Rice [Sun, 10 Jul 2022 20:41:20 +0000 (06:41 +1000)]
docs: Note non-standard --header-out with crosstab

2 years agodocs: Remove sentence about groupby/crosstab not understanding header names.
Timothy Rice [Sun, 10 Jul 2022 20:31:39 +0000 (06:31 +1000)]
docs: Remove sentence about groupby/crosstab not understanding header names.

2 years agodatamash: maint: Fix some indentation
Timothy Rice [Sat, 9 Jul 2022 04:53:47 +0000 (14:53 +1000)]
datamash: maint: Fix some indentation

2 years agodatamash: bugfix: Ensure rmdup respects --output-delimiter
Timothy Rice [Sat, 9 Jul 2022 04:49:52 +0000 (14:49 +1000)]
datamash: bugfix: Ensure rmdup respects --output-delimiter

Fixes bug reported by Dima Kogan.

2 years agodatamash: bugfix: Allow crosstab to be called by header field name.
Timothy Rice [Sat, 9 Jul 2022 03:21:11 +0000 (13:21 +1000)]
datamash: bugfix: Allow crosstab to be called by header field name.

Ensure `--header-in crosstab x,y` does not crash.
Fixes bug reported by Dima Kogan.

2 years agodatamash: maint: fix long line
Timothy Rice [Sat, 9 Jul 2022 02:49:40 +0000 (12:49 +1000)]
datamash: maint: fix long line

2 years agomaint: ignore side effects of make syntax-check
Timothy Rice [Sat, 9 Jul 2022 02:53:26 +0000 (12:53 +1000)]
maint: ignore side effects of make syntax-check

2 years agomaint: Fix minor typo
Timothy Rice [Sat, 9 Jul 2022 02:21:37 +0000 (12:21 +1000)]
maint: Fix minor typo

2 years agodatamash: maint: Add debug helper macro WHEREAMI()
Timothy Rice [Sat, 9 Jul 2022 01:50:17 +0000 (11:50 +1000)]
datamash: maint: Add debug helper macro WHEREAMI()

2 years agotests: Add the test for pcov header in/out
Timothy Rice [Sat, 9 Jul 2022 01:35:46 +0000 (11:35 +1000)]
tests: Add the test for pcov header in/out

2 years agodatamash: Print all operation columns in output header
Timothy Rice [Sat, 9 Jul 2022 00:51:24 +0000 (10:51 +1000)]
datamash: Print all operation columns in output header

Ensure `--header-out pcov x:y` shows `pcov(x,y)` in header.
Fixes bug reported by Dima Kogan.

2 years agomaint: Break long lines
Timothy Rice [Mon, 4 Jul 2022 21:32:24 +0000 (07:32 +1000)]
maint: Break long lines

2 years agomaint: Update version notices.
Timothy Rice [Mon, 4 Jul 2022 21:27:03 +0000 (07:27 +1000)]
maint: Update version notices.

2 years agomaint: align datamash binning tests
Erik Auerswald [Sun, 3 Jul 2022 16:09:20 +0000 (18:09 +0200)]
maint: align datamash binning tests

* tests/datamash-tests-2.pl: Align binning tests.

2 years agotests: more datamash binning tests
Erik Auerswald [Sun, 3 Jul 2022 16:06:28 +0000 (18:06 +0200)]
tests: more datamash binning tests

Floating point numbers as operation parameters can be specified
using scientific notation (e.g., 1.2e3 for 1200.0).

* tests/datamash-tests-2.pl: Add scientific notation bin sizes.

2 years agotests: test parser corner cases
Erik Auerswald [Sun, 3 Jul 2022 15:25:33 +0000 (17:25 +0200)]
tests: test parser corner cases

Add tests of corner cases regarding whitespace in the operation
parsing of GNU Datamash in order to avoid introducing unintended
changes of behavior.

* tests/datamash-parser.pl: Add tests with additional whitespace.

2 years agotests: add more datamash parser tests
Erik Auerswald [Sun, 3 Jul 2022 10:34:51 +0000 (12:34 +0200)]
tests: add more datamash parser tests

* tests/datamash-parser.pl: Additional testing of correct and
  incorrect use of optional operation parameters.

2 years agomaint: more unique test identifiers for datamash
Erik Auerswald [Sun, 3 Jul 2022 10:11:09 +0000 (12:11 +0200)]
maint: more unique test identifiers for datamash

Having unique test identifiers helps in locating the problems
with test failures.  It seems as if the intention is to have
unique test identifiers, at least per tested binary.  Fix a
case of duplicated test identifiers in the two test files
datamash-tests.pl and datamash-parser.pl for binning related
test cases.

* tests/datamash-parser.pl: Rename test identifier 'b1' to '31'
  and 'b2' to 'b32'.

2 years agotests: add third datamash i18n test case
Erik Auerswald [Sat, 25 Jun 2022 18:21:37 +0000 (20:21 +0200)]
tests: add third datamash i18n test case

Depending on tokenizer changes to support comma as decimal
separator, three comma separated fields might trigger a
problem that is avoided with two comma separated fields.

* tests/datamash-i18n-de.pl: Add test case with three fields
  separated with commas.

2 years agotests: avoid Perl warning in datamash-i18n-de.pl
Erik Auerswald [Sat, 25 Jun 2022 14:15:30 +0000 (16:15 +0200)]
tests: avoid Perl warning in datamash-i18n-de.pl

The string comparison `$lc_de eq undef` results in up to two
warnings.  If `$lc_de` is defined, the single warning

    Use of uninitialized value in string eq at
     ./tests/datamash-i18n-de.pl line 39.

is emitted.  This is caused by comparing with `undef` in the
string equality test `eq`.

If the locale `de_DE.utf8` is not found, `$lc_de` is undefined.
If `$lc_de` is undefined, two warnings are emitted:

    Use of uninitialized value $lc_de in string eq at
     ./tests/datamash-i18n-de.pl line 39.

    Use of uninitialized value in string eq at
     ./tests/datamash-i18n-de.pl line 39.

Using `defined()` to test if `$lc_de` is defined avoids this.

* tests/datamash-i18n-de.pl: Use defined() to check if a
  variable is defined.

2 years agotests: add second datamash i18n test case
Erik Auerswald [Sat, 25 Jun 2022 14:03:40 +0000 (16:03 +0200)]
tests: add second datamash i18n test case

Supporting a comma as decimal separator for numeric arguments
to GNU Datamash operations risks confusing a comma separated
list of fields with a floating point number.

* tests/datamash-i18n-de.pl: Add test case with a comma
  separated list of field numbers.

2 years agomaint: fix a typo in a comment
Erik Auerswald [Sat, 25 Jun 2022 11:23:21 +0000 (13:23 +0200)]
maint: fix a typo in a comment

2 years agomaint: Remove incorrect comment about LC_NUMERIC
Timothy Rice [Sat, 25 Jun 2022 03:05:42 +0000 (13:05 +1000)]
maint: Remove incorrect comment about LC_NUMERIC

2 years agomaint: Skip German test if de_DE.utf8 locale not found
Timothy Rice [Sat, 25 Jun 2022 03:02:07 +0000 (13:02 +1000)]
maint: Skip German test if de_DE.utf8 locale not found

2 years agoTest decimal separator in de_DE.UTF-8 locale
Timothy Rice [Fri, 24 Jun 2022 20:24:00 +0000 (06:24 +1000)]
Test decimal separator in de_DE.UTF-8 locale

2 years agosrc/decorate.c: Fix a NetBSD-specific seg fault.
Shawn Wagner [Fri, 24 Jun 2022 15:36:20 +0000 (08:36 -0700)]
src/decorate.c: Fix a NetBSD-specific seg fault.

2 years agodatamash: re-write binning for negative numbers
Erik Auerswald [Thu, 23 Jun 2022 19:38:07 +0000 (21:38 +0200)]
datamash: re-write binning for negative numbers

Make the binning code more explicit regarding handling of
negative binning values:

 - changing a negative zero into a positive zero can only
   be needed for negative values;
 - if the fractional part of a negative value is zero:
   - the number is the lower bound of the bucket interval,
   - and the number could be a negative zero;
 - if the fractional part of a negative value is non-zero:
   - the number falls into the preceding bucket interval.

When testing with both negative and non-negative numbers,
the new code was not slower.  It even seemed to be a tiny
bit faster on average.

* src/field-ops.c (field_op_collect): re-write binning code
  for negative numbers.

2 years agodatamash: Fix binning of negative numbers.
Timothy Rice [Wed, 22 Jun 2022 22:02:45 +0000 (08:02 +1000)]
datamash: Fix binning of negative numbers.

2 years agoAdd framework for installing hooks into cloned git repositories.
Shawn Wagner [Wed, 15 Jun 2022 09:39:00 +0000 (02:39 -0700)]
Add framework for installing hooks into cloned git repositories.

Includes a pre-commit hook that runs make syntax-check

2 years agomaint: Convert sort+header tests from shell to perl
Timothy Rice [Sat, 18 Jun 2022 01:54:57 +0000 (11:54 +1000)]
maint: Convert sort+header tests from shell to perl

2 years agomaint: fix long lines
Timothy Rice [Sat, 18 Jun 2022 00:03:26 +0000 (10:03 +1000)]
maint: fix long lines

2 years agomaint: Make test indentation more consistent
Timothy Rice [Fri, 17 Jun 2022 23:56:01 +0000 (09:56 +1000)]
maint: Make test indentation more consistent

2 years agomaint: fix long lines
Timothy Rice [Fri, 17 Jun 2022 23:35:40 +0000 (09:35 +1000)]
maint: fix long lines

2 years agoRename deprecated tests
Timothy Rice [Thu, 16 Jun 2022 20:59:50 +0000 (06:59 +1000)]
Rename deprecated tests

2 years agoDeprecate -f/--full for non-linewise operations
Timothy Rice [Thu, 16 Jun 2022 19:43:43 +0000 (05:43 +1000)]
Deprecate -f/--full for non-linewise operations

2 years agotests: add cheap I/O error test
Erik Auerswald [Sat, 11 Jun 2022 18:28:45 +0000 (20:28 +0200)]
tests: add cheap I/O error test

The existing tests/datamash-io-errors.sh is marked as expensive
and requires two file system images prepared to provoke I/O
errors for GNU Datamash to detect and report.  As a result it
is executed less frequently than most other tests.

The new tests/datamash-io-errors-cheap.sh requires just the
availability of the special file "/dev/full" to immediately
provoke an I/O error on output.  This test requires minimal
input data and is cheap to run.

* Makefile.am (TESTS): Add new test file.
* tests/datamash-io-errors-cheap.sh: New file with one test.

2 years agodatamash: fix segmentation fault
Erik Auerswald [Mon, 6 Jun 2022 17:02:22 +0000 (19:02 +0200)]
datamash: fix segmentation fault

As reported by Catalin Patulea on [email protected] in
<https://lists.gnu.org/archive/html/bug-datamash/2020-11/msg00001.html>,
GNU Datamash could crash with a segmentation fault if the
unique or countunique operations were used with input data
containing NUL bytes.

The problem was that the field_op_get_string_ptrs() function
could create more pointers than it allocated memory for if
the input data contained NUL bytes.  The solution is to add a
check to avoid writing past the end of the "ptrs" buffer.

* NEWS: Mention bug fix.
* src/field-ops.c (field_op_get_string_ptrs): Do not write past
  the end of the "ptrs" buffer.
* tests/datamash-tests-2.pl: Add tests to verify bug fix.

2 years agodoc: mention ms and rms in --help and man page
Erik Auerswald [Sun, 5 Jun 2022 14:28:05 +0000 (16:28 +0200)]
doc: mention ms and rms in --help and man page

The recent operations ms (mean square) and rms (root mean
square) are listed only in the texinfo manual.  Add them
to both the 'datamash --help' output and the datamash(1)
man page.

* src/datamash.c (usage): Add ms and rms to Statistical
  Grouping operations.
* man/datamash.x: Likewise.

2 years agotests: more leading and trailing whitespace tests
Erik Auerswald [Sat, 4 Jun 2022 17:18:24 +0000 (19:18 +0200)]
tests: more leading and trailing whitespace tests

* tests/datamash-tests.pl: Add tests with leading whitespace,
  trailing whitespace, and both leading and trailing whitespace.

2 years agotests: leading and trailing whitespace behavior
Erik Auerswald [Sat, 4 Jun 2022 16:49:01 +0000 (18:49 +0200)]
tests: leading and trailing whitespace behavior

* tests/datamash-tests.pl: Add whitespace-only tests.

2 years agodoc: texinfo manual adjustments
Erik Auerswald [Sat, 4 Jun 2022 16:48:13 +0000 (18:48 +0200)]
doc: texinfo manual adjustments

* doc/datamash.texi: Mention that "cut" operation uses given field
  ordering. Expand on leading and trailing whitespace description.
  Ask for unified instead of context diff.

2 years agomaint: fix "make syntax-check" errors
Erik Auerswald [Sat, 4 Jun 2022 09:09:00 +0000 (11:09 +0200)]
maint: fix "make syntax-check" errors

* .gitignore: Use "file system" instead of "filesystem".
* tests/datamash-tests-2.pl: Wrap lines longer than 80 characters.

2 years agomaint: fix typo in a comment
Erik Auerswald [Sat, 4 Jun 2022 08:30:27 +0000 (10:30 +0200)]
maint: fix typo in a comment

2 years agoRemove commented-out code
Timothy Rice [Sat, 4 Jun 2022 03:15:54 +0000 (13:15 +1000)]
Remove commented-out code

2 years agoFix typo
Timothy Rice [Sat, 4 Jun 2022 00:57:26 +0000 (10:57 +1000)]
Fix typo

2 years agodatamash: Align field_operations columns
Timothy Rice [Sat, 4 Jun 2022 00:40:13 +0000 (10:40 +1000)]
datamash: Align field_operations columns

It was difficult to visually group entries with the columns zig-zagging.

2 years agodatamash: Alias echo -> cut and unique -> uniq
Timothy Rice [Fri, 3 Jun 2022 23:56:59 +0000 (09:56 +1000)]
datamash: Alias echo -> cut and unique -> uniq

2 years agotests: enable valgrind test to pass
Erik Auerswald [Thu, 2 Jun 2022 18:09:20 +0000 (20:09 +0200)]
tests: enable valgrind test to pass

The sub-test "custom-format" of the expensive test
datamash-valgrind.sh would always fail, because the
datamash was supposed to sum all input numbers, but
some of those are too big for 80-bit extended floating
point numbers as used on x86.  Thus datamash would
emit an error message and return an exit code of 1.
That was interpreted as a test failure.

* tests/datamash-valgrind.sh: Use different valgrind error
  exit code to distinguish between valgrind detecting a
  memory leak and datamash reporting an error.

2 years agomaint: Impose '-T small' for fullfs check
Timothy Rice [Wed, 1 Jun 2022 22:05:46 +0000 (08:05 +1000)]
maint: Impose '-T small' for fullfs check

2 years agomaint: Ignore side-effects make check-expensive
Timothy Rice [Wed, 1 Jun 2022 22:02:12 +0000 (08:02 +1000)]
maint: Ignore side-effects make check-expensive

2 years agoMerge branch 'master' of ssh://git.sv.gnu.org/srv/git/datamash
Shawn Wagner [Tue, 31 May 2022 22:58:23 +0000 (15:58 -0700)]
Merge branch 'master' of ssh://git.sv.gnu.org/srv/git/datamash

2 years agoFix memory leaks in decorate
Shawn Wagner [Tue, 31 May 2022 22:56:41 +0000 (15:56 -0700)]
Fix memory leaks in decorate

2 years agoFix memory leaks with custom numeric precisions and formats
Shawn Wagner [Tue, 31 May 2022 22:30:16 +0000 (15:30 -0700)]
Fix memory leaks with custom numeric precisions and formats