All of lore.kernel.org
 help / color / mirror / Atom feed
* Test failures with GNU grep 2.23
@ 2016-02-07 16:25 John Keeping
  2016-02-19 11:59 ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: John Keeping @ 2016-02-07 16:25 UTC (permalink / raw)
  To: git

It seems that binary file detection has changed in GNU grep 2.23 as a
result of commit 40ed879 (grep: fix bug with with invalid unibyte
sequence).

This causes a couple of test failures in t8005 and t9200 (the t9200 case
is less obvious so I'm only including t8005 here):

-- >8 --
$ ./t8005-blame-i18n.sh -v -i
[snip]
expecting success: 
        git blame --incremental file | \
                egrep "^(author|summary) " > actual &&
        test_cmp actual expected

--- actual      2016-02-07 16:14:55.372510307 +0000
+++ expected    2016-02-07 16:14:55.359510341 +0000
@@ -1 +1,6 @@
-Binary file (standard input) matches
+author �R�c ���Y
+summary �u���[���̃e�X�g�ł��B
+author �R�c ���Y
+summary �u���[���̃e�X�g�ł��B
+author �R�c ���Y
+summary �u���[���̃e�X�g�ł��B
not ok 2 - blame respects i18n.commitencoding
#
#               git blame --incremental file | \
#                       egrep "^(author|summary) " > actual &&
#               test_cmp actual expected
#
-- 8< --

The following patch fixes the tests for me, but I wonder if "-a" is
supported on all target platforms (it's not in POSIX, which specifies
that the "input files shall be text files") or whether we should do
something more comprehensive to provide sane_{e,f,}grep which guarantee
to treat input as text.

I also tried setting POSIXLY_CORRECT but that doesn't affect the
text/binary decision.

-- >8 --
diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 847d098..3b6e697 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -36,7 +36,7 @@ EOF
 test_expect_success !MINGW \
 	'blame respects i18n.commitencoding' '
 	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+		egrep -a "^(author|summary) " > actual &&
 	test_cmp actual expected
 '
 
@@ -53,7 +53,7 @@ test_expect_success !MINGW \
 	'blame respects i18n.logoutputencoding' '
 	git config i18n.logoutputencoding eucJP &&
 	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+		egrep -a "^(author|summary) " > actual &&
 	test_cmp actual expected
 '
 
@@ -69,7 +69,7 @@ EOF
 test_expect_success !MINGW \
 	'blame respects --encoding=UTF-8' '
 	git blame --incremental --encoding=UTF-8 file | \
-		egrep "^(author|summary) " > actual &&
+		egrep -a "^(author|summary) " > actual &&
 	test_cmp actual expected
 '
 
@@ -85,7 +85,7 @@ EOF
 test_expect_success !MINGW \
 	'blame respects --encoding=none' '
 	git blame --incremental --encoding=none file | \
-		egrep "^(author|summary) " > actual &&
+		egrep -a "^(author|summary) " > actual &&
 	test_cmp actual expected
 '
 
diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
index 5cfb9cf..f05578a 100755
--- a/t/t9200-git-cvsexportcommit.sh
+++ b/t/t9200-git-cvsexportcommit.sh
@@ -35,7 +35,7 @@ exit 1
 
 check_entries () {
 	# $1 == directory, $2 == expected
-	grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
+	grep -a '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
 	if test -z "$2"
 	then
 		>expected

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-07 16:25 Test failures with GNU grep 2.23 John Keeping
@ 2016-02-19 11:59 ` Jeff King
  2016-02-19 17:27   ` Eric Sunshine
  2016-02-19 17:38   ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2016-02-19 11:59 UTC (permalink / raw)
  To: John Keeping; +Cc: git

On Sun, Feb 07, 2016 at 04:25:40PM +0000, John Keeping wrote:

> It seems that binary file detection has changed in GNU grep 2.23 as a
> result of commit 40ed879 (grep: fix bug with with invalid unibyte
> sequence).

I read this bug report a while ago when you posted it, but happily
ignored it until today, when my debian unstable system pulled in the new
version of grep. :)

> This causes a couple of test failures in t8005 and t9200 (the t9200 case
> is less obvious so I'm only including t8005 here):
> 
> -- >8 --
> $ ./t8005-blame-i18n.sh -v -i
> [snip]
> expecting success: 
>         git blame --incremental file | \
>                 egrep "^(author|summary) " > actual &&
>         test_cmp actual expected

Just a side note while we are touching these tests:

 - we probably should not pipe, so we check the exist code from
   git-blame

 - we usually flip the test_cmp file order, to show the difference from
   expectation when there is a failure

 - no space after ">" redirection :)

> The following patch fixes the tests for me, but I wonder if "-a" is
> supported on all target platforms (it's not in POSIX, which specifies
> that the "input files shall be text files") or whether we should do
> something more comprehensive to provide sane_{e,f,}grep which guarantee
> to treat input as text.
> 
> I also tried setting POSIXLY_CORRECT but that doesn't affect the
> text/binary decision.

Yeah, I'd worry that "-a" is not portable. OTOH, BSD grep seems to have
it, so between that and GNU, I think most systems are covered. We could
do:

  test_lazy_prereq GREP_A '
	echo foo | grep -a foo
  '

and mark these tests with it. I'd also be happy to skip that step and
just do it if and when somebody actually complains about a system
without it (I wouldn't be surprised if most people on antique systems
end up installing GNU grep anyway).

Another option might be using "sed -ne '/^author/p'" or similar. But
that may very well just be trading one portability problem for another.

I also wondered whether we could get away without grepping at all here.
But the blame output has a bunch of cruft we don't care about; I think
the readability of the tests would suffer if we tried to match the whole
thing in a test_cmp.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-19 11:59 ` Jeff King
@ 2016-02-19 17:27   ` Eric Sunshine
  2016-02-19 17:38   ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2016-02-19 17:27 UTC (permalink / raw)
  To: Jeff King; +Cc: John Keeping, Git List

On Fri, Feb 19, 2016 at 6:59 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Feb 07, 2016 at 04:25:40PM +0000, John Keeping wrote:
>> The following patch fixes the tests for me, but I wonder if "-a" is
>> supported on all target platforms (it's not in POSIX, which specifies
>> that the "input files shall be text files") or whether we should do
>> something more comprehensive to provide sane_{e,f,}grep which guarantee
>> to treat input as text.
>>
>> I also tried setting POSIXLY_CORRECT but that doesn't affect the
>> text/binary decision.
>
> Yeah, I'd worry that "-a" is not portable. OTOH, BSD grep seems to have
> it, so between that and GNU, I think most systems are covered.

Mac OS X grep seems to support -a and tests in t8005 still pass with
-a added to the egrep invocations.

> We could
> do:
>
>   test_lazy_prereq GREP_A '
>         echo foo | grep -a foo
>   '
>
> and mark these tests with it. I'd also be happy to skip that step and
> just do it if and when somebody actually complains about a system
> without it (I wouldn't be surprised if most people on antique systems
> end up installing GNU grep anyway).
>
> Another option might be using "sed -ne '/^author/p'" or similar. But
> that may very well just be trading one portability problem for another.
>
> I also wondered whether we could get away without grepping at all here.
> But the blame output has a bunch of cruft we don't care about; I think
> the readability of the tests would suffer if we tried to match the whole
> thing in a test_cmp.
>
> -Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-19 11:59 ` Jeff King
  2016-02-19 17:27   ` Eric Sunshine
@ 2016-02-19 17:38   ` Junio C Hamano
  2016-02-19 19:11     ` Jeff King
  2016-02-19 19:23     ` John Keeping
  1 sibling, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2016-02-19 17:38 UTC (permalink / raw)
  To: Jeff King; +Cc: John Keeping, git

Jeff King <peff@peff.net> writes:

> Yeah, I'd worry that "-a" is not portable. OTOH, BSD grep seems to have
> it, so between that and GNU, I think most systems are covered. We could
> do:
>
>   test_lazy_prereq GREP_A '
> 	echo foo | grep -a foo
>   '
>
> and mark these tests with it. I'd also be happy to skip that step and
> just do it if and when somebody actually complains about a system
> without it (I wouldn't be surprised if most people on antique systems
> end up installing GNU grep anyway).
>
> Another option might be using "sed -ne '/^author/p'" or similar. But
> that may very well just be trading one portability problem for another.

Would $PERL help, I wonder?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-19 17:38   ` Junio C Hamano
@ 2016-02-19 19:11     ` Jeff King
  2016-02-19 19:23     ` John Keeping
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff King @ 2016-02-19 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Keeping, git

On Fri, Feb 19, 2016 at 09:38:17AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Yeah, I'd worry that "-a" is not portable. OTOH, BSD grep seems to have
> > it, so between that and GNU, I think most systems are covered. We could
> > do:
> >
> >   test_lazy_prereq GREP_A '
> > 	echo foo | grep -a foo
> >   '
> >
> > and mark these tests with it. I'd also be happy to skip that step and
> > just do it if and when somebody actually complains about a system
> > without it (I wouldn't be surprised if most people on antique systems
> > end up installing GNU grep anyway).
> >
> > Another option might be using "sed -ne '/^author/p'" or similar. But
> > that may very well just be trading one portability problem for another.
> 
> Would $PERL help, I wonder?

It would, though I think you would need to call `binmode` to make it
reliable. I was hesitant to suggest it, because I seem to recall some
resistance to more perl dependencies in the test suite, but I think we
may be past the point of no return there, anyway.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-19 17:38   ` Junio C Hamano
  2016-02-19 19:11     ` Jeff King
@ 2016-02-19 19:23     ` John Keeping
  2016-02-19 19:33       ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: John Keeping @ 2016-02-19 19:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Fri, Feb 19, 2016 at 09:38:17AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > Yeah, I'd worry that "-a" is not portable. OTOH, BSD grep seems to have
> > it, so between that and GNU, I think most systems are covered. We could
> > do:
> >
> >   test_lazy_prereq GREP_A '
> > 	echo foo | grep -a foo
> >   '
> >
> > and mark these tests with it. I'd also be happy to skip that step and
> > just do it if and when somebody actually complains about a system
> > without it (I wouldn't be surprised if most people on antique systems
> > end up installing GNU grep anyway).
> >
> > Another option might be using "sed -ne '/^author/p'" or similar. But
> > that may very well just be trading one portability problem for another.
> 
> Would $PERL help, I wonder?

I suspect that any grep that lacks "-a" also lacks binary file handling
that will break these tests.  I found a Solaris grep that doesn't
support "-a" and it treats these files as text.

>From that perspective, it would be better to have a central place that
deals with figuring out how to get grep to work for us.  Perhaps we need
test_grep to get this right.  We already have test_cmp_bin() as a thin
wrapper around cmp so I don't think this is completely unprecedented.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Test failures with GNU grep 2.23
  2016-02-19 19:23     ` John Keeping
@ 2016-02-19 19:33       ` Jeff King
  2016-02-21 17:32         ` [PATCH 0/2] Fix test " John Keeping
                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jeff King @ 2016-02-19 19:33 UTC (permalink / raw)
  To: John Keeping; +Cc: Junio C Hamano, git

On Fri, Feb 19, 2016 at 07:23:11PM +0000, John Keeping wrote:

> I suspect that any grep that lacks "-a" also lacks binary file handling
> that will break these tests.  I found a Solaris grep that doesn't
> support "-a" and it treats these files as text.
> 
> From that perspective, it would be better to have a central place that
> deals with figuring out how to get grep to work for us.  Perhaps we need
> test_grep to get this right.  We already have test_cmp_bin() as a thin
> wrapper around cmp so I don't think this is completely unprecedented.

I think 99% of the time we are using grep for ascii text. As evidenced
by the number of test failures we see with the new grep, it is a small
minority that feed binary gibberish. I'd prefer if "-a" handling didn't
need to pollute anything outside of this narrow range of tests (and as
with my prereq suggestion, I am even find just skipping this narrow
range of tests on platforms with no "-a", though falling back to running
without "-a" is fine if it works).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/2] Fix test failures with GNU grep 2.23
  2016-02-19 19:33       ` Jeff King
@ 2016-02-21 17:32         ` John Keeping
  2016-02-21 17:32         ` [PATCH 1/2] t8005: avoid grep on non-ASCII data John Keeping
  2016-02-21 17:32         ` [PATCH 2/2] t9200: " John Keeping
  2 siblings, 0 replies; 27+ messages in thread
From: John Keeping @ 2016-02-21 17:32 UTC (permalink / raw)
  To: git; +Cc: John Keeping, Jeff King, Junio C Hamano, Eric Sunshine

On Fri, Feb 19, 2016 at 02:33:10PM -0500, Jeff King wrote:
> On Fri, Feb 19, 2016 at 07:23:11PM +0000, John Keeping wrote:
> 
> > I suspect that any grep that lacks "-a" also lacks binary file handling
> > that will break these tests.  I found a Solaris grep that doesn't
> > support "-a" and it treats these files as text.
> > 
> > From that perspective, it would be better to have a central place that
> > deals with figuring out how to get grep to work for us.  Perhaps we need
> > test_grep to get this right.  We already have test_cmp_bin() as a thin
> > wrapper around cmp so I don't think this is completely unprecedented.
> 
> I think 99% of the time we are using grep for ascii text. As evidenced
> by the number of test failures we see with the new grep, it is a small
> minority that feed binary gibberish. I'd prefer if "-a" handling didn't
> need to pollute anything outside of this narrow range of tests (and as
> with my prereq suggestion, I am even find just skipping this narrow
> range of tests on platforms with no "-a", though falling back to running
> without "-a" is fine if it works).

I went with using sed in this series because it seems to be the simplest
and most compatible way to extract lines from the input.  We don't need
any special casing to figure out if an implementation needs "-a" or if
it doesn't support that option and all the implementation I tested
support the constructs used here.

John Keeping (2):
  t8005: avoid grep on non-ASCII data
  t9200: avoid grep on non-ASCII data

 t/t8005-blame-i18n.sh          | 16 ++++++++--------
 t/t9200-git-cvsexportcommit.sh |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

-- 
2.7.1.503.g3cfa3ac

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-19 19:33       ` Jeff King
  2016-02-21 17:32         ` [PATCH 0/2] Fix test " John Keeping
@ 2016-02-21 17:32         ` John Keeping
  2016-02-21 21:01           ` Eric Sunshine
  2016-02-21 17:32         ` [PATCH 2/2] t9200: " John Keeping
  2 siblings, 1 reply; 27+ messages in thread
From: John Keeping @ 2016-02-21 17:32 UTC (permalink / raw)
  To: git; +Cc: John Keeping, Jeff King, Junio C Hamano, Eric Sunshine

GNU grep 2.23 detects the input used in this test as binary data so it
does not work for extracting lines from a file.  We could add the "-a"
option to force grep to treat the input as text, but not all
implementations support that.  Instead, use sed to extract the desired
lines since it will always treat its input as text.

While touching these lines, modernize the test style to avoid hiding the
exit status of "git blame" and remove a space following a redirection
operator.

Signed-off-by: John Keeping <john@keeping.me.uk>
---
 t/t8005-blame-i18n.sh | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 847d098..0a86c72 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -35,8 +35,8 @@ EOF
 
 test_expect_success !MINGW \
 	'blame respects i18n.commitencoding' '
-	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+	git blame --incremental file >output &&
+	sed -ne "/^\(author\|summary\) /p" output >actual &&
 	test_cmp actual expected
 '
 
@@ -52,8 +52,8 @@ EOF
 test_expect_success !MINGW \
 	'blame respects i18n.logoutputencoding' '
 	git config i18n.logoutputencoding eucJP &&
-	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+	git blame --incremental file >output &&
+	sed -ne "/^\(author\|summary\) /p" output >actual &&
 	test_cmp actual expected
 '
 
@@ -68,8 +68,8 @@ EOF
 
 test_expect_success !MINGW \
 	'blame respects --encoding=UTF-8' '
-	git blame --incremental --encoding=UTF-8 file | \
-		egrep "^(author|summary) " > actual &&
+	git blame --incremental --encoding=UTF-8 file >output &&
+	sed -ne "/^\(author\|summary\) /p" output >actual &&
 	test_cmp actual expected
 '
 
@@ -84,8 +84,8 @@ EOF
 
 test_expect_success !MINGW \
 	'blame respects --encoding=none' '
-	git blame --incremental --encoding=none file | \
-		egrep "^(author|summary) " > actual &&
+	git blame --incremental --encoding=none file >output &&
+	sed -ne "/^\(author\|summary\) /p" output >actual &&
 	test_cmp actual expected
 '
 
-- 
2.7.1.503.g3cfa3ac

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-19 19:33       ` Jeff King
  2016-02-21 17:32         ` [PATCH 0/2] Fix test " John Keeping
  2016-02-21 17:32         ` [PATCH 1/2] t8005: avoid grep on non-ASCII data John Keeping
@ 2016-02-21 17:32         ` John Keeping
  2016-02-21 21:15           ` Eric Sunshine
  2 siblings, 1 reply; 27+ messages in thread
From: John Keeping @ 2016-02-21 17:32 UTC (permalink / raw)
  To: git; +Cc: John Keeping, Jeff King, Junio C Hamano, Eric Sunshine

GNU grep 2.23 detects the input used in this test as binary data so it
does not work for extracting lines from a file.  We could add the "-a"
option to force grep to treat the input as text, but not all
implementations support that.  Instead, use sed to extract the desired
lines since it will always treat its input as text.

Signed-off-by: John Keeping <john@keeping.me.uk>
---
 t/t9200-git-cvsexportcommit.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
index 812c9cd..0765d52 100755
--- a/t/t9200-git-cvsexportcommit.sh
+++ b/t/t9200-git-cvsexportcommit.sh
@@ -35,7 +35,7 @@ exit 1
 
 check_entries () {
 	# $1 == directory, $2 == expected
-	grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
+	sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
 	if test -z "$2"
 	then
 		>expected
-- 
2.7.1.503.g3cfa3ac

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 17:32         ` [PATCH 1/2] t8005: avoid grep on non-ASCII data John Keeping
@ 2016-02-21 21:01           ` Eric Sunshine
  2016-02-21 23:19             ` Jeff King
  2016-02-21 23:31             ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Sunshine @ 2016-02-21 21:01 UTC (permalink / raw)
  To: John Keeping; +Cc: Git List, Jeff King, Junio C Hamano

On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> GNU grep 2.23 detects the input used in this test as binary data so it
> does not work for extracting lines from a file.  We could add the "-a"
> option to force grep to treat the input as text, but not all
> implementations support that.  Instead, use sed to extract the desired
> lines since it will always treat its input as text.
>
> While touching these lines, modernize the test style to avoid hiding the
> exit status of "git blame" and remove a space following a redirection
> operator.
>
> Signed-off-by: John Keeping <john@keeping.me.uk>
> ---
> diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
> @@ -35,8 +35,8 @@ EOF
>  test_expect_success !MINGW \
>         'blame respects i18n.commitencoding' '
> -       git blame --incremental file | \
> -               egrep "^(author|summary) " > actual &&
> +       git blame --incremental file >output &&
> +       sed -ne "/^\(author\|summary\) /p" output >actual &&

These tests all crash and burn with BSD sed (including Mac OS X) since
you're not restricting yourself to BRE (basic regular expressions).
You _could_ request extended regular expressions, which do work on
those platforms, as well as with GNU sed:

    sed -nEe "/^(author|summary) /p" ...

>         test_cmp actual expected
>  '
>
> @@ -52,8 +52,8 @@ EOF
>  test_expect_success !MINGW \
>         'blame respects i18n.logoutputencoding' '
>         git config i18n.logoutputencoding eucJP &&
> -       git blame --incremental file | \
> -               egrep "^(author|summary) " > actual &&
> +       git blame --incremental file >output &&
> +       sed -ne "/^\(author\|summary\) /p" output >actual &&
>         test_cmp actual expected
>  '
>
> @@ -68,8 +68,8 @@ EOF
>
>  test_expect_success !MINGW \
>         'blame respects --encoding=UTF-8' '
> -       git blame --incremental --encoding=UTF-8 file | \
> -               egrep "^(author|summary) " > actual &&
> +       git blame --incremental --encoding=UTF-8 file >output &&
> +       sed -ne "/^\(author\|summary\) /p" output >actual &&
>         test_cmp actual expected
>  '
>
> @@ -84,8 +84,8 @@ EOF
>
>  test_expect_success !MINGW \
>         'blame respects --encoding=none' '
> -       git blame --incremental --encoding=none file | \
> -               egrep "^(author|summary) " > actual &&
> +       git blame --incremental --encoding=none file >output &&
> +       sed -ne "/^\(author\|summary\) /p" output >actual &&
>         test_cmp actual expected
>  '

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-21 17:32         ` [PATCH 2/2] t9200: " John Keeping
@ 2016-02-21 21:15           ` Eric Sunshine
  2016-02-21 23:43             ` John Keeping
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2016-02-21 21:15 UTC (permalink / raw)
  To: John Keeping; +Cc: Git List, Jeff King, Junio C Hamano

On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> GNU grep 2.23 detects the input used in this test as binary data so it
> does not work for extracting lines from a file.  We could add the "-a"
> option to force grep to treat the input as text, but not all
> implementations support that.  Instead, use sed to extract the desired
> lines since it will always treat its input as text.
>
> Signed-off-by: John Keeping <john@keeping.me.uk>
> ---
> diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
> @@ -35,7 +35,7 @@ exit 1
>  check_entries () {
>         # $1 == directory, $2 == expected
> -       grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
> +       sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual

This works with BSD sed, but double negatives are confusing. Have you
considered this instead?

    sed -ne '/^\//p' ...

>         if test -z "$2"
>         then
>                 >expected

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 21:01           ` Eric Sunshine
@ 2016-02-21 23:19             ` Jeff King
  2016-02-21 23:31               ` Eric Sunshine
  2016-02-21 23:41               ` John Keeping
  2016-02-21 23:31             ` Junio C Hamano
  1 sibling, 2 replies; 27+ messages in thread
From: Jeff King @ 2016-02-21 23:19 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: John Keeping, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 04:01:27PM -0500, Eric Sunshine wrote:

> On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> > GNU grep 2.23 detects the input used in this test as binary data so it
> > does not work for extracting lines from a file.  We could add the "-a"
> > option to force grep to treat the input as text, but not all
> > implementations support that.  Instead, use sed to extract the desired
> > lines since it will always treat its input as text.
> >
> > While touching these lines, modernize the test style to avoid hiding the
> > exit status of "git blame" and remove a space following a redirection
> > operator.
> >
> > Signed-off-by: John Keeping <john@keeping.me.uk>
> > ---
> > diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
> > @@ -35,8 +35,8 @@ EOF
> >  test_expect_success !MINGW \
> >         'blame respects i18n.commitencoding' '
> > -       git blame --incremental file | \
> > -               egrep "^(author|summary) " > actual &&
> > +       git blame --incremental file >output &&
> > +       sed -ne "/^\(author\|summary\) /p" output >actual &&
> 
> These tests all crash and burn with BSD sed (including Mac OS X) since
> you're not restricting yourself to BRE (basic regular expressions).
> You _could_ request extended regular expressions, which do work on
> those platforms, as well as with GNU sed:
> 
>     sed -nEe "/^(author|summary) /p" ...

At that point, I think we may as well use grep, because obscure
platforms are probably broken either way.

I'm tempted to just go the perl route. We already depend on at least a
baisc version of perl5 being installed for many of the other tests, so
it's not really introducing a new dependency.

Something like the patch below works for me. I think we could make it
shorter by using $PERLIO to get the raw behavior, but using binmode will
work even on ancient versions of perl.

John, if you agree on the direction, feel free to combine it with your
patch.

diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 847d098..f7a02d8 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -33,10 +33,20 @@ author $SJIS_NAME
 summary $SJIS_MSG
 EOF
 
+filter_blame () {
+	perl -e '
+		binmode STDIN;
+		binmode STDOUT;
+		while (<>) {
+			print if /^(author|summary) /;
+		}
+	'
+}
+
 test_expect_success !MINGW \
 	'blame respects i18n.commitencoding' '
 	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+		filter_blame >actual &&
 	test_cmp actual expected
 '
 
@@ -53,7 +63,7 @@ test_expect_success !MINGW \
 	'blame respects i18n.logoutputencoding' '
 	git config i18n.logoutputencoding eucJP &&
 	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
+		filter_blame > actual &&
 	test_cmp actual expected
 '
 
@@ -69,7 +79,7 @@ EOF
 test_expect_success !MINGW \
 	'blame respects --encoding=UTF-8' '
 	git blame --incremental --encoding=UTF-8 file | \
-		egrep "^(author|summary) " > actual &&
+		filter_blame >actual &&
 	test_cmp actual expected
 '
 
@@ -85,7 +95,7 @@ EOF
 test_expect_success !MINGW \
 	'blame respects --encoding=none' '
 	git blame --incremental --encoding=none file | \
-		egrep "^(author|summary) " > actual &&
+		filter_blame >actual &&
 	test_cmp actual expected
 '
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:19             ` Jeff King
@ 2016-02-21 23:31               ` Eric Sunshine
  2016-02-21 23:35                 ` Jeff King
  2016-02-21 23:41               ` John Keeping
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2016-02-21 23:31 UTC (permalink / raw)
  To: Jeff King; +Cc: John Keeping, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 6:19 PM, Jeff King <peff@peff.net> wrote:
> On Sun, Feb 21, 2016 at 04:01:27PM -0500, Eric Sunshine wrote:
>> On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
>> > -       git blame --incremental file | \
>> > -               egrep "^(author|summary) " > actual &&
>> > +       git blame --incremental file >output &&
>> > +       sed -ne "/^\(author\|summary\) /p" output >actual &&
>>
>> These tests all crash and burn with BSD sed (including Mac OS X) since
>> you're not restricting yourself to BRE (basic regular expressions).
>> You _could_ request extended regular expressions, which do work on
>> those platforms, as well as with GNU sed:
>>
>>     sed -nEe "/^(author|summary) /p" ...
>
> At that point, I think we may as well use grep, because obscure
> platforms are probably broken either way.

I came to the same conclusion but forgot to say so at the end of my message.

> I'm tempted to just go the perl route. We already depend on at least a
> baisc version of perl5 being installed for many of the other tests, so
> it's not really introducing a new dependency.
>
> Something like the patch below works for me. I think we could make it
> shorter by using $PERLIO to get the raw behavior, but using binmode will
> work even on ancient versions of perl.
>
> +filter_blame () {
> +       perl -e '
> +               binmode STDIN;
> +               binmode STDOUT;

I was worried about binmode() due to some vague recollection from
years and years ago of it being problematic on Windows, but I see
these tests are all protected by !MINGW anyhow...

> +               while (<>) {
> +                       print if /^(author|summary) /;
> +               }
> +       '
> +}
> +
>  test_expect_success !MINGW \
>         'blame respects i18n.commitencoding' '
>         git blame --incremental file | \
> -               egrep "^(author|summary) " > actual &&
> +               filter_blame >actual &&
>         test_cmp actual expected
>  '
>
> @@ -53,7 +63,7 @@ test_expect_success !MINGW \
>         'blame respects i18n.logoutputencoding' '
>         git config i18n.logoutputencoding eucJP &&
>         git blame --incremental file | \
> -               egrep "^(author|summary) " > actual &&
> +               filter_blame > actual &&
>         test_cmp actual expected
>  '
>
> @@ -69,7 +79,7 @@ EOF
>  test_expect_success !MINGW \
>         'blame respects --encoding=UTF-8' '
>         git blame --incremental --encoding=UTF-8 file | \
> -               egrep "^(author|summary) " > actual &&
> +               filter_blame >actual &&
>         test_cmp actual expected
>  '
>
> @@ -85,7 +95,7 @@ EOF
>  test_expect_success !MINGW \
>         'blame respects --encoding=none' '
>         git blame --incremental --encoding=none file | \
> -               egrep "^(author|summary) " > actual &&
> +               filter_blame >actual &&
>         test_cmp actual expected
>  '

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 21:01           ` Eric Sunshine
  2016-02-21 23:19             ` Jeff King
@ 2016-02-21 23:31             ` Junio C Hamano
  2016-02-21 23:40               ` Eric Sunshine
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-02-21 23:31 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: John Keeping, Git List, Jeff King

Eric Sunshine <sunshine@sunshineco.com> writes:

> These tests all crash and burn with BSD sed (including Mac OS X) since
> you're not restricting yourself to BRE (basic regular expressions).
> You _could_ request extended regular expressions, which do work on
> those platforms, as well as with GNU sed:
>
>     sed -nEe "/^(author|summary) /p" ...

An obvious way to avoid any RE is to write it as two separate
statements.  As there are repeated invocations of this filtering
in this script, perhaps a helper function can hide this ugliness?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:31               ` Eric Sunshine
@ 2016-02-21 23:35                 ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2016-02-21 23:35 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: John Keeping, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 06:31:08PM -0500, Eric Sunshine wrote:

> > Something like the patch below works for me. I think we could make it
> > shorter by using $PERLIO to get the raw behavior, but using binmode will
> > work even on ancient versions of perl.
> >
> > +filter_blame () {
> > +       perl -e '
> > +               binmode STDIN;
> > +               binmode STDOUT;
> 
> I was worried about binmode() due to some vague recollection from
> years and years ago of it being problematic on Windows, but I see
> these tests are all protected by !MINGW anyhow...

Thanks for mentioning that. I meant to put a note on that at the end of
_my_ message, but forgot. :)

It does mean we won't do CRLF processing. We could get around that with
some explicit `chomp`-ing, I think. Or just leave it as-is and assume
these will lose the !MINGW prereq.

I see Junio just mentioned elsewhere that we can simply avoid the
extended regular expressions by using two sed commands. That would be
fine with me, too.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:31             ` Junio C Hamano
@ 2016-02-21 23:40               ` Eric Sunshine
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2016-02-21 23:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Keeping, Git List, Jeff King

On Sun, Feb 21, 2016 at 6:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> These tests all crash and burn with BSD sed (including Mac OS X) since
>> you're not restricting yourself to BRE (basic regular expressions).
>> You _could_ request extended regular expressions, which do work on
>> those platforms, as well as with GNU sed:
>>
>>     sed -nEe "/^(author|summary) /p" ...
>
> An obvious way to avoid any RE is to write it as two separate
> statements.

Yes, that's even better; now I feel stupid for not thinking of it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:19             ` Jeff King
  2016-02-21 23:31               ` Eric Sunshine
@ 2016-02-21 23:41               ` John Keeping
  2016-02-21 23:50                 ` Eric Sunshine
                                   ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: John Keeping @ 2016-02-21 23:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 06:19:14PM -0500, Jeff King wrote:
> On Sun, Feb 21, 2016 at 04:01:27PM -0500, Eric Sunshine wrote:
> 
> > On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> > > GNU grep 2.23 detects the input used in this test as binary data so it
> > > does not work for extracting lines from a file.  We could add the "-a"
> > > option to force grep to treat the input as text, but not all
> > > implementations support that.  Instead, use sed to extract the desired
> > > lines since it will always treat its input as text.
> > >
> > > While touching these lines, modernize the test style to avoid hiding the
> > > exit status of "git blame" and remove a space following a redirection
> > > operator.
> > >
> > > Signed-off-by: John Keeping <john@keeping.me.uk>
> > > ---
> > > diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
> > > @@ -35,8 +35,8 @@ EOF
> > >  test_expect_success !MINGW \
> > >         'blame respects i18n.commitencoding' '
> > > -       git blame --incremental file | \
> > > -               egrep "^(author|summary) " > actual &&
> > > +       git blame --incremental file >output &&
> > > +       sed -ne "/^\(author\|summary\) /p" output >actual &&
> > 
> > These tests all crash and burn with BSD sed (including Mac OS X) since
> > you're not restricting yourself to BRE (basic regular expressions).
> > You _could_ request extended regular expressions, which do work on
> > those platforms, as well as with GNU sed:
> > 
> >     sed -nEe "/^(author|summary) /p" ...
> 
> At that point, I think we may as well use grep, because obscure
> platforms are probably broken either way.

Also GNU sed doesn't understand "-E", it uses "-r" for --regexp-extended.

> I'm tempted to just go the perl route. We already depend on at least a
> baisc version of perl5 being installed for many of the other tests, so
> it's not really introducing a new dependency.
> 
> Something like the patch below works for me. I think we could make it
> shorter by using $PERLIO to get the raw behavior, but using binmode will
> work even on ancient versions of perl.
> 
> John, if you agree on the direction, feel free to combine it with your
> patch.

My original sed version was:

	sed -ne "/^author /p" -e "/^summary /p"

which I think will work on all platforms (we already use it in
t0000-basic.sh) but then I decided to be too clever :-(

I still think sed is simpler than introducing a new function to wrap a
perl script.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-21 21:15           ` Eric Sunshine
@ 2016-02-21 23:43             ` John Keeping
  2016-02-22  0:04               ` Eric Sunshine
  2016-02-22 22:25               ` Jeff King
  0 siblings, 2 replies; 27+ messages in thread
From: John Keeping @ 2016-02-21 23:43 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Jeff King, Junio C Hamano

On Sun, Feb 21, 2016 at 04:15:31PM -0500, Eric Sunshine wrote:
> On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> > GNU grep 2.23 detects the input used in this test as binary data so it
> > does not work for extracting lines from a file.  We could add the "-a"
> > option to force grep to treat the input as text, but not all
> > implementations support that.  Instead, use sed to extract the desired
> > lines since it will always treat its input as text.
> >
> > Signed-off-by: John Keeping <john@keeping.me.uk>
> > ---
> > diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
> > @@ -35,7 +35,7 @@ exit 1
> >  check_entries () {
> >         # $1 == directory, $2 == expected
> > -       grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
> > +       sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
> 
> This works with BSD sed, but double negatives are confusing. Have you
> considered this instead?
> 
>     sed -ne '/^\//p' ...

What do you mean double negatives?  Do you mean using "!" as an
alternative delimiter?  I find changing delimters is normally simpler
than following multiple levels of quoting for escaping slashes, although
in this case it's simple enough that it doesn't make much difference.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:41               ` John Keeping
@ 2016-02-21 23:50                 ` Eric Sunshine
  2016-02-22 22:18                 ` Jeff King
  2016-02-23 23:01                 ` Junio C Hamano
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2016-02-21 23:50 UTC (permalink / raw)
  To: John Keeping; +Cc: Jeff King, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 6:41 PM, John Keeping <john@keeping.me.uk> wrote:
> On Sun, Feb 21, 2016 at 06:19:14PM -0500, Jeff King wrote:
>> On Sun, Feb 21, 2016 at 04:01:27PM -0500, Eric Sunshine wrote:
>> > These tests all crash and burn with BSD sed (including Mac OS X) since
>> > you're not restricting yourself to BRE (basic regular expressions).
>> > You _could_ request extended regular expressions, which do work on
>> > those platforms, as well as with GNU sed:
>> >
>> >     sed -nEe "/^(author|summary) /p" ...
>>
>> At that point, I think we may as well use grep, because obscure
>> platforms are probably broken either way.
>
> Also GNU sed doesn't understand "-E", it uses "-r" for --regexp-extended.

It actually does recognize -E in all the versions I've tested,
however, apparently it's undocumented (thus probably should be
avoided).

> My original sed version was:
>
>         sed -ne "/^author /p" -e "/^summary /p"
>
> which I think will work on all platforms (we already use it in
> t0000-basic.sh) but then I decided to be too clever :-(

The unclever version seems fine.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-21 23:43             ` John Keeping
@ 2016-02-22  0:04               ` Eric Sunshine
  2016-02-22 22:25               ` Jeff King
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2016-02-22  0:04 UTC (permalink / raw)
  To: John Keeping; +Cc: Git List, Jeff King, Junio C Hamano

On Sun, Feb 21, 2016 at 6:43 PM, John Keeping <john@keeping.me.uk> wrote:
> On Sun, Feb 21, 2016 at 04:15:31PM -0500, Eric Sunshine wrote:
>> On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
>> > GNU grep 2.23 detects the input used in this test as binary data so it
>> > does not work for extracting lines from a file.  We could add the "-a"
>> > option to force grep to treat the input as text, but not all
>> > implementations support that.  Instead, use sed to extract the desired
>> > lines since it will always treat its input as text.
>> >
>> > Signed-off-by: John Keeping <john@keeping.me.uk>
>> > ---
>> > diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
>> > @@ -35,7 +35,7 @@ exit 1
>> >  check_entries () {
>> >         # $1 == directory, $2 == expected
>> > -       grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
>> > +       sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
>>
>> This works with BSD sed, but double negatives are confusing. Have you
>> considered this instead?
>>
>>     sed -ne '/^\//p' ...
>
> What do you mean double negatives?  Do you mean using "!" as an
> alternative delimiter?  I find changing delimters is normally simpler
> than following multiple levels of quoting for escaping slashes, although
> in this case it's simple enough that it doesn't make much difference.

Nice, I learned something new today. If I recall correctly, historic
sed did not allow the delimiter to be changed (or it wasn't documented
or I simply forgot about the capability). So, feel free to ignore me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:41               ` John Keeping
  2016-02-21 23:50                 ` Eric Sunshine
@ 2016-02-22 22:18                 ` Jeff King
  2016-02-22 22:25                   ` Junio C Hamano
  2016-02-23 23:01                 ` Junio C Hamano
  2 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2016-02-22 22:18 UTC (permalink / raw)
  To: John Keeping; +Cc: Eric Sunshine, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 11:41:35PM +0000, John Keeping wrote:

> My original sed version was:
> 
> 	sed -ne "/^author /p" -e "/^summary /p"
> 
> which I think will work on all platforms (we already use it in
> t0000-basic.sh) but then I decided to be too clever :-(
> 
> I still think sed is simpler than introducing a new function to wrap a
> perl script.

Yeah, I think that is good (personally I'd use a function anyway, but I
think it is short enough that we could go either way).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-21 23:43             ` John Keeping
  2016-02-22  0:04               ` Eric Sunshine
@ 2016-02-22 22:25               ` Jeff King
  2016-02-23 22:55                 ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2016-02-22 22:25 UTC (permalink / raw)
  To: John Keeping; +Cc: Eric Sunshine, Git List, Junio C Hamano

On Sun, Feb 21, 2016 at 11:43:45PM +0000, John Keeping wrote:

> On Sun, Feb 21, 2016 at 04:15:31PM -0500, Eric Sunshine wrote:
> > On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
> > > GNU grep 2.23 detects the input used in this test as binary data so it
> > > does not work for extracting lines from a file.  We could add the "-a"
> > > option to force grep to treat the input as text, but not all
> > > implementations support that.  Instead, use sed to extract the desired
> > > lines since it will always treat its input as text.
> > >
> > > Signed-off-by: John Keeping <john@keeping.me.uk>
> > > ---
> > > diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
> > > @@ -35,7 +35,7 @@ exit 1
> > >  check_entries () {
> > >         # $1 == directory, $2 == expected
> > > -       grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
> > > +       sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
> > 
> > This works with BSD sed, but double negatives are confusing. Have you
> > considered this instead?
> > 
> >     sed -ne '/^\//p' ...
> 
> What do you mean double negatives?  Do you mean using "!" as an
> alternative delimiter?  I find changing delimters is normally simpler
> than following multiple levels of quoting for escaping slashes, although
> in this case it's simple enough that it doesn't make much difference.

I agree that changing delimiters is much nicer than backslashes. But I
wonder if using "!" is more confusing than it needs to be, given its
other meanings.

I dunno. I admit that the backslash threw me off, too (since it needs
escaped in interactive shells, I first assumed that's what was going
on). Using backslash to select the delimiter was new to me. I've usually
seen:

  s!/foo/!/bar/!

which is arguably a little more clear. Too bad we cannot do:

  m!/foo!

which I think reads better. Oh well. Maybe:

  sed -ne '\#^/#p'

would be more readable, but I'm just bikeshedding at this point.  The
grep invocation really was the most clear. :-/

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-22 22:18                 ` Jeff King
@ 2016-02-22 22:25                   ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2016-02-22 22:25 UTC (permalink / raw)
  To: Jeff King; +Cc: John Keeping, Eric Sunshine, Git List

Jeff King <peff@peff.net> writes:

> On Sun, Feb 21, 2016 at 11:41:35PM +0000, John Keeping wrote:
>
>> My original sed version was:
>> 
>> 	sed -ne "/^author /p" -e "/^summary /p"
>> 
>> which I think will work on all platforms (we already use it in
>> t0000-basic.sh) but then I decided to be too clever :-(
>> 
>> I still think sed is simpler than introducing a new function to wrap a
>> perl script.
>
> Yeah, I think that is good (personally I'd use a function anyway, but I
> think it is short enough that we could go either way).

Agreed, and because there are repeated invocation of the same sed
script in this file, it would be sensible to hide it behind a helper
function.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] t9200: avoid grep on non-ASCII data
  2016-02-22 22:25               ` Jeff King
@ 2016-02-23 22:55                 ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2016-02-23 22:55 UTC (permalink / raw)
  To: Jeff King; +Cc: John Keeping, Eric Sunshine, Git List

Jeff King <peff@peff.net> writes:

> On Sun, Feb 21, 2016 at 11:43:45PM +0000, John Keeping wrote:
>
>> On Sun, Feb 21, 2016 at 04:15:31PM -0500, Eric Sunshine wrote:
>> > On Sun, Feb 21, 2016 at 12:32 PM, John Keeping <john@keeping.me.uk> wrote:
>> > > GNU grep 2.23 detects the input used in this test as binary data so it
>> > > does not work for extracting lines from a file.  We could add the "-a"
>> > > option to force grep to treat the input as text, but not all
>> > > implementations support that.  Instead, use sed to extract the desired
>> > > lines since it will always treat its input as text.
>> > >
>> > > Signed-off-by: John Keeping <john@keeping.me.uk>
>> > > ---
>> > > diff --git a/t/t9200-git-cvsexportcommit.sh b/t/t9200-git-cvsexportcommit.sh
>> > > @@ -35,7 +35,7 @@ exit 1
>> > >  check_entries () {
>> > >         # $1 == directory, $2 == expected
>> > > -       grep '^/' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
>> > > +       sed -ne '\!^/!p' "$1/CVS/Entries" | sort | cut -d/ -f2,3,5 >actual
>> > 
>> > This works with BSD sed, but double negatives are confusing. Have you
>> > considered this instead?
>> > 
>> >     sed -ne '/^\//p' ...
>> 
>> What do you mean double negatives?  Do you mean using "!" as an
>> alternative delimiter?  I find changing delimters is normally simpler
>> than following multiple levels of quoting for escaping slashes, although
>> in this case it's simple enough that it doesn't make much difference.
>
> I agree that changing delimiters is much nicer than backslashes. But I
> wonder if using "!" is more confusing than it needs to be, given its
> other meanings.
>
> I dunno. I admit that the backslash threw me off, too (since it needs
> escaped in interactive shells, I first assumed that's what was going
> on). Using backslash to select the delimiter was new to me. I've usually
> seen:
>
>   s!/foo/!/bar/!
>
> which is arguably a little more clear. Too bad we cannot do:
>
>   m!/foo!
>
> which I think reads better. Oh well. Maybe:
>
>   sed -ne '\#^/#p'
>
> would be more readable, but I'm just bikeshedding at this point.  The
> grep invocation really was the most clear. :-/

Eric's '/^\//' was the most straight-forward and easiest to see what
is going on, I would think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-21 23:41               ` John Keeping
  2016-02-21 23:50                 ` Eric Sunshine
  2016-02-22 22:18                 ` Jeff King
@ 2016-02-23 23:01                 ` Junio C Hamano
  2016-02-24 10:24                   ` John Keeping
  2 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-02-23 23:01 UTC (permalink / raw)
  To: John Keeping; +Cc: Jeff King, Eric Sunshine, Git List

John Keeping <john@keeping.me.uk> writes:

> My original sed version was:
>
> 	sed -ne "/^author /p" -e "/^summary /p"
>
> which I think will work on all platforms (we already use it in
> t0000-basic.sh) but then I decided to be too clever :-(
>
> I still think sed is simpler than introducing a new function to wrap a
> perl script.

Let's do this, before everybody forgets what we discussed.

-- >8 --
From: John Keeping <john@keeping.me.uk>
Date: Sun, 21 Feb 2016 17:32:21 +0000
Subject: [PATCH] t8005: avoid grep on non-ASCII data

GNU grep 2.23 detects the input used in this test as binary data so it
does not work for extracting lines from a file.  We could add the "-a"
option to force grep to treat the input as text, but not all
implementations support that.  Instead, use sed to extract the desired
lines since it will always treat its input as text.

While touching these lines, modernize the test style to avoid hiding the
exit status of "git blame" and remove a space following a redirection
operator.  Also swap the order of the expected and actual output
files given to test_cmp; we compare expect and actual to show how
actual output differs from what is expected.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t8005-blame-i18n.sh | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 847d098..75da219 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -33,11 +33,15 @@ author $SJIS_NAME
 summary $SJIS_MSG
 EOF
 
+filter_author_summary () {
+	sed -n -e '/^author /p' -e '/^summary /p' "$@"
+}
+
 test_expect_success !MINGW \
 	'blame respects i18n.commitencoding' '
-	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
-	test_cmp actual expected
+	git blame --incremental file >output &&
+	filter_author_summary output >actual &&
+	test_cmp expected actual
 '
 
 cat >expected <<EOF
@@ -52,9 +56,9 @@ EOF
 test_expect_success !MINGW \
 	'blame respects i18n.logoutputencoding' '
 	git config i18n.logoutputencoding eucJP &&
-	git blame --incremental file | \
-		egrep "^(author|summary) " > actual &&
-	test_cmp actual expected
+	git blame --incremental file >output &&
+	filter_author_summary output >actual &&
+	test_cmp expected actual
 '
 
 cat >expected <<EOF
@@ -68,9 +72,9 @@ EOF
 
 test_expect_success !MINGW \
 	'blame respects --encoding=UTF-8' '
-	git blame --incremental --encoding=UTF-8 file | \
-		egrep "^(author|summary) " > actual &&
-	test_cmp actual expected
+	git blame --incremental --encoding=UTF-8 file >output &&
+	filter_author_summary output >actual &&
+	test_cmp expected actual
 '
 
 cat >expected <<EOF
@@ -84,9 +88,9 @@ EOF
 
 test_expect_success !MINGW \
 	'blame respects --encoding=none' '
-	git blame --incremental --encoding=none file | \
-		egrep "^(author|summary) " > actual &&
-	test_cmp actual expected
+	git blame --incremental --encoding=none file >output &&
+	filter_author_summary output >actual &&
+	test_cmp expected actual
 '
 
 test_done
-- 
2.7.2-532-g79873b4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] t8005: avoid grep on non-ASCII data
  2016-02-23 23:01                 ` Junio C Hamano
@ 2016-02-24 10:24                   ` John Keeping
  0 siblings, 0 replies; 27+ messages in thread
From: John Keeping @ 2016-02-24 10:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eric Sunshine, Git List

On Tue, Feb 23, 2016 at 03:01:43PM -0800, Junio C Hamano wrote:
> John Keeping <john@keeping.me.uk> writes:
> 
> > My original sed version was:
> >
> > 	sed -ne "/^author /p" -e "/^summary /p"
> >
> > which I think will work on all platforms (we already use it in
> > t0000-basic.sh) but then I decided to be too clever :-(
> >
> > I still think sed is simpler than introducing a new function to wrap a
> > perl script.
> 
> Let's do this, before everybody forgets what we discussed.

Thanks, this looks good to me.

> -- >8 --
> From: John Keeping <john@keeping.me.uk>
> Date: Sun, 21 Feb 2016 17:32:21 +0000
> Subject: [PATCH] t8005: avoid grep on non-ASCII data
> 
> GNU grep 2.23 detects the input used in this test as binary data so it
> does not work for extracting lines from a file.  We could add the "-a"
> option to force grep to treat the input as text, but not all
> implementations support that.  Instead, use sed to extract the desired
> lines since it will always treat its input as text.
> 
> While touching these lines, modernize the test style to avoid hiding the
> exit status of "git blame" and remove a space following a redirection
> operator.  Also swap the order of the expected and actual output
> files given to test_cmp; we compare expect and actual to show how
> actual output differs from what is expected.
> 
> Signed-off-by: John Keeping <john@keeping.me.uk>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  t/t8005-blame-i18n.sh | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
> index 847d098..75da219 100755
> --- a/t/t8005-blame-i18n.sh
> +++ b/t/t8005-blame-i18n.sh
> @@ -33,11 +33,15 @@ author $SJIS_NAME
>  summary $SJIS_MSG
>  EOF
>  
> +filter_author_summary () {
> +	sed -n -e '/^author /p' -e '/^summary /p' "$@"
> +}
> +
>  test_expect_success !MINGW \
>  	'blame respects i18n.commitencoding' '
> -	git blame --incremental file | \
> -		egrep "^(author|summary) " > actual &&
> -	test_cmp actual expected
> +	git blame --incremental file >output &&
> +	filter_author_summary output >actual &&
> +	test_cmp expected actual
>  '
>  
>  cat >expected <<EOF
> @@ -52,9 +56,9 @@ EOF
>  test_expect_success !MINGW \
>  	'blame respects i18n.logoutputencoding' '
>  	git config i18n.logoutputencoding eucJP &&
> -	git blame --incremental file | \
> -		egrep "^(author|summary) " > actual &&
> -	test_cmp actual expected
> +	git blame --incremental file >output &&
> +	filter_author_summary output >actual &&
> +	test_cmp expected actual
>  '
>  
>  cat >expected <<EOF
> @@ -68,9 +72,9 @@ EOF
>  
>  test_expect_success !MINGW \
>  	'blame respects --encoding=UTF-8' '
> -	git blame --incremental --encoding=UTF-8 file | \
> -		egrep "^(author|summary) " > actual &&
> -	test_cmp actual expected
> +	git blame --incremental --encoding=UTF-8 file >output &&
> +	filter_author_summary output >actual &&
> +	test_cmp expected actual
>  '
>  
>  cat >expected <<EOF
> @@ -84,9 +88,9 @@ EOF
>  
>  test_expect_success !MINGW \
>  	'blame respects --encoding=none' '
> -	git blame --incremental --encoding=none file | \
> -		egrep "^(author|summary) " > actual &&
> -	test_cmp actual expected
> +	git blame --incremental --encoding=none file >output &&
> +	filter_author_summary output >actual &&
> +	test_cmp expected actual
>  '
>  
>  test_done
> -- 
> 2.7.2-532-g79873b4

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-02-24 10:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-07 16:25 Test failures with GNU grep 2.23 John Keeping
2016-02-19 11:59 ` Jeff King
2016-02-19 17:27   ` Eric Sunshine
2016-02-19 17:38   ` Junio C Hamano
2016-02-19 19:11     ` Jeff King
2016-02-19 19:23     ` John Keeping
2016-02-19 19:33       ` Jeff King
2016-02-21 17:32         ` [PATCH 0/2] Fix test " John Keeping
2016-02-21 17:32         ` [PATCH 1/2] t8005: avoid grep on non-ASCII data John Keeping
2016-02-21 21:01           ` Eric Sunshine
2016-02-21 23:19             ` Jeff King
2016-02-21 23:31               ` Eric Sunshine
2016-02-21 23:35                 ` Jeff King
2016-02-21 23:41               ` John Keeping
2016-02-21 23:50                 ` Eric Sunshine
2016-02-22 22:18                 ` Jeff King
2016-02-22 22:25                   ` Junio C Hamano
2016-02-23 23:01                 ` Junio C Hamano
2016-02-24 10:24                   ` John Keeping
2016-02-21 23:31             ` Junio C Hamano
2016-02-21 23:40               ` Eric Sunshine
2016-02-21 17:32         ` [PATCH 2/2] t9200: " John Keeping
2016-02-21 21:15           ` Eric Sunshine
2016-02-21 23:43             ` John Keeping
2016-02-22  0:04               ` Eric Sunshine
2016-02-22 22:25               ` Jeff King
2016-02-23 22:55                 ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.