All of lore.kernel.org
 help / color / mirror / Atom feed
From: Todd Zullinger <tmz@pobox.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Andreas Schwab" <schwab@linux-m68k.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Beat Bolli" <dev+git@drbeat.li>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH] t7812: expect failure for grep -i with invalid UTF-8 data
Date: Sun, 1 Dec 2019 13:32:03 -0500	[thread overview]
Message-ID: <20191201183203.GC17681@pobox.com> (raw)
In-Reply-To: <xmqqo8wsypit.fsf@gitster-ct.c.googlers.com>

Hi,

Junio C Hamano wrote:
> Andreas Schwab <schwab@linux-m68k.org> writes:
> 
>> On Nov 29 2019, Todd Zullinger wrote:
>>
>>> When the 'grep with invalid UTF-8 data' tests were added/adjusted in
>>> 8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26)
>>> and 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
>>> 2019-07-26) they lacked a redirect which caused them to falsely succeed
>>> on most architectures.  They failed on big-endian arches where the test
>>> never reached the portion which was missing the redirect.
>>
>> It's not about big vs little endian, it's only about JIT vs non-JIT.
> 
> So, which one of JIT / non-JIT sides did the test fail unexpectedly?

On s390x, the initial:

    test_might_fail git grep -hi "Æ" invalid-0x80 >actual

fails to produce any output in actual, but since we use
test_might_fail, the test happily continues to:

    test_cmp expected actual

which fails.

The test output from and s390x build:

    expecting success of 7812.11 'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i':
        test_might_fail git grep -hi "Æ" invalid-0x80 >actual &&
        test_cmp expected actual &&
        test_must_fail git grep -hi "(*NO_JIT)Æ" invalid-0x80 &&
        test_cmp expected actual
    ++ test_might_fail git grep -hi Æ invalid-0x80
    ++ test_must_fail ok=success git grep -hi Æ invalid-0x80
    ++ case "$1" in
    ++ _test_ok=success
    ++ shift
    ++ git grep -hi Æ invalid-0x80
    fatal: pcre2_match failed with error code -22: UTF-8 error: isolated byte with 0x80 bit set
    ++ exit_code=128
    ++ test 128 -eq 0
    ++ test_match_signal 13 128
    ++ test 128 = 141
    ++ test 128 = 269
    ++ return 1
    ++ test 128 -gt 129
    ++ test 128 -eq 127
    ++ test 128 -eq 126
    ++ return 0
    ++ test_cmp expected actual
    ++ diff -u expected actual
    --- expected    2019-10-19 21:56:08.634252012 +0000
    +++ actual      2019-10-19 21:56:08.714252012 +0000
    @@ -1 +0,0 @@
    -ævar
    error: last command exited with $?=1
    not ok 11 - PCRE v2: grep non-ASCII from invalid UTF-8 data with -i
    #
    #               test_might_fail git grep -hi "Æ" invalid-0x80 >actual &&
    #               test_cmp expected actual &&
    #               test_must_fail git grep -hi "(*NO_JIT)Æ" invalid-0x80 &&
    #               test_cmp expected actual
    #
    # failed 1 among 11 test(s)

After Andreas' missing redirect fix, that still fails on
s390x (not surprisingly).  But now systems with JIT enabled
fail at:

    test_must_fail git grep -hi "(*NO_JIT)Æ" invalid-0x80 >actual &&
    test_cmp expected actual

Though we say that the command must fail, so we shouldn't be
surprised that 'expect' and 'actual' don't match.  It would
be more surprising if they did. :)

> Should I do s/on big-endian arches/with PCRE with JIT disabled/
> while queuing the patch?

Here's how I changed the commit message locally.  I was
going to wait a day or so for any other feedback on the
actual test changes, being a holiday weekend in the US (and
more generally a weekend).

1:  d9aeaf0c98 ! 1:  d0c083db78 t7812: expect failure for grep -i with invalid UTF-8 data
    @@ Commit message
         8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26)
         and 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
         2019-07-26) they lacked a redirect which caused them to falsely succeed
    -    on most architectures.  They failed on big-endian arches where the test
    -    never reached the portion which was missing the redirect.
    +    on most systems.  The 'grep -i' test failed on systems where JIT was
    +    disabled as it never reached the portion which was missing the redirect.
     
    -    A recent patch add the missing redirect and exposed the fact that the
    -    'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i' test fails on
    -    all architectures.
    +    A recent patch added the missing redirect and exposed the fact that the
    +    'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i' test fails
    +    regardless of whether JIT is enabled.
     
         Based on the final paragraph in in 870eea8166:

Thanks for pointing out the proper reasoning to use in the
commit message Andreas.  I hadn't looked at the Fedora pcre2
package to see that it explicitly disables JIT on s390x.

I'm not sure if s390x is supported upstream or not -- it
doesn't appear to have a specific entry in the sljit config
header¹, so it seems likely it's not well-tested at the
least.  (Not that any of that is our concern here.)

¹ https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitConfigInternal.h

Thanks for the follow-up Junio.

-- 
Todd

  parent reply	other threads:[~2019-12-01 18:32 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-21 19:40 [PATCH] grep: use custom JIT stack with pcre2 Carlo Marcelo Arenas Belón
2019-07-24 15:14 ` [PATCH 0/3] grep: PCRE JIT fixes Ævar Arnfjörð Bjarmason
2019-07-24 16:18   ` Junio C Hamano
2019-07-24 20:03     ` Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 0/8] grep: PCRE JIT fixes + ab/no-kwset fix Ævar Arnfjörð Bjarmason
2019-07-26 20:27     ` Junio C Hamano
2019-07-29  9:20       ` Ævar Arnfjörð Bjarmason
2019-07-29 16:12         ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 1/8] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 2/8] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-29  0:33     ` Carlo Arenas
2019-07-26 15:08   ` [PATCH v2 3/8] grep: stop using a custom JIT stack with PCRE v1 Ævar Arnfjörð Bjarmason
2019-07-29  1:26     ` Carlo Arenas
2019-07-26 15:08   ` [PATCH v2 4/8] grep: consistently use "p->fixed" in compile_regexp() Ævar Arnfjörð Bjarmason
2019-07-29  1:48     ` Carlo Arenas
2019-07-29  9:05       ` Ævar Arnfjörð Bjarmason
2019-07-29  9:13         ` Ævar Arnfjörð Bjarmason
2019-07-29 16:23           ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 5/8] grep: create a "is_fixed" member in "grep_pat" Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 6/8] grep: stess test PCRE v2 on invalid UTF-8 data Ævar Arnfjörð Bjarmason
2019-07-26 20:34     ` Junio C Hamano
2019-07-26 21:55       ` Ævar Arnfjörð Bjarmason
2019-07-29  3:06     ` Carlo Arenas
2019-11-26 21:50     ` [PATCH] t7812: add missing redirects Andreas Schwab
2019-11-26 22:27       ` Johannes Schindelin
2019-11-26 23:11         ` Andreas Schwab
2019-11-27 11:58           ` Jeff King
2019-11-30  0:46       ` [PATCH] t7812: expect failure for grep -i with invalid UTF-8 data Todd Zullinger
2019-11-30  8:00         ` Andreas Schwab
2019-12-01 16:33           ` Junio C Hamano
2019-12-01 17:09             ` Andreas Schwab
2019-12-01 18:32             ` Todd Zullinger [this message]
2019-12-02  6:13               ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 7/8] grep: do not enter PCRE2_UTF mode on fixed matching Ævar Arnfjörð Bjarmason
2019-07-26 20:36     ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 8/8] grep: optimistically use PCRE2_MATCH_INVALID_UTF Ævar Arnfjörð Bjarmason
2019-07-26 21:07     ` Junio C Hamano
2019-07-26 21:53       ` Ævar Arnfjörð Bjarmason
2019-07-26 21:57         ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 0/4] grep: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 0/2] " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 13:53         ` Ramsay Jones
2021-01-24 14:24           ` Ramsay Jones
2021-01-24 14:49             ` Ævar Arnfjörð Bjarmason
2021-01-24 16:10               ` Ramsay Jones
2021-01-24 17:29                 ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 1/4] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 2/4] grep/pcre2: simplify boolean spaghetti Ævar Arnfjörð Bjarmason
2021-01-24  5:33       ` Junio C Hamano
2021-01-24 10:45         ` Johannes Sixt
2021-01-24  2:12     ` [PATCH v3 3/4] grep/pcre2: further " Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 4/4] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 1/3] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 2/3] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-24 16:24   ` Junio C Hamano
2019-07-24 20:06     ` Ævar Arnfjörð Bjarmason
2019-07-25  5:11       ` Carlo Arenas
2019-07-24 15:14 ` [PATCH 3/3] grep: stop using a custom JIT stack with PCRE v1 Ævar Arnfjörð Bjarmason
2019-07-26 13:15   ` Carlo Arenas
2019-07-26 13:50     ` Ævar Arnfjörð Bjarmason
2019-07-26 14:12       ` Carlo Arenas
2019-07-26 14:43         ` Ævar Arnfjörð Bjarmason
2019-07-26 20:26           ` [RFC PATCH 0/2] PCRE1 cleanup Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 1/2] grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1 Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 2/2] grep: refactor and simplify PCRE1 support Carlo Marcelo Arenas Belón

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191201183203.GC17681@pobox.com \
    --to=tmz@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=dev+git@drbeat.li \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.