git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Carlo Marcelo Arenas Belón" <carenas@gmail.com>
Cc: git@vger.kernel.org, emaste@freebsd.org, sunshine@sunshineco.com
Subject: Re: [PATCH v2] t4210: detect REG_ILLSEQ dynamically
Date: Fri, 15 May 2020 13:24:49 -0700	[thread overview]
Message-ID: <xmqqmu69kktq.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20200515195157.41217-1-carenas@gmail.com> ("Carlo Marcelo Arenas =?utf-8?Q?Bel=C3=B3n=22's?= message of "Fri, 15 May 2020 12:51:56 -0700")

Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
> index 10284cc56f..7a8ddce45b 100644
> --- a/t/helper/test-regex.c
> +++ b/t/helper/test-regex.c
> @@ -41,16 +41,21 @@ int cmd__regex(int argc, const char **argv)
>  {
>  	const char *pat;
>  	const char *str;
> -	int flags = 0;
> +	int ret, silent = 0, flags = 0;
>  	regex_t r;
>  	regmatch_t m[1];
> +	char errbuf[64];
>  
>  	if (argc == 2 && !strcmp(argv[1], "--bug"))
>  		return test_regex_bug();
>  	else if (argc < 3)
>  		usage("test-tool regex --bug\n"
> -		      "test-tool regex <pattern> <string> [<options>]");
> +		      "test-tool regex [--silent] <pattern> <string> [<options>]");
>
> +	if (!strcmp(argv[1], "--silent")) {
> +		silent = 1;
> +		argv++;
> +	}

This looks fishy---if argc==3 and the first one is "--silent", only
the <pattern> is left in argv and before taking <string> out of the
argv, we need to ensure argc is still large enough, but I do not
think that is done below:

>  	argv++;
>  	pat = *argv++;
>  	str = *argv++;

So str here would be NULL and/or *argv++ would have given you an
out-of-bounds access already.

> @@ -67,8 +72,14 @@ int cmd__regex(int argc, const char **argv)
>  	}
>  	git_setup_gettext();
>  
> -	if (regcomp(&r, pat, flags))
> -		die("failed regcomp() for pattern '%s'", pat);
> +	ret = regcomp(&r, pat, flags);
> +	if (ret) {
> +		if (silent)
> +			return 1;
> +
> +		regerror(ret, &r, errbuf, sizeof(errbuf));
> +		die("failed regcomp() for pattern '%s' (%s)", pat, errbuf);
> +	}
>  	if (regexec(&r, str, 1, m, 0))
>  		return 1;

Not that it matters _too_ much as this is merely a test helper and
it would not hurt anybody as long as our callers are careful.

> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> index c3792081e6..a89f456817 100755
> --- a/t/t4210-log-i18n.sh
> +++ b/t/t4210-log-i18n.sh
> @@ -10,6 +10,12 @@ latin1_e=$(printf '\351')
>  # invalid UTF-8
>  invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
>  
> +if test_have_prereq GETTEXT_LOCALE &&
> +	! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED
> +then
> +	have_reg_illseq=1
> +fi

OK.  Have we cleared have_reg_illseq shell variable before we reach
this point?  If not, we should (think: environment variable end user
had before starting the test).

> @@ -56,38 +62,68 @@ test_expect_success !MINGW 'log --grep does
>  	test_must_be_empty actual
>  '
>  
> +trigger_undefined_behaviour()
> +{

Style:

	triggers_undefined_behaviour () {

My first two readings of this patch mistakenly told me that the name
of the function was an instruction to the test to trigger an
undefined behaviour to see what happens, but this helper answers a
question "does the given engine trigger an undefined behaviour (with
the test data we are going to throw at it)?", right?  Perhaps rename
the helper to "triggerS_undefined_behaviour" would reduce the risk
of inviting such a misinterpretation.

> +	local engine=$1
> +
> +	case $engine in
> +	fixed)
> +		if test -n "$have_reg_illseq" &&
> +			! test_have_prereq LIBPCRE2
> +		then
> +			return 0
> +		else
> +			return 1
> +		fi
> +		;;
> +	basic|extended)
> +		if test -n "$have_reg_illseq"
> +		then
> +			return 0
> +		else
> +			return 1
> +		fi
> +		;;
> +	perl)
> +		return 1
> +		;;
> +	esac
> +}

... and the return value is true for "yes it would trigger undefined
behaviour" and false for "no it would not".

>  for engine in fixed basic extended perl
>  do
>  	prereq=
>  	if test $engine = "perl"
>  	then
> +		prereq=PCRE
>  	fi
>  	force_regex=
>  	if test $engine != "fixed"
>  	then
> +		force_regex='.*'
>  	fi
>  
>  	test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
> +		LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&

Can we do something to these overlong lines, by the way?

>  		test_must_be_empty actual
>  	"
>  
> +	if ! trigger_undefined_behaviour $engine
> +	then

Much easier to read than the ILLSEQ prerequisite, I would think,
even though the overlong lines are annoying.

> +		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
> +			cat >expect <<-\EOF &&
> +			latin1
> +			utf8
> +			EOF
> +			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
> +			test_cmp expect actual
> +		"
> +
> +		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
> +			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
> +			test_must_be_empty actual
> +		"
> +	fi
>  done
>  
>  test_done
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 0ea1e5a05e..81473fea1d 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1454,12 +1454,6 @@ case $uname_s in
>  	test_set_prereq SED_STRIPS_CR
>  	test_set_prereq GREP_STRIPS_CR
>  	;;
> -FreeBSD)
> -	test_set_prereq REGEX_ILLSEQ
> -	test_set_prereq POSIXPERM
> -	test_set_prereq BSLASHPSPEC
> -	test_set_prereq EXECKEEPSPID
> -	;;
>  *)
>  	test_set_prereq POSIXPERM
>  	test_set_prereq BSLASHPSPEC

Nice to be able to drop one case arm from here.  Thanks.

  reply	other threads:[~2020-05-15 20:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 11:16 [PATCH] t4210: detect REG_ILLSEQ dynamically Carlo Marcelo Arenas Belón
2020-05-13 15:44 ` Eric Sunshine
2020-05-13 16:20   ` Junio C Hamano
2020-05-13 20:18   ` Carlo Marcelo Arenas Belón
2020-05-13 20:37     ` Junio C Hamano
2020-05-13 21:04       ` Carlo Marcelo Arenas Belón
2020-05-13 18:02 ` [RFC PATCH v2] " Carlo Marcelo Arenas Belón
2020-05-13 20:40   ` Eric Sunshine
2020-05-15 18:00   ` Junio C Hamano
2020-05-15 18:18     ` Carlo Marcelo Arenas Belón
2020-05-15 19:51 ` [PATCH " Carlo Marcelo Arenas Belón
2020-05-15 20:24   ` Junio C Hamano [this message]
2020-05-15 21:48     ` Junio C Hamano
2020-05-18 18:44   ` [PATCH v3 0/2] auto detect REG_ILLSEQ Carlo Marcelo Arenas Belón
2020-05-18 18:44     ` [PATCH v3 1/2] t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ) Carlo Marcelo Arenas Belón
2020-05-18 20:15       ` Junio C Hamano
2020-05-18 18:44     ` [PATCH v3 2/2] t4210: detect REG_ILLSEQ dynamically and skip affected tests Carlo Marcelo Arenas Belón

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmu69kktq.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=carenas@gmail.com \
    --cc=emaste@freebsd.org \
    --cc=git@vger.kernel.org \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).