git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Victor Engmark <victor@engmark.name>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] userdiff: support Bash
Date: Tue, 20 Oct 2020 16:39:35 -0700	[thread overview]
Message-ID: <xmqqk0vk8o20.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name> (Victor Engmark's message of "Tue, 20 Oct 2020 20:10:49 +1300")

Victor Engmark <victor@engmark.name> writes:

It's a bit troubling not to see the most basic form, i.e.

	RIGHT () {
		: ChangeMe
	}

in the set of tests.

> diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment
> new file mode 100644
> index 0000000000..f1edbeda31
> --- /dev/null
> +++ b/t/t4018/bash-trailing-comment
> @@ -0,0 +1,4 @@
> +RIGHT() { # Comment
> +
> +    ChangeMe
> +}

This is the closest, but it tests "# with comment" at the same time.

> diff --git a/userdiff.c b/userdiff.c
> index fde02f225b..9de0497007 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -23,6 +23,12 @@ IPATTERN("ada",
>  	 "[a-zA-Z][a-zA-Z0-9_]*"
>  	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
>  	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
> +PATTERNS("bash",
> +	 /* POSIX & bashism form; all four compound command types */
> +	 "^[ \t]*((function[ \t]*)?[a-zA-Z_][a-zA-Z0-9_]*(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[))",

We allow an optional leading indent, an optional noiseword
"function" plus a run of whitespaces, and then an identifier
followed by "line noise".

The pattern makes the run of whitespaces after "function" entirely
optional, which makes "functionabc" be taken as a single identifier
without noiseword "function", but it's not like we are parsing and
painting only the identifer in a color different from the keyword,
so it is OK.  Using [ \t]+ instead of [ \t]* after "function" would
probably make the result more clear, even though it does not make a
practical difference.

Then the "line noise" part.  I am not sure if I follow this part
correctly:

	"(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[)"

is what follows the identifier that is the function name.  We may
have 0 or 1 "()", followed by an optional run of whitespaces.  And
then one of '{', '(', '((', '[[' would come.

Did I read it correctly?

Even though it may be unusual to write open and close parentheses
not directly next to each other, or with a space after the name of
the function, i.e.

	RIGHT ( ) {

would also be a valid header, no?  The way I read the pattern in the
above, it should not hit, as the pattern does not allow anything but
"()" as the "the previous identifier is the name of the function we
are defining" sign, and it does not allow any whitespace between the
identifier and "()".

But it does, and the reason why this most basic form happens to work
is because it relies on the support for "bash-ism" forms.  Even if
the "()" after identifier is missing, you'll match as long as the
identifier is followed by an optional run of whitespace and then an
open paren, brace or bracket:

	"[ \t]*(\\{|\\(\\(?|\\[\\[)"

And I do not like code or pattern that happens to appear to work by
accident.  Can we tighten it a bit?

	"^[ \t]*"		     /* (op) leading indent */
	"("			     /* here comes the whole thing */
	"(function[ \t]+)?"	     /* (op) noiseword "function" */
	"[a-zA-Z_][a-zA-Z0-9_]*"     /* identifier - function name */
	"[ \t]*(\\([ \t]*\\))?"      /* (op) start of func () */
	"[ \t]*(\\{|\\(\\(?|\\[\\[)" /* various "opening" of body */
	")",

is my attempt to break it down to make it readable and more correct.

> +	 /* -- */
> +	 /* Characters not in the default $IFS value */
> +	 "[^ \t]+"),
>  PATTERNS("dts",
>  	 "!;\n"
>  	 "!=\n"

Thanks.

  reply	other threads:[~2020-10-20 23:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20  7:10 [PATCH] userdiff: support Bash Victor Engmark
2020-10-20 23:39 ` Junio C Hamano [this message]
2020-10-21  3:00   ` [PATCH v2] " Victor Engmark
2020-10-21  7:07     ` Johannes Sixt
2020-10-21 18:39       ` Junio C Hamano
2020-10-21 22:48       ` Victor Engmark
2020-10-21 23:45       ` [PATCH v3] " Victor Engmark
2020-10-22  6:08         ` Johannes Sixt
2020-10-22 17:30           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk0vk8o20.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=victor@engmark.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).