All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Eric Sunshine via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>
Subject: [PATCH 2/4] chainlint: tighten accuracy when consuming input stream
Date: Tue, 08 Nov 2022 19:08:28 +0000	[thread overview]
Message-ID: <31af383fd439c3c0a5003598961acfecfae4018c.1667934510.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1375.git.git.1667934510.gitgitgadget@gmail.com>

From: Eric Sunshine <sunshine@sunshineco.com>

To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().

However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.

Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 9908de6c758..1f66c03c593 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -179,7 +179,7 @@ RESTART:
 		# handle special characters
 		last unless $$b =~ /\G(.)/sgc;
 		my $c = $1;
-		last if $c =~ /^[ \t]$/; # whitespace ends token
+		pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
 		pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
 		$token .= $self->scan_sqstring(), next if $c eq "'";
 		$token .= $self->scan_dqstring(), next if $c eq '"';
-- 
gitgitgadget


  parent reply	other threads:[~2022-11-08 19:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 1/4] chainlint: add explanatory comments Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` Eric Sunshine via GitGitGadget [this message]
2022-11-08 19:08 ` [PATCH 3/4] chainlint: latch start/end position of each token Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 4/4] chainlint: annotate original test definition rather than token stream Eric Sunshine via GitGitGadget
2022-11-08 20:28 ` [PATCH 0/4] chainlint: improve annotated output Taylor Blau
2022-11-09 13:11   ` Jeff King
2022-11-10  2:42     ` Taylor Blau
2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
2022-11-08 22:43   ` Eric Sunshine
2022-11-08 22:52     ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31af383fd439c3c0a5003598961acfecfae4018c.1667934510.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.