All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Sixt <j6t@kdbg.org>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org,
	Johannes Sixt via GitGitGadget <gitgitgadget@gmail.com>
Subject: Re: [PATCH v2 0/5] Fun with cpp word regex
Date: Sat, 9 Oct 2021 00:11:43 +0200	[thread overview]
Message-ID: <25363715-dc39-1f18-a937-f715b106f529@kdbg.org> (raw)
In-Reply-To: <87r1cvmg0c.fsf@evledraar.gmail.com>

Am 08.10.21 um 22:07 schrieb Ævar Arnfjörð Bjarmason:
>  * I wonder if it isn't time to split up "cpp" into a "c" driver,
>    e.g. git.git's .gitattributes has "cpp" for *.[ch] files, but as C++
>    adds more syntax sugar.
> 
>    So e.g. if you use "<=>" after this series we'll tokenize it
>    differently in *.c files, but it's a C++-only operator, on the other
>    hand probably nobody cares that much...

Yes, it is that: <=> won't appear in a correct C file (outside of
comments), so no-one will care. As far as tokenization is concerned, C
is a subset of C++. I don't think we need to separate the drivers.

>  * I found myself back-porting some of your tests (manually mostly),
>    maybe you disagree, but in cases like 123'123, <=> etc. I'd find it
>    easier to follow if we first added the test data, and then the
>    changed behavior.
> 
>    Because after all, we're going to change how we highlight existing
>    data, so testing for that would be informative.

Good point. I'll work a bit more on that.

>  * This pre-dates your much improved tests, but these test files could
>    really use some test comments, as in:
> 
>    /* Now that we're going to understand the "'" character somehow, will any of this change? */
>    /* We haven't written code like this since the 1960's ... */
>    /* Run & free */
> 
>    I.e. we don't just highlight code the compiler likes to eat, but also
>    comments. So particularly for smaller tokens that also occur in
>    natural language like "'" and "&" are we getting expected results?

Comments are free text. Anything can happen. There is no such thing as
"correct tokenization" in comments. Not interested.

Thank you for the review.
-- Hannes

  reply	other threads:[~2021-10-08 22:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07  6:50 [PATCH 0/3] Fun with cpp word regex Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 1/3] userdiff: tighten " Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 2/3] userdiff: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-07  6:51 ` [PATCH 3/3] userdiff: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-07  9:14 ` [PATCH 0/3] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-07 16:40   ` Johannes Sixt
2021-10-08 19:09 ` [PATCH v2 0/5] " Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 1/5] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 2/5] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 3/5] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 4/5] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 5/5] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-08 20:07   ` [PATCH v2 0/5] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-08 22:11     ` Johannes Sixt [this message]
2021-10-09  0:00       ` Ævar Arnfjörð Bjarmason
2021-10-10 20:15         ` Johannes Sixt
2021-10-10 17:02   ` [PATCH v3 0/6] " Johannes Sixt via GitGitGadget
2021-10-10 17:02     ` [PATCH v3 1/6] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 2/6] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 3/6] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 4/6] userdiff-cpp: prepare test cases with yet unsupported features Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 5/6] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 6/6] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-24  9:56     ` [PATCH 7/6] userdiff-cpp: back out the digit-separators in numbers Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25363715-dc39-1f18-a937-f715b106f529@kdbg.org \
    --to=j6t@kdbg.org \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.