All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes
Date: Thu, 29 Jun 2017 14:01:44 -0700	[thread overview]
Message-ID: <CAGZ79kaq2ym9aSAcHXRSd=onkqjoO-2_0B8PHed2j4q_9gwvCA@mail.gmail.com> (raw)
In-Reply-To: <xmqq37akya3j.fsf@gitster.mtv.corp.google.com>

On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Looking at the implementation of get_ws_cleaned_string() that is the
>> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
>> things for various "ignore whitespace" options (i.e. there is only
>> one implementation, while "git diff" family takes things like
>> "ignore space change", "ignore all whitespace", etc.), though.
>
> This probably deserves a bit more illustration of how I envision the
> code should evolve.
>
> In the longer term, I would prefer to see emitted_symbol_cmp_no_ws()
> to go and instead emitted_symbol_cmp() to take the diff options so
> that it can change the behaviour of the comparison function based on
> the -w/-b/--ignore-space-at-eol/etc. settings.  And compare two strings
> in place.
>
> For that, you may need a helper function that takes a pointer to a
> character pointer, picks the next byte that matters while advancing
> the pointer, and returns that byte.  The emitted_symbol_cmp(a, b)
> which is not used for real comparison (i.e. ordering to see if a
> sorts earlier than b) but for equivalence (i.e. considering various
> whitespace-ignoring settings, does a and b matfch?) may become
> something like:
>
>         int
>         emitted_symbol_eqv(struct emitted_diff_symbol *a,
>                            struct emitted_diff_symbol *b,
>                            const void *keydata) {
>                 struct diff_options *diffopt = keydata;
>                 const char *ap = a->line;
>                 const char *bp = b->line;
>
>                 while (1) {
>                         int ca, cb;
>                         ca = next_byte(&ap, diffopt);
>                         cb = next_byte(&bp, diffopt);
>                         if (ca != cb)
>                                 return 1; /* differs */
>                         if (!ca)
>                                 return 0;
>                 };
>         }
>
> where next_byte() may look like:
>
>         static int
>         next_byte(const char **cp, struct diff_options *diffopt)
>         {
>                 int retval;
>
>         again:
>                 retval = **cp;
>                 if (!retval)
>                         return retval; /* end of string */
>                 if (!isspace(retval)) {
>                         (*cp)++; /* advance */
>                         return retval;
>                 }
>
>                 switch (ignore_space_type(diffopt)) {
>                 case NOT_IGNORING:
>                         (*cp)++; /* advance */
>                         return retval;
>                 case IGNORE_SPACE_CHANGE:
>                         while (**cp && isspace(**cp))
>                                 (*cp)++; /* squash consecutive spaces */
>                         return ' '; /* normalize spaces with a SP */
>                 case IGNORE_ALL_SPACE:
>                         (*cp)++; /* advance */
>                         goto again;
>                 ... other cases here ...
>                 }
>         }
>
>

I just rebased the diff series on top of the hashmap series and now
I want to implement the compare function based on the new data
feature. So I thought I might start off this proposal, as you seem to
have good taste for how to approach problems.

It turns out that the code here is rather a very loose proposal,
not to be copied literally, because of these constraints:
* When dealing with user content, we do not have C-strings,
  but memory + length, such that next_byte also needs to be
  aware of the end pointer.
* The ignore_space_type() as well as these constants do not exist
  as-is, I think the cleanest for now is to parse diffopt->xdl_opts via
  DIFF_XDL_TST

Thanks,
Stefan

  parent reply	other threads:[~2017-06-29 21:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28  0:56 [PATCH 0/6] Fixing up sb/diff-color-moved Stefan Beller
2017-06-28  0:56 ` [PATCH 1/6] diff.c: factor out shrinking of potential moved line blocks Stefan Beller
2017-06-28  0:56 ` [PATCH 2/6] diff.c: change the default for move coloring to zebra Stefan Beller
2017-06-28  3:14   ` Junio C Hamano
2017-06-28 19:54     ` Stefan Beller
2017-06-28  0:56 ` [PATCH 3/6] diff.c: better reporting on color.moved bogus configuration Stefan Beller
2017-06-28  0:56 ` [PATCH 4/6] Documentation/diff: reword color moved Stefan Beller
2017-06-28  3:25   ` Junio C Hamano
2017-06-28  0:56 ` [PATCH 5/6] diff.c: omit uninteresting moved lines Stefan Beller
2017-06-28  3:31   ` Junio C Hamano
2017-06-28 20:00     ` Stefan Beller
2017-06-28  0:56 ` [PATCH 6/6] diff.c: detect blocks despite whitespace changes Stefan Beller
2017-06-28  3:41   ` Junio C Hamano
2017-06-28  5:06     ` Junio C Hamano
2017-06-28 20:36       ` Stefan Beller
2017-06-28 21:16         ` Junio C Hamano
2017-06-29 21:01       ` Stefan Beller [this message]
2017-06-29 23:51         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kaq2ym9aSAcHXRSd=onkqjoO-2_0B8PHed2j4q_9gwvCA@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.