All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Phillip Wood via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: Re: [PATCH 1/2] patience diff: remove unnecessary string comparisons
Date: Wed, 5 May 2021 10:34:29 +0100	[thread overview]
Message-ID: <87001425-8043-4c66-dbc2-637f05a7229f@gmail.com> (raw)
In-Reply-To: <xmqqpmy658e1.fsf@gitster.g>

On 05/05/2021 01:31, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> xdl_prepare_env() calls xdl_classify_record() which arranges for the
>> hashes of non-matching lines to be different so lines can be tested
>> for equality by comparing just their hashes.
> 
> Hmph, that is a bit different from what I read from the comment in
> the post context of the first hunk, though.
> 
> 	/*
> 	 * After xdl_prepare_env() (or more precisely, due to
> 	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
> 	 * is _not_ the hash anymore, but a linearized version of it.  In
> 	 * other words, the "ha" member is guaranteed to start with 0 and
> 	 * the second record's ha can only be 0 or 1, etc.
> 	 *
> 	 * So we multiply ha by 2 in the hope that the hashing was
> 	 * "unique enough".
> 	 */
> 
> The words "home" and "enough" hints to me that the "ha" member is
> not hash, but "lineralized version of it" (whatever it means) does
> not guarantee that two records with the same "ha" are identical, or
> does it?

By "hashes" I meant "the value of record->ha". That comment is a bit 
confusing. I think "linearized version of it" is referring to 
xdl_classify_record() assigning a unique integer to each unique input 
line starting from zero and increasing by one for each unique input line 
(the function is fairly easy to follow). I assume "unique enough" is 
referring to the line below the comment which takes the modulus of 
record->ha and record->ha is not evenly distributed over the whole 
integer range but bunched at the lower end.

The Myers implementation calls xdl_classify_record() and then only ever 
compares record->ha, it does not call xdl_recmatch() while computing the 
diff.

> Well, I should just go read xdl_classify_record() to see what it
> really does, but if it eliminates collisions, then the patch is a
> clear and obvious improvement.

Thanks

Phillip


> Thanks.
> 
>> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
>> index 20699a6f6054..db2d53e89cb0 100644
>> --- a/xdiff/xpatience.c
>> +++ b/xdiff/xpatience.c
>> @@ -90,7 +90,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>   {
>>   	xrecord_t **records = pass == 1 ?
>>   		map->env->xdf1.recs : map->env->xdf2.recs;
>> -	xrecord_t *record = records[line - 1], *other;
>> +	xrecord_t *record = records[line - 1];
>>   	/*
>>   	 * After xdl_prepare_env() (or more precisely, due to
>>   	 * xdl_classify_record()), the "ha" member of the records (AKA lines)
>> @@ -104,11 +104,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>   	int index = (int)((record->ha << 1) % map->alloc);
>>   
>>   	while (map->entries[index].line1) {
>> -		other = map->env->xdf1.recs[map->entries[index].line1 - 1];
>> -		if (map->entries[index].hash != record->ha ||
>> -				!xdl_recmatch(record->ptr, record->size,
>> -					other->ptr, other->size,
>> -					map->xpp->flags)) {
>> +		if (map->entries[index].hash != record->ha) {
>>   			if (++index >= map->alloc)
>>   				index = 0;
>>   			continue;
>> @@ -253,8 +249,7 @@ static int match(struct hashmap *map, int line1, int line2)
>>   {
>>   	xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>>   	xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
>> -	return xdl_recmatch(record1->ptr, record1->size,
>> -		record2->ptr, record2->size, map->xpp->flags);
>> +	return record1->ha == record2->ha;
>>   }
>>   
>>   static int patience_diff(mmfile_t *file1, mmfile_t *file2,

  reply	other threads:[~2021-05-05  9:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  9:25 [PATCH 0/2] A couple of small patience diff cleanups Phillip Wood via GitGitGadget
2021-05-04  9:25 ` [PATCH 1/2] patience diff: remove unnecessary string comparisons Phillip Wood via GitGitGadget
2021-05-05  0:31   ` Junio C Hamano
2021-05-05  9:34     ` Phillip Wood [this message]
2021-05-05 14:58     ` Johannes Schindelin
2021-05-05 18:00       ` Phillip Wood
2021-05-06  1:32       ` Junio C Hamano
2021-05-04  9:25 ` [PATCH 2/2] patience diff: remove unused variable Phillip Wood via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87001425-8043-4c66-dbc2-637f05a7229f@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.