All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Git List <git@vger.kernel.org>, Thomas Rast <tr@thomasrast.ch>,
	Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: [PATCH v4 02/14] add a hashtable implementation that supports O(1) removal
Date: Fri, 08 Nov 2013 09:08:13 -0800	[thread overview]
Message-ID: <xmqqtxfmom82.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <527CBC8E.6050507@gmail.com> (Karsten Blees's message of "Fri, 08 Nov 2013 11:27:26 +0100")

Karsten Blees <karsten.blees@gmail.com> writes:

> What about this:
>
> #define HASHMAP_GROW_AT 80
> #define HASHMAP_SHRINK_AT 16

I am not too enthused for three reasons. The fact that these are
100-based numbers is not written down anywhere other than the place
they are used, the places that use these need to consistently divide
by 100, which invites unnecessary bugs, and compared to the
original, you now require 16/100 but you didn't even want the exact
16% in the first plae (i.e. a simple 1/6 was good enough, and it
still is).

>> Perhaps
>> 
>> #define HASHMAP_GROW_AT(current) ((current) + (current) >> 2)
>> #define HASHMAP_SHRINK_AT(current) ((current) * 6)
>> #define HASHMAP_GROW(current) ((current) << 2)
>> #define HASHMAP_SHRINK(current) ((current) >> 2)
>> 
>> may alleviate my worries; I dunno.

>>> +
>>> +void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)
>>> +{
>> 
>> Why is free_function not part of the constants defiend at
>> hashmap_init() time?  Your API allows the same hashmap, depending on
>> the way it has been used, to be cleaned up with different
>> free_function, but I am not sure if that "flexibility" is intended
>> (and in what application it would be useful).
>> 
>
> The free_function is a convenience so you don't have to loop over
> the entries yourself. ...
> ...a simple 'to free or not to free' boolean would suffice.

That is not the "flexibility" I was talking about. Your API allows
omne to write a single program that does this:

	struct hashmap map;

	hashmap_init(&map, compare_fn);
        add/put/remove on map;

	if (phase_of_moon())
        	hashmap_free(&map, free_them_in_one_way);
	else
        	hashmap_free(&map, free_them_in_another_way);

Just like your _init takes a comparison function to make it clear
that all entries will be compared using the same function throughout
the life of the map, if it takes a free function (and you can use
NULL to mean "do not free, I am managing elements myself"), I would
think that it will make it clear that the elements in that map will
be freed the same way.

And it will allow your _put to call that free function when you
replace an existing entry with a new one, if that becomes necessary.
The API in the posted version seems to make it responsibility of the
caller of _put to do whatever necessary clean-up to the returned
value (which is the entry that was replaced and no longer in the
hashmap), but within the context of a patch series whose later patch
changes the API to replace or remove an entry from the index in such
a way to shift the responsibility of freeing it from the caller to
the callee, such a change to this API to make _put and _remove
responsible for calling per-element free is a possiblity you may
want to consider, no?

>>> +	if (map->tablesize > HASHMAP_INITIAL_SIZE &&
>>> +	    map->size * HASHMAP_SHRINK_AT < map->tablesize)
>>> +		rehash(map, map->tablesize >> HASHMAP_GROW);
>> 
>> This "we shrink by the same amount" looks inconsistent with the use
>> of separate grow-at and shrink-at constants (see above for four
>> suggested #define's).
>> 
>
> These values account for a small hysteresis so that there is no size at which a sequence of add, remove, add, remove (or put, put, put, put) results in permanent resizes.

I was commenting on the two bottom lines of the above three line
quote from the patch.  You use SHIRNK_AT to decide if you want to
shrink, and you use >>GROW to do the actual shrinking.  Why isn't it
like this instead?

	if (map->tablesize > HASHMAP_INITIAL_SIZE &&
	    HASHMAP_SHIRNK_AT(map->size) < map->tablesize)
		rehash(map, map->tablesize >> HASHMAP_SHRINK);

The fact that constant used for shrinking was not called SHRINK but
GROW was what caught my attention.

  parent reply	other threads:[~2013-11-08 17:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-07 14:32 [PATCH v4 00/14] New hash table implementation Karsten Blees
2013-11-07 14:33 ` [PATCH v4 01/14] submodule: don't access the .gitmodules cache entry after removing it Karsten Blees
2013-11-07 22:27   ` Heiko Voigt
2013-11-07 14:34 ` [PATCH v4 02/14] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-11-07 21:40   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-08 16:45       ` Philip Oakley
2013-11-08 17:08       ` Junio C Hamano [this message]
2013-11-13 16:37         ` Karsten Blees
2013-11-07 14:35 ` [PATCH v4 03/14] buitin/describe.c: use new hash map implementation Karsten Blees
2013-11-07 14:36 ` [PATCH v4 04/14] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-11-07 14:36 ` [PATCH v4 05/14] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-11-07 14:37 ` [PATCH v4 06/14] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-11-07 14:38 ` [PATCH v4 07/14] name-hash.c: use new hash map implementation for directories Karsten Blees
2013-11-07 14:38 ` [PATCH v4 08/14] name-hash.c: remove unreferenced directory entries Karsten Blees
2013-11-07 14:39 ` [PATCH v4 09/14] name-hash.c: use new hash map implementation for cache entries Karsten Blees
2013-11-07 14:39 ` [PATCH v4 10/14] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
2013-11-07 14:40 ` [PATCH v4 11/14] remove old hash.[ch] implementation Karsten Blees
2013-11-07 14:43 ` [PATCH v4 12/14] fix 'git update-index --verbose --again' output Karsten Blees
2013-11-07 22:12   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-07 14:44 ` [PATCH v4 13/14] builtin/update-index.c: cleanup update_one Karsten Blees
2013-11-07 21:40   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-07 14:45 ` [PATCH v4 14/14] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
2013-11-07 21:40   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtxfmom82.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.