From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-11.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,USER_IN_DEF_DKIM_WL shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id BACEE1F453 for ; Wed, 26 Sep 2018 22:59:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbeI0FOf (ORCPT ); Thu, 27 Sep 2018 01:14:35 -0400 Received: from mail-yw1-f67.google.com ([209.85.161.67]:42210 "EHLO mail-yw1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726107AbeI0FOf (ORCPT ); Thu, 27 Sep 2018 01:14:35 -0400 Received: by mail-yw1-f67.google.com with SMTP id h134-v6so287819ywc.9 for ; Wed, 26 Sep 2018 15:59:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gmwudEwfun33sJ9hUuOVXka7HFV4dzd2vygZGLsGfVk=; b=rJHBoX4KelcX+xbf4dRozpPb1VzHjddIn5zzqp2zvn8hAXly4tF0GEkFJ2g4F9RvN2 y9xL0b2JoB1iv9XL2WbEW+YefqKBZHPT1ffgp5cp3RhzSIuGP/jAtXe1pdOtrpx9nEvO zmtSu3aYMlUS6s45jwpzNh2yUa7MFGVPZ/5Ol0F8x5eyIuy03JqA3Nb2thm8a2OMcWr7 +33Cq15mDXhKUtgiib0AcdhTdBehr+Zy+Ngykt8qXAfLezkGbUV8J1FZ5fCY2qNGl3un K+BpBiwqUYi9VZKK6jPWSv3JbD4eKyBPdjaU5ZmO9U4gQab2jPhpfPICbu175rzvnNYA NkKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gmwudEwfun33sJ9hUuOVXka7HFV4dzd2vygZGLsGfVk=; b=l/M3nuYjANkTlN32NSpUakn5mgk1p3+zXPRnoKZ1A1L+nBsOmfjuTYmxHtf8FQpYLO VkB8y1t2Z2zquQDjHkM4S7tzNQKDFc4f47AfcM8G6uuoI8unNc4Pb/4ImaX8KvaPg5CF y8XSgsUW3pWRDDgazx2iVnsdU4WCuKb67GppMCeUiwHPnJaBduJZ2POv9FjhhASTm+Y+ wiOTurDx1pK4qvnl6vth5EV7PwxBr0OyNQuwiC5K8TUWIPmuOUiJRv7RrvfTHoB/lN0r sDyh3lZpexTATB0uXCLV/dDjTTlHM0oOgU0ldI+Z4sAQIN7C2W9gBKO8n6zOdvjyYcdo /sSw== X-Gm-Message-State: ABuFfojBQ6RcDEV8gfrgaBxhlxgpuUu+NUOjcvFo0ffPh5YLQHqwhxc6 WcXukuSl0WUSoTjp5+f5E/p5tPe5kqIhzfF8SjTvpw== X-Google-Smtp-Source: ACcGV61HvZQcFL6h3qXu4PucEanDRmnCvDW9fGpN2BP3//vV8byeaCIa1lktJdGKqDDnVGJrtrwKcFO7UklCbQ5oXhc= X-Received: by 2002:a81:a115:: with SMTP id y21-v6mr4344013ywg.178.1538002759797; Wed, 26 Sep 2018 15:59:19 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Stefan Beller Date: Wed, 26 Sep 2018 15:59:08 -0700 Message-ID: Subject: Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap To: Junio C Hamano Cc: git Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Sep 26, 2018 at 2:28 PM Junio C Hamano wrote: > > +struct refname_hash_entry { > + struct hashmap_entry ent; /* must be the first member */ $ git grep "struct hashmap_entry" reveals that we have another convention that we follow but not document, which is to stress the importance of putting the hashmap_entry first. ;-) > + struct object_id oid; > + char refname[FLEX_ARRAY]; > +}; > + > +static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data, > + const struct refname_hash_entry *e1, > + const struct refname_hash_entry *e2, > + const void *keydata) > +{ > + return strcmp(e1->refname, keydata ? keydata : e2->refname); > +} and later hashmap_init(... (hashmap_cmp_fn)refname_hash_entry_cmp, ...); I wonder if we'd want to stick to this style, as that seems to be the easiest to do, and drop the efforts started in 55c965f3a2 (Merge branch 'sb/hashmap-cleanup', 2017-08-11), that avoids the cast in the init, but puts the (implicit) casts in the _cmp function as we'd take void pointers as the function signature and recast to a local variable. > + > +static struct refname_hash_entry *refname_hash_add(struct hashmap *map, > + const char *refname, > + const struct object_id *oid) > +{ > + struct refname_hash_entry *ent; > + size_t len = strlen(refname); > + > + FLEX_ALLOC_MEM(ent, refname, refname, len); > + hashmap_entry_init(ent, memhash(refname, len)); memhash is preferred over strhash as we already know the length. For a second I wondered why we use it instead of strhash > +static int refname_hash_exists(struct hashmap *map, const char *refname) > +{ > + struct hashmap_entry key; > + size_t len = strlen(refname); > + hashmap_entry_init(&key, memhash(refname, len)); Here we could use strhash, but as they could change to differing implementation, we keep using memhash. > /* > - * For all the tags in the remote_refs string list, > + * For all the tags in the remote_refs_list, > * add them to the list of refs to be fetched > */ > - for_each_string_list_item(item, &remote_refs) { > + for_each_string_list_item(remote_ref_item, &remote_refs_list) { > + const char *refname = remote_ref_item->string; > + struct hashmap_entry key; > + > + hashmap_entry_init(&key, memhash(refname, strlen(refname))); > + item = hashmap_get(&remote_refs, &key, refname); > + if (!item) > + continue; /* can this happen??? */ I thought this could happen when we have hash collisions, I convinced myself otherwise, as we pass the refname to hashmap_get as the 'keydata', so when there is a hash collision we keep comparing to the real value. And as whenever we have an insert to remote_refs_list, we also have a refname_hash_add to the remote_refs hashmap, I think the return of NULL is not possible, and we could switch it to BUG(...); > All this code predates more modern in-core lookup API like hashmap; > replace them with two hashmaps and one string list---the hashmaps > are used for look-ups and the string list is to keep the order of > items in the returned result stable (which is the only single thing > hashmap does worse than lookups on string-list). > > Signed-off-by: Junio C Hamano > --- I wonder if we want to add an API to the hashmap that allows for ordered iteration (as hashmap_iter_* relies on the hashing function for its output) I think it may be too early to do so, but there are 2 thoughts on how to do it: * Each element keeps (prev/next) pointers to the previously inserted element and updates the next pointer when the next element is inserted. When removing elements we'd have to update the next and prev pointer of the adjacent elements. Then iteration can be done in insertion-order. Probably this would go into its own file ordered-hashmap.{c, h} * provide an order function to hashmap_iter_init, which then orders according to that function before the first call of hashmap_iter_next returns. > * Just to remind ourselves that we talked about getting rid of > string-list used as look-up tables by replacing them with hashmap > but haven't seen much effort expended on it. I think this is my > first semi-serious use of hashmap, and the code is expected to be > full of newbie mistakes, inefficiencies and ignorance of idioms; > pointing out any of which is much appreciated. I could not find newbie mistakes, but gave my thoughts anyway. Stefan