git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, Tim Schumacher <timschumi@gmx.de>,
	gitster@pobox.com, pclouds@gmail.com
Subject: Re: ordered string-list considered harmful, was Re: [PATCH v3] Allow aliases that include other aliases
Date: Thu, 6 Sep 2018 23:24:02 -0400	[thread overview]
Message-ID: <20180907032401.GB31728@sigill.intra.peff.net> (raw)
In-Reply-To: <20180906235033.GA100309@aiede.svl.corp.google.com>

On Thu, Sep 06, 2018 at 04:50:33PM -0700, Jonathan Nieder wrote:

> Hi,
> 
> Jeff King wrote:
> 
> > But what I think is harmful is a _sorted_ list, because of the
> > "accidentally quadratic" nature, and because it's easy to call its
> > functions on an unsorted list.
> 
> I agree --- in general, it tends to be better to build an unsorted
> string list and then sort it.
> 
> Once I've done so, what is your advice about getting fast lookups
> in the result?  Should I build an auxiliary hashmap as well?  Or
> is this an argument for the 'sorted' flag + BUG approach you
> already mentioned?

I don't see any point in generating a sorted list and _then_ making an
auxiliary hashmap. My idea was that if you're using a sorted string-list
for lookup, then you can replace the whole thing with a hash (inserting
as you go, rather than sorting at the end).

I.e., imagine there are three use cases for string lists:

  1. Build a list via append(), sort at the end, then do a series of
     efficient queries.

  2. Build a list via insert(), which is always sorted. Possibly query
     while building, or after finished building.

  3. Build an unsorted list and iterate over it.

We know that (2) is potentially quadratic during the build-and-sort
step. It would be nice to turn it into (1), but it's not always possible
if we query it while still building. Turning these into a hashmap is an
easy fix with no real downsides.

Case (1) actually isn't a problem for run-time. We'd like to get rid of
it only because the string_list functions are a bit confusing in terms
of which ones expect us to be sorted or not (and if ever forget to sort
before querying, we'd see all kinds of subtle bugs).

Converting these cases to use a hashmap is one way we might get rid of
the confusing string-list functions. And it doesn't carry any big-O
downside, though it may be slower in practice (e.g., hashmaps tend to be
a bit malloc-heavy).

Case (3) is the one we'd like to preserve as the "right" use of
string-list, since it's hard to get wrong (after we remove the sorting
functions).

I think Stefan pointed out a "case 4" in the other part of the thread:
ones where we really care not just about fast lookup, but actual
iteration order.  These ones may need some special care, but I don't
think there are many of them.

So that's _one_ way to make the world better.

Another way is to try to make the functions harder to misuse. E.g.,
maybe putting "sorted" into the name of string_list_has_string(), so
it's on an equal footing with the unsorted variant. Or the sorted flag I
mentioned. Those can help the misuse problem, but they don't help with
the case (2) quadratic ones. It's probably less work, though.

I think I like the hashmap way, if the conversion isn't too painful. As
I said to Stefan in the other part of the thread, I'm mostly at the
"does this seem like a terrible idea" stage. I'd have to start
conversions to see how many of each case we have (and if there are ones
that don't fit into my taxonomy above).

-Peff

  reply	other threads:[~2018-09-07  3:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-06 19:12 ordered string-list considered harmful, was Re: [PATCH v3] Allow aliases that include other aliases Jeff King
2018-09-06 19:20 ` Jeff King
2018-09-06 23:50   ` Jonathan Nieder
2018-09-07  3:24     ` Jeff King [this message]
2018-09-07  6:32       ` Jonathan Nieder
2018-09-07  7:20         ` Ævar Arnfjörð Bjarmason
2018-09-07  7:23           ` Jonathan Nieder
2018-09-08 16:49             ` brian m. carlson
2018-09-07 14:48         ` Jeff King
2018-09-06 20:04 ` Stefan Beller
2018-09-06 20:49   ` Jeff King
2018-09-06 20:54     ` Stefan Beller
2018-09-07  3:12       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180907032401.GB31728@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=timschumi@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).