All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Hariom verma" <hariom18599@gmail.com>,
	"Shourya Shukla" <periperidip@gmail.com>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"Git List" <git@vger.kernel.org>
Subject: Re: GSoC Git Proposal Draft - ZheNing Hu
Date: Tue, 13 Apr 2021 22:51:30 +0800	[thread overview]
Message-ID: <CAOLTT8Q2eV-0TemhiCL21fRWT=P6E8=Qajs1EjekTNN7XEmwpw@mail.gmail.com> (raw)
In-Reply-To: <YHU86dPKODAPDXY0@coredump.intra.peff.net>

Jeff King <peff@peff.net> 于2021年4月13日周二 下午2:40写道:
>
> On Sun, Apr 11, 2021 at 11:34:35PM +0800, ZheNing Hu wrote:
>
> > > Why is Olga’s solution rejected?
> > > 1. Olga's solution is to let `git cat-file` use the `ref-filter` interface,
> > > the performance of `cat-file` appears to be degraded due "very eager to
> > > allocate lots of separate strings" in `ref-filter` and other reasons.
> >
> > I am thinking today whether we can directly append some object information
> > directly to `&state->stack->output`, Instead of assigning to `v->s` firstly.
>
> Yes, that's the direction I think we'd want to go.
>
> > But in `cmp_ref_sorting()` we will use `get_ref_atom_value()`, It is possible
> > to compare `v->s` of two different refs, I must goto fill object info in `v->s`.
> >
> > So I think this is one of the reasons why `ref-filter` desires to
> > allocate a large
> > number of strings, right?
>
> Yeah, I think sorting in general is a bit tricky, because it inherently
> requires collecting the value for each item. Just thinking about what
> properties an ideal solution would have (which we might not be able to
> get all of):
>
>   - if we're sorting by something numeric (e.g., an committer
>     timestamp), we should avoid forming it into a string at all
>
>   - if the sort item requires work to extract that overlaps with the
>     output format (e.g., sorting by authordate and showing author name
>     in the format, both of which require parsing the author ident line
>     of a commit), ideally we'd just do that work once per ref/object.
>

Yes i can understand.

>   - if we are sorting, obviously we have to hold some amount of data for
>     each item in memory all at once (since we have to get data on the
>     sort properties for each, and then sort the result). So we'd
>     probably need at least some allocation per ref anyway, and an extra
>     string isn't too bad. But if we're not sorting, then it would be
>     nice to consider one ref/object at a time, which lets us keep our
>     peak memory usage lower, reuse output buffers, etc.
>

Yes, storing these strings in memory is beneficial for sorting.

> I think some of those are in competition with each other. Minimizing
> work shared between the sorting and format steps means keeping more data
> in memory. So it might be sensible to just treat them totally
> independently, and not worry about sharing work (I haven't looked at how
> ref-filter does this now).  TBH, I care a lot less about making the
> "sorting" case fast than I do about making sure that if we _aren't_
> sorting, we go as fast as possible.
>

Okay, so we just focus on the "nosort" case.
I am thinking about finding those cases that git do not need sort and we can
make a flag like "nosort = 1", and then use this "nosort" flag in
`ref-filter` to do the
string copy optimization what we want.
But the problem now is that `git for-each-ref` itself does not support
`--un-sort`,
and it have a default sort order by `refname`. I suspect that there are no
unsorted situation here for us to improve (Any other command call
`ref_array_sort()`
will also have similar situation, and it seem cause a little memory leak, the
ref_sorting entries in sorting_tail aren't free, right?)

> -Peff

--
ZheNing Hu

      reply	other threads:[~2021-04-13 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  9:03 GSoC Git Proposal Draft - ZheNing Hu ZheNing Hu
2021-04-02 14:57 ` Christian Couder
2021-04-03 13:23   ` ZheNing Hu
2021-04-02 15:39 ` Jeff King
2021-04-03 14:27   ` ZheNing Hu
2021-04-07 19:28     ` Jeff King
2021-04-08 13:29       ` ZheNing Hu
2021-04-11  6:11 ` ZheNing Hu
2021-04-11 15:34   ` ZheNing Hu
2021-04-13  6:40     ` Jeff King
2021-04-13 14:51       ` ZheNing Hu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8Q2eV-0TemhiCL21fRWT=P6E8=Qajs1EjekTNN7XEmwpw@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=peff@peff.net \
    --cc=periperidip@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.