All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom verma <hariom18599@gmail.com>,
	Shourya Shukla <periperidip@gmail.com>,
	olyatelezhnaya@gmail.com
Subject: Re: GSoC Git Proposal Draft - ZheNing Hu
Date: Fri, 2 Apr 2021 11:39:53 -0400	[thread overview]
Message-ID: <YGc6ybE1wD1ck0uB@coredump.intra.peff.net> (raw)
In-Reply-To: <CAOLTT8RfE4nn5NnjZh7xuF09-5=+K+_j_2kP0327HVdR4x_wAQ@mail.gmail.com>

On Fri, Apr 02, 2021 at 05:03:17PM +0800, ZheNing Hu wrote:

> * Because part of the feature of `git for-each-ref` is very similar to
> that of `git cat-file`, I think `git cat-file` can learn some feasible
> solutions from it.
> 
> #### My possible solutions:
> 
> 1. Same [solution](https://github.com/git/git/pull/568/commits/cc40c464e813fc7a6bd93a01661646114d694d76)
> as Olga, add member `struct ref_format format` in `struct
> batch_options`.
> 2. Use the function
> [`verify_ref_format()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L904)
> to replace the first `expand_format()` for parsing format strings.
> 3. Write a function like
> [`format_ref_array_item()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L2392),
> get information about objects, and use `get_object()` to grub the
> information which we prefer (or just use `grab_common_value()`).
> 4. The migration of `%(rest)` may require learning the handling of
> `%(if)` ,`%(else)`.

I think one thing to keep an eye on here is the performance of cat-file.
The formatting code used by for-each-ref is rather slow (it may load
more of the object details than is necessary, it is too eager to
allocate intermediate strings, and so on). That's usually not _too_ big
a problem for ref-filter, because the number of refs tends to be much
smaller than the number of total objects. But I'd expect that moving to
the ref-filter code would make something like:

 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)'

measurably slower.

IMHO the right path forward is not to try porting cat-file to use the
ref-filter code, but to start first with writing a universal formatting
module that takes the best of both implementations (and the commit
pretty-printer):

  - separate the format-parsing stage from formatting actual items, as
    ref-filter does. This lets us have more complex formats without
    paying a per-item runtime cost while formatting. This should also
    allow us to handle multiple syntaxes for the same thing (e.g.,
    ref-filter %(authorname) vs pretty.c %an).

  - figure out which data will be needed for each item based on the
    parsed format, and then do the minimum amount of work to get that
    data (using "oid_object_info_extended()" helps here, because it
    likewise tries to do as little work as possible to satisfy the
    request, but there are many elements that it doesn't know about)

  - likewise avoid doing any intermediate work we can; as much as
    possible, format the result directly into a result strbuf, rather
    than allocating many sub-strings and assembling them (as cat-file
    does).

  - handle formats where the necessary item data may or may not be
    present. E.g., if we're given a refname, then "%(refname)" makes
    sense. But in cat-file we'd not have a refname, and just an object.
    We should still be able to use the same formatting code to handle
    "%(objecttype)", etc. Likewise for formats which require a specific
    type (say %(authorname) for a commit, but the object is a blob).
    Ref-filter does this to some degree for things like authorname, but
    we'd be extending it to the case that we don't even have a refname.

-Peff

  parent reply	other threads:[~2021-04-02 15:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  9:03 GSoC Git Proposal Draft - ZheNing Hu ZheNing Hu
2021-04-02 14:57 ` Christian Couder
2021-04-03 13:23   ` ZheNing Hu
2021-04-02 15:39 ` Jeff King [this message]
2021-04-03 14:27   ` ZheNing Hu
2021-04-07 19:28     ` Jeff King
2021-04-08 13:29       ` ZheNing Hu
2021-04-11  6:11 ` ZheNing Hu
2021-04-11 15:34   ` ZheNing Hu
2021-04-13  6:40     ` Jeff King
2021-04-13 14:51       ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGc6ybE1wD1ck0uB@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=periperidip@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.