All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom verma <hariom18599@gmail.com>,
	Shourya Shukla <periperidip@gmail.com>,
	olyatelezhnaya@gmail.com
Subject: Re: GSoC Git Proposal Draft - ZheNing Hu
Date: Sat, 3 Apr 2021 22:27:39 +0800	[thread overview]
Message-ID: <CAOLTT8R_kmdNhJaMjj60H0SEzs6-KrzTQhBHzShQ82aoDa5vzw@mail.gmail.com> (raw)
In-Reply-To: <YGc6ybE1wD1ck0uB@coredump.intra.peff.net>

Hi, Peff,

Jeff King <peff@peff.net> 于2021年4月2日周五 下午11:39写道:
>
> On Fri, Apr 02, 2021 at 05:03:17PM +0800, ZheNing Hu wrote:
>
> > * Because part of the feature of `git for-each-ref` is very similar to
> > that of `git cat-file`, I think `git cat-file` can learn some feasible
> > solutions from it.
> >
> > #### My possible solutions:
> >
> > 1. Same [solution](https://github.com/git/git/pull/568/commits/cc40c464e813fc7a6bd93a01661646114d694d76)
> > as Olga, add member `struct ref_format format` in `struct
> > batch_options`.
> > 2. Use the function
> > [`verify_ref_format()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L904)
> > to replace the first `expand_format()` for parsing format strings.
> > 3. Write a function like
> > [`format_ref_array_item()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L2392),
> > get information about objects, and use `get_object()` to grub the
> > information which we prefer (or just use `grab_common_value()`).
> > 4. The migration of `%(rest)` may require learning the handling of
> > `%(if)` ,`%(else)`.
>
> I think one thing to keep an eye on here is the performance of cat-file.
> The formatting code used by for-each-ref is rather slow (it may load
> more of the object details than is necessary, it is too eager to
> allocate intermediate strings, and so on). That's usually not _too_ big
> a problem for ref-filter, because the number of refs tends to be much
> smaller than the number of total objects. But I'd expect that moving to
> the ref-filter code would make something like:
>
>  git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)'
>
> measurably slower.
>

Forgive me for thinking about the whole question too simple. It seems that
there are a lot of points to think about in this project.

> IMHO the right path forward is not to try porting cat-file to use the
> ref-filter code, but to start first with writing a universal formatting
> module that takes the best of both implementations (and the commit
> pretty-printer):
>
>   - separate the format-parsing stage from formatting actual items, as
>     ref-filter does. This lets us have more complex formats without
>     paying a per-item runtime cost while formatting. This should also
>     allow us to handle multiple syntaxes for the same thing (e.g.,
>     ref-filter %(authorname) vs pretty.c %an).
>

This is a good suggestion.
Olga seems to have wanted to remove `mark_query` in `struct expand_data`,
I think she also wanted to separate the two parts.

The ref-filter uses `used_atom` as the result of parsing `%(atom)`, It’s
really worth learning.

>   - figure out which data will be needed for each item based on the
>     parsed format, and then do the minimum amount of work to get that
>     data (using "oid_object_info_extended()" helps here, because it
>     likewise tries to do as little work as possible to satisfy the
>     request, but there are many elements that it doesn't know about)
>

I have indeed noticed that `oid_object_info_extended()`
can get information about the object which we actually want.
In `cat-file.c`, It has been used in `batch_object_write()`, and
`expanding_atom()` specify what data we need.
In `ref-filter.c`, It has been used in `get_object()`.
I am not sure what you mean about "many elements that it
doesn't know about", For the time being, `cat-file` can get 5
kind of objects info it need.

Maybe you think that `cat-file` can learn some features in
`ref-filter` to extend the function of `cat-file --batch`?
E.g. %(objectname:short)? I think I may have a better
understanding of the topic of this mini-project now.
We may not want to port the logic of cat-file,but to learn some
design in `ref-filter`, right?

>   - likewise avoid doing any intermediate work we can; as much as
>     possible, format the result directly into a result strbuf, rather
>     than allocating many sub-strings and assembling them (as cat-file
>     does).
>

I guess you mean `scratch` in `batch_object_write()`
every time new content is added after `strbuf_reset`,
but refilter just append messages to `final_buf`.

>   - handle formats where the necessary item data may or may not be
>     present. E.g., if we're given a refname, then "%(refname)" makes
>     sense. But in cat-file we'd not have a refname, and just an object.
>     We should still be able to use the same formatting code to handle
>     "%(objecttype)", etc. Likewise for formats which require a specific
>     type (say %(authorname) for a commit, but the object is a blob).
>     Ref-filter does this to some degree for things like authorname, but
>     we'd be extending it to the case that we don't even have a refname.
>

I may not have a very deep understanding of some details.
On this issue, I think we can use `info_source` to invalidate interfaces that
are not of the same type(only allow SOUCR_OTHER)

Let me summarize:
First part : Parsing any type of atoms, whether it is %an or %(authorname).
Second part : Find all functions that get objects information (which
`oid_object_info_extended()` can't get)
Third part : Optimize multiple small strings in cat-file into one `finnal_buf`.
Forth part : Error handling for unsuitable atoms.

Que:
These task sound like the logic of `ref-filter`, so if I can
participate in this project
later, should I start with optimizing the logic of `ref-filter`, right?

> -Peff

Thank you so much!

--
ZheNing Hu

  reply	other threads:[~2021-04-03 14:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  9:03 GSoC Git Proposal Draft - ZheNing Hu ZheNing Hu
2021-04-02 14:57 ` Christian Couder
2021-04-03 13:23   ` ZheNing Hu
2021-04-02 15:39 ` Jeff King
2021-04-03 14:27   ` ZheNing Hu [this message]
2021-04-07 19:28     ` Jeff King
2021-04-08 13:29       ` ZheNing Hu
2021-04-11  6:11 ` ZheNing Hu
2021-04-11 15:34   ` ZheNing Hu
2021-04-13  6:40     ` Jeff King
2021-04-13 14:51       ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8R_kmdNhJaMjj60H0SEzs6-KrzTQhBHzShQ82aoDa5vzw@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=peff@peff.net \
    --cc=periperidip@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.