All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom Verma <hariom18599@gmail.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/8] [GSOC] ref-filter: add %(raw) atom
Date: Thu, 17 Jun 2021 16:37:34 +0200	[thread overview]
Message-ID: <87r1h0wnwg.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CAOLTT8TMOs-FF+EcTZBbxfGnKQipe_nx_eZon=S=PWRTNT4CjA@mail.gmail.com>


On Thu, Jun 17 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:16写道:
>>
>> > The raw data of blob, tree objects may contain '\0', but most of
>> > the logic in `ref-filter` depends on the output of the atom being
>> > text (specifically, no embedded NULs in it).
>> >
>> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
>> > add the data to the buffer. The raw data of a tree object is
>> > `100644 one\0...`, only the `100644 one` will be added to the buffer,
>> > which is incorrect.
>> >
>> > Therefore, add a new member in `struct atom_value`: `s_size`, which
>> > can record raw object size, it can help us add raw object data to
>> > the buffer or compare two buffers which contain raw object data.
>>
>> Most of the functions that deal with this already use a strbuf in some
>> way, before we had a const char *, now there's a size_t to go along with
>> it, why not simply use a strbuf in the struct for the data? You'll then
>> get the size and \0 handling for free, and any functions to deal with
>> conversion can stick to the strbuf API, there seems to be a lot of back
>> and forth now.
>>
>
> Yes, strbuf is a suitable choice when using <str,len> pair.
> But if replace v->s with strbuf, the possible changes will be larger.

I for one would like to see it done that way, those changes are usually
easy to read. Also it seems a large part of 2/8 is extra new code
because we didn't do that, e.g. getting length differently if something
is a strbuf or not, passing char*/size_t pairs to new functions etc.

>> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
>> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
>> > in the host language, the host language may not support arbitrary binary
>> > data in the variables of its string type.
>>
>> Perl at least deals with that just fine, and to the extent that it
>> doesn't any new problems here would have nothing to do with \0 being in
>> the data. Perl doesn't have a notion of "binary has \0 in it", it always
>> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
>> problems wouldn't be new. I'd think that the same would be true of
>> Python, but I'm not sure.
>>
>
> Not python safe. See [1].
> Regarding the perl language, I support Junio's point of view: it can be
> re-supported in the future.

Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
Perl deals with it correctly, so we could just have it support this.

>>
>> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
>> > +     git cat-file commit refs/tags/testtag^{} >expected &&
>> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
>> > +     sanitize_pgp <expected >expected.clean &&
>> > +     sanitize_pgp <actual >actual.clean &&
>> > +     echo "" >>expected.clean &&
>>
>> Just "echo" will do, ditto for the rest. Also odd to go back and forth
>> between populating expected.clean & actual.clean.
>>
>
> Are you saying that sanitize_pgp is not needed?

No that instead of:

    echo "" >x

You can do:

    echo >x

And also that going back and forth between populating different files is
confusing, i.e. this:


    echo a >x
    echo c >y
    echo b >>x

is better as:

    echo a >x
    echo b >>x
    echo c >y


>>
>> > +test_expect_success 'set up refs pointing to binary blob' '
>> > +     printf "a\0b\0c" >blob1 &&
>> > +     printf "a\0c\0b" >blob2 &&
>> > +     printf "\0a\0b\0c" >blob3 &&
>> > +     printf "abc" >blob4 &&
>> > +     printf "\0 \0 \0 " >blob5 &&
>> > +     printf "\0 \0a\0 " >blob6 &&
>> > +     printf "  " >blob7 &&
>> > +     >blob8 &&
>> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>>
>> Hrm, xargs just to avoid:
>>
>>     git update-ref ... $(git hash-object) ?
>>
>
> I didn’t think about it, just for convenience.

*nod*, Junio had a good suggestion.

>> > +test_expect_success '%(raw) with --python must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --python
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --tcl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --tcl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --perl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --perl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --shell
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
>> > +'
>>
>> s/must failed/must fail/, but see question above about encoding in these
>> languages...
>
>
> [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/
>
> Thanks for a review.


  reply	other threads:[~2021-06-17 14:45 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-12 11:14 [PATCH 0/8] [GSOC][RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-12 11:14 ` [PATCH 1/8] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-17  7:04   ` Ævar Arnfjörð Bjarmason
2021-06-17  7:28     ` Junio C Hamano
2021-06-12 11:14 ` [PATCH 2/8] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-17  7:10   ` Ævar Arnfjörð Bjarmason
2021-06-17  7:34     ` Junio C Hamano
2021-06-17  9:22     ` ZheNing Hu
2021-06-17 14:37       ` Ævar Arnfjörð Bjarmason [this message]
2021-06-17 16:14         ` ZheNing Hu
2021-06-18 10:49           ` Ævar Arnfjörð Bjarmason
2021-06-18 13:47             ` Christian Couder
2021-06-12 11:14 ` [PATCH 3/8] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-12 11:14 ` [PATCH 4/8] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-12 11:14 ` [PATCH 5/8] [GSOC] ref-filter: teach get_object() return useful value ZheNing Hu via GitGitGadget
2021-06-12 20:09   ` Christian Couder
2021-06-13 10:11     ` ZheNing Hu
2021-06-17  7:22   ` Ævar Arnfjörð Bjarmason
2021-06-17 10:01     ` ZheNing Hu
2021-06-12 11:14 ` [PATCH 6/8] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-12 11:14 ` [PATCH 7/8] [GSOC] cat-file: reuse err buf in batch_objet_write() ZheNing Hu via GitGitGadget
2021-06-17  7:16   ` Ævar Arnfjörð Bjarmason
2021-06-17  8:05     ` ZheNing Hu
2021-06-12 11:14 ` [PATCH 8/8] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-17  7:18   ` Ævar Arnfjörð Bjarmason
2021-06-17  9:53     ` ZheNing Hu
2021-06-15 13:28 ` [PATCH v2 0/9] [GSOC][RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-15 13:28   ` [PATCH v2 1/9] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-15 13:28   ` [PATCH v2 2/9] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-15 13:28   ` [PATCH v2 3/9] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-15 13:29   ` [PATCH v2 4/9] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-15 13:29   ` [PATCH v2 5/9] [GSOC] ref-filter: teach get_object() return useful value ZheNing Hu via GitGitGadget
2021-06-16  7:36     ` Junio C Hamano
2021-06-17  7:23       ` ZheNing Hu
2021-06-15 13:29   ` [PATCH v2 6/9] [GSOC] ref-filter: introduce free_array_item_internal() function ZheNing Hu via GitGitGadget
2021-06-16  7:49     ` Junio C Hamano
2021-06-17  8:03       ` ZheNing Hu
2021-06-15 13:29   ` [PATCH v2 7/9] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-15 13:29   ` [PATCH v2 8/9] [GSOC] cat-file: reuse err buf in batch_objet_write() ZheNing Hu via GitGitGadget
2021-06-15 13:29   ` [PATCH v2 9/9] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-16  7:29   ` [PATCH v2 0/9] [GSOC][RFC] cat-file: reuse ref-filter logic Junio C Hamano
2021-06-17  6:07     ` ZheNing Hu
2021-06-17  7:26   ` Ævar Arnfjörð Bjarmason
2021-06-17 10:02     ` ZheNing Hu
2021-06-19  7:02   ` [PATCH v3 00/10] " ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 01/10] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 02/10] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 03/10] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 04/10] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 05/10] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 06/10] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 07/10] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-06-19  7:02     ` [PATCH v3 08/10] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-21  5:55       ` Christian Couder
2021-06-21 13:05         ` ZheNing Hu
2021-06-19  7:02     ` [PATCH v3 09/10] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-06-19  7:03     ` [PATCH v3 10/10] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-22  3:20     ` [PATCH v4 00/14] [GSOC][RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 01/14] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 02/14] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-24  4:14         ` Bagas Sanjaya
2021-06-24  8:23           ` ZheNing Hu
2021-06-22  3:20       ` [PATCH v4 03/14] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 04/14] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 05/14] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 06/14] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-06-24  4:02         ` Bagas Sanjaya
2021-06-22  3:20       ` [PATCH v4 07/14] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 08/14] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 09/14] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 10/14] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 11/14] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-06-24  4:07         ` Bagas Sanjaya
2021-06-22  3:20       ` [PATCH v4 12/14] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 13/14] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-06-22  3:20       ` [PATCH v4 14/14] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-25 16:02       ` [PATCH v5 00/15] [GSOC][RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 01/15] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 02/15] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-26  0:42           ` Bagas Sanjaya
2021-06-27 11:43             ` ZheNing Hu
2021-06-25 16:02         ` [PATCH v5 03/15] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 04/15] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 05/15] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 06/15] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 07/15] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 08/15] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 09/15] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 10/15] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 11/15] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 12/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-26 17:26           ` Hariom verma
2021-06-27 11:31             ` ZheNing Hu
2021-06-26 18:08           ` Hariom verma
2021-06-27 11:34             ` ZheNing Hu
2021-06-25 16:02         ` [PATCH v5 13/15] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 14/15] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-25 16:02         ` [PATCH v5 15/15] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-06-27 12:35         ` [PATCH v6 00/15] [GSOC][RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 01/15] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 02/15] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-28  6:49             ` Bagas Sanjaya
2021-06-27 12:35           ` [PATCH v6 03/15] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 04/15] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 05/15] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 06/15] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 07/15] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 08/15] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 09/15] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 10/15] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 11/15] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 12/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-06-28  7:46             ` Hariom verma
2021-06-28 13:51               ` ZheNing Hu
2021-06-27 12:35           ` [PATCH v6 13/15] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 14/15] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-06-27 12:35           ` [PATCH v6 15/15] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-06-30 22:04           ` [PATCH v6 00/15] [GSOC][RFC] cat-file: reuse ref-filter logic Junio C Hamano
2021-07-01 12:39             ` ZheNing Hu
2021-07-01 14:17               ` Junio C Hamano
2021-07-09 10:04                 ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1h0wnwg.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.