git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: phillip.wood@dunelm.org.uk
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Eli Schwartz" <eschwartz93@gmail.com>,
	"René Scharfe" <l.s.r@web.de>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	"Konstantin Ryabitsev" <konstantin@linuxfoundation.org>,
	"Michal Suchánek" <msuchanek@suse.de>,
	"Raymond E . Pasco" <ray@ameretat.dev>,
	demerphq <demerphq@gmail.com>, "Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty
Date: Fri, 03 Feb 2023 14:49:37 +0100	[thread overview]
Message-ID: <230203.86fsbmbzwp.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <771a98ca-9540-ad4e-dfba-9d304e1dff09@dunelm.org.uk>


On Thu, Feb 02 2023, Phillip Wood wrote:

> On 02/02/2023 09:32, Ævar Arnfjörð Bjarmason wrote:
>> As reported in
>> https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/
>> changing the default "tgz" output method of from "gzip(1)" to our
>> internal "git archive gzip" (using zlib ) broke things for users in
>> the wild that assume that the "git archive" output is stable, most
>> notably GitHub: https://github.com/orgs/community/discussions/45830
>>
>> Leaving aside the larger question of whether we're going to promise
>> output stability for "git archive" in general, the motivation for that
>> change was to have a working compression method on systems that lacked
>> a gzip(1).
>
> As I recall the reduction in cpu time used to create a compressed
> archive was a factor in making it the default.

I read those references in 76d7602631a (archive-tar: add internal gzip
implementation, 2022-06-15) more of a "it's not [much] slower", the flip
to the default in 4f4be00d302 (archive-tar: use internal gzip by
default, 2022-06-15) didn't discuss it.

So I didn't think it was important enough to mention (even though we're
now back to the faster "gzip" method).

>> As the disruption of changing the default isn't worth it, let's use
>> gzip(1) again by default, and only fall back on the new "git archive
>> gzip" if it isn't available.
>
> Playing devil's advocate for a moment as we're not going to promise
> that the compressed output of "git archive" will be stable in the
> future perhaps we should use this breakage as an opportunity to
> highlight that to users and to advertize the config setting that
> allows them to use gzip for compressing archives.

If we were trying to intentionally break things for those users we could
do a lot better than "git archive gzip", whose output is mostly the same
as "gzip", we could tweak one of the headers to make it different all
the time.

But I think it's better to advocate for such intentional chaos-monkeying
as a follow-up to this more conservative "oops, we broke stuff, it's
easy not to break it, so let's not do it'.

> Reverting the change gives the misleading impression that we're making
> a commitment to keeping the output stable.

I don't see how you can conclude that from this series. It explicitly
states that we make no such promises, what it does is go back to
allowing the gzip(1) command to make its own promises.

> The focus of this thread seems to be the
> problems relating to github which they have already addressed.

Which they've addressed by reverting the change, but while they're a
major user of git they're not the only one. They just happened to use
"git archive".

I think it would be a mistake to conclude that everyone who's run into
this has already done so, or is aware of it.

> I think there is general agreement that it is not practical to promise
> that the compressed output of "git archive" is stable so maybe it is
> better[...]

...better than what? This seems to imply that this series is making new
promises about the output stability, which it isn't doing.

> [...]to make that clear now while users can work around it in the
> short term with a config setting rather than waiting until we're faced
> with some security or other issue that forces a change to the output
> which users cannot work around so easily.

I think it's always been clear that you can use that setting. For ages
we've been saying:

	The `tar.gz` and `tgz` formats are defined automatically and use the
	command `gzip -cn` by default.

Then v2.38.0 changed it to:

	[...]
        magic command `git archive gzip` by default

Which IMO was easily missed among other "Performance, Internal
Implementation, Development Support etc." items in the release notes,
which said:

   Teach "git archive" to (optionally and then by default) avoid
   spawning an external "gzip" process when creating ".tar.gz" (and
   ".tgz") archives.

But I agree that all of this is subjective. To me a 2% reduction in CPU
use (at the cost of ~20% increse in wallclock) & some unclear benefits
to teaching users that they can't rely on our "gzip" output seems
unclear or hypothetical.

Whereas the widespread breakage reported is very real, and we should
consider GitHub as a canary for that, not the the stand & end of its
potential impact.

As we didn't have a strong reason to change this in the first place (and
as my series shows, we can have our cake & eat it too if we don't have a
"gzip") I think the obvious choice is to go back to using "gzip".

  parent reply	other threads:[~2023-02-03 14:14 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  0:06 Stability of git-archive, breaking (?) the Github universe, and a possible solution Eli Schwartz
2023-01-31  7:49 ` Ævar Arnfjörð Bjarmason
2023-01-31  9:11   ` Eli Schwartz
2023-02-02  9:32   ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 1/9] archive & tar config docs: de-duplicate configuration section Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 4/9] archive: omit the shell for built-in "command" filters Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 5/9] archive-tar.c: move internal gzip implementation to a function Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 9/9] git archive docs: document output non-stability Ævar Arnfjörð Bjarmason
2023-02-02 10:25       ` brian m. carlson
2023-02-02 10:30         ` Ævar Arnfjörð Bjarmason
2023-02-02 16:34         ` Junio C Hamano
2023-02-04 17:46           ` brian m. carlson
2023-02-02 16:17     ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Phillip Wood
2023-02-02 16:40       ` Junio C Hamano
2023-02-03 13:49       ` Ævar Arnfjörð Bjarmason [this message]
2023-02-06 14:46         ` Phillip Wood
2023-02-03 15:47       ` Theodore Ts'o
2023-02-02 16:25     ` Junio C Hamano
2023-02-04 18:08       ` René Scharfe
2023-02-05 21:30         ` Ævar Arnfjörð Bjarmason
2023-02-12 17:41           ` René Scharfe
2023-02-02 19:23     ` Raymond E. Pasco
2023-02-03  8:06       ` [PATCH] archive: document output stability concerns Raymond E. Pasco
2023-01-31  9:54 ` Stability of git-archive, breaking (?) the Github universe, and a possible solution brian m. carlson
2023-01-31 11:31   ` Ævar Arnfjörð Bjarmason
2023-01-31 15:05   ` Konstantin Ryabitsev
2023-01-31 22:32     ` brian m. carlson
2023-02-01  9:40       ` Ævar Arnfjörð Bjarmason
2023-02-01 11:34         ` demerphq
2023-02-01 12:21           ` Michal Suchánek
2023-02-01 12:48             ` demerphq
2023-02-01 13:43               ` Ævar Arnfjörð Bjarmason
2023-02-01 15:21                 ` demerphq
2023-02-01 18:56                   ` Theodore Ts'o
2023-02-02 21:19                     ` Joey Hess
2023-02-03  4:02                       ` Theodore Ts'o
2023-02-03 13:32                         ` Ævar Arnfjörð Bjarmason
2023-02-01 23:16         ` brian m. carlson
2023-02-01 23:37           ` Junio C Hamano
2023-02-02 23:01             ` brian m. carlson
2023-02-02 23:47               ` rsbecker
2023-02-03 13:18                 ` Ævar Arnfjörð Bjarmason
2023-02-02  0:42           ` Ævar Arnfjörð Bjarmason
2023-02-01 12:17       ` Raymond E. Pasco
2023-01-31 15:56   ` Eli Schwartz
2023-01-31 16:20     ` Konstantin Ryabitsev
2023-01-31 16:34       ` Eli Schwartz
2023-01-31 20:34         ` Konstantin Ryabitsev
2023-01-31 20:45         ` Michal Suchánek
2023-02-01  1:33     ` brian m. carlson
2023-02-01 12:42   ` Ævar Arnfjörð Bjarmason
2023-02-01 23:18     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=230203.86fsbmbzwp.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=demerphq@gmail.com \
    --cc=eschwartz93@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=l.s.r@web.de \
    --cc=msuchanek@suse.de \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ray@ameretat.dev \
    --cc=sandals@crustytoothpaste.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).