From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: rsbecker@nexbridge.com
Cc: "'brian m. carlson'" <sandals@crustytoothpaste.net>,
'Junio C Hamano' <gitster@pobox.com>,
'Konstantin Ryabitsev' <konstantin@linuxfoundation.org>,
'Eli Schwartz' <eschwartz93@gmail.com>,
'Git List' <git@vger.kernel.org>
Subject: Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
Date: Fri, 03 Feb 2023 14:18:58 +0100 [thread overview]
Message-ID: <230203.86sffmc1tz.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <01a901d93760$c690d970$53b28c50$@nexbridge.com>
On Thu, Feb 02 2023, rsbecker@nexbridge.com wrote:
> On February 2, 2023 6:02 PM, brian m. carlson wrote:
>>On 2023-02-01 at 23:37:19, Junio C Hamano wrote:
>>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>>
>>> > I don't think a blurb is necessary, but you're basically
>>> > underscoring the problem, which is that nobody is willing to promise
>>> > that compression is consistent, but yet people want to rely on that
>>> > fact. I'm willing to write and implement a consistent tar spec and
>>> > to guarantee compatibility with that, but the tension here is that
>>> > people also want gzip to never change its byte format ever, which
>>> > frankly seems unrealistic without explicit guarantees. Maybe the
>>> > authors will agree to promise that, but it seems unlikely.
>>>
>>> Just to step back a bit, where does the distinction between
>>> guaranteeing the tar format stability and gzip compressed bitstream
>>> stability come from? At both levels, the same thing can be expressed
>>> in multiple different ways, I think, but spelling out how exactly the
>>> compressor compresses is more involved than spelling out how entries
>>> in a tar archive is ordered and each entry is expressed, or something?
>>
>>Yes, at least with my understanding about how gzip and compression in general
>>work.
>>
>>The tar format (and the pax format which builds on it) can mostly be restricted by
>>explaining what data is to be included in the pax and tar headers and how it is to be
>>formatted. If we say, we will always write such and such information in the pax
>>header and sort the keys, and we write such and such information in the tar header,
>>then the format is completely deterministic, and we can make nice guarantees.
>>
>>My understanding about how Lempel-Ziv-based compression algorithms work is that
>>there's a lot more freedom to decide how best to compress things and that there
>>isn't always a logical obvious choice, but I will admit my understanding is relatively
>>limited. If someone thinks we can effectively succeed in supporting compression
>>more than just relying on gzip, I would be delighted to be shown to be wrong.
>
> The nice part about gzip is that it is generally available on
> virtually all platforms (or can be easily obtained). Other compression
> forms, like bz2, which sometimes produces more dense compression, are
> not necessarily available. Availability is something I would be
> worried about...
I agree with all of that, gzip is in such wide use for a reason.
>... (clone and checkout failures).
But how would a hypothetical obscure format for "git archive" contribute
to clone or checkout failures? Are you thinking of our use of zlib for
e.g. loose objects? That's unrelated to this discussion (and I don't
think anyone relies on their compressed checksum).
> Tar formats are also to be used carefully. Not all platform
> implementations of tar support all variants. "ustar" is fairly common
> but there are others that are not. Interoperability needs to be the
> biggest factor in this decision, IMHO, rather than compression rates.
For "git archive" whether you care about interoperability depends on the
target audience of your archive, and in any case I don't see why we need
to worry about it, except to perhaps note that some are more portable
than others if we e.g. had a built-in "tar.bz2" helper method.
> The alternative is having git supply its own implementation, but that
> is a longer term migration problem, resembling the SHA-256 migration.
I've noted elsewhere in this thread that I don't see the point of
shipping a fallback "gzip" beyond the "git archive gzip" we have
already, but even if we did that the scope of that seems pretty simple,
and *much* easier than the SHA-256 migration.
next prev parent reply other threads:[~2023-02-03 13:32 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-31 0:06 Stability of git-archive, breaking (?) the Github universe, and a possible solution Eli Schwartz
2023-01-31 7:49 ` Ævar Arnfjörð Bjarmason
2023-01-31 9:11 ` Eli Schwartz
2023-02-02 9:32 ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 1/9] archive & tar config docs: de-duplicate configuration section Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 4/9] archive: omit the shell for built-in "command" filters Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 5/9] archive-tar.c: move internal gzip implementation to a function Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip" Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability Ævar Arnfjörð Bjarmason
2023-02-02 9:32 ` [PATCH 9/9] git archive docs: document output non-stability Ævar Arnfjörð Bjarmason
2023-02-02 10:25 ` brian m. carlson
2023-02-02 10:30 ` Ævar Arnfjörð Bjarmason
2023-02-02 16:34 ` Junio C Hamano
2023-02-04 17:46 ` brian m. carlson
2023-02-02 16:17 ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Phillip Wood
2023-02-02 16:40 ` Junio C Hamano
2023-02-03 13:49 ` Ævar Arnfjörð Bjarmason
2023-02-06 14:46 ` Phillip Wood
2023-02-03 15:47 ` Theodore Ts'o
2023-02-02 16:25 ` Junio C Hamano
2023-02-04 18:08 ` René Scharfe
2023-02-05 21:30 ` Ævar Arnfjörð Bjarmason
2023-02-12 17:41 ` René Scharfe
2023-02-02 19:23 ` Raymond E. Pasco
2023-02-03 8:06 ` [PATCH] archive: document output stability concerns Raymond E. Pasco
2023-01-31 9:54 ` Stability of git-archive, breaking (?) the Github universe, and a possible solution brian m. carlson
2023-01-31 11:31 ` Ævar Arnfjörð Bjarmason
2023-01-31 15:05 ` Konstantin Ryabitsev
2023-01-31 22:32 ` brian m. carlson
2023-02-01 9:40 ` Ævar Arnfjörð Bjarmason
2023-02-01 11:34 ` demerphq
2023-02-01 12:21 ` Michal Suchánek
2023-02-01 12:48 ` demerphq
2023-02-01 13:43 ` Ævar Arnfjörð Bjarmason
2023-02-01 15:21 ` demerphq
2023-02-01 18:56 ` Theodore Ts'o
2023-02-02 21:19 ` Joey Hess
2023-02-03 4:02 ` Theodore Ts'o
2023-02-03 13:32 ` Ævar Arnfjörð Bjarmason
2023-02-01 23:16 ` brian m. carlson
2023-02-01 23:37 ` Junio C Hamano
2023-02-02 23:01 ` brian m. carlson
2023-02-02 23:47 ` rsbecker
2023-02-03 13:18 ` Ævar Arnfjörð Bjarmason [this message]
2023-02-02 0:42 ` Ævar Arnfjörð Bjarmason
2023-02-01 12:17 ` Raymond E. Pasco
2023-01-31 15:56 ` Eli Schwartz
2023-01-31 16:20 ` Konstantin Ryabitsev
2023-01-31 16:34 ` Eli Schwartz
2023-01-31 20:34 ` Konstantin Ryabitsev
2023-01-31 20:45 ` Michal Suchánek
2023-02-01 1:33 ` brian m. carlson
2023-02-01 12:42 ` Ævar Arnfjörð Bjarmason
2023-02-01 23:18 ` brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=230203.86sffmc1tz.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=eschwartz93@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=konstantin@linuxfoundation.org \
--cc=rsbecker@nexbridge.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).