git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Raymond E. Pasco" <ray@ameretat.dev>
To: ray@ameretat.dev
Cc: avarab@gmail.com, demerphq@gmail.com, eschwartz93@gmail.com,
	git@vger.kernel.org, gitster@pobox.com,
	konstantin@linuxfoundation.org, l.s.r@web.de, msuchanek@suse.de,
	phillip.wood@dunelm.org.uk, sandals@crustytoothpaste.net,
	tytso@mit.edu
Subject: [PATCH] archive: document output stability concerns
Date: Fri,  3 Feb 2023 03:06:29 -0500	[thread overview]
Message-ID: <20230203080629.31492-1-ray@ameretat.dev> (raw)
In-Reply-To: <de8f1e338e6ee99cd3ee06b16f1edbce@ameretat.dev>

In 4f4be00d302 (archive-tar: use internal gzip by default), the 'git
archive' command switched to using an internal compression filter
implemented with zlib rather than invoking a 'gzip' binary, for the
'.tar.gz' / '.tgz' output formats.

This change brought to light a common misconception that the output of
'git archive' is intended to be byte-for-byte stable. While this is not
the case, stable archive output is desirable for many applications; we
discuss concerns related to output stability and suggest ways in which
the user can control the compression used with the
"tar.<format>.command" configuration option.

Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
---
I think that something along these lines should be included in the
docs, but that the behavior should be kept the same. If it is decided
later to stabilize output, e.g. by vendoring a blessed zlib version
forever, the current state as of 2.38 is the best starting point;
and reverting a useful change because of external breakage which
already has a solution, while also promising instability, seems like
a poor choice.

 Documentation/git-archive.txt | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 60c040988b..77acdacdf8 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -178,6 +178,41 @@ appropriate export-ignore in its `.gitattributes`), adjust the checked out
 option.  Alternatively you can keep necessary attributes that should apply
 while archiving any tree in your `$GIT_DIR/info/attributes` file.
 
+[[STABILITY]]
+STABILITY
+---------
+
+'git archive' does not guarantee that precisely identical archive files
+will be produced for invocations on the same commit or tree.
+
+'git archive' uses an internal implementation of `tar` archiving
+for the `tar` format, which includes the commit ID in an extended
+pax header.  For the `tgz` and `tar.gz` formats, it is augmented with
+a compression filter applied to the output, which is implemented by
+'git archive' by linking to the system zlib.
+
+If the commit ID of the "same" commit is different, for instance in the
+case of an object format migration from SHA-1 to SHA-256, the `tar`
+archive will necessarily differ due to including a different ID.
+
+The output of the compression filter is less deterministic than
+the output of the `tar` implementation, because the versions
+of zlib used may differ. The internal compression filter can be
+replaced with a particular command specified by the user using the
+`tar.<format>.command` configuration option; for instance, a particular
+gzip binary provided by the user could be specified here for consistent
+output.
+
+The `tar` format used by 'git archive' is unlikely to change
+frequently, but is not guaranteed to be completely stable; its output
+will remain identical at least within the same Git version.
+
+The `zip` format has similar concerns to the `tar.gz` and `tgz`
+formats; ZIP archiving is implemented internally, but the Deflate
+compression used relies on the linked zlib. However, because archiving
+and compression are combined into a single operation, there is no
+user-specifiable filter command for the `zip` format.
+
 EXAMPLES
 --------
 `git archive --format=tar --prefix=junk/ HEAD | (cd /var/tmp/ && tar xf -)`::
-- 
2.39.1.561.g98d13ac3e7


  reply	other threads:[~2023-02-03  8:16 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  0:06 Stability of git-archive, breaking (?) the Github universe, and a possible solution Eli Schwartz
2023-01-31  7:49 ` Ævar Arnfjörð Bjarmason
2023-01-31  9:11   ` Eli Schwartz
2023-02-02  9:32   ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 1/9] archive & tar config docs: de-duplicate configuration section Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 4/9] archive: omit the shell for built-in "command" filters Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 5/9] archive-tar.c: move internal gzip implementation to a function Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 9/9] git archive docs: document output non-stability Ævar Arnfjörð Bjarmason
2023-02-02 10:25       ` brian m. carlson
2023-02-02 10:30         ` Ævar Arnfjörð Bjarmason
2023-02-02 16:34         ` Junio C Hamano
2023-02-04 17:46           ` brian m. carlson
2023-02-02 16:17     ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Phillip Wood
2023-02-02 16:40       ` Junio C Hamano
2023-02-03 13:49       ` Ævar Arnfjörð Bjarmason
2023-02-06 14:46         ` Phillip Wood
2023-02-03 15:47       ` Theodore Ts'o
2023-02-02 16:25     ` Junio C Hamano
2023-02-04 18:08       ` René Scharfe
2023-02-05 21:30         ` Ævar Arnfjörð Bjarmason
2023-02-12 17:41           ` René Scharfe
2023-02-02 19:23     ` Raymond E. Pasco
2023-02-03  8:06       ` Raymond E. Pasco [this message]
2023-01-31  9:54 ` Stability of git-archive, breaking (?) the Github universe, and a possible solution brian m. carlson
2023-01-31 11:31   ` Ævar Arnfjörð Bjarmason
2023-01-31 15:05   ` Konstantin Ryabitsev
2023-01-31 22:32     ` brian m. carlson
2023-02-01  9:40       ` Ævar Arnfjörð Bjarmason
2023-02-01 11:34         ` demerphq
2023-02-01 12:21           ` Michal Suchánek
2023-02-01 12:48             ` demerphq
2023-02-01 13:43               ` Ævar Arnfjörð Bjarmason
2023-02-01 15:21                 ` demerphq
2023-02-01 18:56                   ` Theodore Ts'o
2023-02-02 21:19                     ` Joey Hess
2023-02-03  4:02                       ` Theodore Ts'o
2023-02-03 13:32                         ` Ævar Arnfjörð Bjarmason
2023-02-01 23:16         ` brian m. carlson
2023-02-01 23:37           ` Junio C Hamano
2023-02-02 23:01             ` brian m. carlson
2023-02-02 23:47               ` rsbecker
2023-02-03 13:18                 ` Ævar Arnfjörð Bjarmason
2023-02-02  0:42           ` Ævar Arnfjörð Bjarmason
2023-02-01 12:17       ` Raymond E. Pasco
2023-01-31 15:56   ` Eli Schwartz
2023-01-31 16:20     ` Konstantin Ryabitsev
2023-01-31 16:34       ` Eli Schwartz
2023-01-31 20:34         ` Konstantin Ryabitsev
2023-01-31 20:45         ` Michal Suchánek
2023-02-01  1:33     ` brian m. carlson
2023-02-01 12:42   ` Ævar Arnfjörð Bjarmason
2023-02-01 23:18     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230203080629.31492-1-ray@ameretat.dev \
    --to=ray@ameretat.dev \
    --cc=avarab@gmail.com \
    --cc=demerphq@gmail.com \
    --cc=eschwartz93@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=l.s.r@web.de \
    --cc=msuchanek@suse.de \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=sandals@crustytoothpaste.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).