All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matt Cooper via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Carlo Arenas" <carenas@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Johannes Schindelin" <johannes.schindelin@gmail.com>,
	"Philip Oakley" <philipoakley@iee.email>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Matt Cooper" <vtbassmatt@gmail.com>
Subject: [PATCH v4 8/8] clean/smudge: allow clean filters to process extremely large files
Date: Tue, 02 Nov 2021 15:46:11 +0000	[thread overview]
Message-ID: <41fda423982d99847d3879f5ea1eb3570ae9eab6.1635867971.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1068.v4.git.1635867971.gitgitgadget@gmail.com>

From: Matt Cooper <vtbassmatt@gmail.com>

The filter system allows for alterations to file contents when they're
moved between the database and the worktree. We already made sure that
it is possible for smudge filters to produce contents that are larger
than `unsigned long` can represent (which matters on systems where
`unsigned long` is narrower than `size_t`, most notably 64-bit Windows).
Now we make sure that clean filters can _consume_ contents that are
larger than that.

Note that this commit only allows clean filters' _input_ to be larger
than can be represented by `unsigned long`.

This change makes only a very minute dent into the much larger project
to teach Git to use `size_t` instead of `unsigned long` wherever
appropriate.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 convert.c                   |  2 +-
 t/t1051-large-conversion.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/convert.c b/convert.c
index fd9c84b0257..5ad6dfc08a0 100644
--- a/convert.c
+++ b/convert.c
@@ -613,7 +613,7 @@ static int crlf_to_worktree(const char *src, size_t len, struct strbuf *buf,
 
 struct filter_params {
 	const char *src;
-	unsigned long size;
+	size_t size;
 	int fd;
 	const char *cmd;
 	const char *path;
diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh
index e6d52f98b15..042b0e44292 100755
--- a/t/t1051-large-conversion.sh
+++ b/t/t1051-large-conversion.sh
@@ -98,4 +98,15 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
 	test "$size" -eq $((5 * 1024 * 1024 * 1024 + $small_size))
 '
 
+# This clean filter writes down the size of input it receives. By checking against
+# the actual size, we ensure that cleaning doesn't mangle large files on 64-bit Windows.
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB convert on input' '
+	test-tool genzeros $((5*1024*1024*1024)) >big &&
+	test_config filter.checklarge.clean "wc -c >big.size" &&
+	echo "big filter=checklarge" >.gitattributes &&
+	git add big &&
+	test $(test_file_size big) -eq $(cat big.size)
+'
+
 test_done
-- 
gitgitgadget

  parent reply	other threads:[~2021-11-02 15:46 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27  7:49 [PATCH 0/5] Allow clean/smudge filters to handle huge files in the LLP64 data model Johannes Schindelin via GitGitGadget
2021-10-27  7:49 ` [PATCH 1/5] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-28  7:15   ` Carlo Arenas
2021-10-28  8:54     ` [PATCH] helper/test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón
2021-10-28 20:32       ` Johannes Schindelin
2021-10-27  7:49 ` [PATCH 2/5] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-27  7:49 ` [PATCH 3/5] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-27  7:49 ` [PATCH 4/5] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-27  7:49 ` [PATCH 5/5] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-28 20:50 ` [PATCH v2 0/7] Allow clean/smudge filters to handle huge files in the LLP64 data model Johannes Schindelin via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 1/7] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 2/7] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-10-28 22:55     ` Junio C Hamano
2021-10-28 20:50   ` [PATCH v2 3/7] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 4/7] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-28 22:14     ` Carlo Arenas
2021-10-28 22:21       ` Johannes Schindelin
2021-10-28 20:50   ` [PATCH v2 5/7] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-28 23:05     ` Junio C Hamano
2021-10-28 20:50   ` [PATCH v2 6/7] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-28 20:50   ` [PATCH v2 7/7] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-28 22:32   ` [PATCH v2 0/7] Allow clean/smudge filters to handle huge files in the LLP64 data model brian m. carlson
2021-10-28 23:07     ` Junio C Hamano
2021-10-29 13:59   ` [PATCH v3 0/8] " Johannes Schindelin via GitGitGadget
2021-10-29 13:59     ` [PATCH v3 1/8] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-29 13:59     ` [PATCH v3 2/8] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-10-29 22:50       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 3/8] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón via GitGitGadget
2021-10-29 22:52       ` Junio C Hamano
2021-11-02 14:35         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 4/8] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-10-29 23:00       ` Junio C Hamano
2021-10-29 23:21         ` Junio C Hamano
2021-11-02 14:56           ` Johannes Schindelin
2021-11-02 14:57         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 5/8] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-10-29 23:17       ` Junio C Hamano
2021-11-02 15:10         ` Johannes Schindelin
2021-10-29 13:59     ` [PATCH v3 6/8] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-10-29 23:10       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 7/8] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-10-29 23:13       ` Junio C Hamano
2021-10-29 13:59     ` [PATCH v3 8/8] clean/smudge: allow clean filters to process extremely large files Matt Cooper via GitGitGadget
2021-10-29 23:17       ` Junio C Hamano
2021-11-02 14:59         ` Johannes Schindelin
2021-10-29 18:34     ` [PATCH v3 0/8] Allow clean/smudge filters to handle huge files in the LLP64 data model Junio C Hamano
     [not found]       ` <nycvar.QRO.7.76.6.2110292239170.56@tvgsbejvaqbjf.bet>
2021-10-29 21:12         ` Johannes Schindelin
2021-10-29 23:25           ` Junio C Hamano
2021-10-30 15:16           ` Philip Oakley
2021-10-30 17:35             ` Torsten Bögershausen
2021-10-30 19:29               ` Philip Oakley
2021-11-02 14:41       ` Johannes Schindelin
2021-11-02 15:46     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 1/8] test-genzeros: allow more than 2G zeros in Windows Carlo Marcelo Arenas Belón via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 2/8] test-tool genzeros: generate large amounts of data more efficiently Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 3/8] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 4/8] t1051: introduce a smudge filter test for extremely large files Matt Cooper via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 5/8] odb: teach read_blob_entry to use size_t Matt Cooper via GitGitGadget
2021-11-02 20:40         ` Torsten Bögershausen
2021-11-04  0:09           ` Johannes Schindelin
2021-11-04 12:24             ` Philip Oakley
2021-11-02 15:46       ` [PATCH v4 6/8] git-compat-util: introduce more size_t helpers Johannes Schindelin via GitGitGadget
2021-11-02 15:46       ` [PATCH v4 7/8] odb: guard against data loss checking out a huge file Matt Cooper via GitGitGadget
2021-11-02 15:46       ` Matt Cooper via GitGitGadget [this message]
2021-11-02 20:47         ` [PATCH v4 8/8] clean/smudge: allow clean filters to process extremely large files Torsten Bögershausen
2021-11-04  0:11           ` Johannes Schindelin
2021-11-04  8:33             ` Torsten Bögershausen
2021-11-04 17:26         ` Junio C Hamano
2021-11-02 21:46       ` [PATCH v4 0/8] Allow clean/smudge filters to handle huge files in the LLP64 data model Torsten Bögershausen
2021-11-03  6:31         ` Johannes Sixt
2021-10-28 20:56 ` [PATCH 0/3] " Carlo Marcelo Arenas Belón
2021-10-28 20:56   ` [PATCH 1/3] test-lib: add prerequisite for 64-bit platforms Carlo Marcelo Arenas Belón
2021-10-28 21:45     ` Johannes Schindelin
2021-10-28 22:09       ` Carlo Arenas
2021-10-28 22:38         ` Junio C Hamano
2021-11-02 15:20           ` Johannes Schindelin
2021-10-28 20:56   ` [PATCH 2/3] fixup! t1051: introduce a smudge filter test for extremely large files Carlo Marcelo Arenas Belón
2021-10-28 20:56   ` [PATCH 3/3] fixup! clean/smudge: allow clean filters to process " Carlo Marcelo Arenas Belón

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41fda423982d99847d3879f5ea1eb3570ae9eab6.1635867971.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=philipoakley@iee.email \
    --cc=sandals@crustytoothpaste.net \
    --cc=tboegi@web.de \
    --cc=vtbassmatt@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.