All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] t1050-large: replace dd by test-genrandom
@ 2015-01-13 17:36 Johannes Sixt
  2015-01-13 18:56 ` Junio C Hamano
  2015-01-13 21:47 ` Jeff King
  0 siblings, 2 replies; 13+ messages in thread
From: Johannes Sixt @ 2015-01-13 17:36 UTC (permalink / raw)
  To: Git Mailing List

For some unknown reason, the dd on my Windows box segfaults every now
and than, but since recently, it does so much more often than it used
to, which makes running the test suite burdensome.

Get rid of four invocations of dd and use test-genrandom instead.

The new code does change some properties of the generated files:

 - They are a bit smaller.
 - They are not sparse anymore.
 - They do not compress well anymore.
 - The smaller of the four files is now a prefix of the larger.

Fortunately, the tests do not depend on these properties, which would
have a big influence on the size of the generated pack files. There *is*
a test in t1050 that checks the size of pack files generated from large
blobs, but it runs in its own repository with its own set of files (that
are already generated with test-genrandom!).

To emphasize that three of the large blobs are exact copies, use cp to
allocate them.

While we are here, replace cmp with test_cmp_bin to document the
binary-ness of the comparison, which was hinted at by a comment, but not
stated explicitly.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 I won't mind if the first paragraph of the proposed commit message is
 removed, but without the motivation, this commit reads as if it were
 merely code churn.

 The existing test-genrandom invocations look like this:

   test-genrandom "a" $(( 66 * 1024 )) >mid1

 I chose not to mimick this style because being precise with the file
 size is not important for the files generated here.

 t/t1050-large.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f5a9119..f653121 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -9,10 +9,10 @@ test_expect_success setup '
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
-	echo X | dd of=large1 bs=1k seek=2000 &&
-	echo X | dd of=large2 bs=1k seek=2000 &&
-	echo X | dd of=large3 bs=1k seek=2000 &&
-	echo Y | dd of=huge bs=1k seek=2500 &&
+	test-genrandom seed1 2000000 >large1 &&
+	cp large1 large2 &&
+	cp large1 large3 &&
+	test-genrandom seed2 2500000 >huge &&
 	GIT_ALLOC_LIMIT=1500k &&
 	export GIT_ALLOC_LIMIT
 '
@@ -61,7 +61,7 @@ test_expect_success 'checkout a large file' '
 	large1=$(git rev-parse :large1) &&
 	git update-index --add --cacheinfo 100644 $large1 another &&
 	git checkout another &&
-	cmp large1 another ;# this must not be test_cmp
+	test_cmp_bin large1 another
 '
 
 test_expect_success 'packsize limit' '
@@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose object' '
 	test_create_repo packed &&
 	mv pack-* packed/.git/objects/pack &&
 	GIT_DIR=packed/.git git cat-file blob $SHA1 >actual &&
-	cmp huge actual
+	test_cmp_bin huge actual
 '
 
 test_expect_success 'tar achiving' '
-- 
2.0.0.12.gbcf935e

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 17:36 [PATCH] t1050-large: replace dd by test-genrandom Johannes Sixt
@ 2015-01-13 18:56 ` Junio C Hamano
  2015-01-13 19:55   ` Johannes Sixt
  2015-01-13 21:47 ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2015-01-13 18:56 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

Johannes Sixt <j6t@kdbg.org> writes:

> For some unknown reason, the dd on my Windows box segfaults every now
> and than, but since recently, it does so much more often than it used
> to, which makes running the test suite burdensome.
>
> Get rid of four invocations of dd and use test-genrandom instead.
>
> The new code does change some properties of the generated files:
>
>  - They are a bit smaller.
>  - They are not sparse anymore.
>  - They do not compress well anymore.
>  - The smaller of the four files is now a prefix of the larger.
>
> Fortunately, the tests do not depend on these properties, which would
> have a big influence on the size of the generated pack files. There *is*
> a test in t1050 that checks the size of pack files generated from large
> blobs, but it runs in its own repository with its own set of files (that
> are already generated with test-genrandom!).
>
> To emphasize that three of the large blobs are exact copies, use cp to
> allocate them.
>
> While we are here, replace cmp with test_cmp_bin to document the
> binary-ness of the comparison, which was hinted at by a comment, but not
> stated explicitly.
>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  I won't mind if the first paragraph of the proposed commit message is
>  removed, but without the motivation, this commit reads as if it were
>  merely code churn.

I agree that it is good to see the motivation.  Thanks.

>
>  The existing test-genrandom invocations look like this:
>
>    test-genrandom "a" $(( 66 * 1024 )) >mid1
>
>  I chose not to mimick this style because being precise with the file
>  size is not important for the files generated here.
>
>  t/t1050-large.sh | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index f5a9119..f653121 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -9,10 +9,10 @@ test_expect_success setup '
>  	# clone does not allow us to pass core.bigfilethreshold to
>  	# new repos, so set core.bigfilethreshold globally
>  	git config --global core.bigfilethreshold 200k &&
> -	echo X | dd of=large1 bs=1k seek=2000 &&
> -	echo X | dd of=large2 bs=1k seek=2000 &&
> -	echo X | dd of=large3 bs=1k seek=2000 &&
> -	echo Y | dd of=huge bs=1k seek=2500 &&
> +	test-genrandom seed1 2000000 >large1 &&
> +	cp large1 large2 &&
> +	cp large1 large3 &&
> +	test-genrandom seed2 2500000 >huge &&
>  	GIT_ALLOC_LIMIT=1500k &&
>  	export GIT_ALLOC_LIMIT
>  '
> @@ -61,7 +61,7 @@ test_expect_success 'checkout a large file' '
>  	large1=$(git rev-parse :large1) &&
>  	git update-index --add --cacheinfo 100644 $large1 another &&
>  	git checkout another &&
> -	cmp large1 another ;# this must not be test_cmp
> +	test_cmp_bin large1 another
>  '
>  
>  test_expect_success 'packsize limit' '
> @@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose object' '
>  	test_create_repo packed &&
>  	mv pack-* packed/.git/objects/pack &&
>  	GIT_DIR=packed/.git git cat-file blob $SHA1 >actual &&
> -	cmp huge actual
> +	test_cmp_bin huge actual
>  '
>  
>  test_expect_success 'tar achiving' '

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 18:56 ` Junio C Hamano
@ 2015-01-13 19:55   ` Johannes Sixt
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Sixt @ 2015-01-13 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Am 13.01.2015 um 19:56 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
>> The new code does change some properties of the generated files:
>>
>>  - They are a bit smaller.
>>  - They are not sparse anymore.
>>  - They do not compress well anymore.
>>  - The smaller of the four files is now a prefix of the larger.

Would you kindly strike the last bullet point when you apply the patch,
because it is not true, as I just noticed:

>> +	test-genrandom seed1 2000000 >large1 &&
...
>> +	test-genrandom seed2 2500000 >huge &&

The seeds are different so that the files are completely dissimilar.
This does not affect the rest of the analysis I gave in the commit message.

Thanks,
-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 17:36 [PATCH] t1050-large: replace dd by test-genrandom Johannes Sixt
  2015-01-13 18:56 ` Junio C Hamano
@ 2015-01-13 21:47 ` Jeff King
  2015-01-13 22:33   ` Johannes Sixt
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff King @ 2015-01-13 21:47 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

On Tue, Jan 13, 2015 at 06:36:27PM +0100, Johannes Sixt wrote:

> For some unknown reason, the dd on my Windows box segfaults every now
> and than, but since recently, it does so much more often than it used
> to, which makes running the test suite burdensome.
> 
> Get rid of four invocations of dd and use test-genrandom instead.

There are a dozen other uses of dd in the test suite. Do they all need
to go?

> The new code does change some properties of the generated files:
> 
>  - They are a bit smaller.
>  - They are not sparse anymore.
>  - They do not compress well anymore.

This is unfortunate, as it means other platforms will be slower. I
measured a best-of-five on running t1050 going from 0.780s to 1.750s.
That's on an SSD. Doing it on a RAM disk the numbers are 0.600s and
1.394s. Better, but not great.

One second on the test suite probably isn't breaking the bank, but these
sorts of things do add up. I wonder if we can shrink the test size. We
use 2000k files with a 200k core.bigfilethreshold, and a 1500k
GIT_ALLOC_LIMIT.  Skimming through the history, the sizes seem fairly
arbitrary. We can't go _too_ low, or GIT_ALLOC_LIMIT will prevent us
from even allocating heap memory for non-objects.

I tried dropping it by a factor of 10, but sadly that hits several
cases. The commit-slab code wants 512k chunks (which seems like rather a
lot to me), and pack-objects starts at just over 150k for the set of
objects. It would be nice to have a finer-grained tool than
GIT_ALLOC_LIMIT that applied only to objects, but I guess then we would
not be as sure of catching stray code paths (each caller would have to
annotate "this is for an object").

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 21:47 ` Jeff King
@ 2015-01-13 22:33   ` Johannes Sixt
  2015-01-13 22:38     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Sixt @ 2015-01-13 22:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

Am 13.01.2015 um 22:47 schrieb Jeff King:
> On Tue, Jan 13, 2015 at 06:36:27PM +0100, Johannes Sixt wrote:
> 
>> For some unknown reason, the dd on my Windows box segfaults every now
>> and than, but since recently, it does so much more often than it used
>> to, which makes running the test suite burdensome.
>>
>> Get rid of four invocations of dd and use test-genrandom instead.
> 
> There are a dozen other uses of dd in the test suite. Do they all need
> to go?

Yes, at best.

>> The new code does change some properties of the generated files:
>>
>>  - They are a bit smaller.
>>  - They are not sparse anymore.
>>  - They do not compress well anymore.
> 
> This is unfortunate, as it means other platforms will be slower. I
> measured a best-of-five on running t1050 going from 0.780s to 1.750s.
> That's on an SSD. Doing it on a RAM disk the numbers are 0.600s and
> 1.394s. Better, but not great.

Certainly you run the test suite a *LOT* more often than I do, so
theoretically, your (and everybody else's) lost time does add up to
more than the 5 minutes that I need to take care of the failing test
scripts until each test case happens to succeed at least once. So...

BTW, is it the incompressibility where the time is lost or lack of
sparseness of the files? How does the timing change with this patch on
top?

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f653121..9cf4e0e 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -9,10 +9,10 @@ test_expect_success setup '
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
-	test-genrandom seed1 2000000 >large1 &&
+	printf "\0%2000000s" X >large1 &&
 	cp large1 large2 &&
 	cp large1 large3 &&
-	test-genrandom seed2 2500000 >huge &&
+	printf "\0%2500000s" Y >huge &&
 	GIT_ALLOC_LIMIT=1500k &&
 	export GIT_ALLOC_LIMIT
 '

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 22:33   ` Johannes Sixt
@ 2015-01-13 22:38     ` Jeff King
  2015-01-13 23:40       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2015-01-13 22:38 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

On Tue, Jan 13, 2015 at 11:33:08PM +0100, Johannes Sixt wrote:

> BTW, is it the incompressibility where the time is lost or lack of
> sparseness of the files? How does the timing change with this patch on
> top?

Oh, good call. It's the incompressibility. Which makes perfect sense.
Once we copy the file into the object database, that copy is not sparse.
But in the genrandom version, it _is_ a million times bigger. :)

With the patch below, my timings go back to ~0.7s (actually, they seem
slightly _better_ on average than what is in "master" now, but there is
quite a bit of run-to-run noise, so it may not be meaningful).

> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index f653121..9cf4e0e 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -9,10 +9,10 @@ test_expect_success setup '
>  	# clone does not allow us to pass core.bigfilethreshold to
>  	# new repos, so set core.bigfilethreshold globally
>  	git config --global core.bigfilethreshold 200k &&
> -	test-genrandom seed1 2000000 >large1 &&
> +	printf "\0%2000000s" X >large1 &&
>  	cp large1 large2 &&
>  	cp large1 large3 &&
> -	test-genrandom seed2 2500000 >huge &&
> +	printf "\0%2500000s" Y >huge &&
>  	GIT_ALLOC_LIMIT=1500k &&
>  	export GIT_ALLOC_LIMIT
>  '

I think with this squashed in, I have no complaints at all about your
patch.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 22:38     ` Jeff King
@ 2015-01-13 23:40       ` Junio C Hamano
  2015-01-14 11:27         ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2015-01-13 23:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Tue, Jan 13, 2015 at 11:33:08PM +0100, Johannes Sixt wrote:
>
>> BTW, is it the incompressibility where the time is lost or lack of
>> sparseness of the files? How does the timing change with this patch on
>> top?
>
> Oh, good call. It's the incompressibility. Which makes perfect sense.
>
> Once we copy the file into the object database, that copy is not sparse.
> But in the genrandom version, it _is_ a million times bigger. :)

Yeah, of course ;-)

> With the patch below, my timings go back to ~0.7s (actually, they seem
> slightly _better_ on average than what is in "master" now, but there is
> quite a bit of run-to-run noise, so it may not be meaningful).
>
>> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
>> index f653121..9cf4e0e 100755
>> --- a/t/t1050-large.sh
>> +++ b/t/t1050-large.sh
>> @@ -9,10 +9,10 @@ test_expect_success setup '
>>  	# clone does not allow us to pass core.bigfilethreshold to
>>  	# new repos, so set core.bigfilethreshold globally
>>  	git config --global core.bigfilethreshold 200k &&
>> -	test-genrandom seed1 2000000 >large1 &&
>> +	printf "\0%2000000s" X >large1 &&
>>  	cp large1 large2 &&
>>  	cp large1 large3 &&
>> -	test-genrandom seed2 2500000 >huge &&
>> +	printf "\0%2500000s" Y >huge &&
>>  	GIT_ALLOC_LIMIT=1500k &&
>>  	export GIT_ALLOC_LIMIT
>>  '
>
> I think with this squashed in, I have no complaints at all about your
> patch.

OK, perhaps that affects the log message, so I'd play lazy and wait
for a reroll.

Are we depending on the binary-ness of these test files by the way?
The leading NUL \0 looked a bit strange to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-13 23:40       ` Junio C Hamano
@ 2015-01-14 11:27         ` Jeff King
  2015-01-14 17:31           ` Junio C Hamano
  2015-01-14 20:28           ` [PATCH v2] t1050-large: generate large files without dd Johannes Sixt
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2015-01-14 11:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List

On Tue, Jan 13, 2015 at 03:40:10PM -0800, Junio C Hamano wrote:

> >> -	test-genrandom seed2 2500000 >huge &&
> >> +	printf "\0%2500000s" Y >huge &&
> [...]
> Are we depending on the binary-ness of these test files by the way?
> The leading NUL \0 looked a bit strange to me.

I don't think so. We do not want to do a text diff, because that would
overflow our GIT_ALLOC_LIMIT. But the core.bigfilethreshold check is
what will make them binary, not the actual content. So a gigantic text
file is arguably a better test of the feature in question.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] t1050-large: replace dd by test-genrandom
  2015-01-14 11:27         ` Jeff King
@ 2015-01-14 17:31           ` Junio C Hamano
  2015-01-14 20:28           ` [PATCH v2] t1050-large: generate large files without dd Johannes Sixt
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2015-01-14 17:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Tue, Jan 13, 2015 at 03:40:10PM -0800, Junio C Hamano wrote:
>
>> >> -	test-genrandom seed2 2500000 >huge &&
>> >> +	printf "\0%2500000s" Y >huge &&
>> [...]
>> Are we depending on the binary-ness of these test files by the way?
>> The leading NUL \0 looked a bit strange to me.
>
> I don't think so. We do not want to do a text diff, because that would
> overflow our GIT_ALLOC_LIMIT. But the core.bigfilethreshold check is
> what will make them binary, not the actual content. So a gigantic text
> file is arguably a better test of the feature in question.

Perhaps.

The original used "dd seek" primarily so that we can logically have
large file without wasting diskspace, in addition to make sure that
the result would compress well.  The large printf will still waste
the diskspace, but disks are cheap enough to tolerate a hanful of
files of a few megabytes and you already made sure that the
compressibility is what matters more to the test latency, so I think
all is good.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] t1050-large: generate large files without dd
  2015-01-14 11:27         ` Jeff King
  2015-01-14 17:31           ` Junio C Hamano
@ 2015-01-14 20:28           ` Johannes Sixt
  2015-01-14 21:00             ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Sixt @ 2015-01-14 20:28 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Git Mailing List

For some unknown reason, the dd on my Windows box segfaults randomly,
but since recently, it does so much more often than it used to, which
makes running the test suite burdensome.

Use printf to write large files instead of dd. To emphasize that three
of the large blobs are exact copies, use cp to allocate them.

The new code makes the files a bit smaller, and they are not sparse
anymore, but the tests do not depend on these properties. We do not want
to use test-genrandom here (which is used to generate large files
elsewhere in t1050), so that the files can be compressed well (which
keeps the run-time short).

The files are now large text files, not binary files. But since they
are larger than core.bigfilethreshold they are diagnosed as binary
by Git. For this reason, the 'git diff' tests that check the output
for "Binary files differ" still pass.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
V2:
 - use printf instead of test-genrandom
 - use test_cmp, not test_cmp_bin because the files are text now

Am 14.01.2015 um 12:27 schrieb Jeff King:
> On Tue, Jan 13, 2015 at 03:40:10PM -0800, Junio C Hamano wrote:
>> Are we depending on the binary-ness of these test files by the way?
>> The leading NUL \0 looked a bit strange to me.
> 
> I don't think so. We do not want to do a text diff, because that would
> overflow our GIT_ALLOC_LIMIT. But the core.bigfilethreshold check is
> what will make them binary, not the actual content. So a gigantic text
> file is arguably a better test of the feature in question.

Agreed. The files are text now.

 t/t1050-large.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f5a9119..f9f3d13 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -9,10 +9,10 @@ test_expect_success setup '
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
-	echo X | dd of=large1 bs=1k seek=2000 &&
-	echo X | dd of=large2 bs=1k seek=2000 &&
-	echo X | dd of=large3 bs=1k seek=2000 &&
-	echo Y | dd of=huge bs=1k seek=2500 &&
+	printf "%2000000s" X >large1 &&
+	cp large1 large2 &&
+	cp large1 large3 &&
+	printf "%2500000s" Y >huge &&
 	GIT_ALLOC_LIMIT=1500k &&
 	export GIT_ALLOC_LIMIT
 '
@@ -61,7 +61,7 @@ test_expect_success 'checkout a large file' '
 	large1=$(git rev-parse :large1) &&
 	git update-index --add --cacheinfo 100644 $large1 another &&
 	git checkout another &&
-	cmp large1 another ;# this must not be test_cmp
+	test_cmp large1 another
 '

 test_expect_success 'packsize limit' '
@@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose
object' '
 	test_create_repo packed &&
 	mv pack-* packed/.git/objects/pack &&
 	GIT_DIR=packed/.git git cat-file blob $SHA1 >actual &&
-	cmp huge actual
+	test_cmp huge actual
 '

 test_expect_success 'tar achiving' '
-- 
2.0.0.12.gbcf935e

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] t1050-large: generate large files without dd
  2015-01-14 20:28           ` [PATCH v2] t1050-large: generate large files without dd Johannes Sixt
@ 2015-01-14 21:00             ` Jeff King
  2015-01-14 21:17               ` Johannes Sixt
  2015-01-14 21:59               ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2015-01-14 21:00 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Git Mailing List

On Wed, Jan 14, 2015 at 09:28:56PM +0100, Johannes Sixt wrote:

> For some unknown reason, the dd on my Windows box segfaults randomly,
> but since recently, it does so much more often than it used to, which
> makes running the test suite burdensome.
> 
> Use printf to write large files instead of dd. To emphasize that three
> of the large blobs are exact copies, use cp to allocate them.
> 
> The new code makes the files a bit smaller, and they are not sparse
> anymore, but the tests do not depend on these properties. We do not want
> to use test-genrandom here (which is used to generate large files
> elsewhere in t1050), so that the files can be compressed well (which
> keeps the run-time short).

Thanks, this version looks good to me.

> The files are now large text files, not binary files. But since they
> are larger than core.bigfilethreshold they are diagnosed as binary
> by Git. For this reason, the 'git diff' tests that check the output
> for "Binary files differ" still pass.

I was less concerned with tests not passing, as much as tests ending up
testing nothing (which is very hard to test automatically, as you would
have to recreate the original bug!). But I think it is fine, as text is
more likely to get malloc'd than a binary (and these tests are really
about making sure we avoid huge mallocs).

> @@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose
> object' '

Funny wrapping here. I imagine Junio can manage to apply it anyway, but
you may want to check your MUA settings.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] t1050-large: generate large files without dd
  2015-01-14 21:00             ` Jeff King
@ 2015-01-14 21:17               ` Johannes Sixt
  2015-01-14 21:59               ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Johannes Sixt @ 2015-01-14 21:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

Am 14.01.2015 um 22:00 schrieb Jeff King:
>> @@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose
>> object' '
> 
> Funny wrapping here. I imagine Junio can manage to apply it anyway, but
> you may want to check your MUA settings.

Argh! Forgot to switch off line wrapping. Here is a hopefully
correct version.

--- 8< ---
[PATCH v2] t1050-large: generate large files without dd

For some unknown reason, the dd on my Windows box segfaults randomly,
but since recently, it does so much more often than it used to, which
makes running the test suite burdensome.

Use printf to write large files instead of dd. To emphasize that three
of the large blobs are exact copies, use cp to allocate them.

The new code makes the files a bit smaller, and they are not sparse
anymore, but the tests do not depend on these properties. We do not want
to use test-genrandom here (which is used to generate large files
elsewhere in t1050), so that the files can be compressed well (which
keeps the run-time short).

The files are now large text files, not binary files. But since they
are larger than core.bigfilethreshold they are diagnosed as binary
by Git. For this reason, the 'git diff' tests that check the output
for "Binary files differ" still pass.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---

 t/t1050-large.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f5a9119..f9f3d13 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -9,10 +9,10 @@ test_expect_success setup '
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
-	echo X | dd of=large1 bs=1k seek=2000 &&
-	echo X | dd of=large2 bs=1k seek=2000 &&
-	echo X | dd of=large3 bs=1k seek=2000 &&
-	echo Y | dd of=huge bs=1k seek=2500 &&
+	printf "%2000000s" X >large1 &&
+	cp large1 large2 &&
+	cp large1 large3 &&
+	printf "%2500000s" Y >huge &&
 	GIT_ALLOC_LIMIT=1500k &&
 	export GIT_ALLOC_LIMIT
 '
@@ -61,7 +61,7 @@ test_expect_success 'checkout a large file' '
 	large1=$(git rev-parse :large1) &&
 	git update-index --add --cacheinfo 100644 $large1 another &&
 	git checkout another &&
-	cmp large1 another ;# this must not be test_cmp
+	test_cmp large1 another
 '
 
 test_expect_success 'packsize limit' '
@@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose object' '
 	test_create_repo packed &&
 	mv pack-* packed/.git/objects/pack &&
 	GIT_DIR=packed/.git git cat-file blob $SHA1 >actual &&
-	cmp huge actual
+	test_cmp huge actual
 '
 
 test_expect_success 'tar achiving' '
-- 
2.0.0.12.gbcf935e

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] t1050-large: generate large files without dd
  2015-01-14 21:00             ` Jeff King
  2015-01-14 21:17               ` Johannes Sixt
@ 2015-01-14 21:59               ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2015-01-14 21:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Git Mailing List

Jeff King <peff@peff.net> writes:

>> @@ -162,7 +162,7 @@ test_expect_success 'pack-objects with large loose
>> object' '
>
> Funny wrapping here. I imagine Junio can manage to apply it anyway, but
> you may want to check your MUA settings.

Thanks for a warning; luckily this was the only breakage and the
result looks good.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-01-14 21:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13 17:36 [PATCH] t1050-large: replace dd by test-genrandom Johannes Sixt
2015-01-13 18:56 ` Junio C Hamano
2015-01-13 19:55   ` Johannes Sixt
2015-01-13 21:47 ` Jeff King
2015-01-13 22:33   ` Johannes Sixt
2015-01-13 22:38     ` Jeff King
2015-01-13 23:40       ` Junio C Hamano
2015-01-14 11:27         ` Jeff King
2015-01-14 17:31           ` Junio C Hamano
2015-01-14 20:28           ` [PATCH v2] t1050-large: generate large files without dd Johannes Sixt
2015-01-14 21:00             ` Jeff King
2015-01-14 21:17               ` Johannes Sixt
2015-01-14 21:59               ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.