git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* propagating repo corruption across clone
@ 2013-03-24 18:31 Jeff King
  2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Jeff King @ 2013-03-24 18:31 UTC (permalink / raw)
  To: git

I saw this post-mortem on recent disk corruption seen on git.kde.org:

  http://jefferai.org/2013/03/24/too-perfect-a-mirror/

The interesting bit to me is that object corruption propagated across a
clone (and oddly, that --mirror made complaints about corruption go
away). I did a little testing and found some curious results (this ended
up long; skip to the bottom for my conclusions).

Here's a fairly straight-forward corruption recipe:

-- >8 --
obj_to_file() {
  echo ".git/objects/$(echo $1 | sed 's,..,&/,')"
}

# corrupt a single byte inside the object
corrupt_object() {
  fn=$(obj_to_file "$1") &&
  chmod +w "$fn" &&
  printf '\0' | dd of="$fn" bs=1 conv=notrunc seek=10
}

git init repo &&
cd repo &&
echo content >file &&
git add file &&
git commit -m one &&
corrupt_object $(git rev-parse HEAD:file)
-- 8< --

report git clone . fast-local
report git clone --no-local . no-local
report git -c transfer.unpackLimit=1 clone --no-local . index-pack
report git -c fetch.fsckObjects=1 clone --no-local . fsck

and here is how clone reacts in a few situations:

  $ git clone --bare . local-bare && echo WORKED
  Cloning into bare repository 'local-bare'...
  done.
  WORKED

We don't notice the problem during the transport phase, which is to be
expected; we're using the fast "just hardlink it" code path. So that's
OK.

  $ git clone . local-tree && echo WORKED
  Cloning into 'local-tree'...
  done.
  error: inflate: data stream error (invalid distance too far back)
  error: unable to unpack d95f3ad14dee633a758d2e331151e950dd13e4ed header
  WORKED

We _do_ see a problem during the checkout phase, but we don't propagate
a checkout failure to the exit code from clone.  That is bad in general,
and should probably be fixed. Though it would never find corruption of
older objects in the history, anyway, so checkout should not be relied
on for robustness.

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: error: inflate: data stream error (invalid distance too far back)
  remote: error: unable to unpack d95f3ad14dee633a758d2e331151e950dd13e4ed header
  remote: error: inflate: data stream error (invalid distance too far back)
  remote: fatal: loose object d95f3ad14dee633a758d2e331151e950dd13e4ed (stored in ./objects/d9/5f3ad14dee633a758d2e331151e950dd13e4ed) is corrupt
  error: git upload-pack: git-pack-objects died with error.
  fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
  remote: aborting due to possible repository corruption on the remote side.
  fatal: early EOF
  fatal: index-pack failed

Here we detect the error. It's noticed by pack-objects on the remote
side as it tries to put the bogus object into a pack. But what if we
already have a pack that's been corrupted, and pack-objects is just
pushing out entries without doing any recompression?

Let's change our corrupt_object to:

  corrupt_object() {
    git repack -ad &&
    pack=`echo .git/objects/pack/*.pack` &&
    chmod +w "$pack" &&
    printf '\0' | dd of="$pack" bs=1 conv=notrunc seek=175
  }

and try again:

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  error: inflate: data stream error (invalid distance too far back)
  fatal: pack has bad object at offset 169: inflate returned -3
  fatal: index-pack failed

Great, we still notice the problem in unpack-objects on the receiving
end. But what if there's a more subtle corruption, where filesystem
corruption points the directory entry for one object at the inode of
another. Like:

  corrupt_object() {
    corrupt=$(echo corrupted | git hash-object -w --stdin) &&
    mv -f $(obj_to_file $corrupt) $(obj_to_file $1)
  }

This is going to be more subtle, because the object in the packfile is
self-consistent but the object graph as a whole is broken.

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 0 (delta 0)
  Receiving objects: 100% (3/3), done.
  error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
  WORKED

Like the --local cases earlier, we notice the missing object during the
checkout phase, but do not correctly propagate the error.

We do not notice the sha1 mis-match on the sending side (which we could,
if we checked the sha1 as we were sending). We do not notice the broken
object graph during the receive process either. I would have expected
check_everything_connected to handle this, but we don't actually call it
during clone! If you do this:

  $ git init non-local && cd non-local && git fetch ..
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  Unpacking objects: 100% (3/3), done.
  fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
  error: .. did not send all necessary objects

we do notice.

And one final check:

  $ git -c transfer.fsckobjects=1 clone --no-local . fsck
  Cloning into 'fsck'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  Receiving objects: 100% (3/3), done.
  error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
  fatal: object of unexpected type
  fatal: index-pack failed

Fscking the incoming objects does work, but of course it comes at a cost
in the normal case (for linux-2.6, I measured an increase in CPU time
with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
it is probably overkill for finding corruption; index-pack already
recognizes bit corruption inside an object, and
check_everything_connected can detect object graph problems much more
cheaply.

One thing I didn't check is bit corruption inside a packed object that
still correctly zlib inflates. check_everything_connected will end up
reading all of the commits and trees (to walk them), but not the blobs.
And I don't think that we explicitly re-sha1 every incoming object (only
if we detect a possible collision). So it may be that
transfer.fsckObjects would save us there (it also introduces new
problems if there are ignorable warnings in the objects you receive,
like zero-padded trees).

So I think at the very least we should:

  1. Make sure clone propagates errors from checkout to the final exit
     code.

  2. Teach clone to run check_everything_connected.

I don't have details on the KDE corruption, or why it wasn't detected
(if it was one of the cases I mentioned above, or a more subtle issue).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-24 18:31 propagating repo corruption across clone Jeff King
@ 2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
  2013-03-24 19:23   ` Jeff King
  2013-03-24 19:16 ` Ilari Liusvaara
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2013-03-24 19:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sun, Mar 24, 2013 at 7:31 PM, Jeff King <peff@peff.net> wrote:
>
> I don't have details on the KDE corruption, or why it wasn't detected
> (if it was one of the cases I mentioned above, or a more subtle issue).

One thing worth mentioning is this part of the article:

"Originally, mirrored clones were in fact not used, but non-mirrored
clones on the anongits come with their own set of issues, and are more
prone to getting stopped up by legitimate, authenticated force pushes,
ref deletions, and so on – and if we set the refspec such that those
are allowed through silently, we don’t gain much. "

So the only reason they were even using --mirror was because they were
running into those problems with fetching.

So aside from the problems with --mirror I think we should have
something that updates your local refs to be exactly like they are on
the other end, i.e. deletes some, non-fast-forwards others etc.
(obviously behind several --force options and so on). But such an
option *wouldn't* accept corrupted objects.

That would give KDE and other parties a safe way to do exact repo
mirroring like this, wouldn't protect them from someone maliciously
deleting all the refs in all the repos, but would prevent FS
corruption from propagating.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-24 18:31 propagating repo corruption across clone Jeff King
  2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
@ 2013-03-24 19:16 ` Ilari Liusvaara
  2013-03-25 20:01 ` Junio C Hamano
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
  3 siblings, 0 replies; 60+ messages in thread
From: Ilari Liusvaara @ 2013-03-24 19:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sun, Mar 24, 2013 at 02:31:33PM -0400, Jeff King wrote:
> 
> Fscking the incoming objects does work, but of course it comes at a cost
> in the normal case (for linux-2.6, I measured an increase in CPU time
> with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
> it is probably overkill for finding corruption; index-pack already
> recognizes bit corruption inside an object, and
> check_everything_connected can detect object graph problems much more
> cheaply.

AFAIK, standard checks index-pack has to do + checking that the object
graph has no broken links (and every ref points to something valid) will
catch everything except:

- SHA-1 collisions between corrupt objects and clean ones.
- Corrupted refs (that still point to something valid).
- "Born-corrupted" objects.

> One thing I didn't check is bit corruption inside a packed object that
> still correctly zlib inflates. check_everything_connected will end up
> reading all of the commits and trees (to walk them), but not the blobs.

Checking that everything is connected will (modulo SHA-1 collisions) save
you there, at least if packv3 is used as transport stream.

-Ilari

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
@ 2013-03-24 19:23   ` Jeff King
  2013-03-25 13:43     ` Jeff Mitchell
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-24 19:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King <peff@peff.net> wrote:
> >
> > I don't have details on the KDE corruption, or why it wasn't detected
> > (if it was one of the cases I mentioned above, or a more subtle issue).
> 
> One thing worth mentioning is this part of the article:
> 
> "Originally, mirrored clones were in fact not used, but non-mirrored
> clones on the anongits come with their own set of issues, and are more
> prone to getting stopped up by legitimate, authenticated force pushes,
> ref deletions, and so on – and if we set the refspec such that those
> are allowed through silently, we don’t gain much. "
> 
> So the only reason they were even using --mirror was because they were
> running into those problems with fetching.

I think the --mirror thing is a red herring. It should not be changing
the transport used, and that is the part of git that is expected to
catch such corruption.

But I haven't seen exactly what the corruption is, nor exactly what
commands they used to clone. I've invited the blog author to give more
details in this thread.

> So aside from the problems with --mirror I think we should have
> something that updates your local refs to be exactly like they are on
> the other end, i.e. deletes some, non-fast-forwards others etc.
> (obviously behind several --force options and so on). But such an
> option *wouldn't* accept corrupted objects.

That _should_ be how "git fetch --prune +refs/*:refs/*" behaves (and
that refspec is set up when you use "--mirror"; we should probably have
it turn on --prune, too, but I do not think you can do so via a config
option currently).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-24 19:23   ` Jeff King
@ 2013-03-25 13:43     ` Jeff Mitchell
  2013-03-25 14:56       ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff Mitchell @ 2013-03-25 13:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð, git

On Sun, Mar 24, 2013 at 3:23 PM, Jeff King <peff@peff.net> wrote:
> On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King <peff@peff.net> wrote:
>> >
>> > I don't have details on the KDE corruption, or why it wasn't detected
>> > (if it was one of the cases I mentioned above, or a more subtle issue).
>>
>> One thing worth mentioning is this part of the article:
>>
>> "Originally, mirrored clones were in fact not used, but non-mirrored
>> clones on the anongits come with their own set of issues, and are more
>> prone to getting stopped up by legitimate, authenticated force pushes,
>> ref deletions, and so on – and if we set the refspec such that those
>> are allowed through silently, we don’t gain much. "
>>
>> So the only reason they were even using --mirror was because they were
>> running into those problems with fetching.

With a normal fetch. We actually *wanted* things like force updates
and ref deletions to propagate, because we have not just Gitolite's
checks but our own checks on the servers, and wanted that to be
considered the authenticated source. Besides just daily use and
preventing cruft, we wanted to ensure that such actions propagated so
that if a branch was removed because it contained personal
information, accidental commits, or a security issue (for instance)
that the branch was removed on the anongits too, within a timely
fashion.

> I think the --mirror thing is a red herring. It should not be changing
> the transport used, and that is the part of git that is expected to
> catch such corruption.
>
> But I haven't seen exactly what the corruption is, nor exactly what
> commands they used to clone. I've invited the blog author to give more
> details in this thread.

The syncing was performed via a clone with git clone --mirror (and a
git:// URL) and updates with git remote update.

So I should mention that my experiments after the fact were using
local paths, but with --no-hardlinks. If you're saying that the
transport is where corruption is supposed to be caught, then it's
possible that we shouldn't see corruption propagate on an initial
mirror clone across git://, and that something else was responsible
for the trouble we saw with the repositories that got cloned
after-the-fact. But then I'd argue that this is non-obvious. In
particular, when using --no-hardlinks, I wouldn't expect that behavior
to be different with a straight path and with file://.

Something else: apparently one of my statements prompted joeyh to
think about potential issues with backing up live git repos
(http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
Looking at that post made me realize that, when we were doing our
initial thinking about the system three years ago, we made an
assumption that, in fact, taking a .tar.gz of a repo as it's in the
process of being written to or garbage collected or repacked could be
problematic. This isn't a totally baseless assumption, as I once had a
git repository that I was in the process of updating when I had a
sudden power outage that suffered corruption. (It could totally have
been the filesystem, of course, although it was a journaled file
system.)

So, we decided to use Git's built-in capabilities of consistency
checking to our advantage (with, as it turns out, a flaw in our
implementation). But the question remains: are we wrong about thinking
that rsyncing or tar.gz live repositories in the middle of being
pushed to/gc'd/repacked could result in a bogus backup?

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 13:43     ` Jeff Mitchell
@ 2013-03-25 14:56       ` Jeff King
  2013-03-25 15:31         ` Duy Nguyen
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-25 14:56 UTC (permalink / raw)
  To: Jeff Mitchell; +Cc: Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 09:43:23AM -0400, Jeff Mitchell wrote:

> > But I haven't seen exactly what the corruption is, nor exactly what
> > commands they used to clone. I've invited the blog author to give more
> > details in this thread.
> 
> The syncing was performed via a clone with git clone --mirror (and a
> git:// URL) and updates with git remote update.

OK. That should be resilient to corruption, then[1].

> So I should mention that my experiments after the fact were using
> local paths, but with --no-hardlinks.

Yeah, we will do a direct copy in that case, and there is nothing to
prevent corruption propagating.

> If you're saying that the transport is where corruption is supposed to
> be caught, then it's possible that we shouldn't see corruption
> propagate on an initial mirror clone across git://, and that something
> else was responsible for the trouble we saw with the repositories that
> got cloned after-the-fact.

Right. Either it was something else, or there is a bug in git's
protections (but I haven't been able to reproduce anything likely).

> But then I'd argue that this is non-obvious. In particular, when using
> --no-hardlinks, I wouldn't expect that behavior to be different with a
> straight path and with file://.

There are basically three levels of transport that can be used on a
local machine:

  1. Hard-linking (very fast, no redundancy).

  2. Byte-for-byte copy (medium speed, makes a separate copy of the
     data, but does not check the integrity of the original).

  3. Regular git transport, creating a pack (slowest, but should include
     redundancy checks).

Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
think the documentation in "git clone" could use some improvement in
that area.

> Something else: apparently one of my statements prompted joeyh to
> think about potential issues with backing up live git repos
> (http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
> Looking at that post made me realize that, when we were doing our
> initial thinking about the system three years ago, we made an
> assumption that, in fact, taking a .tar.gz of a repo as it's in the
> process of being written to or garbage collected or repacked could be
> problematic. This isn't a totally baseless assumption, as I once had a
> git repository that I was in the process of updating when I had a
> sudden power outage that suffered corruption. (It could totally have
> been the filesystem, of course, although it was a journaled file
> system.)

Yes, if you take a snapshot of a repository with rsync or tar, it may be
in an inconsistent state. Using the git protocol should always be
robust, but if you want to do it with other tools, you need to follow a
particular order:

  1. copy the refs (refs/ and packed-refs) first

  2. copy everything else (including object/)

That covers the case where somebody is pushing an update simultaneously
(you _may_ get extra objects in step 2 that they have not yet
referenced, but you will never end up with a case where you are
referencing objects that you did not yet transfer).

If it's possible that the repository might be repacked during your
transfer, I think the issue a bit trickier, as there's a moment where
the new packfile is renamed into place, and then the old ones are
deleted. Depending on the timing and how your readdir() implementation
behaves with respect to new and deleted entries, it might be possible to
miss both the new one appearing and the old ones disappearing. This is
quite a tight race to catch, I suspect, but if you were to rsync
objects/pack twice in a row, that would be sufficient.

For pruning, I think you could run into the opposite situation: you grab
the refs, somebody updates them with a history rewind (or branch
deletion), then somebody prunes and objects go away. Again, the timing
on this race is quite tight and it's unlikely in practice. I'm not sure
of a simple way to eliminate it completely.

Yet another option is to simply rsync the whole thing and then "git
fsck" the result. If it's not 100% good, just re-run the rsync. This is
simple and should be robust, but is more CPU intensive (you'll end up
re-checking all of the data on each update).

> So, we decided to use Git's built-in capabilities of consistency
> checking to our advantage (with, as it turns out, a flaw in our
> implementation). But the question remains: are we wrong about thinking
> that rsyncing or tar.gz live repositories in the middle of being
> pushed to/gc'd/repacked could result in a bogus backup?

No, I think you are right. If you do the refs-then-objects ordering,
that saves you from most of it, but I do think there are still some
races that exist during repacking or pruning.

-Peff

[1] I mentioned that clone-over-git:// is resilient to corruption. I
    think that is true for bit corruption, but my tests did show that we
    are not as careful about checking graph connectivity during clone as
    we are with fetch. The circumstances in which that would matter are
    quite unlikely, though.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 14:56       ` Jeff King
@ 2013-03-25 15:31         ` Duy Nguyen
  2013-03-25 15:56           ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2013-03-25 15:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Jeff Mitchell, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 9:56 PM, Jeff King <peff@peff.net> wrote:
> There are basically three levels of transport that can be used on a
> local machine:
>
>   1. Hard-linking (very fast, no redundancy).
>
>   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>      data, but does not check the integrity of the original).
>
>   3. Regular git transport, creating a pack (slowest, but should include
>      redundancy checks).
>
> Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
> think the documentation in "git clone" could use some improvement in
> that area.

Not only git-clone. How git-fetch and git-push verify the new pack
should also be documented. I don't think many people outside the
contributor circle know what is done (and maybe how) when data is
received from outside.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 15:31         ` Duy Nguyen
@ 2013-03-25 15:56           ` Jeff King
  2013-03-25 16:32             ` Jeff Mitchell
  2013-03-26  1:06             ` Duy Nguyen
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 15:56 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff Mitchell, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King <peff@peff.net> wrote:
> > There are basically three levels of transport that can be used on a
> > local machine:
> >
> >   1. Hard-linking (very fast, no redundancy).
> >
> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
> >      data, but does not check the integrity of the original).
> >
> >   3. Regular git transport, creating a pack (slowest, but should include
> >      redundancy checks).
> >
> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
> > think the documentation in "git clone" could use some improvement in
> > that area.
> 
> Not only git-clone. How git-fetch and git-push verify the new pack
> should also be documented. I don't think many people outside the
> contributor circle know what is done (and maybe how) when data is
> received from outside.

I think it's less of a documentation issue there, though, because they
_only_ do (3). There is no option to do anything else, so there is
nothing to warn the user about in terms of tradeoffs.

I agree that in general git's handling of corruption could be documented
somewhere, but I'm not sure where.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 15:56           ` Jeff King
@ 2013-03-25 16:32             ` Jeff Mitchell
  2013-03-25 20:07               ` Jeff King
  2013-03-26  1:06             ` Duy Nguyen
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff Mitchell @ 2013-03-25 16:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 11:56 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King <peff@peff.net> wrote:
>> > There are basically three levels of transport that can be used on a
>> > local machine:
>> >
>> >   1. Hard-linking (very fast, no redundancy).
>> >
>> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>> >      data, but does not check the integrity of the original).
>> >
>> >   3. Regular git transport, creating a pack (slowest, but should include
>> >      redundancy checks).
>> >
>> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
>> > think the documentation in "git clone" could use some improvement in
>> > that area.
>>
>> Not only git-clone. How git-fetch and git-push verify the new pack
>> should also be documented. I don't think many people outside the
>> contributor circle know what is done (and maybe how) when data is
>> received from outside.
>
> I think it's less of a documentation issue there, though, because they
> _only_ do (3). There is no option to do anything else, so there is
> nothing to warn the user about in terms of tradeoffs.
>
> I agree that in general git's handling of corruption could be documented
> somewhere, but I'm not sure where.

Hi there,

First of all, thanks for the analysis, it's much appreciated. It's
good to know that we weren't totally off-base in thinking that a naive
copy may be out of sync, as small as the chance are (certainly we
wouldn't have known the right ordering).

I think what was conflating the issue in my testing is that with
--mirror it implies --bare, so there would be checking of the objects
when the working tree was being created, hence --mirror won't show the
error a normal clone will -- it's not a transport question, it's just
a matter of the normal clone doing more and so having more data run
through checks.

However, there are still problems. For blob corruptions, even in this
--no-hardlinks, non --mirror case where an error was found, the exit
code from the clone was 0. I can see this tripping up all sorts of
automated scripts or repository GUIs that ignore the output and only
check the error code, which is not an unreasonable thing to do.

For commit corruptions, the --no-hardlinks, non --mirror case refused
to create the new repository and exited with an error code of 128. The
--no-hardlinks, --mirror case spewed errors to the console, yet
*still* created the new clone *and* returned an error code of zero.

It seems that when there is an "error" as opposed to a "fatal" it
doesn't affect the status code on a clone; I'd argue that it ought to.
If Git knows that the source repository has problems, it ought to be
reflected in the status code so that scripts performing clones have a
normal way to detect this and alert a user/sysadmin/whoever. Even if a
particular cloning method doesn't perform all sanity checks, if it
finds something in the sanity checks it *does* perform, this should be
trumpeted, loudly, regardless of transport mechanism and regardless of
whether a user is watching the process or a script is.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-24 18:31 propagating repo corruption across clone Jeff King
  2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
  2013-03-24 19:16 ` Ilari Liusvaara
@ 2013-03-25 20:01 ` Junio C Hamano
  2013-03-25 20:05   ` Jeff King
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
  3 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2013-03-25 20:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We _do_ see a problem during the checkout phase, but we don't propagate
> a checkout failure to the exit code from clone.  That is bad in general,
> and should probably be fixed. Though it would never find corruption of
> older objects in the history, anyway, so checkout should not be relied
> on for robustness.

It is obvious that we should exit with non-zero status when we see a
failure from the checkout, but do we want to nuke the resulting
repository as in the case of normal transport failure?  A checkout
failure might be due to being under quota for object store but
running out of quota upon populating the working tree, in which case
we probably do not want to.

> We do not notice the sha1 mis-match on the sending side (which we could,
> if we checked the sha1 as we were sending). We do not notice the broken
> object graph during the receive process either. I would have expected
> check_everything_connected to handle this, but we don't actually call it
> during clone! If you do this:
>
>   $ git init non-local && cd non-local && git fetch ..
>   remote: Counting objects: 3, done.
>   remote: Total 3 (delta 0), reused 3 (delta 0)
>   Unpacking objects: 100% (3/3), done.
>   fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
>   error: .. did not send all necessary objects
>
> we do notice.

Yes, it is OK to add connectedness check to "git clone".

> And one final check:
>
>   $ git -c transfer.fsckobjects=1 clone --no-local . fsck
>   Cloning into 'fsck'...
>   remote: Counting objects: 3, done.
>   remote: Total 3 (delta 0), reused 3 (delta 0)
>   Receiving objects: 100% (3/3), done.
>   error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
>   fatal: object of unexpected type
>   fatal: index-pack failed
>
> Fscking the incoming objects does work, but of course it comes at a cost
> in the normal case (for linux-2.6, I measured an increase in CPU time
> with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
> it is probably overkill for finding corruption; index-pack already
> recognizes bit corruption inside an object, and
> check_everything_connected can detect object graph problems much more
> cheaply.

> One thing I didn't check is bit corruption inside a packed object that
> still correctly zlib inflates. check_everything_connected will end up
> reading all of the commits and trees (to walk them), but not the blobs.

Correct.

> So I think at the very least we should:
>
>   1. Make sure clone propagates errors from checkout to the final exit
>      code.
>
>   2. Teach clone to run check_everything_connected.

I agree with both but with a slight reservation on the former one
(see above).

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 20:01 ` Junio C Hamano
@ 2013-03-25 20:05   ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Mar 25, 2013 at 01:01:59PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We _do_ see a problem during the checkout phase, but we don't propagate
> > a checkout failure to the exit code from clone.  That is bad in general,
> > and should probably be fixed. Though it would never find corruption of
> > older objects in the history, anyway, so checkout should not be relied
> > on for robustness.
> 
> It is obvious that we should exit with non-zero status when we see a
> failure from the checkout, but do we want to nuke the resulting
> repository as in the case of normal transport failure?  A checkout
> failure might be due to being under quota for object store but
> running out of quota upon populating the working tree, in which case
> we probably do not want to.

I'm just running through my final tests on a large-ish patch series
which deals with this (among other issues). I had the same thought,
though we do already die on a variety of checkout errors. I left it as a
die() for now, but I think we should potentially address it with a
further patch.

> >   $ git init non-local && cd non-local && git fetch ..
> >   remote: Counting objects: 3, done.
> >   remote: Total 3 (delta 0), reused 3 (delta 0)
> >   Unpacking objects: 100% (3/3), done.
> >   fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
> >   error: .. did not send all necessary objects
> >
> > we do notice.
> 
> Yes, it is OK to add connectedness check to "git clone".

That's in my series, too. Unfortunately, in the local clone case, it
slows down the clone considerably (since we otherwise would not have to
traverse the objects at all).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 16:32             ` Jeff Mitchell
@ 2013-03-25 20:07               ` Jeff King
  2013-03-26 13:43                 ` Jeff Mitchell
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:07 UTC (permalink / raw)
  To: Jeff Mitchell; +Cc: Duy Nguyen, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:

> I think what was conflating the issue in my testing is that with
> --mirror it implies --bare, so there would be checking of the objects
> when the working tree was being created, hence --mirror won't show the
> error a normal clone will -- it's not a transport question, it's just
> a matter of the normal clone doing more and so having more data run
> through checks.

Exactly.

> However, there are still problems. For blob corruptions, even in this
> --no-hardlinks, non --mirror case where an error was found, the exit
> code from the clone was 0. I can see this tripping up all sorts of
> automated scripts or repository GUIs that ignore the output and only
> check the error code, which is not an unreasonable thing to do.

Yes, this is a bug. I'll post a series in a minute which fixes it.

> For commit corruptions, the --no-hardlinks, non --mirror case refused
> to create the new repository and exited with an error code of 128. The
> --no-hardlinks, --mirror case spewed errors to the console, yet
> *still* created the new clone *and* returned an error code of zero.

I wasn't able to reproduce this; can you post a succint test case?

> It seems that when there is an "error" as opposed to a "fatal" it
> doesn't affect the status code on a clone; I'd argue that it ought to.

Agreed completely. The current behavior is buggy.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/9] corrupt object potpourri
  2013-03-24 18:31 propagating repo corruption across clone Jeff King
                   ` (2 preceding siblings ...)
  2013-03-25 20:01 ` Junio C Hamano
@ 2013-03-25 20:14 ` Jeff King
  2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
                     ` (8 more replies)
  3 siblings, 9 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I started these patches with the intent of improving clone's behavior
on corrupt objects, but my testing uncovered some other nastiness,
including two infinite loops in the streaming code!. Yikes.

I think 1-7 are good. We might want to tweak the die() behavior of patch
8, but I think it should come on top. Patch 9 has some pretty ugly
performance implications.

At the end of the series, all of the introduced tests pass except for
one, which is that "git clone" may silently write out a bogus working
tree entry. I haven't tracked that one down yet.

  [1/9]: stream_blob_to_fd: detect errors reading from stream
  [2/9]: check_sha1_signature: check return value from read_istream
  [3/9]: read_istream_filtered: propagate read error from upstream
  [4/9]: avoid infinite loop in read_istream_loose
  [5/9]: add test for streaming corrupt blobs
  [6/9]: streaming_write_entry: propagate streaming errors
  [7/9]: add tests for cloning corrupted repositories
  [8/9]: clone: die on errors from unpack_trees
  [9/9]: clone: run check_everything_connected

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
@ 2013-03-25 20:16   ` Jeff King
  2013-03-26 21:27     ` Junio C Hamano
  2013-03-25 20:17   ` [PATCH 2/9] check_sha1_signature: check return value from read_istream Jeff King
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We call read_istream, but never check its return value for
errors. This can lead to us looping infinitely, as we just
keep trying to write "-1" bytes (and we do not notice the
error, as we simply check that write_in_full reports the
same number of bytes we fed it, which of course is also -1).

Signed-off-by: Jeff King <peff@peff.net>
---
No test yet, as my method for triggering this causes _another_ infinite
loop. So the test comes after the fixes, to avoid infinite loops when
bisecting the history later. :)

 streaming.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/streaming.c b/streaming.c
index 4d978e5..f4126a7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -514,6 +514,8 @@ int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *f
 		ssize_t wrote, holeto;
 		ssize_t readlen = read_istream(st, buf, sizeof(buf));
 
+		if (readlen < 0)
+			goto close_and_exit;
 		if (!readlen)
 			break;
 		if (can_seek && sizeof(buf) == readlen) {
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/9] check_sha1_signature: check return value from read_istream
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
  2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
@ 2013-03-25 20:17   ` Jeff King
  2013-03-25 20:18   ` [PATCH 3/9] read_istream_filtered: propagate read error from upstream Jeff King
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

It's possible for read_istream to return an error, in which
case we just end up in an infinite loop (aside from EOF, we
do not even look at the result, but just feed it straight
into our running hash).

Signed-off-by: Jeff King <peff@peff.net>
---
I didn't actually trigger this code path in any of my tests, but I
audited all of the callers of read_istream after the last patch, and
noticed this one (the rest looked fine to me).

 sha1_file.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index 16967d3..0b99f33 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1266,6 +1266,10 @@ int check_sha1_signature(const unsigned char *sha1, void *map,
 		char buf[1024 * 16];
 		ssize_t readlen = read_istream(st, buf, sizeof(buf));
 
+		if (readlen < 0) {
+			close_istream(st);
+			return -1;
+		}
 		if (!readlen)
 			break;
 		git_SHA1_Update(&c, buf, readlen);
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/9] read_istream_filtered: propagate read error from upstream
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
  2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
  2013-03-25 20:17   ` [PATCH 2/9] check_sha1_signature: check return value from read_istream Jeff King
@ 2013-03-25 20:18   ` Jeff King
  2013-03-25 20:21   ` [PATCH 4/9] avoid infinite loop in read_istream_loose Jeff King
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:18 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The filter istream pulls data from an "upstream" stream,
running it through a filter function. However, we did not
properly notice when the upstream filter yielded an error,
and just returned what we had read. Instead, we should
propagate the error.

Signed-off-by: Jeff King <peff@peff.net>
---
I don't know if we should be preserving fs->i_end from getting a
negative value. I would think the internal state of the istream after an
error is undefined.

 streaming.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/streaming.c b/streaming.c
index f4126a7..f4ab12b 100644
--- a/streaming.c
+++ b/streaming.c
@@ -237,7 +237,7 @@ static read_method_decl(filtered)
 		if (!fs->input_finished) {
 			fs->i_end = read_istream(fs->upstream, fs->ibuf, FILTER_BUFFER);
 			if (fs->i_end < 0)
-				break;
+				return -1;
 			if (fs->i_end)
 				continue;
 		}
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/9] avoid infinite loop in read_istream_loose
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (2 preceding siblings ...)
  2013-03-25 20:18   ` [PATCH 3/9] read_istream_filtered: propagate read error from upstream Jeff King
@ 2013-03-25 20:21   ` Jeff King
  2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The read_istream_loose function loops on inflating a chunk of data
from an mmap'd loose object. We end the loop when we run out
of space in our output buffer, or if we see a zlib error.

We need to treat Z_BUF_ERROR specially, though, as it is not
fatal; it is just zlib's way of telling us that we need to
either feed it more input or give it more output space. It
is perfectly normal for us to hit this when we are at the
end of our buffer.

However, we may also get Z_BUF_ERROR because we have run out
of input. In a well-formed object, this should not happen,
because we have fed the whole mmap'd contents to zlib. But
if the object is truncated or corrupt, we will loop forever,
never giving zlib any more data, but continuing to ask it to
inflate.

We can fix this by considering it an error when zlib returns
Z_BUF_ERROR but we still have output space left (which means
it must want more input, which we know is a truncation
error). It would not be sufficient to just check whether
zlib had consumed all the input at the start of the loop, as
it might still want to generate output from what is in its
internal state.

Signed-off-by: Jeff King <peff@peff.net>
---
The read_istream_pack_non_delta function does not suffer from the same
issue, because it continually feeds more data via use_pack(). Although
it may run into problems if it reads to the very end of a pack. I also
didn't audit the other zlib code paths for similar problems; we may want
to do that.

 streaming.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/streaming.c b/streaming.c
index f4ab12b..cabcd9d 100644
--- a/streaming.c
+++ b/streaming.c
@@ -309,7 +309,7 @@ static read_method_decl(loose)
 			st->z_state = z_done;
 			break;
 		}
-		if (status != Z_OK && status != Z_BUF_ERROR) {
+		if (status != Z_OK && (status != Z_BUF_ERROR || total_read < sz)) {
 			git_inflate_end(&st->z);
 			st->z_state = z_error;
 			return -1;
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 5/9] add test for streaming corrupt blobs
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (3 preceding siblings ...)
  2013-03-25 20:21   ` [PATCH 4/9] avoid infinite loop in read_istream_loose Jeff King
@ 2013-03-25 20:21   ` Jeff King
  2013-03-25 21:10     ` Jonathan Nieder
  2013-03-27 20:27     ` Jeff King
  2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
                     ` (3 subsequent siblings)
  8 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We do not have many tests for handling corrupt objects. This
new test at least checks that we detect a byte error in a
corrupt blob object while streaming it out with cat-file.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1060-object-corruption.sh | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100755 t/t1060-object-corruption.sh

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
new file mode 100755
index 0000000..d36994a
--- /dev/null
+++ b/t/t1060-object-corruption.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+test_description='see how we handle various forms of corruption'
+. ./test-lib.sh
+
+# convert "1234abcd" to ".git/objects/12/34abcd"
+obj_to_file() {
+	echo "$(git rev-parse --git-dir)/objects/$(git rev-parse "$1" | sed 's,..,&/,')"
+}
+
+# Convert byte at offset "$2" of object "$1" into '\0'
+corrupt_byte() {
+	obj_file=$(obj_to_file "$1") &&
+	chmod +w "$obj_file" &&
+	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
+}
+
+test_expect_success 'setup corrupt repo' '
+	git init bit-error &&
+	(
+		cd bit-error &&
+		test_commit content &&
+		corrupt_byte HEAD:content.t 10
+	)
+'
+
+test_expect_success 'streaming a corrupt blob fails' '
+	(
+		cd bit-error &&
+		test_must_fail git cat-file blob HEAD:content.t
+	)
+'
+
+test_done
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (4 preceding siblings ...)
  2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
@ 2013-03-25 20:22   ` Jeff King
  2013-03-25 21:35     ` Eric Sunshine
  2013-03-25 21:39     ` Jonathan Nieder
  2013-03-25 20:22   ` [PATCH 7/9] add tests for cloning corrupted repositories Jeff King
                     ` (2 subsequent siblings)
  8 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.

Signed-off-by: Jeff King <peff@peff.net>
---
 entry.c                      |  6 ++++--
 t/t1060-object-corruption.sh | 25 +++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/entry.c b/entry.c
index 17a6bcc..002b2f2 100644
--- a/entry.c
+++ b/entry.c
@@ -126,8 +126,10 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
 	fd = open_output_fd(path, ce, to_tempfile);
 	if (0 <= fd) {
 		result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
-		*fstat_done = fstat_output(fd, state, statbuf);
-		result = close(fd);
+		if (!result) {
+			*fstat_done = fstat_output(fd, state, statbuf);
+			result = close(fd);
+		}
 	}
 	if (result && 0 <= fd)
 		unlink(path);
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index d36994a..0792132 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -24,6 +24,15 @@ test_expect_success 'setup corrupt repo' '
 	)
 '
 
+test_expect_success 'setup repo with missing object' '
+	git init missing &&
+	(
+		cd missing &&
+		test_commit content &&
+		rm -f "$(obj_to_file HEAD:content.t)"
+	)
+'
+
 test_expect_success 'streaming a corrupt blob fails' '
 	(
 		cd bit-error &&
@@ -31,4 +40,20 @@ test_expect_success 'streaming a corrupt blob fails' '
 	)
 '
 
+test_expect_success 'read-tree -u detects bit-errors in blobs' '
+	(
+		cd bit-error &&
+		rm content.t &&
+		test_must_fail git read-tree --reset -u FETCH_HEAD
+	)
+'
+
+test_expect_success 'read-tree -u detects missing objects' '
+	(
+		cd missing &&
+		rm content.t &&
+		test_must_fail git read-tree --reset -u FETCH_HEAD
+	)
+'
+
 test_done
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 7/9] add tests for cloning corrupted repositories
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (5 preceding siblings ...)
  2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
@ 2013-03-25 20:22   ` Jeff King
  2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
  2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
  8 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We try not to let corruption pass unnoticed over fetches and
clones. For the most part, this works, but there are some
broken corner cases, including:

  1. We do not detect missing objects over git-aware
     transports. This is a little hard to test, because the
     sending side will actually complain about the missing
     object.

     To fool it, we corrupt a repository such that we have a
     "misnamed" object: it claims to be sha1 X, but is
     really Y. This lets the sender blindly transmit it, but
     it is the receiver's responsibility to verify that what
     it got is sane (and it does not).

  2. We do not detect missing or misnamed blobs during the
     checkout phase of clone.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1060-object-corruption.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 0792132..eb285c0 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -33,6 +33,19 @@ test_expect_success 'setup repo with missing object' '
 	)
 '
 
+test_expect_success 'setup repo with misnamed object' '
+	git init misnamed &&
+	(
+		cd misnamed &&
+		test_commit content &&
+		good=$(obj_to_file HEAD:content.t) &&
+		blob=$(echo corrupt | git hash-object -w --stdin) &&
+		bad=$(obj_to_file $blob) &&
+		rm -f "$good" &&
+		mv "$bad" "$good"
+	)
+'
+
 test_expect_success 'streaming a corrupt blob fails' '
 	(
 		cd bit-error &&
@@ -56,4 +69,32 @@ test_expect_success 'read-tree -u detects missing objects' '
 	)
 '
 
+# We use --bare to make sure that the transport detects it, not the checkout
+# phase.
+test_expect_success 'clone --no-local --bare detects corruption' '
+	test_must_fail git clone --no-local --bare bit-error corrupt-transport
+'
+
+test_expect_success 'clone --no-local --bare detects missing object' '
+	test_must_fail git clone --no-local --bare missing missing-transport
+'
+
+test_expect_failure 'clone --no-local --bare detects misnamed object' '
+	test_must_fail git clone --no-local --bare misnamed misnamed-transport
+'
+
+# We do not expect --local to detect corruption at the transport layer,
+# so we are really checking the checkout() code path.
+test_expect_success 'clone --local detects corruption' '
+	test_must_fail git clone --local bit-error corrupt-checkout
+'
+
+test_expect_failure 'clone --local detects missing objects' '
+	test_must_fail git clone --local missing missing-checkout
+'
+
+test_expect_failure 'clone --local detects misnamed objects' '
+	test_must_fail git clone --local misnamed misnamed-checkout
+'
+
 test_done
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 8/9] clone: die on errors from unpack_trees
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (6 preceding siblings ...)
  2013-03-25 20:22   ` [PATCH 7/9] add tests for cloning corrupted repositories Jeff King
@ 2013-03-25 20:23   ` Jeff King
  2013-03-26 21:40     ` Junio C Hamano
  2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
  8 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When clone is populating the working tree, it ignores the
return status from unpack_trees; this means we may report a
successful clone, even when the checkout fails.

When checkout fails, we may want to leave the $GIT_DIR in
place, as it might be possible to recover the data through
further use of "git checkout" (e.g., if the checkout failed
due to a transient error, disk full, etc). However, we
already die on a number of other checkout-related errors, so
this patch follows that pattern.

In addition to marking a now-passing test, we need to adjust
t5710, which blindly assumed it could make bogus clones of
very deep alternates hierarchies. By using "--bare", we can
avoid it actually touching any objects.

Signed-off-by: Jeff King <peff@peff.net>
---
I think the "leave the data behind" fix may be to just set "junk_pid =
0" a little sooner in cmd_clone (i.e., before checkout()). Then we
would still die, but at least leave the fetched objects intact.

 builtin/clone.c              | 3 ++-
 t/t1060-object-corruption.sh | 2 +-
 t/t5710-info-alternate.sh    | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index e0aaf13..7d48ef3 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -579,7 +579,8 @@ static int checkout(void)
 	tree = parse_tree_indirect(sha1);
 	parse_tree(tree);
 	init_tree_desc(&t, tree->buffer, tree->size);
-	unpack_trees(1, &t, &opts);
+	if (unpack_trees(1, &t, &opts) < 0)
+		die(_("unable to checkout working tree"));
 
 	if (write_cache(fd, active_cache, active_nr) ||
 	    commit_locked_index(lock_file))
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index eb285c0..05ba4e7 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -89,7 +89,7 @@ test_expect_success 'clone --local detects corruption' '
 	test_must_fail git clone --local bit-error corrupt-checkout
 '
 
-test_expect_failure 'clone --local detects missing objects' '
+test_expect_success 'clone --local detects missing objects' '
 	test_must_fail git clone --local missing missing-checkout
 '
 
diff --git a/t/t5710-info-alternate.sh b/t/t5710-info-alternate.sh
index aa04529..5a6e49d 100755
--- a/t/t5710-info-alternate.sh
+++ b/t/t5710-info-alternate.sh
@@ -58,7 +58,7 @@ git clone -l -s F G &&
 git clone -l -s D E &&
 git clone -l -s E F &&
 git clone -l -s F G &&
-git clone -l -s G H'
+git clone --bare -l -s G H'
 
 test_expect_success 'invalidity of deepest repository' \
 'cd H && {
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 9/9] clone: run check_everything_connected
  2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
                     ` (7 preceding siblings ...)
  2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
@ 2013-03-25 20:26   ` Jeff King
  2013-03-26  0:53     ` Duy Nguyen
                       ` (2 more replies)
  8 siblings, 3 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 20:26 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When we fetch from a remote, we do a revision walk to make
sure that what we received is connected to our existing
history. We do not do the same check for clone, which should
be able to check that we received an intact history graph.

The upside of this patch is that it will make clone more
resilient against propagating repository corruption. The
downside is that we will now traverse "rev-list --objects
--all" down to the roots, which may take some time (it is
especially noticeable for a "--local --bare" clone).

Note that we need to adjust t5710, which tries to make such
a bogus clone. Rather than checking after the fact that our
clone is bogus, we can simplify it to just make sure "git
clone" reports failure.

Signed-off-by: Jeff King <peff@peff.net>
---
The slowdown is really quite terrible if you try "git clone --bare
linux-2.6.git". Even with this, the local-clone case already misses blob
corruption. So it probably makes sense to restrict it to just the
non-local clone case, which already has to do more work.

Even still, it adds a non-trivial amount of work (linux-2.6 takes
something like a minute to check). I don't like the idea of declaring
"git clone" non-safe unless you turn on transfer.fsckObjects, though. It
should have the same safety as "git fetch".

 builtin/clone.c              | 26 ++++++++++++++++++++++++++
 t/t1060-object-corruption.sh |  2 +-
 t/t5710-info-alternate.sh    |  8 +-------
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 7d48ef3..eceaa74 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
 #include "branch.h"
 #include "remote.h"
 #include "run-command.h"
+#include "connected.h"
 
 /*
  * Overall FIXMEs:
@@ -485,12 +486,37 @@ static void update_remote_refs(const struct ref *refs,
 	}
 }
 
+static int iterate_ref_map(void *cb_data, unsigned char sha1[20])
+{
+	struct ref **rm = cb_data;
+	struct ref *ref = *rm;
+
+	/*
+	 * Skip anything missing a peer_ref, which we are not
+	 * actually going to write a ref for.
+	 */
+	while (ref && !ref->peer_ref)
+		ref = ref->next;
+	/* Returning -1 notes "end of list" to the caller. */
+	if (!ref)
+		return -1;
+
+	hashcpy(sha1, ref->old_sha1);
+	*rm = ref->next;
+	return 0;
+}
+
 static void update_remote_refs(const struct ref *refs,
 			       const struct ref *mapped_refs,
 			       const struct ref *remote_head_points_at,
 			       const char *branch_top,
 			       const char *msg)
 {
+	const struct ref *rm = mapped_refs;
+
+	if (check_everything_connected(iterate_ref_map, 0, &rm))
+		die(_("remote did not send all necessary objects"));
+
 	if (refs) {
 		write_remote_refs(mapped_refs);
 		if (option_single_branch)
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 05ba4e7..fd314ef 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -79,7 +79,7 @@ test_expect_success 'clone --no-local --bare detects missing object' '
 	test_must_fail git clone --no-local --bare missing missing-transport
 '
 
-test_expect_failure 'clone --no-local --bare detects misnamed object' '
+test_expect_success 'clone --no-local --bare detects misnamed object' '
 	test_must_fail git clone --no-local --bare misnamed misnamed-transport
 '
 
diff --git a/t/t5710-info-alternate.sh b/t/t5710-info-alternate.sh
index 5a6e49d..8956c21 100755
--- a/t/t5710-info-alternate.sh
+++ b/t/t5710-info-alternate.sh
@@ -58,13 +58,7 @@ git clone -l -s F G &&
 git clone -l -s D E &&
 git clone -l -s E F &&
 git clone -l -s F G &&
-git clone --bare -l -s G H'
-
-test_expect_success 'invalidity of deepest repository' \
-'cd H && {
-	test_valid_repo
-	test $? -ne 0
-}'
+test_must_fail git clone --bare -l -s G H'
 
 cd "$base_dir"
 
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] add test for streaming corrupt blobs
  2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
@ 2013-03-25 21:10     ` Jonathan Nieder
  2013-03-25 21:26       ` Jeff King
  2013-03-27 20:27     ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Jonathan Nieder @ 2013-03-25 21:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

Jeff King wrote:

> We do not have many tests for handling corrupt objects. This
> new test at least checks that we detect a byte error in a
> corrupt blob object while streaming it out with cat-file.

Thanks.

[...]
> +# convert "1234abcd" to ".git/objects/12/34abcd"
> +obj_to_file() {
> +	echo "$(git rev-parse --git-dir)/objects/$(git rev-parse "$1" | sed 's,..,&/,')"
> +}

Maybe this would be clearer in multiple lines?

	commit=$(git rev-parse --verify "$1") &&
	git_dir=$(git rev-parse --git-dir) &&
	tail=${commit#??} &&
	echo "$git_dir/objects/${commit%$tail}/$tail"

> +
> +# Convert byte at offset "$2" of object "$1" into '\0'
> +corrupt_byte() {
> +	obj_file=$(obj_to_file "$1") &&
> +	chmod +w "$obj_file" &&
> +	printf '\0' | dd of="$obj_file" bs=1 seek="$2"

Some other tests such as t4205 also rely on "printf" being
binary-safe.  Phew.

> +}
> +
> +test_expect_success 'setup corrupt repo' '
> +	git init bit-error &&
> +	(
> +		cd bit-error &&
> +		test_commit content &&
> +		corrupt_byte HEAD:content.t 10
> +	)
> +'
> +
> +test_expect_success 'streaming a corrupt blob fails' '

"fails gracefully", maybe, to be more precise.

With or without the two changes suggested above,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] add test for streaming corrupt blobs
  2013-03-25 21:10     ` Jonathan Nieder
@ 2013-03-25 21:26       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 21:26 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Junio C Hamano

On Mon, Mar 25, 2013 at 02:10:38PM -0700, Jonathan Nieder wrote:

> > +# convert "1234abcd" to ".git/objects/12/34abcd"
> > +obj_to_file() {
> > +	echo "$(git rev-parse --git-dir)/objects/$(git rev-parse "$1" | sed 's,..,&/,')"
> > +}
> 
> Maybe this would be clearer in multiple lines?
> 
> 	commit=$(git rev-parse --verify "$1") &&
> 	git_dir=$(git rev-parse --git-dir) &&
> 	tail=${commit#??} &&
> 	echo "$git_dir/objects/${commit%$tail}/$tail"

Yeah, it started as:

  echo "$1" | sed 's,..,&/,'

and kind of got out of hand as it grew features. I'd be fine with your
version (though $commit is not right, as it is any object, and in fact
the test uses blobs).

> > +
> > +# Convert byte at offset "$2" of object "$1" into '\0'
> > +corrupt_byte() {
> > +	obj_file=$(obj_to_file "$1") &&
> > +	chmod +w "$obj_file" &&
> > +	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
> 
> Some other tests such as t4205 also rely on "printf" being
> binary-safe.  Phew.

Yeah, I think it should be fine, though the choice of character does not
actually matter, as long as it is different from what is currently at
that position (for the sake of simplicity, I just determined
experimentally that the given object is corrupted with the offset and
character I chose).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
@ 2013-03-25 21:35     ` Eric Sunshine
  2013-03-25 21:37       ` Jeff King
  2013-03-25 21:39     ` Jonathan Nieder
  1 sibling, 1 reply; 60+ messages in thread
From: Eric Sunshine @ 2013-03-25 21:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

On Mon, Mar 25, 2013 at 4:22 PM, Jeff King <peff@peff.net> wrote:
> diff --git a/entry.c b/entry.c
> index 17a6bcc..002b2f2 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -126,8 +126,10 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
>         fd = open_output_fd(path, ce, to_tempfile);
>         if (0 <= fd) {
>                 result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
> -               *fstat_done = fstat_output(fd, state, statbuf);
> -               result = close(fd);
> +               if (!result) {
> +                       *fstat_done = fstat_output(fd, state, statbuf);
> +                       result = close(fd);
> +               }

Is this intentionally leaking the opened 'fd' when stream_blob_to_fd()
returns an error?

>         }
>         if (result && 0 <= fd)
>                 unlink(path);

Won't the unlink() now fail on Windows since 'fd' is still open?

-- ES

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 21:35     ` Eric Sunshine
@ 2013-03-25 21:37       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 21:37 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Junio C Hamano

On Mon, Mar 25, 2013 at 05:35:51PM -0400, Eric Sunshine wrote:

> On Mon, Mar 25, 2013 at 4:22 PM, Jeff King <peff@peff.net> wrote:
> > diff --git a/entry.c b/entry.c
> > index 17a6bcc..002b2f2 100644
> > --- a/entry.c
> > +++ b/entry.c
> > @@ -126,8 +126,10 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
> >         fd = open_output_fd(path, ce, to_tempfile);
> >         if (0 <= fd) {
> >                 result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
> > -               *fstat_done = fstat_output(fd, state, statbuf);
> > -               result = close(fd);
> > +               if (!result) {
> > +                       *fstat_done = fstat_output(fd, state, statbuf);
> > +                       result = close(fd);
> > +               }
> 
> Is this intentionally leaking the opened 'fd' when stream_blob_to_fd()
> returns an error?

Good catch. I was so focused on making sure we still called unlink that
I forgot about the cleanup side-effect of close.

I'll re-roll it.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
  2013-03-25 21:35     ` Eric Sunshine
@ 2013-03-25 21:39     ` Jonathan Nieder
  2013-03-25 21:49       ` [PATCH v2 " Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Jonathan Nieder @ 2013-03-25 21:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

Jeff King wrote:

> When we are streaming an index blob to disk, we store the
> error from stream_blob_to_fd in the "result" variable, and
> then immediately overwrite that with the return value of
> "close".

Good catch.

[...]
> --- a/entry.c
> +++ b/entry.c
> @@ -126,8 +126,10 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
>  	fd = open_output_fd(path, ce, to_tempfile);
>  	if (0 <= fd) {
>  		result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
> -		*fstat_done = fstat_output(fd, state, statbuf);
> -		result = close(fd);
> +		if (!result) {
> +			*fstat_done = fstat_output(fd, state, statbuf);
> +			result = close(fd);
> +		}

Should this do something like


	{
		int fd, result = 0;

		fd = open_output_fd(path, ce, to_tempfile);
		if (fd < 0)
			return -1;

		result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
		if (result)
			goto close_fd;

		*fstat_done = fstat_output(fd, state, statbuf);
	close_fd:
		result |= close(fd);
	unlink_path:
		if (result)
			unlink(path);
		return result;
	}

to avoid leaking the file descriptor?

> @@ -31,4 +40,20 @@ test_expect_success 'streaming a corrupt blob fails' '
>  	)
>  '
>  
> +test_expect_success 'read-tree -u detects bit-errors in blobs' '
> +	(
> +		cd bit-error &&
> +		rm content.t &&
> +		test_must_fail git read-tree --reset -u FETCH_HEAD
> +	)

Makes sense.  Might make sense to use "rm -f" instead of "rm" to avoid
failures if content.t is removed already.

> +'
> +
> +test_expect_success 'read-tree -u detects missing objects' '
> +	(
> +		cd missing &&
> +		rm content.t &&

Especially here.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 21:39     ` Jonathan Nieder
@ 2013-03-25 21:49       ` Jeff King
  2013-03-25 23:29         ` Jonathan Nieder
  2013-03-26 21:38         ` Junio C Hamano
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2013-03-25 21:49 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Eric Sunshine, git, Junio C Hamano

On Mon, Mar 25, 2013 at 02:39:34PM -0700, Jonathan Nieder wrote:

> > --- a/entry.c
> > +++ b/entry.c
> > @@ -126,8 +126,10 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
> >  	fd = open_output_fd(path, ce, to_tempfile);
> >  	if (0 <= fd) {
> >  		result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
> > -		*fstat_done = fstat_output(fd, state, statbuf);
> > -		result = close(fd);
> > +		if (!result) {
> > +			*fstat_done = fstat_output(fd, state, statbuf);
> > +			result = close(fd);
> > +		}
> 
> Should this do something like
> [...]
> to avoid leaking the file descriptor?

Yes, Eric Sunshine noticed this, too. Re-rolled patch is below, which I
think is even a little cleaner.

> > +test_expect_success 'read-tree -u detects bit-errors in blobs' '
> > +	(
> > +		cd bit-error &&
> > +		rm content.t &&
> > +		test_must_fail git read-tree --reset -u FETCH_HEAD
> > +	)
> 
> Makes sense.  Might make sense to use "rm -f" instead of "rm" to avoid
> failures if content.t is removed already.

Yeah, good point. My original test looked like:

  git init bit-error &&
  git fetch .. &&
  corrupt ...
  test_must_fail ...

but I ended up refactoring it to re-use the corrupted directories, and
added the "rm" after the fact. The use of FETCH_HEAD is also bogus
(read-tree is failing, but because we are giving it a bogus ref, not
because of the corruption, so we are not actually testing anything
anymore, even though it still passes).

Both fixed in my re-roll.

-- >8 --
Subject: [PATCH] streaming_write_entry: propagate streaming errors

When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.

We can fix this by using bitwise-OR to accumulate errors in
our result variable.

While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.

Signed-off-by: Jeff King <peff@peff.net>
---
 entry.c                      | 16 +++++++++-------
 t/t1060-object-corruption.sh | 25 +++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/entry.c b/entry.c
index 17a6bcc..a20bcbc 100644
--- a/entry.c
+++ b/entry.c
@@ -120,16 +120,18 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
 				 const struct checkout *state, int to_tempfile,
 				 int *fstat_done, struct stat *statbuf)
 {
-	int result = -1;
+	int result = 0;
 	int fd;
 
 	fd = open_output_fd(path, ce, to_tempfile);
-	if (0 <= fd) {
-		result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
-		*fstat_done = fstat_output(fd, state, statbuf);
-		result = close(fd);
-	}
-	if (result && 0 <= fd)
+	if (fd < 0)
+		return -1;
+
+	result |= stream_blob_to_fd(fd, ce->sha1, filter, 1);
+	*fstat_done = fstat_output(fd, state, statbuf);
+	result |= close(fd);
+
+	if (result)
 		unlink(path);
 	return result;
 }
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index d36994a..2945395 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -24,6 +24,15 @@ test_expect_success 'setup corrupt repo' '
 	)
 '
 
+test_expect_success 'setup repo with missing object' '
+	git init missing &&
+	(
+		cd missing &&
+		test_commit content &&
+		rm -f "$(obj_to_file HEAD:content.t)"
+	)
+'
+
 test_expect_success 'streaming a corrupt blob fails' '
 	(
 		cd bit-error &&
@@ -31,4 +40,20 @@ test_expect_success 'streaming a corrupt blob fails' '
 	)
 '
 
+test_expect_success 'read-tree -u detects bit-errors in blobs' '
+	(
+		cd bit-error &&
+		rm -f content.t &&
+		test_must_fail git read-tree --reset -u HEAD
+	)
+'
+
+test_expect_success 'read-tree -u detects missing objects' '
+	(
+		cd missing &&
+		rm -f content.t &&
+		test_must_fail git read-tree --reset -u HEAD
+	)
+'
+
 test_done
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 21:49       ` [PATCH v2 " Jeff King
@ 2013-03-25 23:29         ` Jonathan Nieder
  2013-03-26 21:38         ` Junio C Hamano
  1 sibling, 0 replies; 60+ messages in thread
From: Jonathan Nieder @ 2013-03-25 23:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, git, Junio C Hamano

Jeff King wrote:

> Both fixed in my re-roll.

Thanks!  This and the rest of the patches up to and including patch 8
look good to me.

I haven't decided what to think about patch 9 yet, but I suspect it
would be good, too.  In the long term I suspect "git clone
--worktree-only <repo>" (or some other standard interface for
git-new-workdir functionality) is a better way to provide a convenient
lightweight same-machine clone anyway.

Jonathan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] clone: run check_everything_connected
  2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
@ 2013-03-26  0:53     ` Duy Nguyen
  2013-03-26 22:24       ` Jeff King
  2013-03-26 21:50     ` Junio C Hamano
  2013-03-28  0:40     ` Duy Nguyen
  2 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2013-03-26  0:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Tue, Mar 26, 2013 at 3:26 AM, Jeff King <peff@peff.net> wrote:
>  static void update_remote_refs(const struct ref *refs,
>                                const struct ref *mapped_refs,
>                                const struct ref *remote_head_points_at,
>                                const char *branch_top,
>                                const char *msg)
>  {
> +       const struct ref *rm = mapped_refs;
> +
> +       if (check_everything_connected(iterate_ref_map, 0, &rm))
> +               die(_("remote did not send all necessary objects"));
> +
>         if (refs) {
>                 write_remote_refs(mapped_refs);
>                 if (option_single_branch)

Maybe move this after checkout, so that I can switch terminal and
start working while it's verifying? And maybe an option not to
check_everything_connected, instead print a big fat warning telling
the user to fsck later?
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 15:56           ` Jeff King
  2013-03-25 16:32             ` Jeff Mitchell
@ 2013-03-26  1:06             ` Duy Nguyen
  1 sibling, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2013-03-26  1:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Jeff Mitchell, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 10:56 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King <peff@peff.net> wrote:
>> > There are basically three levels of transport that can be used on a
>> > local machine:
>> >
>> >   1. Hard-linking (very fast, no redundancy).
>> >
>> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>> >      data, but does not check the integrity of the original).
>> >
>> >   3. Regular git transport, creating a pack (slowest, but should include
>> >      redundancy checks).
>> >
>> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
>> > think the documentation in "git clone" could use some improvement in
>> > that area.
>>
>> Not only git-clone. How git-fetch and git-push verify the new pack
>> should also be documented. I don't think many people outside the
>> contributor circle know what is done (and maybe how) when data is
>> received from outside.
>
> I think it's less of a documentation issue there, though, because they
> _only_ do (3). There is no option to do anything else, so there is
> nothing to warn the user about in terms of tradeoffs.
>
> I agree that in general git's handling of corruption could be documented
> somewhere, but I'm not sure where.

I think either a section in git-fsck.txt or git.txt. Probably the
former as people who read it are probably more concerned about
corruption.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-25 20:07               ` Jeff King
@ 2013-03-26 13:43                 ` Jeff Mitchell
  2013-03-26 16:55                   ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff Mitchell @ 2013-03-26 13:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Ævar Arnfjörð, git

On Mon, Mar 25, 2013 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
>> For commit corruptions, the --no-hardlinks, non --mirror case refused
>> to create the new repository and exited with an error code of 128. The
>> --no-hardlinks, --mirror case spewed errors to the console, yet
>> *still* created the new clone *and* returned an error code of zero.
>
> I wasn't able to reproduce this; can you post a succint test case?

This actually seems hard to reproduce. I found this during testing
with an existing repository on-disk, but when I tried creating a new
repository with some commit objects, and modifying one of the commit
objects the same way I modified the an object in the previous
repository, I was unable to reproduce it.

I do have the original repository though, so I'll tar.gz it up so that
you can have exactly the same content as I do. It's about 40MB and you
can grab it here:
https://www.dropbox.com/s/e8dhedmpd1a1axs/tomahawk-corrupt.tar.gz (MD5
sum: cde8a43233db5d649932407891f8366b).

Once you extract that, you should be able to run a clone using paths
(not file://) with --no-hardlinks --mirror and replicate the behavior
I saw. FYI, I'm on Git 1.8.2.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 13:43                 ` Jeff Mitchell
@ 2013-03-26 16:55                   ` Jeff King
  2013-03-26 21:59                     ` Philip Oakley
  2013-03-26 23:20                     ` Rich Fromm
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2013-03-26 16:55 UTC (permalink / raw)
  To: Jeff Mitchell; +Cc: Duy Nguyen, Ævar Arnfjörð, git

On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote:

> On Mon, Mar 25, 2013 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
> >> For commit corruptions, the --no-hardlinks, non --mirror case refused
> >> to create the new repository and exited with an error code of 128. The
> >> --no-hardlinks, --mirror case spewed errors to the console, yet
> >> *still* created the new clone *and* returned an error code of zero.
> >
> > I wasn't able to reproduce this; can you post a succint test case?
>
> [...link to tar.gz...]
> Once you extract that, you should be able to run a clone using paths
> (not file://) with --no-hardlinks --mirror and replicate the behavior
> I saw. FYI, I'm on Git 1.8.2.

Thanks for providing an example.

The difference is the same "--mirror implies --bare" issue; the non-bare
case dies during the checkout (even before my patches, as the corruption
is not in a blob, but rather in the HEAD commit object itself). You can
replace --mirror with --bare and see the same behavior.

The troubling part is that we see errors in the bare case, but do not
die. Those errors all come from upload-pack, the "sending" side of a
clone/fetch. Even though we do not transfer the objects via the git
protocol, we still invoke upload-pack to get the ref list (and then copy
the objects themselves out-of-band).

What happens is that upload-pack sees the errors while trying to see if
the object is a tag that can be peeled (the server advertises both tags
and the objects they point to). It does not distinguish between "errors
did not let me peel this object" and "this object is not a tag, and
therefore there is nothing to peel".

We could change that, but I'm not sure whether it is a good idea. I
think the intent is that upload-pack's ref advertisement would remain
resilient to corruption in the repository (e.g., even if that commit is
corrupt, you can still fetch the rest of the data). We should not worry
about advertising broken objects, because we will encounter the same
error when we actually do try to send the objects. Dying at the
advertisement phase would be premature, since we do not yet know what
the client will request.

The problem, of course, is that the --local optimization _skips_ the
part where we actually ask upload-pack for data, and instead blindly
copies it. So this is the same issue as usual, which is that the local
transport is not thorough enough to catch corruption. It seems like a
failing in this case, because upload-pack does notice the problem, but
that is only luck; if the corruption were in a non-tip object, it would
not notice it at all. So trying to die on errors in the ref
advertisement would just be a band-aid. Fundamentally the problem is
that the --local transport is not safe from propagating corruption, and
should not be used if that's a requirement.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream
  2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
@ 2013-03-26 21:27     ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-26 21:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We call read_istream, but never check its return value for
> errors. This can lead to us looping infinitely, as we just
> keep trying to write "-1" bytes (and we do not notice the
> error, as we simply check that write_in_full reports the
> same number of bytes we fed it, which of course is also -1).

Looks sane.  Thanks.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> No test yet, as my method for triggering this causes _another_ infinite
> loop. So the test comes after the fixes, to avoid infinite loops when
> bisecting the history later. :)
>
>  streaming.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/streaming.c b/streaming.c
> index 4d978e5..f4126a7 100644
> --- a/streaming.c
> +++ b/streaming.c
> @@ -514,6 +514,8 @@ int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *f
>  		ssize_t wrote, holeto;
>  		ssize_t readlen = read_istream(st, buf, sizeof(buf));
>  
> +		if (readlen < 0)
> +			goto close_and_exit;
>  		if (!readlen)
>  			break;
>  		if (can_seek && sizeof(buf) == readlen) {

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/9] streaming_write_entry: propagate streaming errors
  2013-03-25 21:49       ` [PATCH v2 " Jeff King
  2013-03-25 23:29         ` Jonathan Nieder
@ 2013-03-26 21:38         ` Junio C Hamano
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-26 21:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Eric Sunshine, git

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] streaming_write_entry: propagate streaming errors
>
> When we are streaming an index blob to disk, we store the
> error from stream_blob_to_fd in the "result" variable, and
> then immediately overwrite that with the return value of
> "close". That means we catch errors on close (e.g., problems
> committing the file to disk), but miss anything which
> happened before then.
>
> We can fix this by using bitwise-OR to accumulate errors in
> our result variable.
>
> While we're here, we can also simplify the error handling
> with an early return, which makes it easier to see under
> which circumstances we need to clean up.
>
> Signed-off-by: Jeff King <peff@peff.net>

Very sensible.  Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] clone: die on errors from unpack_trees
  2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
@ 2013-03-26 21:40     ` Junio C Hamano
  2013-03-26 22:22       ` [PATCH 10/9] clone: leave repo in place after checkout errors Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2013-03-26 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> When clone is populating the working tree, it ignores the
> return status from unpack_trees; this means we may report a
> successful clone, even when the checkout fails.
>
> When checkout fails, we may want to leave the $GIT_DIR in
> place, as it might be possible to recover the data through
> further use of "git checkout" (e.g., if the checkout failed
> due to a transient error, disk full, etc). However, we
> already die on a number of other checkout-related errors, so
> this patch follows that pattern.
>
> In addition to marking a now-passing test, we need to adjust
> t5710, which blindly assumed it could make bogus clones of
> very deep alternates hierarchies. By using "--bare", we can
> avoid it actually touching any objects.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Thanks.

> I think the "leave the data behind" fix may be to just set "junk_pid =
> 0" a little sooner in cmd_clone (i.e., before checkout()). Then we
> would still die, but at least leave the fetched objects intact.

Yeah, perhaps, but I agree that is a much lower priority change.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] clone: run check_everything_connected
  2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
  2013-03-26  0:53     ` Duy Nguyen
@ 2013-03-26 21:50     ` Junio C Hamano
  2013-03-28  0:40     ` Duy Nguyen
  2 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-26 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The slowdown is really quite terrible if you try "git clone --bare
> linux-2.6.git". Even with this, the local-clone case already misses blob
> corruption. So it probably makes sense to restrict it to just the
> non-local clone case, which already has to do more work.

Probably.  We may want to enable fsck even for local clones in the
longer term and also have this check.  Those who know their filesystem
is trustworthy can do the filesystem-level copy with "cp -R" themselves
after all.

> Even still, it adds a non-trivial amount of work (linux-2.6 takes
> something like a minute to check). I don't like the idea of declaring
> "git clone" non-safe unless you turn on transfer.fsckObjects, though. It
> should have the same safety as "git fetch".

True.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 16:55                   ` Jeff King
@ 2013-03-26 21:59                     ` Philip Oakley
  2013-03-26 22:03                       ` Jeff King
  2013-03-26 23:20                     ` Rich Fromm
  1 sibling, 1 reply; 60+ messages in thread
From: Philip Oakley @ 2013-03-26 21:59 UTC (permalink / raw)
  To: Jeff King, Jeff Mitchell
  Cc: Duy Nguyen, Ævar Arnfjörð, Git List

From: "Jeff King" <peff@peff.net>
Sent: Tuesday, March 26, 2013 4:55 PM
> On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote:
>
>> On Mon, Mar 25, 2013 at 4:07 PM, Jeff King <peff@peff.net> wrote:
>> > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
>> >> For commit corruptions, the --no-hardlinks, non --mirror case
>> >> refused
>> >> to create the new repository and exited with an error code of 128.
>> >> The
>> >> --no-hardlinks, --mirror case spewed errors to the console, yet
>> >> *still* created the new clone *and* returned an error code of
>> >> zero.
>> >
>> > I wasn't able to reproduce this; can you post a succint test case?
>>
>> [...link to tar.gz...]
>> Once you extract that, you should be able to run a clone using paths
>> (not file://) with --no-hardlinks --mirror and replicate the behavior
>> I saw. FYI, I'm on Git 1.8.2.
>
> Thanks for providing an example.
>
> The difference is the same "--mirror implies --bare" issue; the
> non-bare
> case dies during the checkout (even before my patches, as the
> corruption
> is not in a blob, but rather in the HEAD commit object itself). You
> can
> replace --mirror with --bare and see the same behavior.
>
> The troubling part is that we see errors in the bare case, but do not
> die. Those errors all come from upload-pack, the "sending" side of a
> clone/fetch. Even though we do not transfer the objects via the git
> protocol, we still invoke upload-pack to get the ref list (and then
> copy
> the objects themselves out-of-band).
>
> What happens is that upload-pack sees the errors while trying to see
> if
> the object is a tag that can be peeled (the server advertises both
> tags
> and the objects they point to). It does not distinguish between
> "errors
> did not let me peel this object" and "this object is not a tag, and
> therefore there is nothing to peel".
>
> We could change that, but I'm not sure whether it is a good idea. I
> think the intent is that upload-pack's ref advertisement would remain
> resilient to corruption in the repository (e.g., even if that commit
> is
> corrupt, you can still fetch the rest of the data). We should not
> worry
> about advertising broken objects, because we will encounter the same
> error when we actually do try to send the objects. Dying at the
> advertisement phase would be premature, since we do not yet know what
> the client will request.
>
> The problem, of course, is that the --local optimization _skips_ the
> part where we actually ask upload-pack for data, and instead blindly
> copies it. So this is the same issue as usual, which is that the local
> transport is not thorough enough to catch corruption. It seems like a
> failing in this case, because upload-pack does notice the problem, but
> that is only luck; if the corruption were in a non-tip object, it
> would
> not notice it at all. So trying to die on errors in the ref
> advertisement would just be a band-aid. Fundamentally the problem is
> that the --local transport is not safe from propagating corruption,
> and
> should not be used if that's a requirement.
>
> -Peff
> --

Which way does `git bundle file.bundl --all` perform after the changes
for both the 'transport' checking and being reliable during updates.

Is it an option for creating an archivable file that can be used for a
later `clone`?

I wasn't sure if the bundle capability had been considered.

Philip

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 21:59                     ` Philip Oakley
@ 2013-03-26 22:03                       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-26 22:03 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Jeff Mitchell, Duy Nguyen, Ævar Arnfjörð, Git List

On Tue, Mar 26, 2013 at 09:59:42PM -0000, Philip Oakley wrote:

> Which way does `git bundle file.bundl --all` perform after the changes
> for both the 'transport' checking and being reliable during updates.

Bundles are treated at a fairly low level the same as a remote who
provides us a particular set of refs and a packfile. So we should get
the same protections via index-pack, and still run
check_everything_connected on it, just as we would with a fetch over the
git protocol.

I didn't test it, though.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 10/9] clone: leave repo in place after checkout errors
  2013-03-26 21:40     ` Junio C Hamano
@ 2013-03-26 22:22       ` Jeff King
  2013-03-26 22:32         ` Jonathan Nieder
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-26 22:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Mar 26, 2013 at 02:40:57PM -0700, Junio C Hamano wrote:

> > I think the "leave the data behind" fix may be to just set "junk_pid =
> > 0" a little sooner in cmd_clone (i.e., before checkout()). Then we
> > would still die, but at least leave the fetched objects intact.
> 
> Yeah, perhaps, but I agree that is a much lower priority change.

As it turns out, the checkout() error path sometimes _already_ leaves
the repository intact, but it's due to a bug. And it ends up deleting
something random instead. :)

I agree it's not a high priority, but I think it makes sense while we're
in the area. And while it's very unlikely that the deletion would be
disastrous (see below), it makes me nervous. Patch is below.

-- >8 --
Subject: [PATCH] clone: leave repo in place after checkout errors

If we manage to clone a remote repository but run into an
error in the checkout, it is probably sane to leave the repo
directory in place. That lets the user examine the situation
without spending time to re-clone from the remote (which may
be a lengthy process).

Rather than try to convert each die() from the checkout code
path into an error(), we simply set a flag that tells the
"remove_junk" atexit function to print a helpful message and
leave the repo in place.

Note that the test added in this patch actually passes
without the code change. The reason is that the cleanup code
is buggy; we chdir into the working tree for the checkout,
but still may use relative paths to remove the directories
(which means if you cloned into "foo", we would accidentally
remove "foo" from the working tree!).  There's no point in
fixing it now, since this patch means we will never try to
remove anything after the chdir, anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
I think the accidental deletion could also escape the repository if you
did something like:

  git clone $remote ../../foo

which would delete ../../foo/../../foo, or ../../../foo, which is not
related to what you just cloned. But I didn't test, and we don't have to
care anymore after this patch.

 builtin/clone.c              | 33 ++++++++++++++++++++++++++++++++-
 t/t1060-object-corruption.sh |  4 ++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index eceaa74..e145dfc 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -377,10 +377,40 @@ static void remove_junk(void)
 static const char *junk_work_tree;
 static const char *junk_git_dir;
 static pid_t junk_pid;
+enum {
+	JUNK_LEAVE_NONE,
+	JUNK_LEAVE_REPO,
+	JUNK_LEAVE_ALL
+} junk_mode = JUNK_LEAVE_NONE;
+
+static const char junk_leave_repo_msg[] =
+N_("The remote repository was cloned successfully, but there was\n"
+   "an error checking out the HEAD branch. The repository has been left in\n"
+   "place but the working tree may be in an inconsistent state. You can\n"
+   "can inspect the contents with:\n"
+   "\n"
+   "    git status\n"
+   "\n"
+   "and retry the checkout with\n"
+   "\n"
+   "    git checkout -f HEAD\n"
+   "\n");
 
 static void remove_junk(void)
 {
 	struct strbuf sb = STRBUF_INIT;
+
+	switch (junk_mode) {
+	case JUNK_LEAVE_REPO:
+		warning("%s", _(junk_leave_repo_msg));
+		/* fall-through */
+	case JUNK_LEAVE_ALL:
+		return;
+	default:
+		/* proceed to removal */
+		break;
+	}
+
 	if (getpid() != junk_pid)
 		return;
 	if (junk_git_dir) {
@@ -925,12 +955,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	transport_unlock_pack(transport);
 	transport_disconnect(transport);
 
+	junk_mode = JUNK_LEAVE_REPO;
 	err = checkout();
 
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
 	strbuf_release(&key);
 	strbuf_release(&value);
-	junk_pid = 0;
+	junk_mode = JUNK_LEAVE_ALL;
 	return err;
 }
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index a405b70..a84deb1 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -89,6 +89,10 @@ test_expect_success 'clone --local detects corruption' '
 	test_must_fail git clone --local bit-error corrupt-checkout
 '
 
+test_expect_success 'error detected during checkout leaves repo intact' '
+	test_path_is_dir corrupt-checkout/.git
+'
+
 test_expect_success 'clone --local detects missing objects' '
 	test_must_fail git clone --local missing missing-checkout
 '
-- 
1.8.2.13.g0f18d3c

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] clone: run check_everything_connected
  2013-03-26  0:53     ` Duy Nguyen
@ 2013-03-26 22:24       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-26 22:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git, Junio C Hamano

On Tue, Mar 26, 2013 at 07:53:42AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Tue, Mar 26, 2013 at 3:26 AM, Jeff King <peff@peff.net> wrote:
> >  static void update_remote_refs(const struct ref *refs,
> >                                const struct ref *mapped_refs,
> >                                const struct ref *remote_head_points_at,
> >                                const char *branch_top,
> >                                const char *msg)
> >  {
> > +       const struct ref *rm = mapped_refs;
> > +
> > +       if (check_everything_connected(iterate_ref_map, 0, &rm))
> > +               die(_("remote did not send all necessary objects"));
> > +
> >         if (refs) {
> >                 write_remote_refs(mapped_refs);
> >                 if (option_single_branch)
> 
> Maybe move this after checkout, so that I can switch terminal and
> start working while it's verifying? And maybe an option not to
> check_everything_connected, instead print a big fat warning telling
> the user to fsck later?

I tried to follow the fetch process of not installing the refs until we
had verified that the objects were reasonable. It probably doesn't
matter that much for clone, since you would not have simultaneous users
expecting the repository to be in a reasonable state until after clone
completes, though.

We also would have to tweak check_everything_connected, which does
something like "--not --all" to avoid rechecking objects we already
have. But that is not too hard to do.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/9] clone: leave repo in place after checkout errors
  2013-03-26 22:22       ` [PATCH 10/9] clone: leave repo in place after checkout errors Jeff King
@ 2013-03-26 22:32         ` Jonathan Nieder
  2013-03-27  1:03           ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jonathan Nieder @ 2013-03-26 22:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Jeff King wrote:

> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -377,10 +377,40 @@ static void remove_junk(void)
>  static const char *junk_work_tree;
>  static const char *junk_git_dir;
>  static pid_t junk_pid;
> +enum {
> +	JUNK_LEAVE_NONE,
> +	JUNK_LEAVE_REPO,
> +	JUNK_LEAVE_ALL
> +} junk_mode = JUNK_LEAVE_NONE;

Neat.

> +
> +static const char junk_leave_repo_msg[] =
> +N_("The remote repository was cloned successfully, but there was\n"
> +   "an error checking out the HEAD branch. The repository has been left in\n"
> +   "place but the working tree may be in an inconsistent state. You can\n"
> +   "can inspect the contents with:\n"
> +   "\n"
> +   "    git status\n"
> +   "\n"
> +   "and retry the checkout with\n"
> +   "\n"
> +   "    git checkout -f HEAD\n"
> +   "\n");

Can this be made more precise?  I don't know what it means for the
working tree to be in an inconsistent state: do you mean that some files
might be partially checked out or not have been checked out at all yet?

	error: Clone succeeded, but checkout failed.
	hint: You can inspect what was checked out with "git status".
	hint: To retry the checkout, run "git checkout -f HEAD".

Aside from that, this looks very nice.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 16:55                   ` Jeff King
  2013-03-26 21:59                     ` Philip Oakley
@ 2013-03-26 23:20                     ` Rich Fromm
  2013-03-27  1:25                       ` Jonathan Nieder
  2013-03-27  3:47                       ` Junio C Hamano
  1 sibling, 2 replies; 60+ messages in thread
From: Rich Fromm @ 2013-03-26 23:20 UTC (permalink / raw)
  To: git

Jeff King wrote
> Fundamentally the problem is
> that the --local transport is not safe from propagating corruption, and
> should not be used if that's a requirement.

I've read Jeff Mitchell's blog post, his update, relevant parts of the
git-clone(1) man page, and a decent chunk of this thread, and I'm still not
clear on one thing.  Is the danger of `git clone --mirror` propagating
corruption only true when using the --local option ?

Specifically, in my case, I'm using `git clone --mirror`, but I'm *not*
using --local, nor am I using --no-hardlinks.  The host executing the clone
command is different than the the host on which the remote repository lives,
and I am using ssh as a transport protocol.  If there is corruption, can I
or can I not expect the clone operation to fail and return a non-zero exit
value?  If I can not expect this, is the workaround to run `git fsck` on the
resulting clone?




--
View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580771.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/9] clone: leave repo in place after checkout errors
  2013-03-26 22:32         ` Jonathan Nieder
@ 2013-03-27  1:03           ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-27  1:03 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git

On Tue, Mar 26, 2013 at 03:32:59PM -0700, Jonathan Nieder wrote:

> > +static const char junk_leave_repo_msg[] =
> > +N_("The remote repository was cloned successfully, but there was\n"
> > +   "an error checking out the HEAD branch. The repository has been left in\n"
> > +   "place but the working tree may be in an inconsistent state. You can\n"
> > +   "can inspect the contents with:\n"
> > +   "\n"
> > +   "    git status\n"
> > +   "\n"
> > +   "and retry the checkout with\n"
> > +   "\n"
> > +   "    git checkout -f HEAD\n"
> > +   "\n");
> 
> Can this be made more precise?  I don't know what it means for the
> working tree to be in an inconsistent state: do you mean that some files
> might be partially checked out or not have been checked out at all yet?

It means that we died during the checkout procedure, and we don't have
any idea what was left. Maybe something, maybe nothing. Maybe an index,
maybe not.

> 	error: Clone succeeded, but checkout failed.
> 	hint: You can inspect what was checked out with "git status".
> 	hint: To retry the checkout, run "git checkout -f HEAD".

That is certainly more succint, if not more precise. I'd be fine with
it.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 23:20                     ` Rich Fromm
@ 2013-03-27  1:25                       ` Jonathan Nieder
  2013-03-27 18:23                         ` Rich Fromm
  2013-03-27  3:47                       ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Jonathan Nieder @ 2013-03-27  1:25 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git, Jeff King

Hi,

Rich Fromm wrote:

>                                                The host executing the clone
> command is different than the the host on which the remote repository lives,
> and I am using ssh as a transport protocol.  If there is corruption, can I
> or can I not expect the clone operation to fail and return a non-zero exit
> value?  If I can not expect this, is the workaround to run `git fsck` on the
> resulting clone?

Is the "[transfer] fsckObjects" configuration on the host executing the
clone set to true?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-26 23:20                     ` Rich Fromm
  2013-03-27  1:25                       ` Jonathan Nieder
@ 2013-03-27  3:47                       ` Junio C Hamano
  2013-03-27  6:19                         ` Sitaram Chamarty
                                           ` (2 more replies)
  1 sibling, 3 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-27  3:47 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git

Rich Fromm <richard_fromm@yahoo.com> writes:

> Jeff King wrote
>> Fundamentally the problem is
>> that the --local transport is not safe from propagating corruption, and
>> should not be used if that's a requirement.
>
> I've read Jeff Mitchell's blog post, his update, relevant parts of the
> git-clone(1) man page, and a decent chunk of this thread, and I'm still not
> clear on one thing.  Is the danger of `git clone --mirror` propagating
> corruption only true when using the --local option ?

If you use --local, that is equivalent to "cp -R".  Your corruption
in the source will faithfully be byte-for-byte copied to the
destination.  If you do not (and in your case you have two different
machines), unless you are using the long deprecated rsync transport
(which again is the same as "cp -R"), transport layer will notice
object corruption.  See Jeff's analysis earlier in the thread.

If you are lucky (or unlucky, depending on how you look at it), the
corruption you have in your object store may affect objects that are
needed to check out the version at the tip of the history, and "git
checkout" that happens as the last step of cloning may loudly
complain, but that just means you can immediately notice the
breakage in that case.  You may be unlucky and the corruption may
not affect objects that are needed to check out the tip. The initial
checkout will succeed as if nothing is wrong, but the corruption in
your object store is still there nevertheless.  "git log -p --all"
or "git fsck" will certainly be unhappy.

The difference between --mirror and no --mirror is a red herring.
You may want to ask Jeff Mitchell to remove the mention of it; it
only adds to the confusion without helping users.  If you made
byte-for-byte copy of corrupt repository, it wouldn't make any
difference if the first "checkout" notices it.

To be paranoid, you may want to set transfer.fsckObjects to true,
perhaps in your ~/.gitconfig.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27  3:47                       ` Junio C Hamano
@ 2013-03-27  6:19                         ` Sitaram Chamarty
  2013-03-27 15:03                           ` Junio C Hamano
  2013-03-27 18:51                         ` Rich Fromm
  2013-03-28 13:48                         ` Jeff Mitchell
  2 siblings, 1 reply; 60+ messages in thread
From: Sitaram Chamarty @ 2013-03-27  6:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rich Fromm, git

On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <gitster@pobox.com> wrote:

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

do we have any numbers on the overhead of this?

Even a "guesstimate" will do...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27  6:19                         ` Sitaram Chamarty
@ 2013-03-27 15:03                           ` Junio C Hamano
  2013-03-27 15:47                             ` Sitaram Chamarty
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2013-03-27 15:03 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: Rich Fromm, git

Sitaram Chamarty <sitaramc@gmail.com> writes:

> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> To be paranoid, you may want to set transfer.fsckObjects to true,
>> perhaps in your ~/.gitconfig.
>
> do we have any numbers on the overhead of this?
>
> Even a "guesstimate" will do...

On a reasonably slow machine:

$ cd /project/git/git.git && git repack -a -d
$ ls -hl .git/objects/pack/*.pack
-r--r--r-- 1 junio src 44M Mar 26 13:18 .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack

$ cd .. && git --bare init junk && cd junk
$ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
real    0m13.873s
user    0m21.345s
sys     0m2.248s

That's about 3.2 Mbps?

Compare that with the speed your other side feeds you (or your line
speed could be the limiting factor) and decide how much you value
your data.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27 15:03                           ` Junio C Hamano
@ 2013-03-27 15:47                             ` Sitaram Chamarty
  0 siblings, 0 replies; 60+ messages in thread
From: Sitaram Chamarty @ 2013-03-27 15:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rich Fromm, git

On Wed, Mar 27, 2013 at 8:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Sitaram Chamarty <sitaramc@gmail.com> writes:
>
>> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> To be paranoid, you may want to set transfer.fsckObjects to true,
>>> perhaps in your ~/.gitconfig.
>>
>> do we have any numbers on the overhead of this?
>>
>> Even a "guesstimate" will do...
>
> On a reasonably slow machine:
>
> $ cd /project/git/git.git && git repack -a -d
> $ ls -hl .git/objects/pack/*.pack
> -r--r--r-- 1 junio src 44M Mar 26 13:18 .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack
>
> $ cd .. && git --bare init junk && cd junk
> $ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
> real    0m13.873s
> user    0m21.345s
> sys     0m2.248s
>
> That's about 3.2 Mbps?
>
> Compare that with the speed your other side feeds you (or your line
> speed could be the limiting factor) and decide how much you value
> your data.

Thanks.  I was also interested in overhead on the server just as a %-age.

I have no idea why but when I did some tests a long time ago I got
upwards of 40% or so, but now when I try these tests for git.git

    cd <some empty dir>
    git init --bare
    # git config transfer.fsckobjects true
    git fetch file:///full/path/to/git.git refs/*:refs/*

then, the difference in elapsed time 18s -> 22s, so about 22%, and CPU
time is 31 -> 37, so about 20%.  I didn't measure disk access
increases, but I guess 20% is not too bad.

Is it likely to be linear in the size of the repo, by and large?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27  1:25                       ` Jonathan Nieder
@ 2013-03-27 18:23                         ` Rich Fromm
  2013-03-27 19:49                           ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Rich Fromm @ 2013-03-27 18:23 UTC (permalink / raw)
  To: git

Jonathan Nieder-2 wrote
> Is the "[transfer] fsckObjects" configuration on the host executing the
> clone set to true?

I hadn't been setting it at all, and according to git-config(1) it defaults
to false, so the answer is no.  It looks like setting it might be a good
idea.

But I'm still somewhat confused about what is and is not checked under what
conditions.  Consider the three statements:

# 1
git clone --mirror myuser@myhost:my_repo

# 2
git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo

# 3
git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck

Are 2 and 3 equivalent?  Or is there an increasing level of checking that
occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
this could be clearer in the man pages.

git-config(1) says that transfer.fsckObjects essentially (if fetch... and
receive... are not explicitly set) "git-fetch-pack will check all fetched
objects" and "git-receive-pack will check all received objects."  While
git-fsck(1) says "git-fsck tests SHA1 and general object sanity, and it does
full tracking of the resulting reachability and everything else."  The
latter sounds like a stronger statement to me.  But if that's true, perhaps
should the relevant section(s) of git-config(1) explicitly note that this is
not equivalent to a full git-fsck ?



--
View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580839.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27  3:47                       ` Junio C Hamano
  2013-03-27  6:19                         ` Sitaram Chamarty
@ 2013-03-27 18:51                         ` Rich Fromm
  2013-03-27 19:13                           ` Junio C Hamano
  2013-03-28 13:52                           ` Jeff Mitchell
  2013-03-28 13:48                         ` Jeff Mitchell
  2 siblings, 2 replies; 60+ messages in thread
From: Rich Fromm @ 2013-03-27 18:51 UTC (permalink / raw)
  To: git

Junio C Hamano wrote
> If you use --local, that is equivalent to "cp -R".  Your corruption
> in the source will faithfully be byte-for-byte copied to the
> destination.  If you do not
> ...
> transport layer will notice
> object corruption.
> ...
> The difference between --mirror and no --mirror is a red herring.
> You may want to ask Jeff Mitchell to remove the mention of it; it
> only adds to the confusion without helping users.

Just to clarify, I don't know Jeff Mitchell personally, and I'm not
affiliated with the KDE project.  I happened to have recently implemented a
backup strategy for a different codebase, that relies on `git clone
--mirror` to take the actual snapshots of the live repos, and I read about
Jeff's experiences, and that's why I started following this discussion. 
Apologies if my questions are considered slightly off topic -- I'm not
positive if this is supposed to be a list for developers, and not users.

Nevertheless, I will try to contact Jeff and point him at this.  My initial
reading of his blog posts definitely gave me the impression that this was a
--mirror vs. not issue, but it really sounds like his main problem was using
--local.

However, I think there may be room for some additional clarity in the docs. 
The --local option in git-config(1) says "When the repository to clone from
is on a local machine, this flag bypasses the normal "git aware" transport
mechanism".  But there's no mention of the consequences of this transport
bypass.  There's also no mention of this in the "GIT URLS" section that
discusses transport protocols, and I also don't see anything noting it in
either of these sections of the git book:

http://git-scm.com/book/en/Git-on-the-Server-The-Protocols
http://git-scm.com/book/en/Git-Internals-Transfer-Protocols




--
View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580845.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27 18:51                         ` Rich Fromm
@ 2013-03-27 19:13                           ` Junio C Hamano
  2013-03-28 13:52                           ` Jeff Mitchell
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-27 19:13 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git

Rich Fromm <richard_fromm@yahoo.com> writes:

> Apologies if my questions are considered slightly off topic -- I'm not
> positive if this is supposed to be a list for developers, and not users.

The list is both for users and developers.

> However, I think there may be room for some additional clarity in the docs. 
> The --local option in git-config(1) says "When the repository to clone from
> is on a local machine, this flag bypasses the normal "git aware" transport
> mechanism".  But there's no mention of the consequences of this transport
> bypass.

Yeah, I would not mind a patch to update the documentation for
"clone --local" and rsync transport to say something about
byte-for-byte copying of broken repository.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27 18:23                         ` Rich Fromm
@ 2013-03-27 19:49                           ` Jeff King
  2013-03-27 20:04                             ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-27 19:49 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git

On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:

> But I'm still somewhat confused about what is and is not checked under what
> conditions.  Consider the three statements:
> 
> # 1
> git clone --mirror myuser@myhost:my_repo
> 
> # 2
> git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
> 
> # 3
> git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
> 
> Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> this could be clearer in the man pages.

2 and 3 are not exactly equivalent, in that they are implemented
slightly differently, but I do not know offhand of any case that would
pass 2 but not 3. We do check reachability with transfer.fsckObjects.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27 19:49                           ` Jeff King
@ 2013-03-27 20:04                             ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2013-03-27 20:04 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git

On Wed, Mar 27, 2013 at 03:49:38PM -0400, Jeff King wrote:

> On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:
> 
> > But I'm still somewhat confused about what is and is not checked under what
> > conditions.  Consider the three statements:
> > 
> > # 1
> > git clone --mirror myuser@myhost:my_repo
> > 
> > # 2
> > git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
> > 
> > # 3
> > git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
> > 
> > Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> > occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> > this could be clearer in the man pages.
> 
> 2 and 3 are not exactly equivalent, in that they are implemented
> slightly differently, but I do not know offhand of any case that would
> pass 2 but not 3. We do check reachability with transfer.fsckObjects.

Oh, and in the case of #1, I think we would already find corruption, in
that index-pack will expand and check the sha1 of each object we
receive. The transfer.fsckObjects check adds some semantic checks as
well (e.g., making sure author identities are well-formed).

Clone will not currently detect missing objects and reachability
without transfer.fsckObjects set, but that is IMHO a bug; fetch will
notice it, and clone should behave the same way.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] add test for streaming corrupt blobs
  2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
  2013-03-25 21:10     ` Jonathan Nieder
@ 2013-03-27 20:27     ` Jeff King
  2013-03-27 20:35       ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff King @ 2013-03-27 20:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Mar 25, 2013 at 04:21:34PM -0400, Jeff King wrote:

> +# Convert byte at offset "$2" of object "$1" into '\0'
> +corrupt_byte() {
> +	obj_file=$(obj_to_file "$1") &&
> +	chmod +w "$obj_file" &&
> +	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
> +}

Hmm, this last line should probably be:

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index a84deb1..3f87051 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -12,7 +12,7 @@ corrupt_byte() {
 corrupt_byte() {
 	obj_file=$(obj_to_file "$1") &&
 	chmod +w "$obj_file" &&
-	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
+	printf '\0' | dd of="$obj_file" bs=1 seek="$2" conv=notrunc
 }
 
 test_expect_success 'setup corrupt repo' '

The intent was to change a single byte, not truncate the file (though on
the plus side, that truncation is what found the other bugs).

-Peff

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] add test for streaming corrupt blobs
  2013-03-27 20:27     ` Jeff King
@ 2013-03-27 20:35       ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2013-03-27 20:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Mon, Mar 25, 2013 at 04:21:34PM -0400, Jeff King wrote:
>
>> +# Convert byte at offset "$2" of object "$1" into '\0'
>> +corrupt_byte() {
>> +	obj_file=$(obj_to_file "$1") &&
>> +	chmod +w "$obj_file" &&
>> +	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
>> +}
>
> Hmm, this last line should probably be:
>
> diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
> index a84deb1..3f87051 100755
> --- a/t/t1060-object-corruption.sh
> +++ b/t/t1060-object-corruption.sh
> @@ -12,7 +12,7 @@ corrupt_byte() {
>  corrupt_byte() {
>  	obj_file=$(obj_to_file "$1") &&
>  	chmod +w "$obj_file" &&
> -	printf '\0' | dd of="$obj_file" bs=1 seek="$2"
> +	printf '\0' | dd of="$obj_file" bs=1 seek="$2" conv=notrunc
>  }
>  
>  test_expect_success 'setup corrupt repo' '
>
> The intent was to change a single byte, not truncate the file (though on
> the plus side, that truncation is what found the other bugs).

;-).  Thanks, I missed that.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] clone: run check_everything_connected
  2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
  2013-03-26  0:53     ` Duy Nguyen
  2013-03-26 21:50     ` Junio C Hamano
@ 2013-03-28  0:40     ` Duy Nguyen
  2013-03-31  7:57       ` Duy Nguyen
  2 siblings, 1 reply; 60+ messages in thread
From: Duy Nguyen @ 2013-03-28  0:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Tue, Mar 26, 2013 at 3:26 AM, Jeff King <peff@peff.net> wrote:
> The slowdown is really quite terrible if you try "git clone --bare
> linux-2.6.git". Even with this, the local-clone case already misses blob
> corruption. So it probably makes sense to restrict it to just the
> non-local clone case, which already has to do more work.
>
> Even still, it adds a non-trivial amount of work (linux-2.6 takes
> something like a minute to check). I don't like the idea of declaring
> "git clone" non-safe unless you turn on transfer.fsckObjects, though. It
> should have the same safety as "git fetch".

Maybe we could do it in index-pack to save some (wall) time. I haven't
tried but I think it might work. The problem is to make sure the pack
contains objects for all sha1 references in the pack. By that
description, we don't need to do standard DAG traversal. We could
extract sha-1 references in index-pack as we uncompress objects and
put all "want" sha-1 in a hash table. At the end of index-pack, we
check if any sha-1 in the hash table still points to non-existing
object.

This way, at least we don't need to uncompress all objects again in
rev-list. We could parse+hash in both phases in index-pack. The first
phase (parse_pack_objects) is usually I/O bound, we could hide some
cost there. The second phase is multithreaded, all the better.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27  3:47                       ` Junio C Hamano
  2013-03-27  6:19                         ` Sitaram Chamarty
  2013-03-27 18:51                         ` Rich Fromm
@ 2013-03-28 13:48                         ` Jeff Mitchell
  2 siblings, 0 replies; 60+ messages in thread
From: Jeff Mitchell @ 2013-03-28 13:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rich Fromm, git

On Tue, Mar 26, 2013 at 11:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> The difference between --mirror and no --mirror is a red herring.
> You may want to ask Jeff Mitchell to remove the mention of it; it
> only adds to the confusion without helping users.  If you made
> byte-for-byte copy of corrupt repository, it wouldn't make any
> difference if the first "checkout" notices it.

Hi,

Several days ago I had actually already updated the post to indicate
that my testing methodology was incorrect as a result of mixing up
--no-hardlinks and --no-local, and pointed to this thread.

I will say that we did see corrupted repos on the downstream mirrors.
I don't have an explanation for it, but have not been able to
reproduce it either. My only potential guess (untested) is that
perhaps when the corruption was detected the clone aborted but left
the objects already transferred locally. Again, untested -- I mention
it only because it's my only potential explanation  :-)

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

Interesting; I'd known about receive.fsckObjects but not
transfer/fetch. Thanks for the pointer.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: propagating repo corruption across clone
  2013-03-27 18:51                         ` Rich Fromm
  2013-03-27 19:13                           ` Junio C Hamano
@ 2013-03-28 13:52                           ` Jeff Mitchell
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff Mitchell @ 2013-03-28 13:52 UTC (permalink / raw)
  To: Rich Fromm; +Cc: git

On Wed, Mar 27, 2013 at 2:51 PM, Rich Fromm <richard_fromm@yahoo.com> wrote:
> Nevertheless, I will try to contact Jeff and point him at this.  My initial
> reading of his blog posts definitely gave me the impression that this was a
> --mirror vs. not issue, but it really sounds like his main problem was using
> --local.

Actually, I wasn't using --local, I just wasn't using --no-local, as I
mixed up that and --no-hardlinks (which lies somewhere between, but is
still not the same as using file://).

It's entirely possible that you read the posts before I updated them.

Also, keep in mind, if you're evaluating what you're doing, that the
system we have was not set up to be a backup system. It was set up to
be a mirror system. With proper sanity checking, it would have acted
in a decent capacity as a backup system, but that wasn't why it was
set up, nor how it was set up.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] clone: run check_everything_connected
  2013-03-28  0:40     ` Duy Nguyen
@ 2013-03-31  7:57       ` Duy Nguyen
  0 siblings, 0 replies; 60+ messages in thread
From: Duy Nguyen @ 2013-03-31  7:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Thu, Mar 28, 2013 at 07:40:51AM +0700, Duy Nguyen wrote:
> Maybe we could do it in index-pack to save some (wall) time. I haven't
> tried but I think it might work. The problem is to make sure the pack
> contains objects for all sha1 references in the pack. By that
> description, we don't need to do standard DAG traversal. We could
> extract sha-1 references in index-pack as we uncompress objects and
> put all "want" sha-1 in a hash table. At the end of index-pack, we
> check if any sha-1 in the hash table still points to non-existing
> object.
> 
> This way, at least we don't need to uncompress all objects again in
> rev-list. We could parse+hash in both phases in index-pack. The first
> phase (parse_pack_objects) is usually I/O bound, we could hide some
> cost there. The second phase is multithreaded, all the better.

It looks like what I describe above is exactly what index-pack
--strict does. Except that it holds the lock longer and has more
abstraction layers to slow things down. On linux-2.6 with 3 threads:

$ rev-list --all --objects --quiet (aka check_everything_connected)
34.26user 0.22system 0:34.56elapsed 99%CPU (0avgtext+0avgdata 2550528maxresident)k
0inputs+0outputs (0major+208569minor)pagefaults 0swaps

$ index-pack --stdin
214.57user 8.38system 1:31.82elapsed 242%CPU (0avgtext+0avgdata 1357328maxresident)k
8inputs+1421016outputs (0major+1222537minor)pagefaults 0swaps

$ index-pack --stdin --strict
297.36user 13.77system 2:11.82elapsed 236%CPU (0avgtext+0avgdata 1875040maxresident)k
0inputs+1421016outputs (0major+1308718minor)pagefaults 0swaps

$ index-pack --stdin --connectivity
231.09user 7.42system 1:37.39elapsed 244%CPU (0avgtext+0avgdata 2080816maxresident)k
0inputs+1421016outputs (0major+540069minor)pagefaults 0swaps

The last one does not hold locks by duplicating object hash table per
thread. As you can see the consumed memory is much higher than --stdin.
In return it only adds up 1/3 of rev-list time.

Maybe you should check which one is cheaper for clone case,
check_everything_connected() or index-pack --strict.
--
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2013-03-31  7:57 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-24 18:31 propagating repo corruption across clone Jeff King
2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
2013-03-24 19:23   ` Jeff King
2013-03-25 13:43     ` Jeff Mitchell
2013-03-25 14:56       ` Jeff King
2013-03-25 15:31         ` Duy Nguyen
2013-03-25 15:56           ` Jeff King
2013-03-25 16:32             ` Jeff Mitchell
2013-03-25 20:07               ` Jeff King
2013-03-26 13:43                 ` Jeff Mitchell
2013-03-26 16:55                   ` Jeff King
2013-03-26 21:59                     ` Philip Oakley
2013-03-26 22:03                       ` Jeff King
2013-03-26 23:20                     ` Rich Fromm
2013-03-27  1:25                       ` Jonathan Nieder
2013-03-27 18:23                         ` Rich Fromm
2013-03-27 19:49                           ` Jeff King
2013-03-27 20:04                             ` Jeff King
2013-03-27  3:47                       ` Junio C Hamano
2013-03-27  6:19                         ` Sitaram Chamarty
2013-03-27 15:03                           ` Junio C Hamano
2013-03-27 15:47                             ` Sitaram Chamarty
2013-03-27 18:51                         ` Rich Fromm
2013-03-27 19:13                           ` Junio C Hamano
2013-03-28 13:52                           ` Jeff Mitchell
2013-03-28 13:48                         ` Jeff Mitchell
2013-03-26  1:06             ` Duy Nguyen
2013-03-24 19:16 ` Ilari Liusvaara
2013-03-25 20:01 ` Junio C Hamano
2013-03-25 20:05   ` Jeff King
2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
2013-03-26 21:27     ` Junio C Hamano
2013-03-25 20:17   ` [PATCH 2/9] check_sha1_signature: check return value from read_istream Jeff King
2013-03-25 20:18   ` [PATCH 3/9] read_istream_filtered: propagate read error from upstream Jeff King
2013-03-25 20:21   ` [PATCH 4/9] avoid infinite loop in read_istream_loose Jeff King
2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
2013-03-25 21:10     ` Jonathan Nieder
2013-03-25 21:26       ` Jeff King
2013-03-27 20:27     ` Jeff King
2013-03-27 20:35       ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
2013-03-25 21:35     ` Eric Sunshine
2013-03-25 21:37       ` Jeff King
2013-03-25 21:39     ` Jonathan Nieder
2013-03-25 21:49       ` [PATCH v2 " Jeff King
2013-03-25 23:29         ` Jonathan Nieder
2013-03-26 21:38         ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 7/9] add tests for cloning corrupted repositories Jeff King
2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
2013-03-26 21:40     ` Junio C Hamano
2013-03-26 22:22       ` [PATCH 10/9] clone: leave repo in place after checkout errors Jeff King
2013-03-26 22:32         ` Jonathan Nieder
2013-03-27  1:03           ` Jeff King
2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
2013-03-26  0:53     ` Duy Nguyen
2013-03-26 22:24       ` Jeff King
2013-03-26 21:50     ` Junio C Hamano
2013-03-28  0:40     ` Duy Nguyen
2013-03-31  7:57       ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).