All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jeff Mitchell <jeffrey.mitchell@gmail.com>
Cc: "Ævar Arnfjörð" <avarab@gmail.com>, git@vger.kernel.org
Subject: Re: propagating repo corruption across clone
Date: Mon, 25 Mar 2013 10:56:44 -0400	[thread overview]
Message-ID: <20130325145644.GA16576@sigill.intra.peff.net> (raw)
In-Reply-To: <CAOx6V3YtM-e8-S41v1KnC+uSymYwZw8QBwiCJRYw0MYJXRjj-w@mail.gmail.com>

On Mon, Mar 25, 2013 at 09:43:23AM -0400, Jeff Mitchell wrote:

> > But I haven't seen exactly what the corruption is, nor exactly what
> > commands they used to clone. I've invited the blog author to give more
> > details in this thread.
> 
> The syncing was performed via a clone with git clone --mirror (and a
> git:// URL) and updates with git remote update.

OK. That should be resilient to corruption, then[1].

> So I should mention that my experiments after the fact were using
> local paths, but with --no-hardlinks.

Yeah, we will do a direct copy in that case, and there is nothing to
prevent corruption propagating.

> If you're saying that the transport is where corruption is supposed to
> be caught, then it's possible that we shouldn't see corruption
> propagate on an initial mirror clone across git://, and that something
> else was responsible for the trouble we saw with the repositories that
> got cloned after-the-fact.

Right. Either it was something else, or there is a bug in git's
protections (but I haven't been able to reproduce anything likely).

> But then I'd argue that this is non-obvious. In particular, when using
> --no-hardlinks, I wouldn't expect that behavior to be different with a
> straight path and with file://.

There are basically three levels of transport that can be used on a
local machine:

  1. Hard-linking (very fast, no redundancy).

  2. Byte-for-byte copy (medium speed, makes a separate copy of the
     data, but does not check the integrity of the original).

  3. Regular git transport, creating a pack (slowest, but should include
     redundancy checks).

Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
think the documentation in "git clone" could use some improvement in
that area.

> Something else: apparently one of my statements prompted joeyh to
> think about potential issues with backing up live git repos
> (http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
> Looking at that post made me realize that, when we were doing our
> initial thinking about the system three years ago, we made an
> assumption that, in fact, taking a .tar.gz of a repo as it's in the
> process of being written to or garbage collected or repacked could be
> problematic. This isn't a totally baseless assumption, as I once had a
> git repository that I was in the process of updating when I had a
> sudden power outage that suffered corruption. (It could totally have
> been the filesystem, of course, although it was a journaled file
> system.)

Yes, if you take a snapshot of a repository with rsync or tar, it may be
in an inconsistent state. Using the git protocol should always be
robust, but if you want to do it with other tools, you need to follow a
particular order:

  1. copy the refs (refs/ and packed-refs) first

  2. copy everything else (including object/)

That covers the case where somebody is pushing an update simultaneously
(you _may_ get extra objects in step 2 that they have not yet
referenced, but you will never end up with a case where you are
referencing objects that you did not yet transfer).

If it's possible that the repository might be repacked during your
transfer, I think the issue a bit trickier, as there's a moment where
the new packfile is renamed into place, and then the old ones are
deleted. Depending on the timing and how your readdir() implementation
behaves with respect to new and deleted entries, it might be possible to
miss both the new one appearing and the old ones disappearing. This is
quite a tight race to catch, I suspect, but if you were to rsync
objects/pack twice in a row, that would be sufficient.

For pruning, I think you could run into the opposite situation: you grab
the refs, somebody updates them with a history rewind (or branch
deletion), then somebody prunes and objects go away. Again, the timing
on this race is quite tight and it's unlikely in practice. I'm not sure
of a simple way to eliminate it completely.

Yet another option is to simply rsync the whole thing and then "git
fsck" the result. If it's not 100% good, just re-run the rsync. This is
simple and should be robust, but is more CPU intensive (you'll end up
re-checking all of the data on each update).

> So, we decided to use Git's built-in capabilities of consistency
> checking to our advantage (with, as it turns out, a flaw in our
> implementation). But the question remains: are we wrong about thinking
> that rsyncing or tar.gz live repositories in the middle of being
> pushed to/gc'd/repacked could result in a bogus backup?

No, I think you are right. If you do the refs-then-objects ordering,
that saves you from most of it, but I do think there are still some
races that exist during repacking or pruning.

-Peff

[1] I mentioned that clone-over-git:// is resilient to corruption. I
    think that is true for bit corruption, but my tests did show that we
    are not as careful about checking graph connectivity during clone as
    we are with fetch. The circumstances in which that would matter are
    quite unlikely, though.

  reply	other threads:[~2013-03-25 14:57 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-24 18:31 propagating repo corruption across clone Jeff King
2013-03-24 19:01 ` Ævar Arnfjörð Bjarmason
2013-03-24 19:23   ` Jeff King
2013-03-25 13:43     ` Jeff Mitchell
2013-03-25 14:56       ` Jeff King [this message]
2013-03-25 15:31         ` Duy Nguyen
2013-03-25 15:56           ` Jeff King
2013-03-25 16:32             ` Jeff Mitchell
2013-03-25 20:07               ` Jeff King
2013-03-26 13:43                 ` Jeff Mitchell
2013-03-26 16:55                   ` Jeff King
2013-03-26 21:59                     ` Philip Oakley
2013-03-26 22:03                       ` Jeff King
2013-03-26 23:20                     ` Rich Fromm
2013-03-27  1:25                       ` Jonathan Nieder
2013-03-27 18:23                         ` Rich Fromm
2013-03-27 19:49                           ` Jeff King
2013-03-27 20:04                             ` Jeff King
2013-03-27  3:47                       ` Junio C Hamano
2013-03-27  6:19                         ` Sitaram Chamarty
2013-03-27 15:03                           ` Junio C Hamano
2013-03-27 15:47                             ` Sitaram Chamarty
2013-03-27 18:51                         ` Rich Fromm
2013-03-27 19:13                           ` Junio C Hamano
2013-03-28 13:52                           ` Jeff Mitchell
2013-03-28 13:48                         ` Jeff Mitchell
2013-03-26  1:06             ` Duy Nguyen
2013-03-24 19:16 ` Ilari Liusvaara
2013-03-25 20:01 ` Junio C Hamano
2013-03-25 20:05   ` Jeff King
2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
2013-03-26 21:27     ` Junio C Hamano
2013-03-25 20:17   ` [PATCH 2/9] check_sha1_signature: check return value from read_istream Jeff King
2013-03-25 20:18   ` [PATCH 3/9] read_istream_filtered: propagate read error from upstream Jeff King
2013-03-25 20:21   ` [PATCH 4/9] avoid infinite loop in read_istream_loose Jeff King
2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
2013-03-25 21:10     ` Jonathan Nieder
2013-03-25 21:26       ` Jeff King
2013-03-27 20:27     ` Jeff King
2013-03-27 20:35       ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
2013-03-25 21:35     ` Eric Sunshine
2013-03-25 21:37       ` Jeff King
2013-03-25 21:39     ` Jonathan Nieder
2013-03-25 21:49       ` [PATCH v2 " Jeff King
2013-03-25 23:29         ` Jonathan Nieder
2013-03-26 21:38         ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 7/9] add tests for cloning corrupted repositories Jeff King
2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
2013-03-26 21:40     ` Junio C Hamano
2013-03-26 22:22       ` [PATCH 10/9] clone: leave repo in place after checkout errors Jeff King
2013-03-26 22:32         ` Jonathan Nieder
2013-03-27  1:03           ` Jeff King
2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
2013-03-26  0:53     ` Duy Nguyen
2013-03-26 22:24       ` Jeff King
2013-03-26 21:50     ` Junio C Hamano
2013-03-28  0:40     ` Duy Nguyen
2013-03-31  7:57       ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130325145644.GA16576@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jeffrey.mitchell@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.