All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Chamberland <Eric.Chamberland@giref.ulaval.ca>
Cc: git@vger.kernel.org
Subject: Re: GIT get corrupted on lustre
Date: Wed, 26 Dec 2012 17:51:52 -0500	[thread overview]
Message-ID: <20121226225152.GB11491@sigill.intra.peff.net> (raw)
In-Reply-To: <50D861EE.6020105@giref.ulaval.ca>

On Mon, Dec 24, 2012 at 09:08:46AM -0500, Eric Chamberland wrote:

> Doing a "git clone" always work fine, but when we "git pull" or "git
> gc" or "git fsck", often (1/5) the local repository get corrupted.
> for example, I got this error two days ago while doing "git gc":
> 
> error: index file .git/objects/pack/pack-7b43b1c613a851392aaf4f66916dff2577931576.idx is too small
> error: refs/heads/mail_seekable does not point to a valid object!
> [...]
> We think it could be related to the fact that we are on a *Lustre*
> filesystem, which I think doesn't fully support file locking.

I don't think locking is a problem here. The problem is that you have a
corrupt .idx file (the second error is almost certainly an effect of the
first one; git cannot look in the packfile, and therefore cannot find
the object the ref points to). But we do not ever lock the .idx files.
They are generated in tmpfiles and then atomically moved into place
using a hard link.

So if anything, I would suspect that lustre has trouble with the
write/fsync/close/link sequence. Is it possible that it does not keep
the ordering, and readers might see a linked file that is missing some
data? If you wait (or do some synchronizing operation on the filesystem,
like "sync", or an unmount/mount), does the repo later work, or is it
broken forever?

> #1) However, how can we *test* the filesystem (lustre) compatibility
> with git? (Is there a unit test we can run?)

Running "make test" in git.git would be a good start. You could also try
running the C program I'm including below. It repeatedly runs a
write/close/fsync/link sequence like the one that index-pack runs, and
then verifies the result. If it does not run forever without error, that
would be a sign of the possible ordering problem I mentioned above.

> #2) Is there a way to compile GIT to be compatible with lustre? (ex:
> no threads?)

This isn't a known issue, so I don't know offhand what compile flags
might help. The complete list is at the top of Makefile. You might try
with NO_PTHREADS=Yes, but I kind of doubt that threads are at work here.

> #3) If you *know* your filesystem doesn't allow file locking, how
> would you configure/compile GIT to work on it?

I think locking is a red herring here, as it is not used to create the
.idx files at all (and we don't do flock locking anyway; everything
happens via O_EXCL creation).

-Peff

-- >8 --
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

static int randomize(unsigned char *buf, int len)
{
  int i;
  len = rand() % len;
  for (i = 0; i < len; i++)
    buf[i] = rand() & 0xff;
  return len;
}

static int check_eof(int fd)
{
  int ch;
  int r = read(fd, &ch, 1);
  if (r < 0) {
    perror("read error after expected EOF");
    return -1;
  }
  if (r > 0) {
    fprintf(stderr, "extra byte after expected EOF");
    return -1;
  }
  return 0;
}

static int verify(int fd, const unsigned char *buf, int len)
{
  while (len) {
    char to_check[4096];
    int got = read(fd, to_check,
                   len < sizeof(to_check) ? len : sizeof(to_check));

    if (got < 0) {
      perror("unable to read");
      return -1;
    }
    if (got == 0) {
      fprintf(stderr, "premature EOF (%d bytes remaining)", len);
      return -1;
    }
    if (memcmp(buf, to_check, got)) {
      fprintf(stderr, "bytes differ");
      return -1;
    }

    buf += got;
    len -= got;
  }

  return check_eof(fd);
}

int write_in_full(int fd, const unsigned char *buf, int len)
{
  while (len) {
    int r = write(fd, buf, len);
    if (r < 0)
      return -1;
    buf += r;
    len -= r;
  }
  return 0;
}

int move_into_place(const char *old, const char *new)
{
  if (link(old, new) < 0) {
    perror("unable to create hard link");
    return 1;
  }
  unlink(old);
  return 0;
}

int main(void)
{
  while (1) {
    static unsigned char junk[1024*1024];
    int len = randomize(junk, sizeof(junk));
    int fd;

    /* clean up from any previous round */
    unlink("tmpfile");
    unlink("final.idx");

    fd = open("tmpfile", O_WRONLY|O_CREAT, 0666);
    if (fd < 0) {
      perror("unable to open tmpfile");
      return 1;
    }
    if (write_in_full(fd, junk, len) < 0 ||
        fsync(fd) < 0 ||
        close(fd) < 0) {
      perror("unable to write");
      return 1;
    }

    if (move_into_place("tmpfile", "final.idx") < 0)
      return 1;

    fd = open("final.idx", O_RDONLY);
    if (fd < 0) {
      perror("unable to open index file");
      return 1;
    }
    if (verify(fd, junk, len) < 0)
      return 1;
    close(fd);
  }
}

      parent reply	other threads:[~2012-12-26 22:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-24 14:08 GIT get corrupted on lustre Eric Chamberland
2012-12-24 14:48 ` Andreas Schwab
2012-12-24 15:11 ` Brian J. Murrell
2013-01-08 16:11   ` Eric Chamberland
2013-01-09 21:20     ` Eric Chamberland
2013-01-17 13:07       ` Eric Chamberland
2013-01-17 14:23         ` Philippe Vaucher
2013-01-17 16:30           ` Eric Chamberland
2013-01-17 16:40             ` Pyeron, Jason J CTR (US)
2013-01-17 16:41               ` Maxime Boissonneault
2013-01-17 17:17                 ` Pyeron, Jason J CTR (US)
2013-01-18 17:50                   ` Eric Chamberland
2013-01-21 13:29                     ` Erik Faye-Lund
2013-01-21 16:11                       ` Thomas Rast
2013-01-21 16:14                         ` Maxime Boissonneault
2013-01-21 16:20                           ` Thomas Rast
2013-01-21 18:54                         ` Brian J. Murrell
2013-01-21 19:29                           ` Thomas Rast
2013-01-22 21:31                             ` Eric Chamberland
2013-01-22 22:03                               ` Junio C Hamano
2013-01-22 22:14                               ` Thomas Rast
2013-01-22 22:46                                 ` Eric Chamberland
2013-01-23 14:45                                 ` Sébastien Boisvert
2013-01-23 14:50                                 ` Sébastien Boisvert
2013-01-23 15:23                                 ` Erik Faye-Lund
2013-01-23 15:32                                   ` Thomas Rast
2013-01-23 15:32                                     ` Erik Faye-Lund
2013-01-23 15:44                                       ` Thomas Rast
2013-01-23 15:54                                         ` Erik Faye-Lund
2013-01-23 17:23                                         ` Jonathan Nieder
2013-01-23 18:34                                 ` Sébastien Boisvert
2013-02-04 13:58                                   ` Eric Chamberland
2013-01-21 17:07                       ` Eric Chamberland
2013-01-21 18:28                         ` Eric Chamberland
2012-12-25  1:11 ` Greg Troxel
2012-12-26 22:51 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121226225152.GB11491@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Eric.Chamberland@giref.ulaval.ca \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.