From: Thomas Rast <trast@student.ethz.ch>
To: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
Zygo Blaxell <zblaxell@gibbs.hungrycats.org>,
<git@vger.kernel.org>
Subject: Re: 'git add' corrupts repository if the working directory is modified as it runs
Date: Sat, 13 Feb 2010 15:39:53 +0100 [thread overview]
Message-ID: <201002131539.54142.trast@student.ethz.ch> (raw)
In-Reply-To: <20100213133951.GA14352@Knoppix>
On Saturday 13 February 2010 14:39:52 Ilari Liusvaara wrote:
> On Sat, Feb 13, 2010 at 06:12:38AM -0600, Jonathan Nieder wrote:
> >
> > With the current code, write_sha1_file() will hash the file, notice
> > that object is already in .git/objects, and return. With a
> > read-hash-copy loop, git would have to store a (compressed or
> > uncompressed) copy of the file somewhere in the meantime.
>
> It could be done by first reading the file and computing hash,
> if the hash matches existing object, return that hash. Otherwise
> read the file for object write, hashing it again and use that value
> for object ID.
That is still racy. The real problem is that the file is mmap()ed,
and git then first computes the SHA1 of that buffer, next it
compresses it.[*]
Due to the last sentence in the following snippet from mmap(2):
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the map-
ping are not visible to other processes mapping the same
file, and are not carried through to the underlying file.
It is unspecified whether changes made to the file after the
mmap() call are visible in the mapped region.
This is racy despite the use of MAP_PRIVATE: the mapped contents can
change at any time.
AFAICS there are only two possible solutions:
* Copy the file (possibly block-by-block) as we go, to make sure that
the data we SHA1 is the same we compress.
* Unpack and re-hash the compressed data to verify that the SHA1 is
correct. In case of failure either retry (but you could have to do
this infinitely often if the user just hates you!) or abort.
(Of course, in neither case does the user have any sort of guarantee
about what data ended up in the repository, but he never had that, we
only try to ensure repo consistency.)
[*] The "do we have this" check actually happens before the
compression, and that arm is thus race-free.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2010-02-13 14:40 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20100211234753.22574.48799.reportbug@gibbs.hungrycats.org>
2010-02-12 0:27 ` Bug#569505: git-core: 'git add' corrupts repository if the working directory is modified as it runs Jonathan Nieder
2010-02-12 1:23 ` Zygo Blaxell
2010-02-13 12:12 ` Jonathan Nieder
2010-02-13 13:39 ` Ilari Liusvaara
2010-02-13 14:39 ` Thomas Rast [this message]
2010-02-13 16:29 ` Ilari Liusvaara
2010-02-13 22:09 ` Dmitry Potapov
2010-02-13 22:37 ` Zygo Blaxell
2010-02-14 1:18 ` [PATCH] don't use mmap() to hash files Dmitry Potapov
2010-02-14 1:37 ` Junio C Hamano
2010-02-14 2:18 ` Dmitry Potapov
2010-02-14 3:14 ` Junio C Hamano
2010-02-14 11:14 ` Thomas Rast
2010-02-14 11:46 ` Junio C Hamano
2010-02-14 1:53 ` Johannes Schindelin
2010-02-14 2:00 ` Junio C Hamano
2010-02-14 2:42 ` Dmitry Potapov
2010-02-14 11:07 ` Jakub Narebski
2010-02-14 11:55 ` Paolo Bonzini
2010-02-14 18:10 ` Johannes Schindelin
2010-02-14 19:06 ` Dmitry Potapov
2010-02-14 19:22 ` Johannes Schindelin
2010-02-14 19:28 ` Johannes Schindelin
2010-02-14 19:56 ` Dmitry Potapov
2010-02-14 23:52 ` Zygo Blaxell
2010-02-15 5:05 ` Nicolas Pitre
2010-02-15 12:23 ` Dmitry Potapov
2010-02-15 7:48 ` Paolo Bonzini
2010-02-15 12:25 ` Dmitry Potapov
2010-02-14 19:55 ` Dmitry Potapov
2010-02-14 23:13 ` Avery Pennarun
2010-02-15 4:16 ` Nicolas Pitre
2010-02-15 5:01 ` Avery Pennarun
2010-02-15 5:48 ` Nicolas Pitre
2010-02-15 19:19 ` Avery Pennarun
2010-02-15 19:29 ` Nicolas Pitre
2010-02-14 3:05 ` [PATCH v2] " Dmitry Potapov
2010-02-18 1:16 ` [PATCH] Teach "git add" and friends to be paranoid Junio C Hamano
2010-02-18 1:20 ` Junio C Hamano
2010-02-18 15:32 ` Zygo Blaxell
2010-02-19 17:51 ` Junio C Hamano
2010-02-18 1:38 ` Jeff King
2010-02-18 4:55 ` Nicolas Pitre
2010-02-18 5:36 ` Junio C Hamano
2010-02-18 7:27 ` Wincent Colaiuta
2010-02-18 16:18 ` Zygo Blaxell
2010-02-18 18:12 ` Jonathan Nieder
2010-02-18 18:35 ` Junio C Hamano
2010-02-22 12:59 ` Paolo Bonzini
2010-02-22 13:33 ` Dmitry Potapov
2010-02-18 10:14 ` Thomas Rast
2010-02-18 18:16 ` Junio C Hamano
2010-02-18 19:58 ` Nicolas Pitre
2010-02-18 20:11 ` 16 gig, 350,000 file repository Bill Lear
2010-02-18 20:58 ` Nicolas Pitre
2010-02-19 9:27 ` Erik Faye-Lund
2010-02-22 22:20 ` Bill Lear
2010-02-22 22:31 ` Nicolas Pitre
2010-02-18 20:14 ` [PATCH] Teach "git add" and friends to be paranoid Peter Harris
2010-02-18 20:17 ` Junio C Hamano
2010-02-18 21:30 ` Nicolas Pitre
2010-02-19 1:04 ` Jonathan Nieder
2010-02-19 15:26 ` Zygo Blaxell
2010-02-19 17:52 ` Junio C Hamano
2010-02-19 19:08 ` Zygo Blaxell
2010-02-19 8:28 ` Dmitry Potapov
2010-02-19 17:52 ` Junio C Hamano
2010-02-20 19:23 ` Junio C Hamano
2010-02-21 7:21 ` Dmitry Potapov
2010-02-21 19:32 ` Junio C Hamano
2010-02-22 3:35 ` Dmitry Potapov
2010-02-22 6:59 ` Junio C Hamano
2010-02-22 12:25 ` Dmitry Potapov
2010-02-22 15:40 ` Nicolas Pitre
2010-02-22 16:01 ` Dmitry Potapov
2010-02-22 17:31 ` Zygo Blaxell
2010-02-22 18:01 ` Nicolas Pitre
2010-02-22 19:56 ` Junio C Hamano
2010-02-22 20:52 ` Nicolas Pitre
2010-02-22 18:05 ` Dmitry Potapov
2010-02-22 18:14 ` Nicolas Pitre
2010-02-14 1:36 ` mmap with MAP_PRIVATE is useless (was Re: Bug#569505: git-core: 'git add' corrupts repository if the working directory is modified as it runs) Paolo Bonzini
2010-02-14 1:53 ` mmap with MAP_PRIVATE is useless Junio C Hamano
2010-02-14 2:11 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201002131539.54142.trast@student.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@vger.kernel.org \
--cc=ilari.liusvaara@elisanet.fi \
--cc=jrnieder@gmail.com \
--cc=zblaxell@gibbs.hungrycats.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.