All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zachary Turner <zturner@chromium.org>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Stefan Zager <szager@google.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Make the git codebase thread-safe
Date: Fri, 14 Feb 2014 11:16:50 -0800	[thread overview]
Message-ID: <CAAErz9j=_FpWLSyUk43pp8A6e7Ej0crT8ghW5-yxBEbGkd6O+A@mail.gmail.com> (raw)
In-Reply-To: <CAAErz9g7ND1htfk=yxRJJLbSEgBi4EV_AHC9uDRptugGWFWcXw@mail.gmail.com>

(Gah, sorry if you're receiving multiple emails to your personal
addresses, I need to get used to manually setting Plain-text mode
every time I send a message).

For the mixed read, we wouldn't be looking for another caller of
pread() (since it doesn't care what the file pointer is), but instead
a caller of read() or lseek() (since those do depend on the current
file pointer).  In index-pack.c, I see two possible culprits:

1) A call to xread() from inside fill()
2) A call to lseek in parse_pack_objects()

Do you think these could be related?  If so, maybe that opens up some
other solutions?

BTW, the version you posted isn't thread safe.  Suppose thread A and
thread B execute this function at the same time.  A executes through
the ReadFile(), but does not yet reset the second lseek64.  B then
executes the first lseek64(), storing off the modified file pointer.
Then A finishes, then B finishes.  At the end, the file pointer is
still modified.

On Fri, Feb 14, 2014 at 11:15 AM, Zachary Turner <zturner@chromium.org> wrote:
> For the mixed read, we wouldn't be looking for another caller of pread()
> (since it doesn't care what the file pointer is), but instead a caller of
> read() or lseek().  In index-pack.c, I see two possible culprits:
>
> 1) A call to xread() from inside fill()
> 2) A call to lseek in parse_pack_objects()
>
> Do you think these could be related?  If so, maybe that opens up some other
> solutions?
>
> BTW, the version you posted isn't thread safe.  Suppose thread A and thread
> B execute this function at the same time.  A executes through the
> ReadFile(), but does not yet reset the second lseek64.  B then executes the
> first lseek64(), storing off the modified file pointer.  Then A finishes,
> then B finishes.  At the end, the file pointer is still modified.
>
>
>
> On Fri, Feb 14, 2014 at 11:04 AM, Karsten Blees <karsten.blees@gmail.com>
> wrote:
>>
>> Am 14.02.2014 00:09, schrieb Zachary Turner:
>> > To elaborate a little bit more, you can verify with a sample program
>> > that ReadFile with OVERLAPPED does in fact modify the HANDLE's file
>> > position.  The documentation doesn't actually state one way or
>> > another.   My original attempt at a patch didn't have the ReOpenFile,
>> > and we experienced regular read corruption.  We scratched our heads
>> > over it for a bit, and then hypothesized that someone must be mixing
>> > read styles, which led to this ReOpenFile workaround, which
>> > incidentally also solved the corruption problems.  We wrote a similar
>> > sample program to verify that when using ReOpenHandle, and changing
>> > the file pointer of the duplicated handle, that the file pointer of
>> > the original handle is not modified.
>> >
>> > We did not actually try to identify the source of the mixed read
>> > styles, but it seems like the only possible explanation.
>> >
>> > On Thu, Feb 13, 2014 at 2:53 PM, Stefan Zager <szager@google.com> wrote:
>> >> On Thu, Feb 13, 2014 at 2:51 PM, Karsten Blees
>> >> <karsten.blees@gmail.com> wrote:
>> >>> Am 13.02.2014 19:38, schrieb Zachary Turner:
>> >>>
>> >>>> The only reason ReOpenFile is necessary at
>> >>>> all is because some code somewhere is mixing read-styles against the
>> >>>> same
>> >>>> fd.
>> >>>>
>> >>>
>> >>> I don't understand...ReadFile with OVERLAPPED parameter doesn't modify
>> >>> the HANDLE's file position, so you should be able to mix read()/pread()
>> >>> however you like (as long as read() is only called from one thread).
>> >>
>> >> That is, apparently, a bald-faced lie in the ReadFile API doc.  First
>> >> implementation didn't use ReOpenFile, and it crashed all over the
>> >> place.  ReOpenFile fixed it.
>> >>
>> >> Stefan
>>
>> Damn...you're right, multi-threaded git-index-pack works fine, but some
>> tests fail badly. Mixed reads would have to be from git_mmap, which is the
>> only other caller of pread().
>>
>> A simple alternative to ReOpenHandle is to reset the file pointer to its
>> original position, as in compat/pread.c::git_pread. Thus single-theaded code
>> can mix read()/pread() at will, but multi-threaded code has to use pread()
>> exclusively (which is usually the case anyway). A main thread using read()
>> and background threads using pread() (which is technically allowed by POSIX)
>> will fail with this solution.
>>
>> This version passes the test suite on msysgit:
>>
>> ----8<----
>> ssize_t mingw_pread(int fd, void *buf, size_t count, off64_t offset)
>> {
>>         DWORD bytes_read;
>>         OVERLAPPED overlapped;
>>         off64_t current;
>>         memset(&overlapped, 0, sizeof(overlapped));
>>         overlapped.Offset = (DWORD) offset;
>>         overlapped.OffsetHigh = (DWORD) (offset >> 32);
>>
>>         current = lseek64(fd, 0, SEEK_CUR);
>>
>>         if (!ReadFile((HANDLE)_get_osfhandle(fd), buf, count, &bytes_read,
>> &overlapped)) {
>>                 errno = err_win_to_posix(GetLastError());
>>                 return -1;
>>         }
>>
>>         lseek64(fd, current, SEEK_SET);
>>
>>         return (ssize_t) bytes_read;
>> }
>>
>

  parent reply	other threads:[~2014-02-14 19:17 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12  1:54 Make the git codebase thread-safe Stefan Zager
2014-02-12  2:02 ` Robin H. Johnson
2014-02-12  3:43   ` Duy Nguyen
2014-02-12 11:00     ` Karsten Blees
2014-02-12 23:03       ` Mike Hommey
2014-02-13  0:06         ` Karsten Blees
2014-02-12 18:15     ` Stefan Zager
2014-02-12  2:11 ` Duy Nguyen
2014-02-12 18:12   ` Stefan Zager
2014-02-12 18:33     ` Matthieu Moy
2014-02-12 18:39       ` Stefan Zager
2014-02-12 18:50     ` David Kastrup
2014-02-12 19:02       ` Stefan Zager
2014-02-12 19:15         ` David Kastrup
2014-02-12 23:09           ` Mike Hommey
2014-02-13  6:04             ` David Kastrup
2014-02-13  9:34               ` Mike Hommey
2014-02-13  9:48                 ` Mike Hommey
2014-02-13  8:30           ` David Kastrup
2014-02-12 20:06     ` Junio C Hamano
2014-02-12 20:27       ` Stefan Zager
2014-02-12 23:05         ` Junio C Hamano
2014-02-12 11:59 ` Erik Faye-Lund
2014-02-12 18:20   ` Stefan Zager
2014-02-12 18:27     ` Erik Faye-Lund
2014-02-12 18:34       ` Stefan Zager
2014-02-12 18:37         ` Erik Faye-Lund
2014-02-12 19:22           ` Karsten Blees
2014-02-12 19:30             ` Stefan Zager
2014-02-13  8:27               ` Johannes Sixt
2014-02-13  8:38                 ` David Kastrup
2014-02-13 18:40                 ` Stefan Zager
2014-02-13 18:38             ` Zachary Turner
2014-02-13 22:51               ` Karsten Blees
2014-02-13 22:53                 ` Stefan Zager
2014-02-13 23:09                   ` Zachary Turner
2014-02-14 19:04                     ` Karsten Blees
     [not found]                       ` <CAAErz9g7ND1htfk=yxRJJLbSEgBi4EV_AHC9uDRptugGWFWcXw@mail.gmail.com>
2014-02-14 19:16                         ` Zachary Turner [this message]
2014-02-14 23:10                           ` Karsten Blees
2014-02-15  0:45                           ` Duy Nguyen
2014-02-15  0:50                             ` Stefan Zager
2014-02-15  0:56                               ` Duy Nguyen
2014-02-15  1:15                                 ` Zachary Turner
2014-02-15  1:39                                   ` Duy Nguyen
2014-02-18 17:55                                     ` Junio C Hamano
2014-02-18 18:14                                       ` Zachary Turner
2014-02-14 19:52                         ` Stefan Zager
2014-02-14 21:49                       ` Stefan Zager
2014-02-13  1:42 ` brian m. carlson
2019-04-02  0:52 ` Matheus Tavares
2019-04-02  1:07   ` Duy Nguyen
2019-04-02 10:30     ` David Kastrup
2019-04-02 11:35       ` Duy Nguyen
2019-04-02 11:52         ` David Kastrup
2019-04-02 19:06     ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAErz9j=_FpWLSyUk43pp8A6e7Ej0crT8ghW5-yxBEbGkd6O+A@mail.gmail.com' \
    --to=zturner@chromium.org \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    --cc=szager@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.