From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Myers <bpm@sgi.com>, Peter Zijlstra <peterz@infradead.org>,
Oleg Nesterov <oleg@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Dave Jones <davej@redhat.com>,
xfs@oss.sgi.com
Subject: Re: splice vs execve lockdep trace.
Date: Wed, 17 Jul 2013 06:43:35 +1000 [thread overview]
Message-ID: <20130716204335.GH11674@dastard> (raw)
In-Reply-To: <CA+55aFzTBUKStdZu1GhKoiYc2knybhiaUFr2By98QYew_STE=A@mail.gmail.com>
On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote:
> On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@sgi.com> wrote:
> >> >
> >> > And looking more at that, I'm actually starting to think this is an
> >> > XFS locking problem. XFS really should not call back to splice while
> >> > holding the inode lock.
>
> .. that was misleading, normally "inode lock" would be i_lock, but
> here I meant the XFS-specific i_iolock.
>
> But you clearly picked up on it:
>
> > CPU0 CPU1 CPU2
> > ---- ---- ----
> > lock(&sig->cred_guard_mutex);
> > lock(&pipe->mutex/1);
> > lock(&(&ip->io_lock)->mr_lock);
> > lock(&(&ip->io_lock)->mr_lock);
> > lock(&sig->cred_guard_mutex);
> > lock(&pipe->mutex/1);
>
> Yup.
>
> > I agree that fixing this in XFS seems to be the most reasonable plan,
> > and Dave's approach looks ok to me. Seems like patch 1 should go
> > through Al's tree, but we'll also need to commit it to the xfs tree
> > prerequisite to patch 2.
>
> Btw, is there some reason why XFS couldn't just use
> generic_file_splice_read() directly?
Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
don't use i_mutex for many things IO related, and so internal
locking is needed to serialise against stuff like truncate, hole
punching, etc, that are run through non-vfs interfaces.
> And splice has mmap semantics - the whole point of splice is about
> moving pages around, after all - so I *think* your current
> "xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.
No, that's just taking the ip->i_iolock in shared mode - that's less
serialisation than holding i_mutex as it allow parallel read
operations but still locks out concurrent buffered writes to the
file (i.e. posix atomic write vs read requirements)
> The pages will be shared by the pipe buffers anyway, so splicing by
> definition doesn't imply full data serialization (because the reading
> of the data itself will happen much later).
>
> So the per-page serialization done by readpage() should already be
> sufficient, no?
>
> I dunno. Maybe there's something I'm missing. XFS does seem to wrap
> all the other generic functions in that lock too, but since mmap() etc
> clearly have to be able to get things one page at a time (without any
> wrapping at higher layers), I'm just wondering whether splice_read
> might not be able to avoid it.
Read isn't the problem - it's write that's the deadlock issue...
Cheers,
Dave.
>
> Linus
>
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-07-16 20:43 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 1:53 splice vs execve lockdep trace Dave Jones
2013-07-16 2:32 ` Linus Torvalds
2013-07-16 2:38 ` Dave Jones
2013-07-16 3:25 ` Linus Torvalds
2013-07-16 3:28 ` Dave Jones
2013-07-16 5:31 ` Al Viro
2013-07-16 6:03 ` Dave Chinner
2013-07-16 6:16 ` Al Viro
2013-07-16 6:41 ` Dave Chinner
2013-07-16 6:50 ` Dave Chinner
2013-07-16 19:33 ` Ben Myers
2013-07-16 20:18 ` Linus Torvalds
2013-07-16 20:43 ` Dave Chinner [this message]
2013-07-16 21:02 ` Linus Torvalds
2013-07-17 4:06 ` Dave Chinner
2013-07-17 4:54 ` Linus Torvalds
2013-07-17 5:51 ` Dave Chinner
2013-07-17 16:03 ` Linus Torvalds
2013-07-17 23:40 ` Ben Myers
2013-07-18 0:17 ` Linus Torvalds
2013-07-18 3:42 ` Dave Chinner
2013-07-18 21:16 ` Ben Myers
2013-07-18 22:21 ` Ben Myers
2013-07-18 22:49 ` Dave Chinner
2013-07-18 3:17 ` Dave Chinner
2013-07-16 13:59 ` Vince Weaver
2013-07-16 15:02 ` Dave Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130716204335.GH11674@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=davej@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).