linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Smelkov <kirr@nexedi.com>
To: David Laight <David.Laight@ACULAB.COM>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Yongzhi Pan <panyongzhi@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	David Vrabel <david.vrabel@citrix.com>,
	Juergen Gross <jgross@suse.com>,
	Miklos Szeredi <miklos@szeredi.hu>, Tejun Heo <tj@kernel.org>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Arnd Bergmann <arnd@arndb.de>, Christoph Hellwig <hch@lst.de>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	Nikolaus Rath <Nikolaus@rath.org>,
	Han-Wen Nienhuys <hanwen@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
Date: Fri, 26 Apr 2019 07:45:33 +0000	[thread overview]
Message-ID: <20190426074522.GA16247@deco.navytux.spb.ru> (raw)
In-Reply-To: <4d366f81f90442cb9da7ad393680d004@AcuMS.aculab.com>

On Thu, Apr 25, 2019 at 10:04:34AM +0000, David Laight wrote:
> From: Kirill Smelkov
> > Sent: 24 April 2019 19:30
> > 
> > On Wed, Apr 24, 2019 at 10:26:55AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 24, 2019 at 10:19 AM Sasha Levin <sashal@kernel.org> wrote:
> > > >
> > > > Hm, I might be confusing something here but I see a bunch of patches
> > > > that convert existing callers mentioned in this patch to use
> > > > stream_open() which was introduced here.
> > >
> > > The only use of stream_open() upstream right now is the xenbus
> > > conversion, and that isn't actually a bugfix, because xenbus used to
> > > manually do that
> > >
> > >         filp->f_mode &= ~FMODE_ATOMIC_POS; /* cdev-style semantics */
> > >
> > > that stream_open() does.
> > >
> > > So no, there isn't "a bunch of patches" anywhere.
> > >
> > > There are *future* cleanups for 5.2 that will happen, and that might
> > > have hit linux-next. And there is at least one FUSE patch (again -
> > > pending, not upstream) that may get marked for stable.
> > >
> > > But I see nothing right now that makes it stable material yet.
> > 
> > Linus, thanks for explaining. Sasha, Greg, there is a FUSE patch that
> > should be stable material that will need this stream_open() thing. That
> > patch has just entered fuse.git#for-next today:
> > 
> > 	https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=bbd84f33652f
> > 
> > and will hopefully enter 5.2 when merge window opens. I agree we should
> > not blindly backport bulk stream_open conversions as performed by
> > stream_open.cocci, at least unless there is a bug report indicating that
> > it is actually required for a particular driver. On the other hand both
> > Xen and FUSE deadlocks were hit for real which justifies stable
> > propagation for their fixes.
> > 
> > You can read about the deadlock regression and the plan to fix it in
> > original "fs: stream_open - opener for stream-like files so that read
> > and write can run simultaneously without deadlock" patch (the 59/66
> > patch that was send in this thread), or here:
> > 
> > 	https://git.kernel.org/linus/10dce8af3422
> > 
> > 
> > Hope it clarifies things a bit,
> 
> I can also imagine drivers that expect accesses to be done using
> pread() and pwrite() - maybe only if the fd is shared.
> Provided accesses get the correct offset they can be concurrent.
> In fact they only need to update the offset in the file structure
> when they complete - they may do this already.
> 
> I know (I think) uclibc implementing pread() as lseek() + read()
> caused me grief - but that might just have been the extra system
> call overhead rather than any problems with the offset.

I'm not sure I understand your comment completely, but we convert to
stream_open only drivers that actually do _not_ use position at all, and
that were already using nonseekable_open, thus pread and pwrite were
already returning -ESPIPE for them (nonseekable_open clears
FMODE_{PREAD,PWRITE} and ksys_{pread,pwrite}64 check for that flag). We
also convert only drivers that use no_llseek for .llseek, so lseek
on those files is/was always returning -ESPIPE as well.

If a driver uses position in its read and write and has support for
pread/pwrite (FMODE_PREAD and FMODE_PWRITE), pread and pwrite are
already working _without_ file->f_pos locking - because those system
calls do not semantically update file->f_pos at all and thus do not take
file->f_pos_lock - i.e. pread/pwrite can be run simultaneously already.

If libc implements pread as lseek+read it will work for a single
user case  (single thread, or fd not shared between processes), but it
will break because of lseek+read non-atomicity if multiple preads are
simultaneously used from several threads. And also for such emulation
for multiple users case there is a chance for pread vs pwrite deadlock,
since those system calls are using read and write and read and write
take file->f_pos_lock.

Kirill

  reply	other threads:[~2019-04-26  7:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190424143341.27665-1-sashal@kernel.org>
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 56/66] kernel/sysctl.c: fix out-of-bounds access when setting file-max Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Sasha Levin
2019-04-24 16:34   ` Greg Kroah-Hartman
2019-04-24 16:40     ` Linus Torvalds
2019-04-24 17:02       ` Greg Kroah-Hartman
2019-04-24 17:19       ` Sasha Levin
2019-04-24 17:26         ` Linus Torvalds
2019-04-24 18:30           ` Kirill Smelkov
2019-04-25 10:04             ` David Laight
2019-04-26  7:45               ` Kirill Smelkov [this message]
2019-04-26 11:00                 ` David Laight
2019-04-26 18:20                   ` Kirill Smelkov
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 62/66] pin iocb through aio Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 63/66] aio: fold lookup_kiocb() into its sole caller Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 64/66] aio: keep io_event in aio_kiocb Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 65/66] aio: store event at final iocb_put() Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 66/66] Fix aio_poll() races Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190426074522.GA16247@deco.navytux.spb.ru \
    --to=kirr@nexedi.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=Julia.Lawall@lip6.fr \
    --cc=Nikolaus@rath.org \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=david.vrabel@citrix.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hanwen@google.com \
    --cc=hch@lst.de \
    --cc=jgross@suse.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mtk.manpages@gmail.com \
    --cc=panyongzhi@gmail.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).