From: Volker Lendecke <Volker.Lendecke@SerNet.DE> To: Andrew Morton <akpm@linux-foundation.org> Cc: Milosz Tanski <milosz@adfin.com>, linux-kernel@vger.kernel.org, Christoph Hellwig <hch@infradead.org>, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, Mel Gorman <mgorman@suse.de>, Tejun Heo <tj@kernel.org>, Jeff Moyer <jmoyer@redhat.com>, "Theodore Ts'o" <tytso@mit.edu>, Al Viro <viro@zeniv.linux.org.uk>, linux-api@vger.kernel.org, Michael Kerrisk <mtk.manpages@gmail.com>, linux-arch@vger.kernel.org, Dave Chinner <david@fromorbit.com> Subject: Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Date: Fri, 27 Mar 2015 09:02:51 +0100 [thread overview] Message-ID: <E1YbPEF-0088HN-S8@intern.SerNet.DE> (raw) In-Reply-To: <20150326230833.4ccfaebb.akpm@linux-foundation.org> On Thu, Mar 26, 2015 at 11:08:33PM -0700, Andrew Morton wrote: > On Fri, 27 Mar 2015 06:41:25 +0100 Volker Lendecke <Volker.Lendecke@sernet.de> wrote: > > > On Thu, Mar 26, 2015 at 08:28:24PM -0700, Andrew Morton wrote: > > > A thing which bugs me about pread2() is that it is specifically > > > tailored to applications which are able to use a partial read result. > > > ie, by sending it over the network. > > > > Can you explain what you mean by this? Samba gets a pread > > request from a client for some bytes. The client will be > > confused when we send less than requested although the file > > is long enough to satisfy all. > > Well it was my assumption that samba would be able to do something > useful with a partial read - pread() is allowed to return less than requested. No, this is not the case. Maybe my whole understanding of pread is wrong: I always thought that it won't return short if the file spans the pread range. EINTR nonwithstanding. > if (it's all in cache) I know I'm repeating myself: We have a race condition here. A small one, but it is racy. I've seen loaded systems where we spend seconds between becoming re-scheduled. In these systems, it will be the norm to block in later reads. And we don't have a good way to detect this situation afterwards and turn to threads as a precaution next time. > read it all now > else > ask a worker thread to read it all > > Bear in mind that these operations involve physical IO and large > memcpy's. Yes, a fincore() approach will consume more CPU but the > additional overhead will be relatively small. We have to pay this price for every single chunk. Without oplocks we get 10-byte read requests. This is hard to swallow for many vendors with small CPUs. With best regards, Volker Lendecke -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt@sernet.de
WARNING: multiple messages have this Message-ID (diff)
From: Volker Lendecke <Volker.Lendecke-PS7XAnAlDA+VvDNblw4Uiw@public.gmane.org> To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Cc: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-aio-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> Subject: Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Date: Fri, 27 Mar 2015 09:02:51 +0100 [thread overview] Message-ID: <E1YbPEF-0088HN-S8@intern.SerNet.DE> (raw) In-Reply-To: <20150326230833.4ccfaebb.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> On Thu, Mar 26, 2015 at 11:08:33PM -0700, Andrew Morton wrote: > On Fri, 27 Mar 2015 06:41:25 +0100 Volker Lendecke <Volker.Lendecke@sernet.de> wrote: > > > On Thu, Mar 26, 2015 at 08:28:24PM -0700, Andrew Morton wrote: > > > A thing which bugs me about pread2() is that it is specifically > > > tailored to applications which are able to use a partial read result. > > > ie, by sending it over the network. > > > > Can you explain what you mean by this? Samba gets a pread > > request from a client for some bytes. The client will be > > confused when we send less than requested although the file > > is long enough to satisfy all. > > Well it was my assumption that samba would be able to do something > useful with a partial read - pread() is allowed to return less than requested. No, this is not the case. Maybe my whole understanding of pread is wrong: I always thought that it won't return short if the file spans the pread range. EINTR nonwithstanding. > if (it's all in cache) I know I'm repeating myself: We have a race condition here. A small one, but it is racy. I've seen loaded systems where we spend seconds between becoming re-scheduled. In these systems, it will be the norm to block in later reads. And we don't have a good way to detect this situation afterwards and turn to threads as a precaution next time. > read it all now > else > ask a worker thread to read it all > > Bear in mind that these operations involve physical IO and large > memcpy's. Yes, a fincore() approach will consume more CPU but the > additional overhead will be relatively small. We have to pay this price for every single chunk. Without oplocks we get 10-byte read requests. This is hard to swallow for many vendors with small CPUs. With best regards, Volker Lendecke -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt-3ekOc4rQMZmzQB+pC5nmwQ@public.gmane.org
next prev parent reply other threads:[~2015-03-27 8:03 UTC|newest] Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-03-16 18:27 [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski 2015-03-16 18:27 ` [PATCH v7 1/5] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski 2015-03-16 18:27 ` Milosz Tanski 2015-03-16 21:05 ` Andreas Dilger 2015-03-16 21:05 ` Andreas Dilger 2015-03-16 18:27 ` [PATCH v7 2/5] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski 2015-03-16 18:27 ` Milosz Tanski 2015-03-16 18:27 ` [PATCH v7 3/5] x86: wire up preadv2 and pwritev2 Milosz Tanski 2015-03-16 18:27 ` Milosz Tanski 2015-03-16 18:27 ` [PATCH v7 4/5] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski 2015-03-16 18:27 ` Milosz Tanski 2015-03-16 18:27 ` [PATCH v7 5/5] xfs: add RWF_NONBLOCK support Milosz Tanski 2015-03-16 18:27 ` Milosz Tanski 2015-03-16 22:04 ` Dave Chinner 2015-03-16 18:32 ` [PATCH] Add preadv2/pwritev2 documentation Milosz Tanski 2015-03-27 16:49 ` Andrew Morton 2015-03-30 7:33 ` Christoph Hellwig 2015-03-30 7:33 ` Christoph Hellwig 2015-03-16 18:34 ` [PATCH] fstests: generic test for preadv2 behavior on linux Milosz Tanski 2015-03-16 18:34 ` Milosz Tanski 2015-03-16 21:07 ` Andreas Dilger 2015-03-16 21:07 ` Andreas Dilger 2015-03-16 22:03 ` Milosz Tanski 2015-03-16 22:02 ` Dave Chinner 2015-03-16 22:02 ` Dave Chinner 2015-03-16 22:11 ` Milosz Tanski 2015-03-16 22:56 ` Dave Chinner 2015-03-16 22:56 ` Dave Chinner 2015-03-26 11:55 ` [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Christoph Hellwig 2015-03-26 11:55 ` Christoph Hellwig 2015-03-26 19:12 ` Milosz Tanski 2015-03-26 19:12 ` Milosz Tanski 2015-03-27 2:26 ` Milosz Tanski 2015-03-27 2:29 ` Milosz Tanski 2015-03-27 2:29 ` Milosz Tanski 2015-03-27 3:28 ` Andrew Morton 2015-03-27 3:28 ` Andrew Morton 2015-03-27 5:41 ` Volker Lendecke 2015-03-27 5:41 ` Volker Lendecke 2015-03-27 6:08 ` Andrew Morton 2015-03-27 6:08 ` Andrew Morton 2015-03-27 8:02 ` Volker Lendecke [this message] 2015-03-27 8:02 ` Volker Lendecke 2015-03-27 8:12 ` Christoph Hellwig 2015-03-27 8:18 ` Christoph Hellwig 2015-03-27 8:18 ` Christoph Hellwig 2015-03-27 8:35 ` Andrew Morton 2015-03-27 8:35 ` Andrew Morton 2015-03-27 8:48 ` Christoph Hellwig 2015-03-27 9:01 ` Andrew Morton 2015-03-27 9:01 ` Andrew Morton 2015-03-27 9:44 ` Volker Lendecke 2015-03-27 15:58 ` Jeremy Allison 2015-03-27 15:58 ` Jeremy Allison 2015-03-27 16:30 ` Andrew Morton 2015-03-27 16:30 ` Andrew Morton 2015-03-27 16:30 ` Andrew Morton 2015-03-27 16:30 ` Andrew Morton 2015-03-27 16:39 ` Jeremy Allison 2015-03-27 16:39 ` Jeremy Allison 2015-03-27 16:39 ` Andrew Morton 2015-03-27 16:45 ` Milosz Tanski 2015-03-31 1:27 ` Milosz Tanski 2015-03-27 16:38 ` Milosz Tanski 2015-03-27 16:38 ` Milosz Tanski 2015-03-30 7:36 ` Christoph Hellwig 2015-03-30 17:19 ` Jeremy Allison 2015-03-30 17:19 ` Jeremy Allison 2015-03-30 22:51 ` Milosz Tanski 2015-03-30 20:26 ` Andrew Morton 2015-03-30 20:26 ` Andrew Morton 2015-03-30 20:32 ` Jeremy Allison 2015-03-30 20:37 ` Andrew Morton 2015-03-30 20:49 ` Jeremy Allison 2015-03-30 21:33 ` Andrew Morton 2015-03-30 22:35 ` Milosz Tanski 2015-03-30 22:49 ` Milosz Tanski 2015-03-30 22:57 ` Andrew Morton 2015-03-30 23:06 ` Milosz Tanski 2015-03-30 23:06 ` Milosz Tanski 2015-03-30 23:25 ` Milosz Tanski 2015-04-04 3:42 ` Andrew Morton 2015-04-06 3:53 ` Milosz Tanski 2015-04-06 3:53 ` Milosz Tanski 2015-03-30 23:09 ` Milosz Tanski 2015-03-27 15:21 ` Milosz Tanski 2015-03-27 15:21 ` Milosz Tanski 2015-03-27 17:04 ` Andrew Morton 2015-03-30 7:40 ` Christoph Hellwig 2015-03-30 7:40 ` Christoph Hellwig 2015-03-30 18:54 ` Andrew Morton 2015-03-30 22:40 ` Milosz Tanski 2015-03-30 22:50 ` Andrew Morton 2015-03-30 22:50 ` Andrew Morton
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=E1YbPEF-0088HN-S8@intern.SerNet.DE \ --to=volker.lendecke@sernet.de \ --cc=akpm@linux-foundation.org \ --cc=david@fromorbit.com \ --cc=hch@infradead.org \ --cc=jmoyer@redhat.com \ --cc=linux-aio@kvack.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mgorman@suse.de \ --cc=milosz@adfin.com \ --cc=mtk.manpages@gmail.com \ --cc=tj@kernel.org \ --cc=tytso@mit.edu \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.