* [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
@ 2011-04-15 13:40 Stefan Hajnoczi
2011-04-15 15:05 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-04-15 13:40 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Khoa Huynh,
Badari Pulavarty, Christoph Hellwig
The Linux NFS client issues separate NFS requests for vectored direct
I/O writes. For example, a pwritev() with 8 elements results in 8 write
requests to the server. This is very inefficient and a kernel-side fix
is not trivial or likely to be available soon.
This patch detects files on NFS and uses the QEMU_AIO_MISALIGNED flag to
force requests to bounce through a linear buffer.
Khoa Huynh <khoa@us.ibm.com> reports the following ffsb benchmark
results over 1 Gbit Ethernet:
Test (threads=8) unpatched patched
(MB/s) (MB/s)
Large File Creates (bs=256 KB) 20.5 112.0
Sequential Reads (bs=256 KB) 58.7 112.0
Large File Creates (bs=8 KB) 5.2 5.8
Sequential Reads (bs=8 KB) 46.7 80.9
Random Reads (bs=8 KB) 8.7 23.4
Random Writes (bs=8 KB) 39.6 44.0
Mail Server (bs=8 KB) 10.2 23.6
Test (threads=1) unpatched patched
(MB/s) (MB/s)
Large File Creates (bs=256 KB) 14.5 49.8
Sequential Reads (bs=256 KB) 87.9 83.9
Large File Creates (bs=8 KB) 4.8 4.8
Sequential Reads (bs=8 KB) 23.2 23.1
Random Reads (bs=8 KB) 4.8 4.7
Random Writes (bs=8 KB) 9.4 12.8
Mail Server (bs=8 KB) 5.4 7.3
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/raw-posix.c | 55 ++++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 46 insertions(+), 9 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..40b7c61 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -49,8 +49,10 @@
#ifdef __linux__
#include <sys/ioctl.h>
#include <sys/param.h>
+#include <sys/vfs.h>
#include <linux/cdrom.h>
#include <linux/fd.h>
+#include <linux/magic.h>
#endif
#if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
#include <signal.h>
@@ -124,6 +126,7 @@ typedef struct BDRVRawState {
#endif
uint8_t *aligned_buf;
unsigned aligned_buf_size;
+ bool force_linearize;
#ifdef CONFIG_XFS
bool is_xfs : 1;
#endif
@@ -136,6 +139,32 @@ static int64_t raw_getlength(BlockDriverState *bs);
static int cdrom_reopen(BlockDriverState *bs);
#endif
+#if defined(__linux__)
+static bool is_vectored_io_slow(int fd, int open_flags)
+{
+ struct statfs stfs;
+ int ret;
+
+ do {
+ ret = fstatfs(fd, &stfs);
+ } while (ret != 0 && errno == EINTR);
+
+ /*
+ * Linux NFS client splits vectored direct I/O requests into separate NFS
+ * requests so it is faster to submit a single buffer instead.
+ */
+ if (!ret && stfs.f_type == NFS_SUPER_MAGIC && (open_flags & O_DIRECT)) {
+ return true;
+ }
+ return false;
+}
+#else /* !defined(__linux__) */
+static bool is_vectored_io_slow(int fd, int open_flags)
+{
+ return false;
+}
+#endif
+
static int raw_open_common(BlockDriverState *bs, const char *filename,
int bdrv_flags, int open_flags)
{
@@ -167,6 +196,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
}
s->fd = fd;
s->aligned_buf = NULL;
+ s->force_linearize = is_vectored_io_slow(fd, s->open_flags);
if ((bdrv_flags & BDRV_O_NOCACHE)) {
/*
@@ -536,20 +566,27 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
return NULL;
/*
+ * Check if buffers need to be copied into a single linear buffer.
+ */
+ if (s->force_linearize && qiov->niov > 1) {
+ type |= QEMU_AIO_MISALIGNED;
+ }
+
+ /*
* If O_DIRECT is used the buffer needs to be aligned on a sector
- * boundary. Check if this is the case or telll the low-level
+ * boundary. Check if this is the case or tell the low-level
* driver that it needs to copy the buffer.
*/
- if (s->aligned_buf) {
- if (!qiov_is_aligned(bs, qiov)) {
- type |= QEMU_AIO_MISALIGNED;
+ if (s->aligned_buf && !qiov_is_aligned(bs, qiov)) {
+ type |= QEMU_AIO_MISALIGNED;
+ }
+
#ifdef CONFIG_LINUX_AIO
- } else if (s->use_aio) {
- return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
- nb_sectors, cb, opaque, type);
-#endif
- }
+ if (s->use_aio && (type & QEMU_AIO_MISALIGNED) == 0) {
+ return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
+ nb_sectors, cb, opaque, type);
}
+#endif
return paio_submit(bs, s->fd, sector_num, qiov, nb_sectors,
cb, opaque, type);
--
1.7.4.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 13:40 [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS Stefan Hajnoczi
@ 2011-04-15 15:05 ` Christoph Hellwig
2011-04-15 15:26 ` Stefan Hajnoczi
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-15 15:05 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Anthony Liguori, qemu-devel, Khoa Huynh,
Badari Pulavarty, Christoph Hellwig
NAK. Just wait for the bloody NFS client fix to get in instead of
adding crap like that.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 15:05 ` Christoph Hellwig
@ 2011-04-15 15:26 ` Stefan Hajnoczi
2011-04-15 15:34 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-04-15 15:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, Badari Pulavarty
On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig <hch@lst.de> wrote:
> NAK. Just wait for the bloody NFS client fix to get in instead of
> adding crap like that.
That's totally fine if NFS client will be fixed in the near future but
this doesn't seem likely:
http://www.spinics.net/lists/linux-nfs/msg20462.html
Stefan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 15:26 ` Stefan Hajnoczi
@ 2011-04-15 15:34 ` Christoph Hellwig
2011-04-15 16:10 ` Anthony Liguori
2011-04-15 16:23 ` Badari Pulavarty
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-15 15:34 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, Badari Pulavarty, Christoph Hellwig
On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
> On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig <hch@lst.de> wrote:
> > NAK. ?Just wait for the bloody NFS client fix to get in instead of
> > adding crap like that.
>
> That's totally fine if NFS client will be fixed in the near future but
> this doesn't seem likely:
>
> http://www.spinics.net/lists/linux-nfs/msg20462.html
The code to use preadv/pwritev has been in qemu for over 2 years,
and it took people to notice the NFS slowdown until now, so don't
expect it to be fixed three days layer.
I can't event see you in the relevent threads arguing for getting it
fixed, so don't complain.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 15:34 ` Christoph Hellwig
@ 2011-04-15 16:10 ` Anthony Liguori
2011-04-15 16:17 ` Stefan Hajnoczi
2011-04-15 17:27 ` Christoph Hellwig
2011-04-15 16:23 ` Badari Pulavarty
1 sibling, 2 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-04-15 16:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, pbadari
On 04/15/2011 10:34 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
>> On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig<hch@lst.de> wrote:
>>> NAK. ?Just wait for the bloody NFS client fix to get in instead of
>>> adding crap like that.
>> That's totally fine if NFS client will be fixed in the near future but
>> this doesn't seem likely:
>>
>> http://www.spinics.net/lists/linux-nfs/msg20462.html
> The code to use preadv/pwritev has been in qemu for over 2 years,
> and it took people to notice the NFS slowdown until now, so don't
> expect it to be fixed three days layer.
>
> I can't event see you in the relevent threads arguing for getting it
> fixed, so don't complain.
In general, since we are userspace, we should try to run well on
whatever kernel we're on.
What I don't like about this patch is that likelihood of false
positives. We check for Linux and for NFS but that means an old
userspace is doing unoptimal things on newer kernels. Even if we had a
kernel version check, if the fix gets backported to an older kernel,
we'll still get a false positive.
Ideally, we'd be able to query the kernel to see whether we should
bounce or not. But AFAIK there is nothing even close to an interface to
do this today.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 16:10 ` Anthony Liguori
@ 2011-04-15 16:17 ` Stefan Hajnoczi
2011-04-15 17:27 ` Christoph Hellwig
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-04-15 16:17 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Khoa Huynh, pbadari,
Christoph Hellwig
On Fri, Apr 15, 2011 at 5:10 PM, Anthony Liguori <aliguori@us.ibm.com> wrote:
> On 04/15/2011 10:34 AM, Christoph Hellwig wrote:
>>
>> On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
>>>
>>> On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig<hch@lst.de> wrote:
>>>>
>>>> NAK. ?Just wait for the bloody NFS client fix to get in instead of
>>>> adding crap like that.
>>>
>>> That's totally fine if NFS client will be fixed in the near future but
>>> this doesn't seem likely:
>>>
>>> http://www.spinics.net/lists/linux-nfs/msg20462.html
>>
>> The code to use preadv/pwritev has been in qemu for over 2 years,
>> and it took people to notice the NFS slowdown until now, so don't
>> expect it to be fixed three days layer.
>>
>> I can't event see you in the relevent threads arguing for getting it
>> fixed, so don't complain.
>
> In general, since we are userspace, we should try to run well on whatever
> kernel we're on.
>
> What I don't like about this patch is that likelihood of false positives.
> We check for Linux and for NFS but that means an old userspace is doing
> unoptimal things on newer kernels. Even if we had a kernel version check,
> if the fix gets backported to an older kernel, we'll still get a false
> positive.
>
> Ideally, we'd be able to query the kernel to see whether we should bounce or
> not. But AFAIK there is nothing even close to an interface to do this
> today.
Bah, good point. I was planning to sneak in a uname() later but that
really doesn't cut it due to backports/downstream.
Stefan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 15:34 ` Christoph Hellwig
2011-04-15 16:10 ` Anthony Liguori
@ 2011-04-15 16:23 ` Badari Pulavarty
2011-04-15 17:29 ` Christoph Hellwig
2011-04-15 18:09 ` Anthony Liguori
1 sibling, 2 replies; 17+ messages in thread
From: Badari Pulavarty @ 2011-04-15 16:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh
On Fri, 2011-04-15 at 17:34 +0200, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig <hch@lst.de> wrote:
> > > NAK. ?Just wait for the bloody NFS client fix to get in instead of
> > > adding crap like that.
> >
> > That's totally fine if NFS client will be fixed in the near future but
> > this doesn't seem likely:
> >
> > http://www.spinics.net/lists/linux-nfs/msg20462.html
>
> The code to use preadv/pwritev has been in qemu for over 2 years,
> and it took people to notice the NFS slowdown until now, so don't
> expect it to be fixed three days layer.
True. That brings up a different question - whether we are doing
enough testing on mainline QEMU :(
> I can't event see you in the relevent threads arguing for getting it
> fixed, so don't complain.
I asked Stefan to work on a QEMU patch - since my QEMU skills are
limited and Stefan is more appropriate person to deal with QEMU :)
As mentioned earlier, we don't see performance with any temporary
solutions suggested so far (RPC table slot + ASYNC) comes anywhere
close to readv/writev support patch or linearized QEMU.
Thanks,
Badari
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 16:10 ` Anthony Liguori
2011-04-15 16:17 ` Stefan Hajnoczi
@ 2011-04-15 17:27 ` Christoph Hellwig
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:27 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, pbadari, Christoph Hellwig
On Fri, Apr 15, 2011 at 11:10:37AM -0500, Anthony Liguori wrote:
> In general, since we are userspace, we should try to run well on whatever
> kernel we're on.
It should run with the interfaces given, but hacking around performance
bugs in a gross way is not something qemu should do.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 16:23 ` Badari Pulavarty
@ 2011-04-15 17:29 ` Christoph Hellwig
2011-04-15 22:21 ` Badari Pulavarty
2011-04-15 18:09 ` Anthony Liguori
1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:29 UTC (permalink / raw)
To: Badari Pulavarty
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh, Christoph Hellwig
On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote:
> True. That brings up a different question - whether we are doing
> enough testing on mainline QEMU :(
It seems you're clearly not doing enough testing on any qemu. Even
the RHEL6 qemu has had preadv/pwritev since the first beta.
> I asked Stefan to work on a QEMU patch - since my QEMU skills are
> limited and Stefan is more appropriate person to deal with QEMU :)
>
> As mentioned earlier, we don't see performance with any temporary
> solutions suggested so far (RPC table slot + ASYNC) comes anywhere
> close to readv/writev support patch or linearized QEMU.
So push harder for the real patch. I haven't seen a clear NAK from
Trond, just that he'd like to avoid it. For extra brownie points get
him release his read/write rework and integrate it with that.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 16:23 ` Badari Pulavarty
2011-04-15 17:29 ` Christoph Hellwig
@ 2011-04-15 18:09 ` Anthony Liguori
2011-04-15 18:25 ` Badari Pulavarty
1 sibling, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2011-04-15 18:09 UTC (permalink / raw)
To: Badari Pulavarty
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh, Christoph Hellwig
On 04/15/2011 11:23 AM, Badari Pulavarty wrote:
> On Fri, 2011-04-15 at 17:34 +0200, Christoph Hellwig wrote:
>> On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
>>> On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig<hch@lst.de> wrote:
>>>> NAK. ?Just wait for the bloody NFS client fix to get in instead of
>>>> adding crap like that.
>>> That's totally fine if NFS client will be fixed in the near future but
>>> this doesn't seem likely:
>>>
>>> http://www.spinics.net/lists/linux-nfs/msg20462.html
>> The code to use preadv/pwritev has been in qemu for over 2 years,
>> and it took people to notice the NFS slowdown until now, so don't
>> expect it to be fixed three days layer.
> True. That brings up a different question - whether we are doing
> enough testing on mainline QEMU :(
The issue here is NFS, not QEMU. Moreover, the real problem is that
we're using O_DIRECT. O_DIRECT seems to result in nothing but problems
and it never seems to be tested well on any file system.
I think the fundamental problem we keep running into really boils down
to O_DIRECT being a second class interface within Linux.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 18:09 ` Anthony Liguori
@ 2011-04-15 18:25 ` Badari Pulavarty
0 siblings, 0 replies; 17+ messages in thread
From: Badari Pulavarty @ 2011-04-15 18:25 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh, Christoph Hellwig
On Fri, 2011-04-15 at 13:09 -0500, Anthony Liguori wrote:
> On 04/15/2011 11:23 AM, Badari Pulavarty wrote:
> > On Fri, 2011-04-15 at 17:34 +0200, Christoph Hellwig wrote:
> >> On Fri, Apr 15, 2011 at 04:26:41PM +0100, Stefan Hajnoczi wrote:
> >>> On Fri, Apr 15, 2011 at 4:05 PM, Christoph Hellwig<hch@lst.de> wrote:
> >>>> NAK. ?Just wait for the bloody NFS client fix to get in instead of
> >>>> adding crap like that.
> >>> That's totally fine if NFS client will be fixed in the near future but
> >>> this doesn't seem likely:
> >>>
> >>> http://www.spinics.net/lists/linux-nfs/msg20462.html
> >> The code to use preadv/pwritev has been in qemu for over 2 years,
> >> and it took people to notice the NFS slowdown until now, so don't
> >> expect it to be fixed three days layer.
> > True. That brings up a different question - whether we are doing
> > enough testing on mainline QEMU :(
>
> The issue here is NFS, not QEMU.
Sure. But we should have caught the regression on NFS when
preadv/pwritev change went into QEMU or before going in -- Isn't it ?
Since it was 2 years ago (like hch indicated) - we could have
fixed NFS long ago :)
> Moreover, the real problem is that
> we're using O_DIRECT. O_DIRECT seems to result in nothing but problems
> and it never seems to be tested well on any file system.
O_DIRECT was added for a specific use-case in mind - and its supposed
to handle only that case well (pre-allocated files with database usage -
where db has their own caching layer). That case is well tested
by various DB vendors on most (important) local filesystems.
You know very well why we are on O_DIRECT path :)
>
> I think the fundamental problem we keep running into really boils down
> to O_DIRECT being a second class interface within Linux.
Its by design. Its a special case for specific use.
Thanks,
Badari
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 17:29 ` Christoph Hellwig
@ 2011-04-15 22:21 ` Badari Pulavarty
2011-04-15 23:00 ` Anthony Liguori
2011-04-16 2:03 ` Christoph Hellwig
0 siblings, 2 replies; 17+ messages in thread
From: Badari Pulavarty @ 2011-04-15 22:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh
On 4/15/2011 10:29 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote:
>
>> True. That brings up a different question - whether we are doing
>> enough testing on mainline QEMU :(
>>
> It seems you're clearly not doing enough testing on any qemu. Even
> the RHEL6 qemu has had preadv/pwritev since the first beta.
>
Christoph,
When you say "you're" - you really meant RH right ? RH should have
caught this in their
regression year ago as part of their first beta. Correct ?
Unfortunately, you are picking on person who spent time find & analyzing
the regression,
narrowing the problem area and suggesting approaches to address the issue :(
BTW, I am not sure RH discussion is appropriate in this forum.
Thanks,
Badari
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 22:21 ` Badari Pulavarty
@ 2011-04-15 23:00 ` Anthony Liguori
2011-04-15 23:33 ` Badari Pulavarty
2011-04-16 8:46 ` Stefan Hajnoczi
2011-04-16 2:03 ` Christoph Hellwig
1 sibling, 2 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-04-15 23:00 UTC (permalink / raw)
To: pbadari
Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, Christoph Hellwig
On 04/15/2011 05:21 PM, pbadari@linux.vnet.ibm.com wrote:
> On 4/15/2011 10:29 AM, Christoph Hellwig wrote:
>> On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote:
>>> True. That brings up a different question - whether we are doing
>>> enough testing on mainline QEMU :(
>> It seems you're clearly not doing enough testing on any qemu. Even
>> the RHEL6 qemu has had preadv/pwritev since the first beta.
>
> Christoph,
>
> When you say "you're" - you really meant RH right ? RH should have
> caught this in their
> regression year ago as part of their first beta. Correct ?
>
> Unfortunately, you are picking on person who spent time find &
> analyzing the regression,
> narrowing the problem area and suggesting approaches to address the
> issue :(
This is a pretty silly discussion to be having.
The facts are:
1) NFS sucks with preadv/pwritev and O_DIRECT -- is anyone really surprised?
2) We could work around this in QEMU by doing something ugly
3) We have no way to detect when we no longer need a work around which
makes (2) really unappealing.
4) That leaves us with:
a) waiting for NFS to get fixed properly and just living with worse
performance on older kernels
b) having a user-tunable switch to enable bouncing
I really dislike the idea of (b) because we're stuck with it forever and
it's yet another switch for people to mistakenly depend on.
I'm still waiting to see performance data without O_DIRECT. I suspect
that using cache=writethrough will make most of this problem go away in
which case, we can just recommend that as a work around until NFS is
properly fixed.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 23:00 ` Anthony Liguori
@ 2011-04-15 23:33 ` Badari Pulavarty
2011-04-16 2:05 ` Christoph Hellwig
2011-04-16 8:46 ` Stefan Hajnoczi
1 sibling, 1 reply; 17+ messages in thread
From: Badari Pulavarty @ 2011-04-15 23:33 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel,
Khoa Huynh, pbadari, Christoph Hellwig
On 4/15/2011 4:00 PM, Anthony Liguori wrote:
> On 04/15/2011 05:21 PM, pbadari@linux.vnet.ibm.com wrote:
>> On 4/15/2011 10:29 AM, Christoph Hellwig wrote:
>>> On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote:
>>>> True. That brings up a different question - whether we are doing
>>>> enough testing on mainline QEMU :(
>>> It seems you're clearly not doing enough testing on any qemu. Even
>>> the RHEL6 qemu has had preadv/pwritev since the first beta.
>>
>> Christoph,
>>
>> When you say "you're" - you really meant RH right ? RH should have
>> caught this in their
>> regression year ago as part of their first beta. Correct ?
>>
>> Unfortunately, you are picking on person who spent time find &
>> analyzing the regression,
>> narrowing the problem area and suggesting approaches to address the
>> issue :(
>
> This is a pretty silly discussion to be having.
>
> The facts are:
>
> 1) NFS sucks with preadv/pwritev and O_DIRECT -- is anyone really
> surprised?
>
> 2) We could work around this in QEMU by doing something ugly
>
> 3) We have no way to detect when we no longer need a work around which
> makes (2) really unappealing.
>
> 4) That leaves us with:
> a) waiting for NFS to get fixed properly and just living with
> worse performance on older kernels
>
> b) having a user-tunable switch to enable bouncing
>
> I really dislike the idea of (b) because we're stuck with it forever
> and it's yet another switch for people to mistakenly depend on.
>
> I'm still waiting to see performance data without O_DIRECT. I suspect
> that using cache=writethrough will make most of this problem go away
> in which case, we can just recommend that as a work around until NFS
> is properly fixed.
We need to run through all cases and analyze the performance of
cache=writethrough. Our initial (smaller setup) analysis
indicates that its better than unpatched O_DIRECT - but ~5% slower for
sequential writes. But 30%+ slower for
random read/writes and mixed IO workloads. (In the past NFS O_SYNC is
performance extremely poor compared to
O_DIRECT with no scaling - older kernels due to congestion control issues).
Khoa would collect the data over next few days.
To be honest with you, we should kill cache=none and just optimize only
one case and live with it (like other commerical
hypervisor). :(
Thanks,
Badari
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 22:21 ` Badari Pulavarty
2011-04-15 23:00 ` Anthony Liguori
@ 2011-04-16 2:03 ` Christoph Hellwig
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-16 2:03 UTC (permalink / raw)
To: Badari Pulavarty
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh, Christoph Hellwig
On Fri, Apr 15, 2011 at 03:21:36PM -0700, Badari Pulavarty wrote:
> When you say "you're" - you really meant RH right ? RH should have caught
> this in their
> regression year ago as part of their first beta. Correct ?
With you I mean whoever cares. Which apparently is no one but IBM.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 23:33 ` Badari Pulavarty
@ 2011-04-16 2:05 ` Christoph Hellwig
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-04-16 2:05 UTC (permalink / raw)
To: Badari Pulavarty
Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Stefan Hajnoczi,
qemu-devel, Khoa Huynh, pbadari, Christoph Hellwig
On Fri, Apr 15, 2011 at 04:33:51PM -0700, Badari Pulavarty wrote:
> To be honest with you, we should kill cache=none and just optimize only one
> case and live with it (like other commerical
> hypervisor). :(
cache=none is the only sane mode for qemu, modulo bugs in nfs or similar weird
protocols that no sane person should use for VM storage.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
2011-04-15 23:00 ` Anthony Liguori
2011-04-15 23:33 ` Badari Pulavarty
@ 2011-04-16 8:46 ` Stefan Hajnoczi
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-04-16 8:46 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Khoa Huynh, pbadari,
Christoph Hellwig
On Sat, Apr 16, 2011 at 12:00 AM, Anthony Liguori <aliguori@us.ibm.com> wrote:
> 3) We have no way to detect when we no longer need a work around which makes
> (2) really unappealing.
I agree.
> 4) That leaves us with:
> a) waiting for NFS to get fixed properly and just living with worse
> performance on older kernels
>
> b) having a user-tunable switch to enable bouncing
>
> I really dislike the idea of (b) because we're stuck with it forever and
> it's yet another switch for people to mistakenly depend on.
The user-tunable switch is potentially interesting for performance
troubleshooting. We have seen another file system which has issues
with vectored direct I/O. It would have been much easier to identify
the problem by telling the user "Try running it with linearize=on and
see if it makes a difference".
But let's try harder on linux-nfs.
Stefan
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-04-16 8:46 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-15 13:40 [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS Stefan Hajnoczi
2011-04-15 15:05 ` Christoph Hellwig
2011-04-15 15:26 ` Stefan Hajnoczi
2011-04-15 15:34 ` Christoph Hellwig
2011-04-15 16:10 ` Anthony Liguori
2011-04-15 16:17 ` Stefan Hajnoczi
2011-04-15 17:27 ` Christoph Hellwig
2011-04-15 16:23 ` Badari Pulavarty
2011-04-15 17:29 ` Christoph Hellwig
2011-04-15 22:21 ` Badari Pulavarty
2011-04-15 23:00 ` Anthony Liguori
2011-04-15 23:33 ` Badari Pulavarty
2011-04-16 2:05 ` Christoph Hellwig
2011-04-16 8:46 ` Stefan Hajnoczi
2011-04-16 2:03 ` Christoph Hellwig
2011-04-15 18:09 ` Anthony Liguori
2011-04-15 18:25 ` Badari Pulavarty
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.