All of lore.kernel.org
 help / color / mirror / Atom feed
* help? looking for limits on in-flight write operations for virtio-blk
@ 2014-08-25 19:42 Chris Friesen
  2014-08-26 10:34 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Friesen @ 2014-08-25 19:42 UTC (permalink / raw)
  To: rusty, mst, virtualization

Hi,

I'm trying to figure out what controls the number if in-flight virtio 
block operations when running linux in qemu on top of a linux host.

The problem is that we're trying to run as many VMs as possible, using 
ceph/rbd for the rootfs.  We've tripped over the fact the the memory 
consumption of qemu can spike noticeably when doing I/O (something as 
simple as "dd" from /dev/zero to a file can cause the memory consumption 
to go up by 200MB--with dozens of VMs this can add up enough to trigger 
the OOM killer.

It looks like the rbd driver in qemu allocates a number of buffers for 
each request, one of which is the full amount of data to read/write. 
Monitoring the "inflight" numbers in the guest I've seen it go as high 
as 184.

I'm trying to figure out if there are any limits on how high the 
inflight numbers can go, but I'm not having much luck.

I was hopeful when I saw qemu calling virtio_add_queue() with a queue 
size, but the queue size was 128 which didn't match the inflight numbers 
I was seeing, and after changing the queue size down to 16 I still saw 
the number of inflight requests go up to 184 and then the guest took a 
kernel panic in virtqueue_add_buf().

Can someone with more knowledge of how virtio block works point me in 
the right direction?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: help? looking for limits on in-flight write operations for virtio-blk
  2014-08-25 19:42 help? looking for limits on in-flight write operations for virtio-blk Chris Friesen
@ 2014-08-26 10:34 ` Stefan Hajnoczi
  2014-08-26 14:58   ` Chris Friesen
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2014-08-26 10:34 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Josh Durgin, Jeff Cody, Linux Virtualization, Michael S. Tsirkin

On Mon, Aug 25, 2014 at 8:42 PM, Chris Friesen
<chris.friesen@windriver.com> wrote:
> I'm trying to figure out what controls the number if in-flight virtio block
> operations when running linux in qemu on top of a linux host.
>
> The problem is that we're trying to run as many VMs as possible, using
> ceph/rbd for the rootfs.  We've tripped over the fact the the memory
> consumption of qemu can spike noticeably when doing I/O (something as simple
> as "dd" from /dev/zero to a file can cause the memory consumption to go up
> by 200MB--with dozens of VMs this can add up enough to trigger the OOM
> killer.
>
> It looks like the rbd driver in qemu allocates a number of buffers for each
> request, one of which is the full amount of data to read/write. Monitoring
> the "inflight" numbers in the guest I've seen it go as high as 184.
>
> I'm trying to figure out if there are any limits on how high the inflight
> numbers can go, but I'm not having much luck.
>
> I was hopeful when I saw qemu calling virtio_add_queue() with a queue size,
> but the queue size was 128 which didn't match the inflight numbers I was
> seeing, and after changing the queue size down to 16 I still saw the number
> of inflight requests go up to 184 and then the guest took a kernel panic in
> virtqueue_add_buf().
>
> Can someone with more knowledge of how virtio block works point me in the
> right direction?

You can use QEMU's I/O throttling as a workaround:
qemu -drive ...,iops=64

libvirt has XML syntax for specifying iops limits.  Please see
<iotune> at http://libvirt.org/formatdomain.html.

I have CCed Josh Durgin and Jeff Cody for ideas on reducing
block/rbd.c memory consumption.  Is it possible to pass a
scatter-gather list so I/O can be performed directly on guest memory?
This would also improve performance slightly.

Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: help? looking for limits on in-flight write operations for virtio-blk
  2014-08-26 10:34 ` Stefan Hajnoczi
@ 2014-08-26 14:58   ` Chris Friesen
  2014-08-27  5:43     ` Chris Friesen
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Friesen @ 2014-08-26 14:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Josh Durgin, Jeff Cody, Linux Virtualization, Michael S. Tsirkin

On 08/26/2014 04:34 AM, Stefan Hajnoczi wrote:
> On Mon, Aug 25, 2014 at 8:42 PM, Chris Friesen
> <chris.friesen@windriver.com> wrote:

>> I'm trying to figure out if there are any limits on how high the inflight
>> numbers can go, but I'm not having much luck.
>>
>> I was hopeful when I saw qemu calling virtio_add_queue() with a queue size,
>> but the queue size was 128 which didn't match the inflight numbers I was
>> seeing, and after changing the queue size down to 16 I still saw the number
>> of inflight requests go up to 184 and then the guest took a kernel panic in
>> virtqueue_add_buf().
>>
>> Can someone with more knowledge of how virtio block works point me in the
>> right direction?
>
> You can use QEMU's I/O throttling as a workaround:
> qemu -drive ...,iops=64
>
> libvirt has XML syntax for specifying iops limits.  Please see
> <iotune> at http://libvirt.org/formatdomain.html.

IOPS limits are better than nothing, but not an actual solution.  There 
are two problems that come to mind:

1) If you specify a burst value then a single burst can allocate a bunch 
of memory and it rarely drops back down after that (due to the usual 
malloc()/brk() interactions).

2) If the aggregate I/O load is higher than what the server can provide, 
the number of inflight requests can increase without bounds while still 
abiding by the configured IOPS value.

What I'd like to see (and may take a stab at implementing) is a cap on 
either inflight bytes or inflight IOPS.  One complication is that this 
requires hooking into the completion path to update the stats (and 
possibly unblock the I/O code) when an operation is done.

> I have CCed Josh Durgin and Jeff Cody for ideas on reducing
> block/rbd.c memory consumption.  Is it possible to pass a
> scatter-gather list so I/O can be performed directly on guest memory?
> This would also improve performance slightly.

It's not just rbd.  I've seen qemu RSS jump by 110MB when accessing 
qcow2 images on an NFS-mounted filesystem.  When the guest is configured 
with 512MB that's fairly significant.

Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: help? looking for limits on in-flight write operations for virtio-blk
  2014-08-26 14:58   ` Chris Friesen
@ 2014-08-27  5:43     ` Chris Friesen
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Friesen @ 2014-08-27  5:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Josh Durgin, Jeff Cody, Linux Virtualization, Michael S. Tsirkin

On 08/26/2014 08:58 AM, Chris Friesen wrote:

> What I'd like to see (and may take a stab at implementing) is a cap on
> either inflight bytes or inflight IOPS.  One complication is that this
> requires hooking into the completion path to update the stats (and
> possibly unblock the I/O code) when an operation is done.

Well, it looks like I won't be taking a stab at this after all.

It seems that modifying qemu to call mallopt() to set the trim/mmap 
thresholds to 128K is enough to minimize the increase in RSS and also 
drop it back down after an I/O burst.  For now this looks like it should 
be sufficient for our purposes.

I'm actually a bit surprised I didn't have to go lower, but it seems to 
work for both "dd" and dbench testcases so we'll give it a try.

Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-27  5:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-25 19:42 help? looking for limits on in-flight write operations for virtio-blk Chris Friesen
2014-08-26 10:34 ` Stefan Hajnoczi
2014-08-26 14:58   ` Chris Friesen
2014-08-27  5:43     ` Chris Friesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.