All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] coroutines and block I/O considerations
@ 2011-07-19  8:06 Frediano Ziglio
  2011-07-19 10:10 ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Frediano Ziglio @ 2011-07-19  8:06 UTC (permalink / raw)
  To: qemu-devel

Hi,
  I'm exercise myself in block I/O layer and I decided to test
coroutine branch cause I find it easier to use instead of normal
callback. Looking at normal code there are a lot of rows in source to
save/restore state and declare callbacks and is not that easier to
understand the normal flow. At the end I would like to create a new
image format to get rid of some performance problem I encounter using
writethrough and snapshots. I have some questions regard block I/O and
also coroutines

1- threading model. I don't understand it. I can see that aio pool
routines does not contain locking code so I think aio layer is mainly
executed in a single thread. I saw introduction of some locking using
coroutines so I think coroutines are now called from different threads
and needs lock (current implementation serialize all device
operations)

2- memory considerations on coroutines. Beside coroutines allow more
readable code I wonder if somebody considered memory. For every
coroutines a different stack has to be allocated. For instance
ucontext and win32 implementation use 4mb. Assuming 128 concurrent AIO
this require about 512mb of ram (mostly only committed but not used
and coroutines are reused).

About snapshot and block i/o I think that using "external snapshot"
would help making some stuff easier. By "external snapshot" I mean
creating a new image with backing file as current image file and using
this new image for future operations. This would allow for instance
- support snapshot with every format (even raw)
- making snapshot backup using external programs (even from different
hosts using clustered file system and without many locking issues as
original image is now read-only)
- convert images live (just snapshot, qemu-img convert, remove snapshot)

Regards
  Frediano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] coroutines and block I/O considerations
  2011-07-19  8:06 [Qemu-devel] coroutines and block I/O considerations Frediano Ziglio
@ 2011-07-19 10:10 ` Kevin Wolf
  2011-07-19 10:57   ` Stefan Hajnoczi
  2011-07-19 13:15   ` Anthony Liguori
  0 siblings, 2 replies; 6+ messages in thread
From: Kevin Wolf @ 2011-07-19 10:10 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: qemu-devel

Am 19.07.2011 10:06, schrieb Frediano Ziglio:
>   I'm exercise myself in block I/O layer and I decided to test
> coroutine branch cause I find it easier to use instead of normal
> callback. Looking at normal code there are a lot of rows in source to
> save/restore state and declare callbacks and is not that easier to
> understand the normal flow. 

Yes. This is one of the reasons why we're trying to switch to
coroutines. QED is a prototype for a fully asynchronous callback-based
image format, and sometimes it's really hard to follow its code paths.
That the real functionality gets lost in the noise of transferring state
doesn't really help with readability either.

> At the end I would like to create a new
> image format to get rid of some performance problem I encounter using
> writethrough and snapshots. I have some questions regard block I/O and
> also coroutines

No. A new image format is the wrong answer, whatever the question may
be. :-)

If writethrough doesn't perform well with the existing format drivers,
fix the existing format drivers. You need very good reasons to convince
me that qcow2 can't do what your new format could do.

The solution for slow writethrough mode in qcow2 is probably to make
requests parallel, even if they touch metadata. This is a change that
becomes possible relatively easily once we have switched to coroutines.

What exactly is the problem with snapshots? Saving/loading internal
snapshots is too slow, or general performance with an image that has
snapshots? I think Luiz reported the first one a while ago, and it
should be easy enough to fix (use Qcow2Cache in writeback mode during
the refcount update).

> 1- threading model. I don't understand it. I can see that aio pool
> routines does not contain locking code so I think aio layer is mainly
> executed in a single thread. I saw introduction of some locking using
> coroutines so I think coroutines are now called from different threads
> and needs lock (current implementation serialize all device
> operations)

You can view coroutines as threads with cooperative scheduling. That is,
unlike threads a coroutine is never interrupted by a scheduler, but it
can only call qemu_coroutine_yield(), which transfers control to a
different coroutine. Compared to threads this simplifies locking a bit
because you exactly know at which point other code may run.

But of course, even though you know where it happens, you have other
code running in the middle of your function,  so there can be a need to
lock things, which is why there are things like a CoMutex.

They are still all running in the same thread.

> 2- memory considerations on coroutines. Beside coroutines allow more
> readable code I wonder if somebody considered memory. For every
> coroutines a different stack has to be allocated. For instance
> ucontext and win32 implementation use 4mb. Assuming 128 concurrent AIO
> this require about 512mb of ram (mostly only committed but not used
> and coroutines are reused).

128 concurrent requests is a lot. And even then, it's only virtual
memory. I doubt that we're actually using much more than we do in the
old code with the AIOCBs (which will disappear and become local
variables when we complete the conversion).

> About snapshot and block i/o I think that using "external snapshot"
> would help making some stuff easier. By "external snapshot" I mean
> creating a new image with backing file as current image file and using
> this new image for future operations. This would allow for instance
> - support snapshot with every format (even raw)
> - making snapshot backup using external programs (even from different
> hosts using clustered file system and without many locking issues as
> original image is now read-only)
> - convert images live (just snapshot, qemu-img convert, remove snapshot)

These are things that are actively worked on. snapshot_blkdev is a
monitor command that already exists and does exactly what you describe.
For the rest, live block copy and image streaming are the keywords that
you should be looking for. We've had quite some discussions on these in
the past few weeks. You may also be interested in this wiki page:
http://wiki.qemu.org/Features/LiveBlockMigration

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] coroutines and block I/O considerations
  2011-07-19 10:10 ` Kevin Wolf
@ 2011-07-19 10:57   ` Stefan Hajnoczi
  2011-07-25  8:56     ` Paolo Bonzini
  2011-07-19 13:15   ` Anthony Liguori
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 10:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, Frediano Ziglio, qemu-devel

On Tue, Jul 19, 2011 at 11:10 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 19.07.2011 10:06, schrieb Frediano Ziglio:
>> 2- memory considerations on coroutines. Beside coroutines allow more
>> readable code I wonder if somebody considered memory. For every
>> coroutines a different stack has to be allocated. For instance
>> ucontext and win32 implementation use 4mb. Assuming 128 concurrent AIO
>> this require about 512mb of ram (mostly only committed but not used
>> and coroutines are reused).
>
> 128 concurrent requests is a lot. And even then, it's only virtual
> memory. I doubt that we're actually using much more than we do in the
> old code with the AIOCBs (which will disappear and become local
> variables when we complete the conversion).

>From what I understand "committed" on Windows means that physical
pages have been allocated and pagefile space has been set aside:
http://msdn.microsoft.com/en-us/library/ms810627.aspx

On Linux memory is overcommitted and will not require swap space or
any actual pages.  This behavior can be configured differently IIRC
but the default is to be lazy about claiming memory resources so that
even 4 MB thread/coroutine stacks are not an issue.

The question is how can we get the same effect on Windows and does the
current Fibers implementation not already work?

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] coroutines and block I/O considerations
  2011-07-19 10:10 ` Kevin Wolf
  2011-07-19 10:57   ` Stefan Hajnoczi
@ 2011-07-19 13:15   ` Anthony Liguori
  1 sibling, 0 replies; 6+ messages in thread
From: Anthony Liguori @ 2011-07-19 13:15 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Frediano Ziglio, qemu-devel

On 07/19/2011 05:10 AM, Kevin Wolf wrote:
> Am 19.07.2011 10:06, schrieb Frediano Ziglio:
> They are still all running in the same thread.
>
>> 2- memory considerations on coroutines. Beside coroutines allow more
>> readable code I wonder if somebody considered memory. For every
>> coroutines a different stack has to be allocated. For instance
>> ucontext and win32 implementation use 4mb. Assuming 128 concurrent AIO
>> this require about 512mb of ram (mostly only committed but not used
>> and coroutines are reused).
>
> 128 concurrent requests is a lot. And even then, it's only virtual
> memory. I doubt that we're actually using much more than we do in the
> old code with the AIOCBs (which will disappear and become local
> variables when we complete the conversion).

A 4mb stack is probably overkill anyway.  It's easiest to just start 
with a large stack and then once all of the functionality is worked out, 
optimize to a smaller stack.

The same problem exists with using threads FWIW since the default thread 
stack is usually quite large.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] coroutines and block I/O considerations
  2011-07-19 10:57   ` Stefan Hajnoczi
@ 2011-07-25  8:56     ` Paolo Bonzini
  2011-07-25 10:00       ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2011-07-25  8:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Frediano Ziglio, qemu-devel

On 07/19/2011 12:57 PM, Stefan Hajnoczi wrote:
>  From what I understand "committed" on Windows means that physical
> pages have been allocated and pagefile space has been set aside:
> http://msdn.microsoft.com/en-us/library/ms810627.aspx

Yes, memory that is "reserved" on Windows is just a contiguous part of 
the address space that is set aside, like MAP_NORESERVE under Linux. 
Memory that is "committed" is really allocated.

> The question is how can we get the same effect on Windows and does the
> current Fibers implementation not already work?

Windows thread and fiber stacks have both a reserved and a committed 
part.  The dwStackSize argument to CreateFiber indeed represents 
_committed_ stack size, so we're now committing 4 MB of stack per fiber. 
  The maximum size that the stack can grow to is set to the 
(per-executable) default.

If you want to specify both the reserved and committed stack sizes, you 
can do that with CreateFiberEx.

http://msdn.microsoft.com/en-us/library/ms682406%28v=vs.85%29.aspx

4 MB is quite a lot of address space anyway to waste for a thread.  A 
coroutine should not need that much, even on Linux.  I think for Windows 
64 KB of initial stack size and 1 MB of maximum size should do (for 
Linux it would 1 MB overall).

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] coroutines and block I/O considerations
  2011-07-25  8:56     ` Paolo Bonzini
@ 2011-07-25 10:00       ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-07-25 10:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Frediano Ziglio, qemu-devel

On Mon, Jul 25, 2011 at 9:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 07/19/2011 12:57 PM, Stefan Hajnoczi wrote:
>>
>>  From what I understand "committed" on Windows means that physical
>> pages have been allocated and pagefile space has been set aside:
>> http://msdn.microsoft.com/en-us/library/ms810627.aspx
>
> Yes, memory that is "reserved" on Windows is just a contiguous part of the
> address space that is set aside, like MAP_NORESERVE under Linux. Memory that
> is "committed" is really allocated.
>
>> The question is how can we get the same effect on Windows and does the
>> current Fibers implementation not already work?
>
> Windows thread and fiber stacks have both a reserved and a committed part.
>  The dwStackSize argument to CreateFiber indeed represents _committed_ stack
> size, so we're now committing 4 MB of stack per fiber.  The maximum size
> that the stack can grow to is set to the (per-executable) default.
>
> If you want to specify both the reserved and committed stack sizes, you can
> do that with CreateFiberEx.
>
> http://msdn.microsoft.com/en-us/library/ms682406%28v=vs.85%29.aspx
>
> 4 MB is quite a lot of address space anyway to waste for a thread.  A
> coroutine should not need that much, even on Linux.  I think for Windows 64
> KB of initial stack size and 1 MB of maximum size should do (for Linux it
> would 1 MB overall).

I agree, let's make sure not to commit all this memory upfront.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-07-25 10:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-19  8:06 [Qemu-devel] coroutines and block I/O considerations Frediano Ziglio
2011-07-19 10:10 ` Kevin Wolf
2011-07-19 10:57   ` Stefan Hajnoczi
2011-07-25  8:56     ` Paolo Bonzini
2011-07-25 10:00       ` Stefan Hajnoczi
2011-07-19 13:15   ` Anthony Liguori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.