KVM call minutes for Sept 14

All of lore.kernel.org
 help / color / mirror / Atom feed

* KVM call minutes for Sept 14
@ 2010-09-14 14:47 ` Chris Wright
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wright @ 2010-09-14 14:47 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

0.13
- if all goes well...tomorrow

stable tree
- please look at -stable to see what is missing (bugfixes)
  - esp. regressions from 0.12
- looking for dedicated stable maintainer/release manager
  - pick this discussion up next week

qed/qcow2
- increase concurrency, performance
- threading vs state machine
- avi doesn't like qed reliance on fsck
  - undermines value of error checking (errors become normal)
  - prefer preallocation and fsck just checks for leaked blocks
- just load and validate metadata
- options for correctness are
  - fsync at every data allocation
  - leak data blocks
  - scan
- qed is pure statemachine
  - state on stack, control flow vs function call
- common need to separate handle requests concurrently, issue async i/o
- most disk formats have similar metadata and methods
  - lookup cluster, read/write data
  - qed could be a path to cleaning up other formats (reusing)
- need an incremental way to improve qcow2 performance
  - threading doesn't seem to be the way to achieve this (incrementally)
- coroutines vs. traditional threads discussion
  - parallel (and locking) vs few well-defined preemption points
- plan for qed...attempt to merge in 0.14
  - online fsck support is all that's missing
  - add bdrv check callback, look for new patch series over the next week
- back to list with discussion...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] KVM call minutes for Sept 14
@ 2010-09-14 14:47 ` Chris Wright
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wright @ 2010-09-14 14:47 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

0.13
- if all goes well...tomorrow

stable tree
- please look at -stable to see what is missing (bugfixes)
  - esp. regressions from 0.12
- looking for dedicated stable maintainer/release manager
  - pick this discussion up next week

qed/qcow2
- increase concurrency, performance
- threading vs state machine
- avi doesn't like qed reliance on fsck
  - undermines value of error checking (errors become normal)
  - prefer preallocation and fsck just checks for leaked blocks
- just load and validate metadata
- options for correctness are
  - fsync at every data allocation
  - leak data blocks
  - scan
- qed is pure statemachine
  - state on stack, control flow vs function call
- common need to separate handle requests concurrently, issue async i/o
- most disk formats have similar metadata and methods
  - lookup cluster, read/write data
  - qed could be a path to cleaning up other formats (reusing)
- need an incremental way to improve qcow2 performance
  - threading doesn't seem to be the way to achieve this (incrementally)
- coroutines vs. traditional threads discussion
  - parallel (and locking) vs few well-defined preemption points
- plan for qed...attempt to merge in 0.14
  - online fsck support is all that's missing
  - add bdrv check callback, look for new patch series over the next week
- back to list with discussion...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM call minutes for Sept 14
  2010-09-14 14:47 ` [Qemu-devel] " Chris Wright
@ 2010-09-14 15:11   ` Anthony Liguori
  -1 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2010-09-14 15:11 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, qemu-devel

On 09/14/2010 09:47 AM, Chris Wright wrote:
> 0.13
> - if all goes well...tomorrow
>    

To tag, it may be thursday for announcement.  I need to run a regression 
run tonight.

> qed/qcow2
> - increase concurrency, performance
>    

To achieve performance, a block driver must: 1) support concurrent 
request handling 2) not hold the qemu_mutex for prolonged periods of time.

QED never does (2) and supports (1) in all circumstances except cluster 
allocation today.

qcow2 can do (1) for the data read/write portions of an I/O request.  
All metadata read/write is serialized.  It also does (2) for all 
metadata operations and for CoW operations.

These are implementation details though.  The real claim of QED is that 
by having fewer IO ops required to satisfy a request, it achieves better 
performance especially since it achieves zero syncs in the cluster 
allocation path.  qcow2 has two syncs in the cluster allocation path 
today.  One sync is due to the refcount table.  Another sync is due to 
the fact that it doesn't require fsck support.

We could sync() on cluster allocations in QED and we'd still have better 
performance than qcow2 on paper because we have less IO ops and fewer 
sync()s.  That would eliminate fsck.

However, since the design target is to have no sync()s in the fast path, 
we're starting with fsck.

> - threading vs state machine
> - avi doesn't like qed reliance on fsck
>    - undermines value of error checking (errors become normal)
>    - prefer preallocation and fsck just checks for leaked blocks
>    

We will provide performance data on fsck.  That's the next step.

> - just load and validate metadata
> - options for correctness are
>    - fsync at every data allocation
>    - leak data blocks
>    

I contend that leaking data blocks is incorrect and potentially guest 
exploitable so it's not an option IMHO.

>    - scan
> - qed is pure statemachine
>    - state on stack, control flow vs function call
> - common need to separate handle requests concurrently, issue async i/o
> - most disk formats have similar metadata and methods
>    - lookup cluster, read/write data
>    - qed could be a path to cleaning up other formats (reusing)
> - need an incremental way to improve qcow2 performance
>    - threading doesn't seem to be the way to achieve this (incrementally)
>    

Because qcow2 already implements a state machine and the qemu 
infrastructure is based on events.  We can incrementally split states in 
qcow2.  Once you've got explicit states, it's trivial to compact those 
states into control flow using coroutines.

OTOH, threading would probably require a full rewrite of qcow2 and a lot 
of the block layer.

Regards,

Anthony Liguori

> - coroutines vs. traditional threads discussion
>    - parallel (and locking) vs few well-defined preemption points
> - plan for qed...attempt to merge in 0.14
>    - online fsck support is all that's missing
>    - add bdrv check callback, look for new patch series over the next week
> - back to list with discussion...
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: KVM call minutes for Sept 14
@ 2010-09-14 15:11   ` Anthony Liguori
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2010-09-14 15:11 UTC (permalink / raw)
  To: Chris Wright; +Cc: qemu-devel, kvm

On 09/14/2010 09:47 AM, Chris Wright wrote:
> 0.13
> - if all goes well...tomorrow
>    

To tag, it may be thursday for announcement.  I need to run a regression 
run tonight.

> qed/qcow2
> - increase concurrency, performance
>    

To achieve performance, a block driver must: 1) support concurrent 
request handling 2) not hold the qemu_mutex for prolonged periods of time.

QED never does (2) and supports (1) in all circumstances except cluster 
allocation today.

qcow2 can do (1) for the data read/write portions of an I/O request.  
All metadata read/write is serialized.  It also does (2) for all 
metadata operations and for CoW operations.

These are implementation details though.  The real claim of QED is that 
by having fewer IO ops required to satisfy a request, it achieves better 
performance especially since it achieves zero syncs in the cluster 
allocation path.  qcow2 has two syncs in the cluster allocation path 
today.  One sync is due to the refcount table.  Another sync is due to 
the fact that it doesn't require fsck support.

We could sync() on cluster allocations in QED and we'd still have better 
performance than qcow2 on paper because we have less IO ops and fewer 
sync()s.  That would eliminate fsck.

However, since the design target is to have no sync()s in the fast path, 
we're starting with fsck.

> - threading vs state machine
> - avi doesn't like qed reliance on fsck
>    - undermines value of error checking (errors become normal)
>    - prefer preallocation and fsck just checks for leaked blocks
>    

We will provide performance data on fsck.  That's the next step.

> - just load and validate metadata
> - options for correctness are
>    - fsync at every data allocation
>    - leak data blocks
>    

I contend that leaking data blocks is incorrect and potentially guest 
exploitable so it's not an option IMHO.

>    - scan
> - qed is pure statemachine
>    - state on stack, control flow vs function call
> - common need to separate handle requests concurrently, issue async i/o
> - most disk formats have similar metadata and methods
>    - lookup cluster, read/write data
>    - qed could be a path to cleaning up other formats (reusing)
> - need an incremental way to improve qcow2 performance
>    - threading doesn't seem to be the way to achieve this (incrementally)
>    

Because qcow2 already implements a state machine and the qemu 
infrastructure is based on events.  We can incrementally split states in 
qcow2.  Once you've got explicit states, it's trivial to compact those 
states into control flow using coroutines.

OTOH, threading would probably require a full rewrite of qcow2 and a lot 
of the block layer.

Regards,

Anthony Liguori

> - coroutines vs. traditional threads discussion
>    - parallel (and locking) vs few well-defined preemption points
> - plan for qed...attempt to merge in 0.14
>    - online fsck support is all that's missing
>    - add bdrv check callback, look for new patch series over the next week
> - back to list with discussion...
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-14 15:11   ` [Qemu-devel] " Anthony Liguori
  (?)
@ 2010-09-15  8:30   ` Kevin Wolf
  2010-09-15 12:26     ` Anthony Liguori
  -1 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2010-09-15  8:30 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

Am 14.09.2010 17:11, schrieb Anthony Liguori:
> On 09/14/2010 09:47 AM, Chris Wright wrote:
>> 0.13
>> - if all goes well...tomorrow
>>    
> 
> To tag, it may be thursday for announcement.  I need to run a regression 
> run tonight.
> 
>> qed/qcow2
>> - increase concurrency, performance
>>    
> 
> To achieve performance, a block driver must: 1) support concurrent 
> request handling 2) not hold the qemu_mutex for prolonged periods of time.
> 
> QED never does (2) and supports (1) in all circumstances except cluster 
> allocation today.
> 
> qcow2 can do (1) for the data read/write portions of an I/O request.  
> All metadata read/write is serialized.  It also does (2) for all 
> metadata operations and for CoW operations.
> 
> These are implementation details though.  The real claim of QED is that 
> by having fewer IO ops required to satisfy a request, it achieves better 
> performance especially since it achieves zero syncs in the cluster 
> allocation path.  qcow2 has two syncs in the cluster allocation path 
> today.  One sync is due to the refcount table.  Another sync is due to 
> the fact that it doesn't require fsck support.

The refcount table sync is the sync that allows not doing an fsck. For a
simple cluster allocation (no L2 allocation, no COW), we only have one
sync (which is still one sync too much in this path, so we must move it).

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15  8:30   ` Kevin Wolf
@ 2010-09-15 12:26     ` Anthony Liguori
  2010-09-15 12:38       ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2010-09-15 12:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Chris Wright, qemu-devel, kvm

On 09/15/2010 03:30 AM, Kevin Wolf wrote:
> Am 14.09.2010 17:11, schrieb Anthony Liguori:
>    
>> On 09/14/2010 09:47 AM, Chris Wright wrote:
>>      
>>> 0.13
>>> - if all goes well...tomorrow
>>>
>>>        
>> To tag, it may be thursday for announcement.  I need to run a regression
>> run tonight.
>>
>>      
>>> qed/qcow2
>>> - increase concurrency, performance
>>>
>>>        
>> To achieve performance, a block driver must: 1) support concurrent
>> request handling 2) not hold the qemu_mutex for prolonged periods of time.
>>
>> QED never does (2) and supports (1) in all circumstances except cluster
>> allocation today.
>>
>> qcow2 can do (1) for the data read/write portions of an I/O request.
>> All metadata read/write is serialized.  It also does (2) for all
>> metadata operations and for CoW operations.
>>
>> These are implementation details though.  The real claim of QED is that
>> by having fewer IO ops required to satisfy a request, it achieves better
>> performance especially since it achieves zero syncs in the cluster
>> allocation path.  qcow2 has two syncs in the cluster allocation path
>> today.  One sync is due to the refcount table.  Another sync is due to
>> the fact that it doesn't require fsck support.
>>      
> The refcount table sync is the sync that allows not doing an fsck. For a
> simple cluster allocation (no L2 allocation, no COW), we only have one
> sync (which is still one sync too much in this path, so we must move it).
>    

Don't you have to write both a reference count entry and update the L2 
entry?  Both calls would be bdrv_pwrite_sync, no?

Regards,

Anthony Liguori

> Kevin
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15 12:26     ` Anthony Liguori
@ 2010-09-15 12:38       ` Kevin Wolf
  2010-09-15 13:21         ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2010-09-15 12:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

Am 15.09.2010 14:26, schrieb Anthony Liguori:
> On 09/15/2010 03:30 AM, Kevin Wolf wrote:
>> Am 14.09.2010 17:11, schrieb Anthony Liguori:
>>    
>>> On 09/14/2010 09:47 AM, Chris Wright wrote:
>>>      
>>>> 0.13
>>>> - if all goes well...tomorrow
>>>>
>>>>        
>>> To tag, it may be thursday for announcement.  I need to run a regression
>>> run tonight.
>>>
>>>      
>>>> qed/qcow2
>>>> - increase concurrency, performance
>>>>
>>>>        
>>> To achieve performance, a block driver must: 1) support concurrent
>>> request handling 2) not hold the qemu_mutex for prolonged periods of time.
>>>
>>> QED never does (2) and supports (1) in all circumstances except cluster
>>> allocation today.
>>>
>>> qcow2 can do (1) for the data read/write portions of an I/O request.
>>> All metadata read/write is serialized.  It also does (2) for all
>>> metadata operations and for CoW operations.
>>>
>>> These are implementation details though.  The real claim of QED is that
>>> by having fewer IO ops required to satisfy a request, it achieves better
>>> performance especially since it achieves zero syncs in the cluster
>>> allocation path.  qcow2 has two syncs in the cluster allocation path
>>> today.  One sync is due to the refcount table.  Another sync is due to
>>> the fact that it doesn't require fsck support.
>>>      
>> The refcount table sync is the sync that allows not doing an fsck. For a
>> simple cluster allocation (no L2 allocation, no COW), we only have one
>> sync (which is still one sync too much in this path, so we must move it).
>>    
> 
> Don't you have to write both a reference count entry and update the L2 
> entry?  Both calls would be bdrv_pwrite_sync, no?

No, we don't really care if the L2 entry is on disk. If the guest want
to have its data safe it needs to issue an explicit flush anyway. The
only thing we want to achieve with bdrv_write_sync is to maintain the
right order between metadata updates to survive a crash without corruption.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15 12:38       ` Kevin Wolf
@ 2010-09-15 13:21         ` Anthony Liguori
  2010-09-15 13:30           ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2010-09-15 13:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Chris Wright, qemu-devel, kvm

On 09/15/2010 07:38 AM, Kevin Wolf wrote:
> No, we don't really care if the L2 entry is on disk. If the guest want
> to have its data safe it needs to issue an explicit flush anyway. The
> only thing we want to achieve with bdrv_write_sync is to maintain the
> right order between metadata updates to survive a crash without corruption.
>    

Ah, yes, this is brand new :-)

I was looking at my QED branch which is a few weeks old.

Regards,

Anthony Liguori

> Kevin
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15 13:21         ` Anthony Liguori
@ 2010-09-15 13:30           ` Kevin Wolf
  2010-09-15 13:52             ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2010-09-15 13:30 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

Am 15.09.2010 15:21, schrieb Anthony Liguori:
> On 09/15/2010 07:38 AM, Kevin Wolf wrote:
>> No, we don't really care if the L2 entry is on disk. If the guest want
>> to have its data safe it needs to issue an explicit flush anyway. The
>> only thing we want to achieve with bdrv_write_sync is to maintain the
>> right order between metadata updates to survive a crash without corruption.
>>    
> 
> Ah, yes, this is brand new :-)
> 
> I was looking at my QED branch which is a few weeks old.

Well, the whole bdrv_pwrite_sync thing is new - with your benchmarking
you probably caught qcow2 at its worst performance in years. Initially I
just blindly converted everything to be on the safe side, and now we
need to optimize to get the performance back. There are probably some
more syncs that can be removed in less common paths.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15 13:30           ` Kevin Wolf
@ 2010-09-15 13:52             ` Anthony Liguori
  2010-09-15 13:57               ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2010-09-15 13:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Chris Wright, qemu-devel, kvm

On 09/15/2010 08:30 AM, Kevin Wolf wrote:
> Am 15.09.2010 15:21, schrieb Anthony Liguori:
>    
>> On 09/15/2010 07:38 AM, Kevin Wolf wrote:
>>      
>>> No, we don't really care if the L2 entry is on disk. If the guest want
>>> to have its data safe it needs to issue an explicit flush anyway. The
>>> only thing we want to achieve with bdrv_write_sync is to maintain the
>>> right order between metadata updates to survive a crash without corruption.
>>>
>>>        
>> Ah, yes, this is brand new :-)
>>
>> I was looking at my QED branch which is a few weeks old.
>>      
> Well, the whole bdrv_pwrite_sync thing is new - with your benchmarking
> you probably caught qcow2 at its worst performance in years.

FWIW, we queued a run reverting the sync() stuff entirely as we were 
aware of that.  Should have results this morning.

>   Initially I
> just blindly converted everything to be on the safe side, and now we
> need to optimize to get the performance back. There are probably some
> more syncs that can be removed in less common paths.
>    

Most likely.

Regards,

Anthony Liguori

> Kevin
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: KVM call minutes for Sept 14
  2010-09-15 13:52             ` Anthony Liguori
@ 2010-09-15 13:57               ` Kevin Wolf
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Wolf @ 2010-09-15 13:57 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

Am 15.09.2010 15:52, schrieb Anthony Liguori:
> On 09/15/2010 08:30 AM, Kevin Wolf wrote:
>> Am 15.09.2010 15:21, schrieb Anthony Liguori:
>>    
>>> On 09/15/2010 07:38 AM, Kevin Wolf wrote:
>>>      
>>>> No, we don't really care if the L2 entry is on disk. If the guest want
>>>> to have its data safe it needs to issue an explicit flush anyway. The
>>>> only thing we want to achieve with bdrv_write_sync is to maintain the
>>>> right order between metadata updates to survive a crash without corruption.
>>>>
>>>>        
>>> Ah, yes, this is brand new :-)
>>>
>>> I was looking at my QED branch which is a few weeks old.
>>>      
>> Well, the whole bdrv_pwrite_sync thing is new - with your benchmarking
>> you probably caught qcow2 at its worst performance in years.
> 
> FWIW, we queued a run reverting the sync() stuff entirely as we were 
> aware of that.  Should have results this morning.

Okay. I think that will be helpful, even outside the context of QED. I'd
be interested how much of a difference it really makes in your tests.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-15 13:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-14 14:47 KVM call minutes for Sept 14 Chris Wright
2010-09-14 14:47 ` [Qemu-devel] " Chris Wright
2010-09-14 15:11 ` Anthony Liguori
2010-09-14 15:11   ` [Qemu-devel] " Anthony Liguori
2010-09-15  8:30   ` Kevin Wolf
2010-09-15 12:26     ` Anthony Liguori
2010-09-15 12:38       ` Kevin Wolf
2010-09-15 13:21         ` Anthony Liguori
2010-09-15 13:30           ` Kevin Wolf
2010-09-15 13:52             ` Anthony Liguori
2010-09-15 13:57               ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.