All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
@ 2016-01-26 16:41 Chris Friesen
  2016-01-26 16:45 ` Daniel P. Berrange
  2016-01-26 16:50 ` Paolo Bonzini
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Friesen @ 2016-01-26 16:41 UTC (permalink / raw)
  To: libvir-list, qemu-devel

Hi,

I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack.

If I live-migrate a guest with virtio network interfaces, I see a ~1200msec 
delay in processing the network packets, and several hundred of them get 
dropped.  I get the dropped packets, but I'm not sure why the delay is there.

I instrumented qemu and libvirt, and the strange thing is that this delay seems 
to happen before qemu actually starts doing any migration-related work.  (i.e. 
before qmp_migrate() is called)

Looking at my timestamps, the start of the glitch seems to coincide with 
libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of the 
glitch occurs when the migration is complete and we're up and running on the 
destination.

My question is, why doesn't qemu continue processing virtio packets while the 
dirty page scanning and memory transfer over the network is proceeding?

Thanks,
Chris

(Please CC me on responses, I'm not subscribed to the lists.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 16:41 [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug Chris Friesen
@ 2016-01-26 16:45 ` Daniel P. Berrange
  2016-01-26 20:31   ` Chris Friesen
  2016-01-26 16:50 ` Paolo Bonzini
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Berrange @ 2016-01-26 16:45 UTC (permalink / raw)
  To: Chris Friesen; +Cc: libvir-list, qemu-devel

On Tue, Jan 26, 2016 at 10:41:12AM -0600, Chris Friesen wrote:
> Hi,
> 
> I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack.
> 
> If I live-migrate a guest with virtio network interfaces, I see a ~1200msec
> delay in processing the network packets, and several hundred of them get
> dropped.  I get the dropped packets, but I'm not sure why the delay is
> there.
> 
> I instrumented qemu and libvirt, and the strange thing is that this delay
> seems to happen before qemu actually starts doing any migration-related
> work.  (i.e. before qmp_migrate() is called)
> 
> Looking at my timestamps, the start of the glitch seems to coincide with
> libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of the
> glitch occurs when the migration is complete and we're up and running on the
> destination.
> 
> My question is, why doesn't qemu continue processing virtio packets while
> the dirty page scanning and memory transfer over the network is proceeding?

The qemuDomainMigratePrepareTunnel3Params() method is responsible for
starting the QEMU process on the target host. This should not normally
have any impact on host networking connectivity, since the CPUs on that
target QEMU wouldn't be running at that point. Perhaps the mere act of
starting QEMU and plugging the TAP dev into the network on the target
host causes some issue though ? eg are you using a bridge that is doing
STP or something like that.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 16:41 [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug Chris Friesen
  2016-01-26 16:45 ` Daniel P. Berrange
@ 2016-01-26 16:50 ` Paolo Bonzini
  2016-01-26 17:21   ` Chris Friesen
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2016-01-26 16:50 UTC (permalink / raw)
  To: Chris Friesen, libvir-list, qemu-devel



On 26/01/2016 17:41, Chris Friesen wrote:
> I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack.
> 
> If I live-migrate a guest with virtio network interfaces, I see a
> ~1200msec delay in processing the network packets, and several hundred
> of them get dropped.  I get the dropped packets, but I'm not sure why
> the delay is there.
> 
> I instrumented qemu and libvirt, and the strange thing is that this
> delay seems to happen before qemu actually starts doing any
> migration-related work.  (i.e. before qmp_migrate() is called)
> 
> Looking at my timestamps, the start of the glitch seems to coincide with
> libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of
> the glitch occurs when the migration is complete and we're up and
> running on the destination.
> 
> My question is, why doesn't qemu continue processing virtio packets
> while the dirty page scanning and memory transfer over the network is
> proceeding?

QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
have no delay---only dropped packets.  Or am I missing something?

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 16:50 ` Paolo Bonzini
@ 2016-01-26 17:21   ` Chris Friesen
  2016-01-26 17:31     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Friesen @ 2016-01-26 17:21 UTC (permalink / raw)
  To: Paolo Bonzini, libvir-list, qemu-devel

On 01/26/2016 10:50 AM, Paolo Bonzini wrote:
>
>
> On 26/01/2016 17:41, Chris Friesen wrote:
>> I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack.
>>
>> If I live-migrate a guest with virtio network interfaces, I see a
>> ~1200msec delay in processing the network packets, and several hundred
>> of them get dropped.  I get the dropped packets, but I'm not sure why
>> the delay is there.
>>
>> I instrumented qemu and libvirt, and the strange thing is that this
>> delay seems to happen before qemu actually starts doing any
>> migration-related work.  (i.e. before qmp_migrate() is called)
>>
>> Looking at my timestamps, the start of the glitch seems to coincide with
>> libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of
>> the glitch occurs when the migration is complete and we're up and
>> running on the destination.
>>
>> My question is, why doesn't qemu continue processing virtio packets
>> while the dirty page scanning and memory transfer over the network is
>> proceeding?
>
> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
> have no delay---only dropped packets.  Or am I missing something?

I have separate timestamps embedded in the packet for when it was sent and when 
it was echoed back by the target (which is the one being migrated).  What I'm 
seeing is that packets to the guest are being sent every msec, but they get 
delayed somewhere for over a second on the way to the destination VM while the 
migration is in progress.  Once the migration is over, a bunch of packets get 
delivered to the app in the guest and are then processed all at once and echoed 
back to the sender in a big burst (and a bunch of packets are dropped, 
presumably due to a buffer overflowing somewhere).

For comparison, we have a DPDK-based fastpath NIC type that we added (sort of 
like vhost-net), and it continues to process packets while the dirty page 
scanning is going on.  Only the actual cutover affects it.

Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 17:21   ` Chris Friesen
@ 2016-01-26 17:31     ` Paolo Bonzini
  2016-01-26 17:49       ` Chris Friesen
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2016-01-26 17:31 UTC (permalink / raw)
  To: Chris Friesen, libvir-list, qemu-devel



On 26/01/2016 18:21, Chris Friesen wrote:
>>>
>>> My question is, why doesn't qemu continue processing virtio packets
>>> while the dirty page scanning and memory transfer over the network is
>>> proceeding?
>>
>> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
>> have no delay---only dropped packets.  Or am I missing something?
> 
> I have separate timestamps embedded in the packet for when it was sent
> and when it was echoed back by the target (which is the one being
> migrated).  What I'm seeing is that packets to the guest are being sent
> every msec, but they get delayed somewhere for over a second on the way
> to the destination VM while the migration is in progress.  Once the
> migration is over, a bunch of packets get delivered to the app in the
> guest and are then processed all at once and echoed back to the sender
> in a big burst (and a bunch of packets are dropped, presumably due to a
> buffer overflowing somewhere).

That doesn't exclude a bug somewhere in net/ code.  It doesn't pinpoint
it to QEMU or vhost-net.

In any case, what I would do is to use tracing at all levels (guest
kernel, QEMU, host kernel) for packet rx and tx, and find out at which
layer the hiccup appears.

Paolo

> For comparison, we have a DPDK-based fastpath NIC type that we added
> (sort of like vhost-net), and it continues to process packets while the
> dirty page scanning is going on.  Only the actual cutover affects it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 17:31     ` Paolo Bonzini
@ 2016-01-26 17:49       ` Chris Friesen
  2016-01-26 18:07         ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Friesen @ 2016-01-26 17:49 UTC (permalink / raw)
  To: Paolo Bonzini, libvir-list, qemu-devel

On 01/26/2016 11:31 AM, Paolo Bonzini wrote:
>
>
> On 26/01/2016 18:21, Chris Friesen wrote:
>>>>
>>>> My question is, why doesn't qemu continue processing virtio packets
>>>> while the dirty page scanning and memory transfer over the network is
>>>> proceeding?
>>>
>>> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
>>> have no delay---only dropped packets.  Or am I missing something?
>>
>> I have separate timestamps embedded in the packet for when it was sent
>> and when it was echoed back by the target (which is the one being
>> migrated).  What I'm seeing is that packets to the guest are being sent
>> every msec, but they get delayed somewhere for over a second on the way
>> to the destination VM while the migration is in progress.  Once the
>> migration is over, a bunch of packets get delivered to the app in the
>> guest and are then processed all at once and echoed back to the sender
>> in a big burst (and a bunch of packets are dropped, presumably due to a
>> buffer overflowing somewhere).
>
> That doesn't exclude a bug somewhere in net/ code.  It doesn't pinpoint
> it to QEMU or vhost-net.
>
> In any case, what I would do is to use tracing at all levels (guest
> kernel, QEMU, host kernel) for packet rx and tx, and find out at which
> layer the hiccup appears.

Is there a straightforward way to trace packet processing in qemu (preferably 
with millisecond-accurate timestamps)?

Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 17:49       ` Chris Friesen
@ 2016-01-26 18:07         ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-01-26 18:07 UTC (permalink / raw)
  To: Chris Friesen, libvir-list, qemu-devel



On 26/01/2016 18:49, Chris Friesen wrote:
>>
>> That doesn't exclude a bug somewhere in net/ code.  It doesn't pinpoint
>> it to QEMU or vhost-net.
>>
>> In any case, what I would do is to use tracing at all levels (guest
>> kernel, QEMU, host kernel) for packet rx and tx, and find out at which
>> layer the hiccup appears.
> 
> Is there a straightforward way to trace packet processing in qemu
> (preferably with millisecond-accurate timestamps)?

You can use tracing (docs/tracing.txt).  There are two possibilities:

1) use existing low-level virtio tracepoints: virtqueue_fill (end of tx
and rx operation) and virtqueue_pop (beginning of tx operation).

2) add tracepoints to hw/net/virtio-net.c (virtio_net_flush_tx,
virtio_net_tx_complete, virtio_net_receive) or net/tap.c (tap_receive,
tap_receive_iov, tap_send).

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
  2016-01-26 16:45 ` Daniel P. Berrange
@ 2016-01-26 20:31   ` Chris Friesen
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Friesen @ 2016-01-26 20:31 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: libvir-list, qemu-devel

On 01/26/2016 10:45 AM, Daniel P. Berrange wrote:
> On Tue, Jan 26, 2016 at 10:41:12AM -0600, Chris Friesen wrote:

>> My question is, why doesn't qemu continue processing virtio packets while
>> the dirty page scanning and memory transfer over the network is proceeding?
>
> The qemuDomainMigratePrepareTunnel3Params() method is responsible for
> starting the QEMU process on the target host. This should not normally
> have any impact on host networking connectivity, since the CPUs on that
> target QEMU wouldn't be running at that point. Perhaps the mere act of
> starting QEMU and plugging the TAP dev into the network on the target
> host causes some issue though ? eg are you using a bridge that is doing
> STP or something like that.

Well, looks like your suspicions were correct.  Our fast-path backend was 
mistakenly sending out a GARP when the backend was initialized as part of 
creating the qemu process on the target host.  Oops.

Thanks for your help.

Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-26 20:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26 16:41 [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug Chris Friesen
2016-01-26 16:45 ` Daniel P. Berrange
2016-01-26 20:31   ` Chris Friesen
2016-01-26 16:50 ` Paolo Bonzini
2016-01-26 17:21   ` Chris Friesen
2016-01-26 17:31     ` Paolo Bonzini
2016-01-26 17:49       ` Chris Friesen
2016-01-26 18:07         ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.