kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Guest OS migration and lost IPIs
@ 2020-08-05  0:07 Paul E. McKenney
  2020-08-07 12:36 ` Paolo Bonzini
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2020-08-05  0:07 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm

Hello, Paolo!

We are seeing occasional odd hangs, but only in cases where guest OSes
are being migrated.  Migrating more often makes the hangs happen more
frequently.

Added debug showed that the hung CPU is stuck trying to send an IPI (e.g.,
smp_call_function_single()).  The hung CPU thinks that it has sent the
IPI, but the destination CPU has interrupts enabled (-not- disabled,
enabled, as in ready, willing, and able to take interrupts).  In fact,
the destination CPU usually is going about its business as if nothing
was wrong, which makes me suspect that the IPI got lost somewhere along
the way.

I bumbled a bit through the qemu and KVM source, and didn't find anything
synchronizing IPIs and migrations, though given that I know pretty much
nothing about either qemu or KVM, this doesn't count for much.

The guest OS is running v5.2, so reasonably recent.  It is using
QEMU Guest Agent 2.12.0.  The host is also running v5.2 and providing
qemu-system-x86_64 version 2.11.0.

Is this a known problem?  Is there some debugging options I should enable?
Any other patch I should apply?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Guest OS migration and lost IPIs
  2020-08-05  0:07 Guest OS migration and lost IPIs Paul E. McKenney
@ 2020-08-07 12:36 ` Paolo Bonzini
  2020-08-08  2:02   ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Bonzini @ 2020-08-07 12:36 UTC (permalink / raw)
  To: paulmck; +Cc: kvm

On 05/08/20 02:07, Paul E. McKenney wrote:
> 
> We are seeing occasional odd hangs, but only in cases where guest OSes
> are being migrated.  Migrating more often makes the hangs happen more
> frequently.
> 
> Added debug showed that the hung CPU is stuck trying to send an IPI (e.g.,
> smp_call_function_single()).  The hung CPU thinks that it has sent the
> IPI, but the destination CPU has interrupts enabled (-not- disabled,
> enabled, as in ready, willing, and able to take interrupts).  In fact,
> the destination CPU usually is going about its business as if nothing
> was wrong, which makes me suspect that the IPI got lost somewhere along
> the way.
> 
> I bumbled a bit through the qemu and KVM source, and didn't find anything
> synchronizing IPIs and migrations, though given that I know pretty much
> nothing about either qemu or KVM, this doesn't count for much.

The code migrating the interrupt controller is in
kvm_x86_ops.sync_pir_to_irr (which calls vmx_sync_pir_to_irr) and
kvm_apic_get_state.  kvm_apic_get_state is called after CPUs are stopped.

It's possible that we're missing a kvm_x86_ops.sync_pir_to_irr call
somewhere.  It would be surprising but it would explain the symptoms
very well.

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Guest OS migration and lost IPIs
  2020-08-07 12:36 ` Paolo Bonzini
@ 2020-08-08  2:02   ` Paul E. McKenney
  0 siblings, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2020-08-08  2:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Fri, Aug 07, 2020 at 02:36:17PM +0200, Paolo Bonzini wrote:
> On 05/08/20 02:07, Paul E. McKenney wrote:
> > 
> > We are seeing occasional odd hangs, but only in cases where guest OSes
> > are being migrated.  Migrating more often makes the hangs happen more
> > frequently.
> > 
> > Added debug showed that the hung CPU is stuck trying to send an IPI (e.g.,
> > smp_call_function_single()).  The hung CPU thinks that it has sent the
> > IPI, but the destination CPU has interrupts enabled (-not- disabled,
> > enabled, as in ready, willing, and able to take interrupts).  In fact,
> > the destination CPU usually is going about its business as if nothing
> > was wrong, which makes me suspect that the IPI got lost somewhere along
> > the way.
> > 
> > I bumbled a bit through the qemu and KVM source, and didn't find anything
> > synchronizing IPIs and migrations, though given that I know pretty much
> > nothing about either qemu or KVM, this doesn't count for much.
> 
> The code migrating the interrupt controller is in
> kvm_x86_ops.sync_pir_to_irr (which calls vmx_sync_pir_to_irr) and
> kvm_apic_get_state.  kvm_apic_get_state is called after CPUs are stopped.
> 
> It's possible that we're missing a kvm_x86_ops.sync_pir_to_irr call
> somewhere.  It would be surprising but it would explain the symptoms
> very well.

Thank you for the info, Paolo!  I will see what I can find.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-08  2:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05  0:07 Guest OS migration and lost IPIs Paul E. McKenney
2020-08-07 12:36 ` Paolo Bonzini
2020-08-08  2:02   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).