issues with PLE and/or scheduler.

* issues with PLE and/or scheduler.
@ 2011-12-20 20:41 Konrad Rzeszutek Wilk
  2011-12-20 20:41 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-20 20:41 UTC (permalink / raw)
  To: xen-devel, konrad.wilk, George.Dunlap, keir

Hey folks,

I am sending this on behalf of Andrew since our internal email system
is dropping all xen-devel mailing lists :-(

Anyhow:

This is with xen-4.1-testing cs 23201:1c89f7d29fbb
and using the default "credit" scheduler.

I've run into an interesting issue with HVM guests which
make use of Pause Loop Exiting (ie. on westmere systems;
and also on romley systems):  after yielding the cpu, guests
don't seem to receive timer interrupts correctly..

Some background: for historical reasons (ie old templates) we boot
OL/RHEL guests with the following settings:

kernel parameters: clock=pit nohpet nopmtimer
vm.cfg: timer_mode = 2

With PLE enabled, 2.6.32 guests will crash early on with:
 ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 # a few lines omitted..
 Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with
apic=debug

While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but
continue and lock up in the serial line initialization.

 ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 # continues until lock up here:
 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled

Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that jiffies
isn't advancing (or only 1 or 2 ticks are being received, which is insufficient
for "working"). This is on a "quiet" system with no other activity.
So, even though the guest has voluntarily yielded the cpu (through PLE),
I would still expect it to receive every clock tick (even with timer_mode=2)
as there is no other work to do on the system.

Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an
aside, so does setting ple_gap to 41 (ie prior to 21355:727ccaaa6cce) --
the perf counters show no exits happening, so this is equivalent to disabling PLE.]

I'm hoping someone who knows the scheduler well will be able to quickly
decide whether this is a bug or a feature...

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread