All of lore.kernel.org
 help / color / mirror / Atom feed
* dump with xen-unstable & linux 3.2.17
@ 2012-05-17 21:20 Ben Guthro
  2012-05-21 14:55 ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-17 21:20 UTC (permalink / raw)
  To: Xen Devel, Konrad Rzeszutek Wilk

I'm attempting to update the Xen I'm using, and am having an issue with S3.
Konrad - I know you've been down this path before - so I'm hoping you
might have some insight, given the stack below

The linux tree also has an older version of Konrad's acpi-s3
development branches (it was v7)

Upon resuming from S3 - Xen panics with the following stack. I'm
guessing it has something to do with some of the pcpu work going on
recently - but haven't been tracking this too closely, so am not sure.
The same kernel with Xen 4.0.x works OK.


(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu,
&cpus)' failed at sched_credit.c:477
(XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82c48011a793>] _csched_cpu_pick+0x135/0x552
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: 0000000000000010   rcx: 0000000000000010
(XEN) rdx: 000000ffff   rsi: 0000000000000010   rdi: 0000000000000000
(XEN) rbp: ffff83013e38fdd8   rsp: ffff83013e38fd08   r8:  0000000000000000
(XEN) r9:  000000000000003e   r10: ffff82c480232480   r11: 0000000000000246
(XEN) r12: ffff82c480262820   r13: 0000000000000001   r14: ffff82c4803022e0
(XEN) r15: ffff83014c6f0068   cr0: 000000008005003b   cr4: 00000000001026f0
(XEN) cr3: 0000000141a05000   cr2: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83013e38fd08:
(XEN)    0100000141a05000 ffff83013e38fd40 0000000000000297 ffff83013e38fd38
(XEN)    ffff82c480125929 ffff830148ac2000 ffff83013e38fd78 5400000000000002
(XEN)    0000000000000282 ffff83013e38fd68 ffff82c480125929 ffff830148ac2000
(XEN)    ffff83013e38fd98 ffff82c48012cd57 0000000000000000 0000000000000008
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000286 ffff83014c6f0068 ffff83014c6f0068 0000000000000001
(XEN)    ffff82c4803022e0 ffff83014c6f0068 ffff83013e38fde8 ffff82c48011abbe
(XEN)    ffff83013e38fe58 ffff82c48012399d ffff82c4803022e0 ffff82c4803022e0
(XEN)    ffff82c4803022e0 ffff8300aa0fd000 00000000000fd060 0000000000000246
(XEN)    ffff82c4801280c1 ffff8300aa0fd000 ffff82c4803022e0 ffff82c4802ec5c0
(XEN)    ffff83014c6f0068002dc2e820 ffff83013e38fe88 ffff82c480123c57
(XEN)    fffffffffffffffe ffff830148a36000 ffff8300aa0fd000 0000000000000000
(XEN)    ffff83013e38fef8 ffff82c4801063f5 ffff83013e38ff18 ffffffff810030d2
(XEN)    ffff8300aa0fd000 0000000000000000 ffff83013e38ff08 ffff82c480185ff0
(XEN)    ffffffff81ab0020 ffff8300aa0fd000 0000000000000001 000000000000
(XEN)    0000000000000000 ffff88002dc2e820 00007cfec1c700c7 ffff82c480228f78
(XEN)    ffffffff8100130a 0000000000000018 ffff88002dc2e820 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff88002786dda0 ffff88002dc2bdc0
(XEN)    0000000000000246 00000000ffffffff 0000000000000040 0000000000000000
(XEN)    0000000000000018 ffffffff8100130a 0000000000000000 0000000000000001
(XEN) Xen call trace:
(XEN)    [<ffff82c48011a793>] _csched_cpu_pick+0x135/0x552
(XEN)    [<ffff82c48011abbe>] csched_cpu_pick+0xe/0x10
(XEN)    [<ffff82c48012399d>] vcpu_migrate+0x19f/0x346
(XEN)    [<ffff82c480123c57>] vcpu_force_reschedule+0xa4/0xb6
(XEN)    [<ffff82c48f5>] do_vcpu_op+0x2c9/0x452
(XEN)    [<ffff82c480228f78>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu,
&cpus)' failed at sched_credit.c:477
(XEN) ****************************************

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-17 21:20 dump with xen-unstable & linux 3.2.17 Ben Guthro
@ 2012-05-21 14:55 ` Jan Beulich
  2012-05-21 19:12   ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2012-05-21 14:55 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 17.05.12 at 23:20, Ben Guthro <ben@guthro.net> wrote:
> I'm attempting to update the Xen I'm using, and am having an issue with S3.
> Konrad - I know you've been down this path before - so I'm hoping you
> might have some insight, given the stack below
> 
> The linux tree also has an older version of Konrad's acpi-s3
> development branches (it was v7)
> 
> Upon resuming from S3 - Xen panics with the following stack. I'm
> guessing it has something to do with some of the pcpu work going on
> recently - but haven't been tracking this too closely, so am not sure.

I don't think that's related, the more that if anything that work
was on the Dom0 kernel side, not the hypervisor (or not
recently).

> The same kernel with Xen 4.0.x works OK.

Anything known regarding 4.1.x?

> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu,
> &cpus)' failed at sched_credit.c:477

Assuming this is reproducible, could you try to get your serial
console problems sorted out so that you can obtain a full log
(there are quite a few dropped characters here).

At least I can't tell anything from the dumped data alone (other
than the first part of the ASSERT() expression being redundant
with the second part), so I'd also like to ask that you attach (or
make accessible another way) the xen-syms binary corresponding
to the xen.gz one used, so one can locate and analyze individual
variables in the registers and on stack.

Jan

> (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82c48011a793>] _csched_cpu_pick+0x135/0x552
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: 0000000000000010   rcx: 0000000000000010
> (XEN) rdx: 000000ffff   rsi: 0000000000000010   rdi: 0000000000000000
> (XEN) rbp: ffff83013e38fdd8   rsp: ffff83013e38fd08   r8:  0000000000000000
> (XEN) r9:  000000000000003e   r10: ffff82c480232480   r11: 0000000000000246
> (XEN) r12: ffff82c480262820   r13: 0000000000000001   r14: ffff82c4803022e0
> (XEN) r15: ffff83014c6f0068   cr0: 000000008005003b   cr4: 00000000001026f0
> (XEN) cr3: 0000000141a05000   cr2: 0000000000000000
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83013e38fd08:
> (XEN)    0100000141a05000 ffff83013e38fd40 0000000000000297 ffff83013e38fd38
> (XEN)    ffff82c480125929 ffff830148ac2000 ffff83013e38fd78 5400000000000002
> (XEN)    0000000000000282 ffff83013e38fd68 ffff82c480125929 ffff830148ac2000
> (XEN)    ffff83013e38fd98 ffff82c48012cd57 0000000000000000 0000000000000008
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000286 ffff83014c6f0068 ffff83014c6f0068 0000000000000001
> (XEN)    ffff82c4803022e0 ffff83014c6f0068 ffff83013e38fde8 ffff82c48011abbe
> (XEN)    ffff83013e38fe58 ffff82c48012399d ffff82c4803022e0 ffff82c4803022e0
> (XEN)    ffff82c4803022e0 ffff8300aa0fd000 00000000000fd060 0000000000000246
> (XEN)    ffff82c4801280c1 ffff8300aa0fd000 ffff82c4803022e0 ffff82c4802ec5c0
> (XEN)    ffff83014c6f0068002dc2e820 ffff83013e38fe88 ffff82c480123c57
> (XEN)    fffffffffffffffe ffff830148a36000 ffff8300aa0fd000 0000000000000000
> (XEN)    ffff83013e38fef8 ffff82c4801063f5 ffff83013e38ff18 ffffffff810030d2
> (XEN)    ffff8300aa0fd000 0000000000000000 ffff83013e38ff08 ffff82c480185ff0
> (XEN)    ffffffff81ab0020 ffff8300aa0fd000 0000000000000001 000000000000
> (XEN)    0000000000000000 ffff88002dc2e820 00007cfec1c700c7 ffff82c480228f78
> (XEN)    ffffffff8100130a 0000000000000018 ffff88002dc2e820 0000000000000000
> (XEN)    0000000000000000 0000000000000001 ffff88002786dda0 ffff88002dc2bdc0
> (XEN)    0000000000000246 00000000ffffffff 0000000000000040 0000000000000000
> (XEN)    0000000000000018 ffffffff8100130a 0000000000000000 0000000000000001
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48011a793>] _csched_cpu_pick+0x135/0x552
> (XEN)    [<ffff82c48011abbe>] csched_cpu_pick+0xe/0x10
> (XEN)    [<ffff82c48012399d>] vcpu_migrate+0x19f/0x346
> (XEN)    [<ffff82c480123c57>] vcpu_force_reschedule+0xa4/0xb6
> (XEN)    [<ffff82c48f5>] do_vcpu_op+0x2c9/0x452
> (XEN)    [<ffff82c480228f78>] syscall_enter+0xc8/0x122
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu,
> &cpus)' failed at sched_credit.c:477
> (XEN) ****************************************
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-21 14:55 ` Jan Beulich
@ 2012-05-21 19:12   ` Ben Guthro
  2012-05-21 19:49     ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-21 19:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel

On Mon, May 21, 2012 at 10:55 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 17.05.12 at 23:20, Ben Guthro <ben@guthro.net> wrote:
>> I'm attempting to update the Xen I'm using, and am having an issue with S3.
>> Konrad - I know you've been down this path before - so I'm hoping you
>> might have some insight, given the stack below
>>
>> The linux tree also has an older version of Konrad's acpi-s3
>> development branches (it was v7)
>>
>> Upon resuming from S3 - Xen panics with the following stack. I'm
>> guessing it has something to do with some of the pcpu work going on
>> recently - but haven't been tracking this too closely, so am not sure.
>
> I don't think that's related, the more that if anything that work
> was on the Dom0 kernel side, not the hypervisor (or not
> recently).

OK

>
>> The same kernel with Xen 4.0.x works OK.
>
> Anything known regarding 4.1.x?
>

Not at this time.

I'm attempting to update the hypervisor (and patches) used in
XenClient Enterprise (was NxTop)

Since we have a stack of patches that are not necessarily appropriate
for upstream (and some that just haven't been submitted yet) - there
is some porting involved in getting things running to the point that I
can test it.

If we can't figure this out with the logs - I can try to bisect, and
figure out where things broke...but that could be rather time
consuming, if I have to re-port those patches each time.


>> (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu,
>> &cpus)' failed at sched_credit.c:477
>
> Assuming this is reproducible, could you try to get your serial
> console problems sorted out so that you can obtain a full log
> (there are quite a few dropped characters here).
>
> At least I can't tell anything from the dumped data alone (other
> than the first part of the ASSERT() expression being redundant
> with the second part), so I'd also like to ask that you attach (or
> make accessible another way) the xen-syms binary corresponding
> to the xen.gz one used, so one can locate and analyze individual
> variables in the registers and on stack.

Please find a new log, xen.gz & syms here:

https://citrix.sharefile.com/d/s4ba432974874efd8

(I didn't want to attach large files to the mailing list)



FWIW, this is on an Intel Ivybridge SDP - but also seems to happen on
older platforms, as well.


Any insight is greatly appreciated.

Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-21 19:12   ` Ben Guthro
@ 2012-05-21 19:49     ` Ben Guthro
  2012-05-22 15:38       ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-21 19:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel

> On Mon, May 21, 2012 at 10:55 AM, Jan Beulich <JBeulich@suse.com> wrote:
> Anything known regarding 4.1.x?

Well, it looks like this happens without any of our patches - so that
may make bisecting much easier.
I'll try to get some more information.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-21 19:49     ` Ben Guthro
@ 2012-05-22 15:38       ` Ben Guthro
  2012-05-22 16:08         ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-22 15:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel

hmm...maintaining the serial console connection during a suspend /
resume seems to have been added sometime in the 4.2 development
timeframe.
Older changesets seem to just reboot on me, without much output.
This makes determining if it is the same crash rather problematic.


The crash against the unstable tip appears to be happening in the
VCPUOP_stop_periodic_timer case of the
do_vcpu_op() function

by the time it gets down to _csched_cpu_pick() - it asserts on the
first part of the ASSERT below:

    online = cpupool_scheduler_cpumask(vc->domain->cpupool);
    cpumask_and(&cpus, online, vc->cpu_affinity);
    cpu = cpumask_test_cpu(vc->processor, &cpus)
            ? vc->processor
            : cpumask_cycle(vc->processor, &cpus);
    ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );


...but I'm not terribly familiar with the proper operations of these code paths



I still suspect this is something my dom0 kernel is doing incorrectly,
but I don't have much to back that up other than a gut feeling.


Any advice / pointers are appreciated.

/btg
On Mon, May 21, 2012 at 3:49 PM, Ben Guthro <ben@guthro.net> wrote:
>> On Mon, May 21, 2012 at 10:55 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> Anything known regarding 4.1.x?
>
> Well, it looks like this happens without any of our patches - so that
> may make bisecting much easier.
> I'll try to get some more information.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 15:38       ` Ben Guthro
@ 2012-05-22 16:08         ` Jan Beulich
  2012-05-22 16:21           ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2012-05-22 16:08 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 22.05.12 at 17:38, Ben Guthro <ben@guthro.net> wrote:
> I still suspect this is something my dom0 kernel is doing incorrectly,
> but I don't have much to back that up other than a gut feeling.

If the Dom0 kernel can affect the scheduler in this way, then
quite likely a DomU kernel can too. And even if not, it's still a
problem - the scheduler just shouldn't be susceptible.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 16:08         ` Jan Beulich
@ 2012-05-22 16:21           ` Ben Guthro
  2012-05-22 17:34             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-22 16:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel

good point.

I'll continue to try to figure this out - but if you have anything
specific I can try (particular boot options, or a patch with extra
debugging in interesting areas) please let me know, as it is 100%
reproducible in my environment right now.

-Ben

On Tue, May 22, 2012 at 12:08 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 22.05.12 at 17:38, Ben Guthro <ben@guthro.net> wrote:
>> I still suspect this is something my dom0 kernel is doing incorrectly,
>> but I don't have much to back that up other than a gut feeling.
>
> If the Dom0 kernel can affect the scheduler in this way, then
> quite likely a DomU kernel can too. And even if not, it's still a
> problem - the scheduler just shouldn't be susceptible.
>
> Jan
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 16:21           ` Ben Guthro
@ 2012-05-22 17:34             ` Konrad Rzeszutek Wilk
  2012-05-22 17:55               ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-22 17:34 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Jan Beulich, xen-devel

On Tue, May 22, 2012 at 12:21:20PM -0400, Ben Guthro wrote:
> good point.
> 
> I'll continue to try to figure this out - but if you have anything
> specific I can try (particular boot options, or a patch with extra
> debugging in interesting areas) please let me know, as it is 100%
> reproducible in my environment right now.

So.. you mentioned that you are using the serial console and that
there were some changes in the hypervisor to deal with the serial console.

I sent to xen-devel some of the patches that you guys had (with
proper authorship) - especially the one dealing with PCI serial cards.
Jan found that one of them had an simple mistake of trying to use
the ports before they were initialized and fixed that in xen-unstable.
Perhaps that is missing?

> 
> -Ben
> 
> On Tue, May 22, 2012 at 12:08 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>>> On 22.05.12 at 17:38, Ben Guthro <ben@guthro.net> wrote:
> >> I still suspect this is something my dom0 kernel is doing incorrectly,
> >> but I don't have much to back that up other than a gut feeling.
> >
> > If the Dom0 kernel can affect the scheduler in this way, then
> > quite likely a DomU kernel can too. And even if not, it's still a
> > problem - the scheduler just shouldn't be susceptible.
> >
> > Jan
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 17:34             ` Konrad Rzeszutek Wilk
@ 2012-05-22 17:55               ` Ben Guthro
  2012-05-22 18:00                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-22 17:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jan Beulich, xen-devel

Are you referring to kernel, or Xen patches?

I'm currently seeing the issue in question with the xen-unstable tip
none of our patches pushed...
...or are you suggesting that the patch, as committed is incorrect?

On Tue, May 22, 2012 at 1:34 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, May 22, 2012 at 12:21:20PM -0400, Ben Guthro wrote:
>> good point.
>>
>> I'll continue to try to figure this out - but if you have anything
>> specific I can try (particular boot options, or a patch with extra
>> debugging in interesting areas) please let me know, as it is 100%
>> reproducible in my environment right now.
>
> So.. you mentioned that you are using the serial console and that
> there were some changes in the hypervisor to deal with the serial console.
>
> I sent to xen-devel some of the patches that you guys had (with
> proper authorship) - especially the one dealing with PCI serial cards.
> Jan found that one of them had an simple mistake of trying to use
> the ports before they were initialized and fixed that in xen-unstable.
> Perhaps that is missing?
>
>>
>> -Ben
>>
>> On Tue, May 22, 2012 at 12:08 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>>> On 22.05.12 at 17:38, Ben Guthro <ben@guthro.net> wrote:
>> >> I still suspect this is something my dom0 kernel is doing incorrectly,
>> >> but I don't have much to back that up other than a gut feeling.
>> >
>> > If the Dom0 kernel can affect the scheduler in this way, then
>> > quite likely a DomU kernel can too. And even if not, it's still a
>> > problem - the scheduler just shouldn't be susceptible.
>> >
>> > Jan
>> >
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 17:55               ` Ben Guthro
@ 2012-05-22 18:00                 ` Konrad Rzeszutek Wilk
  2012-05-22 18:26                   ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-22 18:00 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Jan Beulich, xen-devel

On Tue, May 22, 2012 at 01:55:25PM -0400, Ben Guthro wrote:
> Are you referring to kernel, or Xen patches?

Xen.
> 
> I'm currently seeing the issue in question with the xen-unstable tip
> none of our patches pushed...

I pushed the serial ones.

> ...or are you suggesting that the patch, as committed is incorrect?

I just tried to suspend and resume Xen 4.1 with and without serial console
and it only works when there is no serial console.

I think something is buggy. Not sure what (this was with a normal
built-in motherboard serial port).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 18:00                 ` Konrad Rzeszutek Wilk
@ 2012-05-22 18:26                   ` Ben Guthro
  2012-05-22 21:00                     ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-22 18:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jan Beulich, xen-devel

OK, I'll compare what got committed with our old patches against 4.0 &
see if there are any obvious differences.

Tom Goetz wrote those, originally - and he's on vacation - so
specifics about them may have to wait until he returns.



On Tue, May 22, 2012 at 2:00 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, May 22, 2012 at 01:55:25PM -0400, Ben Guthro wrote:
>> Are you referring to kernel, or Xen patches?
>
> Xen.
>>
>> I'm currently seeing the issue in question with the xen-unstable tip
>> none of our patches pushed...
>
> I pushed the serial ones.
>
>> ...or are you suggesting that the patch, as committed is incorrect?
>
> I just tried to suspend and resume Xen 4.1 with and without serial console
> and it only works when there is no serial console.
>
> I think something is buggy. Not sure what (this was with a normal
> built-in motherboard serial port).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 18:26                   ` Ben Guthro
@ 2012-05-22 21:00                     ` Ben Guthro
  2012-05-23  9:39                       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-22 21:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: keir, Jan Beulich, xen-devel

I've bisected this to the following commit in the xen-unstable git tree.

I'll be able to dive in a little deeper tomorrow.
If you see anything here that looks suspicious to the crash
referenced... let me know.

-Ben

e4a31205766d518276d0fbc1780bcd63db1a502d is the first bad commit
commit e4a31205766d518276d0fbc1780bcd63db1a502d
Author: Keir Fraser <keir@xen.org>
Date:   Thu Mar 22 12:20:13 2012 +0000

    Introduce system_state variable.

    Use it to replace x86-specific early_boot boolean variable.

    Also use it to detect suspend/resume case during cpu offline/online
    to avoid unnecessarily breaking vcpu and cpupool affinities.

    Signed-off-by: Keir Fraser <keir@xen.org>
    Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>


http://xenbits.xen.org/gitweb/?p=xen-unstable.git;a=commit;h=e4a31205766d518276d0fbc1780bcd63db1a502d

or, if you prefer mercurial:
http://xenbits.xen.org/hg/staging/xen-unstable.hg/rev/d5ccb2d1dbd1



On Tue, May 22, 2012 at 2:26 PM, Ben Guthro <ben@guthro.net> wrote:
> OK, I'll compare what got committed with our old patches against 4.0 &
> see if there are any obvious differences.
>
> Tom Goetz wrote those, originally - and he's on vacation - so
> specifics about them may have to wait until he returns.
>
>
>
> On Tue, May 22, 2012 at 2:00 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Tue, May 22, 2012 at 01:55:25PM -0400, Ben Guthro wrote:
>>> Are you referring to kernel, or Xen patches?
>>
>> Xen.
>>>
>>> I'm currently seeing the issue in question with the xen-unstable tip
>>> none of our patches pushed...
>>
>> I pushed the serial ones.
>>
>>> ...or are you suggesting that the patch, as committed is incorrect?
>>
>> I just tried to suspend and resume Xen 4.1 with and without serial console
>> and it only works when there is no serial console.
>>
>> I think something is buggy. Not sure what (this was with a normal
>> built-in motherboard serial port).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-22 21:00                     ` Ben Guthro
@ 2012-05-23  9:39                       ` Jan Beulich
  2012-05-23 11:00                         ` Juergen Gross
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2012-05-23  9:39 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Juergen Gross, Konrad Rzeszutek Wilk, keir, xen-devel

>>> On 22.05.12 at 23:00, Ben Guthro <ben@guthro.net> wrote:
> I've bisected this to the following commit in the xen-unstable git tree.
> 
> I'll be able to dive in a little deeper tomorrow.
> If you see anything here that looks suspicious to the crash
> referenced... let me know.

As the change was really a re-write of a submission by Jürgen,
I'm adding him to Cc.

Unless he has an immediate idea, we definitely want to
understand why "cpus" is empty - hence we'd want to see
*online, *vc->cpu_affinity, vc->cpu_id, and maybe
vc->processor. (Printing them is probably not a good idea
here, so I'd instead suggest just copying them to [additional]
on-stack variables, making sure the compiler doesn't optimize
them away.)

Probably it would be good to also know what each active
vCPU's ->cpu_affinity was right before suspend and/or right
after resume (perhaps in freeze_domains() and/or
thaw_domains(). That way we'd at least know whether the
affinity - despite the offending changeset's inverse intention -
did get changed during the resume process.

Jan

> commit e4a31205766d518276d0fbc1780bcd63db1a502d
> Author: Keir Fraser <keir@xen.org>
> Date:   Thu Mar 22 12:20:13 2012 +0000
> 
>     Introduce system_state variable.
> 
>     Use it to replace x86-specific early_boot boolean variable.
> 
>     Also use it to detect suspend/resume case during cpu offline/online
>     to avoid unnecessarily breaking vcpu and cpupool affinities.
> 
>     Signed-off-by: Keir Fraser <keir@xen.org>
>     Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
> 
> 
> http://xenbits.xen.org/gitweb/?p=xen-unstable.git;a=commit;h=e4a31205766d5182 
> 76d0fbc1780bcd63db1a502d
> 
> or, if you prefer mercurial:
> http://xenbits.xen.org/hg/staging/xen-unstable.hg/rev/d5ccb2d1dbd1 
> 
> 
> 
> On Tue, May 22, 2012 at 2:26 PM, Ben Guthro <ben@guthro.net> wrote:
>> OK, I'll compare what got committed with our old patches against 4.0 &
>> see if there are any obvious differences.
>>
>> Tom Goetz wrote those, originally - and he's on vacation - so
>> specifics about them may have to wait until he returns.
>>
>>
>>
>> On Tue, May 22, 2012 at 2:00 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>>> On Tue, May 22, 2012 at 01:55:25PM -0400, Ben Guthro wrote:
>>>> Are you referring to kernel, or Xen patches?
>>>
>>> Xen.
>>>>
>>>> I'm currently seeing the issue in question with the xen-unstable tip
>>>> none of our patches pushed...
>>>
>>> I pushed the serial ones.
>>>
>>>> ...or are you suggesting that the patch, as committed is incorrect?
>>>
>>> I just tried to suspend and resume Xen 4.1 with and without serial console
>>> and it only works when there is no serial console.
>>>
>>> I think something is buggy. Not sure what (this was with a normal
>>> built-in motherboard serial port).



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-23  9:39                       ` Jan Beulich
@ 2012-05-23 11:00                         ` Juergen Gross
  2012-05-25 13:20                           ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Juergen Gross @ 2012-05-23 11:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, Ben Guthro, Konrad Rzeszutek Wilk

On 05/23/2012 11:39 AM, Jan Beulich wrote:
>>>> On 22.05.12 at 23:00, Ben Guthro<ben@guthro.net>  wrote:
>> I've bisected this to the following commit in the xen-unstable git tree.
>>
>> I'll be able to dive in a little deeper tomorrow.
>> If you see anything here that looks suspicious to the crash
>> referenced... let me know.
> As the change was really a re-write of a submission by Jürgen,
> I'm adding him to Cc.
>
> Unless he has an immediate idea, we definitely want to
> understand why "cpus" is empty - hence we'd want to see
> *online, *vc->cpu_affinity, vc->cpu_id, and maybe
> vc->processor. (Printing them is probably not a good idea
> here, so I'd instead suggest just copying them to [additional]
> on-stack variables, making sure the compiler doesn't optimize
> them away.)
>
> Probably it would be good to also know what each active
> vCPU's ->cpu_affinity was right before suspend and/or right
> after resume (perhaps in freeze_domains() and/or
> thaw_domains(). That way we'd at least know whether the
> affinity - despite the offending changeset's inverse intention -
> did get changed during the resume process.

No idea, sorry.
I tested the patch only against a problem with power_off, so I never hit the
resume path.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-23 11:00                         ` Juergen Gross
@ 2012-05-25 13:20                           ` Ben Guthro
  2012-05-31 15:52                             ` Ben Guthro
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-25 13:20 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, keir, Jan Beulich, Konrad Rzeszutek Wilk

It seems to be related to the hunk in xen/common/schedule.c

If I remove the part below - I get further in the resume process, in
that the machine seems to wake up, but not be responsive.
Eventually - the watchdog fires, and reboots the machine.


Any thoughts?

/btg

Changed parts:

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0854f55..95cb2b4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -543,7 +543,7 @@ int cpu_disable_scheduler(unsigned int cpu)
     int    ret = 0;

     c = per_cpu(cpupool, cpu);
-    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
+    if ( (c == NULL) )
         return ret;

     for_each_domain_in_cpupool ( d, c )






On Wed, May 23, 2012 at 7:00 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> On 05/23/2012 11:39 AM, Jan Beulich wrote:
>>>>>
>>>>> On 22.05.12 at 23:00, Ben Guthro<ben@guthro.net>  wrote:
>>>
>>> I've bisected this to the following commit in the xen-unstable git tree.
>>>
>>> I'll be able to dive in a little deeper tomorrow.
>>> If you see anything here that looks suspicious to the crash
>>> referenced... let me know.
>>
>> As the change was really a re-write of a submission by Jürgen,
>> I'm adding him to Cc.
>>
>> Unless he has an immediate idea, we definitely want to
>> understand why "cpus" is empty - hence we'd want to see
>> *online, *vc->cpu_affinity, vc->cpu_id, and maybe
>> vc->processor. (Printing them is probably not a good idea
>> here, so I'd instead suggest just copying them to [additional]
>> on-stack variables, making sure the compiler doesn't optimize
>> them away.)
>>
>> Probably it would be good to also know what each active
>> vCPU's ->cpu_affinity was right before suspend and/or right
>> after resume (perhaps in freeze_domains() and/or
>> thaw_domains(). That way we'd at least know whether the
>> affinity - despite the offending changeset's inverse intention -
>> did get changed during the resume process.
>
>
> No idea, sorry.
> I tested the patch only against a problem with power_off, so I never hit the
> resume path.
>
>
> Juergen
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-25 13:20                           ` Ben Guthro
@ 2012-05-31 15:52                             ` Ben Guthro
  2012-05-31 16:06                               ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Guthro @ 2012-05-31 15:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Juergen Gross, keir, Jan Beulich, xen-devel

Just to follow up on this - it appears I was running into two issues
with all of this.

1. The changeset mentioned below needed to be reverted, as it was
removing the CPUS at suspend time.

2. The linux xen watchdog driver (drivers/watchdog/xen_wdt.c) seems to
be enabling itsself on resume, even if you tell it not to.
I worked around this by just turning off watchdogs in my kernel
config...because I wasn't using them anyhow.

After making these 2 changes - S3 works again.



On Fri, May 25, 2012 at 9:20 AM, Ben Guthro <ben@guthro.net> wrote:
> It seems to be related to the hunk in xen/common/schedule.c
>
> If I remove the part below - I get further in the resume process, in
> that the machine seems to wake up, but not be responsive.
> Eventually - the watchdog fires, and reboots the machine.
>
>
> Any thoughts?
>
> /btg
>
> Changed parts:
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 0854f55..95cb2b4 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -543,7 +543,7 @@ int cpu_disable_scheduler(unsigned int cpu)
>     int    ret = 0;
>
>     c = per_cpu(cpupool, cpu);
> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
> +    if ( (c == NULL) )
>         return ret;
>
>     for_each_domain_in_cpupool ( d, c )
>
>
>
>
>
>
> On Wed, May 23, 2012 at 7:00 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> On 05/23/2012 11:39 AM, Jan Beulich wrote:
>>>>>>
>>>>>> On 22.05.12 at 23:00, Ben Guthro<ben@guthro.net>  wrote:
>>>>
>>>> I've bisected this to the following commit in the xen-unstable git tree.
>>>>
>>>> I'll be able to dive in a little deeper tomorrow.
>>>> If you see anything here that looks suspicious to the crash
>>>> referenced... let me know.
>>>
>>> As the change was really a re-write of a submission by Jürgen,
>>> I'm adding him to Cc.
>>>
>>> Unless he has an immediate idea, we definitely want to
>>> understand why "cpus" is empty - hence we'd want to see
>>> *online, *vc->cpu_affinity, vc->cpu_id, and maybe
>>> vc->processor. (Printing them is probably not a good idea
>>> here, so I'd instead suggest just copying them to [additional]
>>> on-stack variables, making sure the compiler doesn't optimize
>>> them away.)
>>>
>>> Probably it would be good to also know what each active
>>> vCPU's ->cpu_affinity was right before suspend and/or right
>>> after resume (perhaps in freeze_domains() and/or
>>> thaw_domains(). That way we'd at least know whether the
>>> affinity - despite the offending changeset's inverse intention -
>>> did get changed during the resume process.
>>
>>
>> No idea, sorry.
>> I tested the patch only against a problem with power_off, so I never hit the
>> resume path.
>>
>>
>> Juergen
>>
>> --
>> Juergen Gross                 Principal Developer Operating Systems
>> PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
>> Fujitsu Technology Solutions              e-mail:
>> juergen.gross@ts.fujitsu.com
>> Domagkstr. 28                           Internet: ts.fujitsu.com
>> D-80807 Muenchen                 Company details:
>> ts.fujitsu.com/imprint.html
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-31 15:52                             ` Ben Guthro
@ 2012-05-31 16:06                               ` Jan Beulich
  2012-05-31 16:18                                 ` Ben Guthro
  2012-06-01  6:54                                 ` Dietmar Hahn
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2012-05-31 16:06 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Juergen Gross, Konrad Rzeszutek Wilk, keir, xen-devel

>>> On 31.05.12 at 17:52, Ben Guthro <ben@guthro.net> wrote:
> 1. The changeset mentioned below needed to be reverted, as it was
> removing the CPUS at suspend time.

I assume you refer to the one line change, not the full c/s?
Juergen would have to tell us whether reverting that would
break something else.

> 2. The linux xen watchdog driver (drivers/watchdog/xen_wdt.c) seems to
> be enabling itsself on resume, even if you tell it not to.
> I worked around this by just turning off watchdogs in my kernel
> config...because I wasn't using them anyhow.

That was a problem up to 3.3, but was fixed in 3.4 afaict. What
kernel version did you see this with?

Jan

> On Fri, May 25, 2012 at 9:20 AM, Ben Guthro <ben@guthro.net> wrote:
>> Changed parts:
>>
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 0854f55..95cb2b4 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -543,7 +543,7 @@ int cpu_disable_scheduler(unsigned int cpu)
>>     int    ret = 0;
>>
>>     c = per_cpu(cpupool, cpu);
>> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
>> +    if ( (c == NULL) )
>>         return ret;
>>
>>     for_each_domain_in_cpupool ( d, c )

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-31 16:06                               ` Jan Beulich
@ 2012-05-31 16:18                                 ` Ben Guthro
  2012-06-01  6:54                                 ` Dietmar Hahn
  1 sibling, 0 replies; 19+ messages in thread
From: Ben Guthro @ 2012-05-31 16:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, Konrad Rzeszutek Wilk, keir, xen-devel

On Thu, May 31, 2012 at 12:06 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 31.05.12 at 17:52, Ben Guthro <ben@guthro.net> wrote:
>> 1. The changeset mentioned below needed to be reverted, as it was
>> removing the CPUS at suspend time.
>
> I assume you refer to the one line change, not the full c/s?
> Juergen would have to tell us whether reverting that would
> break something else.

I reverted the whole c/s - but I think the one line change would be sufficient.

>
>> 2. The linux xen watchdog driver (drivers/watchdog/xen_wdt.c) seems to
>> be enabling itsself on resume, even if you tell it not to.
>> I worked around this by just turning off watchdogs in my kernel
>> config...because I wasn't using them anyhow.
>
> That was a problem up to 3.3, but was fixed in 3.4 afaict. What
> kernel version did you see this with?
>

3.2.17 + some of konrad's branches
...so that makes sense.


> Jan
>
>> On Fri, May 25, 2012 at 9:20 AM, Ben Guthro <ben@guthro.net> wrote:
>>> Changed parts:
>>>
>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>> index 0854f55..95cb2b4 100644
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -543,7 +543,7 @@ int cpu_disable_scheduler(unsigned int cpu)
>>>     int    ret = 0;
>>>
>>>     c = per_cpu(cpupool, cpu);
>>> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
>>> +    if ( (c == NULL) )
>>>         return ret;
>>>
>>>     for_each_domain_in_cpupool ( d, c )
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: dump with xen-unstable & linux 3.2.17
  2012-05-31 16:06                               ` Jan Beulich
  2012-05-31 16:18                                 ` Ben Guthro
@ 2012-06-01  6:54                                 ` Dietmar Hahn
  1 sibling, 0 replies; 19+ messages in thread
From: Dietmar Hahn @ 2012-06-01  6:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, keir, Ben Guthro, Jan Beulich, Konrad Rzeszutek Wilk

Am Donnerstag 31 Mai 2012, 17:06:39 schrieb Jan Beulich:
> >>> On 31.05.12 at 17:52, Ben Guthro <ben@guthro.net> wrote:
> > 1. The changeset mentioned below needed to be reverted, as it was
> > removing the CPUS at suspend time.
> 
> I assume you refer to the one line change, not the full c/s?
> Juergen would have to tell us whether reverting that would
> break something else.

Maybe this will take some time as Juergen is away for a week.

Dietmar.

> 
> > 2. The linux xen watchdog driver (drivers/watchdog/xen_wdt.c) seems to
> > be enabling itsself on resume, even if you tell it not to.
> > I worked around this by just turning off watchdogs in my kernel
> > config...because I wasn't using them anyhow.
> 
> That was a problem up to 3.3, but was fixed in 3.4 afaict. What
> kernel version did you see this with?
> 
> Jan
> 
> > On Fri, May 25, 2012 at 9:20 AM, Ben Guthro <ben@guthro.net> wrote:
> >> Changed parts:
> >>
> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> >> index 0854f55..95cb2b4 100644
> >> --- a/xen/common/schedule.c
> >> +++ b/xen/common/schedule.c
> >> @@ -543,7 +543,7 @@ int cpu_disable_scheduler(unsigned int cpu)
> >>     int    ret = 0;
> >>
> >>     c = per_cpu(cpupool, cpu);
> >> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
> >> +    if ( (c == NULL) )
> >>         return ret;
> >>
> >>     for_each_domain_in_cpupool ( d, c )

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-06-01  6:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-17 21:20 dump with xen-unstable & linux 3.2.17 Ben Guthro
2012-05-21 14:55 ` Jan Beulich
2012-05-21 19:12   ` Ben Guthro
2012-05-21 19:49     ` Ben Guthro
2012-05-22 15:38       ` Ben Guthro
2012-05-22 16:08         ` Jan Beulich
2012-05-22 16:21           ` Ben Guthro
2012-05-22 17:34             ` Konrad Rzeszutek Wilk
2012-05-22 17:55               ` Ben Guthro
2012-05-22 18:00                 ` Konrad Rzeszutek Wilk
2012-05-22 18:26                   ` Ben Guthro
2012-05-22 21:00                     ` Ben Guthro
2012-05-23  9:39                       ` Jan Beulich
2012-05-23 11:00                         ` Juergen Gross
2012-05-25 13:20                           ` Ben Guthro
2012-05-31 15:52                             ` Ben Guthro
2012-05-31 16:06                               ` Jan Beulich
2012-05-31 16:18                                 ` Ben Guthro
2012-06-01  6:54                                 ` Dietmar Hahn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.