All of lore.kernel.org
 help / color / mirror / Atom feed
* Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
@ 2015-02-23  9:27 Sander Eikelenboom
  2015-02-23 10:06 ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Sander Eikelenboom @ 2015-02-23  9:27 UTC (permalink / raw)
  To: xen-devel

Hi,

While shutting down all guests to go for a host reboot i encountered the splat below.
This was running on Xen with:
xen_changeset: Fri Feb 20 16:21:10 2015 +0100 git:24b2b8d-dirty

--
Sander

(XEN) [2015-02-23 09:16:26.292] Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
(XEN) [2015-02-23 09:16:26.292] ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [2015-02-23 09:16:26.292] CPU:    1
(XEN) [2015-02-23 09:16:26.292] RIP:    e008:[<ffff82d08012c018>] cpu_raise_softirq+0xd7/0xeb
(XEN) [2015-02-23 09:16:26.292] RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) [2015-02-23 09:16:26.292] rax: ffff82d080328e60   rbx: 0000000000000005   rcx: ffff82d0802fff80
(XEN) [2015-02-23 09:16:26.292] rdx: ffff83054eb680e0   rsi: 0000000000000007   rdi: 0000000000000000
(XEN) [2015-02-23 09:16:26.292] rbp: ffff83054eb67328   rsp: ffff83054eb67308   r8:  0000000000000001
(XEN) [2015-02-23 09:16:26.292] r9:  ffff83054eb1a240   r10: 0000000000000000   r11: 0000000000000000
(XEN) [2015-02-23 09:16:26.292] r12: 0000000000000007   r13: 0000000000000001   r14: 000000000000000e
(XEN) [2015-02-23 09:16:26.292] r15: 00000000000008f8   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) [2015-02-23 09:16:26.292] cr3: 0000000476850000   cr2: 00007ffd07e91f20
(XEN) [2015-02-23 09:16:26.292] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [2015-02-23 09:16:26.292] Xen stack trace from rsp=ffff83054eb67308:
(XEN) [2015-02-23 09:16:26.292]    000000000000000e ffff83009fd4b000 0000000000000001 ffff83009fd4b000
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67348 ffff82d080167c90 0000000000000000 ffff83009fd4b418
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67378 ffff82d0801cf411 ffff83054eb673a8 ffff83009fd4b000
(XEN) [2015-02-23 09:16:26.292]    ffff83009fd41418 0000000000000000 ffff83054eb67398 ffff82d0801cf51f
(XEN) [2015-02-23 09:16:26.292]    ffff83009fd4b000 ffff83009fd41418 ffff83054eb673e8 ffff82d0801cfd50
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67418 00000001011ef7e2 ffff830300000000 ffff83009fd41000
(XEN) [2015-02-23 09:16:26.292]    0000000000000300 00000000000008f8 00000000000008f8 ffff83009fd41000
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67448 ffff82d0801d0657 ffff83054eb67458 ffff82d0801f5d11
(XEN) [2015-02-23 09:16:26.292]    000000004eb67594 0000000000000000 0000000000000000 0000000000000000
(XEN) [2015-02-23 09:16:26.292]    ffff8303d7bd6000 0000000000000300 0000000000000004 ffff83009fd41000
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb674a8 ffff82d0801d0b58 ffff83054eb674a8 ffff82d080178a6c
(XEN) [2015-02-23 09:16:26.292]    0000000002211067 ffff83054eb674e4 80000000fee0017b ffff82d080289a68
(XEN) [2015-02-23 09:16:26.292]    ffff82d0801cf175 ffff82d0801ceb87 ffff83054eb67568 ffff83009fd41000
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67518 ffff82d0801c6747 ffff830500000000 ffff82d080289fa0
(XEN) [2015-02-23 09:16:26.292]    ffff82d0801cf175 ffff82d0801d09d7 ffff83054eb674e8 0000000080227575
(XEN) [2015-02-23 09:16:26.292]    ffff83054eb67548 ffff83009fd41000 ffff8305356de000 0000000000000004
(XEN) [2015-02-23 09:16:26.292]    ffff82e008f4e3c0 0000000000000000 ffff83054eb675b8 ffff82d0801b7023
(XEN) [2015-02-23 09:16:26.292]    0000000000000004 0000000000000300 ffff83054eb67618 000000014eb67618
(XEN) [2015-02-23 09:16:26.292]    00000000fee00300 010082d000000000 ffff83054eb67578 0000000000000004
(XEN) [2015-02-23 09:16:26.292]    00000000fee00300 00000000000008f8 0000000400000001 0100000000000000
(XEN) [2015-02-23 09:16:26.292] Xen call trace:
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d08012c018>] cpu_raise_softirq+0xd7/0xeb
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d080167c90>] vcpu_kick+0x65/0x6f
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801cf411>] vlapic_set_irq+0xb6/0xc4
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801cf51f>] vlapic_accept_irq+0x91/0x1ca
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801cfd50>] vlapic_ipi+0x28b/0x2ae
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801d0657>] vlapic_reg_write+0x215/0x595
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801d0b58>] vlapic_write+0x181/0x1f7
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801c6747>] hvm_mmio_intercept+0x14d/0x36a
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801b7023>] hvmemul_do_io+0x440/0x66b
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801b727b>] hvmemul_do_mmio+0x2d/0x2f
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801b88de>] hvmemul_write+0x1d8/0x24c
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801a1c9f>] x86_emulate+0xc97b/0x1010c
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801b76da>] _hvm_emulate_one+0x197/0x2bb
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801b78bf>] hvm_emulate_one+0x10/0x12
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801c6eb4>] handle_mmio+0x54/0xd4
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801c6f78>] handle_mmio_with_translation+0x44/0x46
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801c5264>] hvm_hap_nested_page_fault+0x163/0x541
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801db3f7>] svm_vmexit_handler+0x16bf/0x19bc
(XEN) [2015-02-23 09:16:26.292]    [<ffff82d0801dd352>] svm_stgi_label+0x8/0x46
(XEN) [2015-02-23 09:16:26.292] 
(XEN) [2015-02-23 09:16:27.613] 
(XEN) [2015-02-23 09:16:27.622] ****************************************
(XEN) [2015-02-23 09:16:27.642] Panic on CPU 1:
(XEN) [2015-02-23 09:16:27.654] Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
(XEN) [2015-02-23 09:16:27.687] ****************************************
(XEN) [2015-02-23 09:16:27.706] 
(XEN) [2015-02-23 09:16:27.715] Manual reset required ('noreboot' specified)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
  2015-02-23  9:27 Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97 Sander Eikelenboom
@ 2015-02-23 10:06 ` Jan Beulich
  2015-02-23 10:38   ` Andrew Cooper
  2015-02-23 10:45   ` Sander Eikelenboom
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2015-02-23 10:06 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

>>> On 23.02.15 at 10:27, <linux@eikelenboom.it> wrote:
> While shutting down all guests to go for a host reboot i encountered the 
> splat below.
> This was running on Xen with:
> xen_changeset: Fri Feb 20 16:21:10 2015 +0100 git:24b2b8d-dirty

"-dirty" meaning what?

> (XEN) [2015-02-23 09:16:26.292] Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97

Since with debug=y the callstack entries should be reliable, I can't
see how this matches up with ...

> (XEN) [2015-02-23 09:16:26.292] Xen call trace:
> (XEN) [2015-02-23 09:16:26.292]    [<ffff82d08012c018>] cpu_raise_softirq+0xd7/0xeb

... this, since

void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
{
    unsigned int this_cpu = smp_processor_id();

    if ( test_and_set_bit(nr, &softirq_pending(cpu))
         || (cpu == this_cpu)
         || arch_skip_send_event_check(cpu) )
        return;

    if ( !per_cpu(batching, this_cpu) || in_irq() )
        smp_send_event_check_cpu(cpu);
    else
        set_bit(nr, &per_cpu(batch_mask, this_cpu));
}

doesn't indicate any use of cpumask functions. If, however,
arch_skip_send_event_check()'s call to cpumask_test_cpu()
didn't get inlined, that might be the cause. Albeit that would mean
smp_processor_id() returned an out-of-range value... In any
event we'll need to know what exactly above code location refers
to inside the entire function.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
  2015-02-23 10:06 ` Jan Beulich
@ 2015-02-23 10:38   ` Andrew Cooper
  2015-02-23 10:56     ` Jan Beulich
  2015-02-23 10:45   ` Sander Eikelenboom
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2015-02-23 10:38 UTC (permalink / raw)
  To: Jan Beulich, Sander Eikelenboom; +Cc: xen-devel

On 23/02/15 10:06, Jan Beulich wrote:
>>>> On 23.02.15 at 10:27, <linux@eikelenboom.it> wrote:
>> While shutting down all guests to go for a host reboot i encountered the 
>> splat below.
>> This was running on Xen with:
>> xen_changeset: Fri Feb 20 16:21:10 2015 +0100 git:24b2b8d-dirty
> "-dirty" meaning what?
>
>> (XEN) [2015-02-23 09:16:26.292] Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
> Since with debug=y the callstack entries should be reliable, I can't
> see how this matches up with ...
>
>> (XEN) [2015-02-23 09:16:26.292] Xen call trace:
>> (XEN) [2015-02-23 09:16:26.292]    [<ffff82d08012c018>] cpu_raise_softirq+0xd7/0xeb
> ... this, since
>
> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
>     unsigned int this_cpu = smp_processor_id();
>
>     if ( test_and_set_bit(nr, &softirq_pending(cpu))
>          || (cpu == this_cpu)
>          || arch_skip_send_event_check(cpu) )
>         return;
>
>     if ( !per_cpu(batching, this_cpu) || in_irq() )
>         smp_send_event_check_cpu(cpu);
>     else
>         set_bit(nr, &per_cpu(batch_mask, this_cpu));
> }
>
> doesn't indicate any use of cpumask functions. If, however,
> arch_skip_send_event_check()'s call to cpumask_test_cpu()
> didn't get inlined, that might be the cause. Albeit that would mean
> smp_processor_id() returned an out-of-range value... In any
> event we'll need to know what exactly above code location refers
> to inside the entire function.

Are you sure your code is up to date?

Current staging has

void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
{
    unsigned int this_cpu = smp_processor_id();

    if ( test_and_set_bit(nr, &softirq_pending(cpu))
         || (cpu == this_cpu)
         || arch_skip_send_event_check(cpu) )
        return;

    if ( !per_cpu(batching, this_cpu) || in_irq() )
        smp_send_event_check_cpu(cpu);
    else
        __cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu));
}


And furthermore, I think the final __cpumask_set_cpu(...) appears
wrong.  The first parameter should be 'cpu' rather than 'nr'.  I am not
surprised that the ASSERT() is firing.

~Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
  2015-02-23 10:06 ` Jan Beulich
  2015-02-23 10:38   ` Andrew Cooper
@ 2015-02-23 10:45   ` Sander Eikelenboom
  2015-02-23 10:57     ` Jan Beulich
  1 sibling, 1 reply; 6+ messages in thread
From: Sander Eikelenboom @ 2015-02-23 10:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


Monday, February 23, 2015, 11:06:25 AM, you wrote:

>>>> On 23.02.15 at 10:27, <linux@eikelenboom.it> wrote:
>> While shutting down all guests to go for a host reboot i encountered the 
>> splat below.
>> This was running on Xen with:
>> xen_changeset: Fri Feb 20 16:21:10 2015 +0100 git:24b2b8d-dirty

> "-dirty" meaning what?
Patch for re-enabeling HPET, which doesn't get enabled due to a bios glitch, but 
actually just works fine (for over a year now or so). 
(and if it's not enabled, cpuidle breaks bad)

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
index c1b76fb..43435bc 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -608,7 +608,7 @@ int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
     {
         AMD_IOMMU_DEBUG("Failed to setup HPET MSI remapping."
                         " Wrong HPET.\n");
-        return -ENODEV;
+       /* return -ENODEV; */
     }

     lock = get_intremap_lock(hpet_sbdf.seg, hpet_sbdf.bdf);



And the other one is Konrad's temp fix for the dpci softirq problem:

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..ed3cfa1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -804,7 +804,19 @@ static void dpci_softirq(void)
         d = pirq_dpci->dom;
         smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
         if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
-            BUG();
+        {
+           unsigned long flags;
+
+            /* Put back on the list and retry. */
+            local_irq_save(flags);
+           list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
+            local_irq_restore(flags);
+
+            raise_softirq(HVM_DPCI_SOFTIRQ);
+            continue;
+       }
+
+
         /*
          * The one who clears STATE_SCHED MUST refcount the domain.
          */


>> (XEN) [2015-02-23 09:16:26.292] Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97

> Since with debug=y the callstack entries should be reliable, I can't
> see how this matches up with ...

>> (XEN) [2015-02-23 09:16:26.292] Xen call trace:
>> (XEN) [2015-02-23 09:16:26.292]    [<ffff82d08012c018>] cpu_raise_softirq+0xd7/0xeb

> ... this, since

> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
>     unsigned int this_cpu = smp_processor_id();

>     if ( test_and_set_bit(nr, &softirq_pending(cpu))
>          || (cpu == this_cpu)
>          || arch_skip_send_event_check(cpu) )
>         return;

>     if ( !per_cpu(batching, this_cpu) || in_irq() )
>         smp_send_event_check_cpu(cpu);
>     else
>         set_bit(nr, &per_cpu(batch_mask, this_cpu));
> }

> doesn't indicate any use of cpumask functions. If, however,
> arch_skip_send_event_check()'s call to cpumask_test_cpu()
> didn't get inlined, that might be the cause. Albeit that would mean
> smp_processor_id() returned an out-of-range value... In any
> event we'll need to know what exactly above code location refers
> to inside the entire function.

Any instructions on how to figure that out ?

--
Sander
> Jan

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
  2015-02-23 10:38   ` Andrew Cooper
@ 2015-02-23 10:56     ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2015-02-23 10:56 UTC (permalink / raw)
  To: Andrew Cooper, Sander Eikelenboom; +Cc: xen-devel

>>> On 23.02.15 at 11:38, <andrew.cooper3@citrix.com> wrote:
> On 23/02/15 10:06, Jan Beulich wrote:
>>>>> On 23.02.15 at 10:27, <linux@eikelenboom.it> wrote:
>>> While shutting down all guests to go for a host reboot i encountered the 
>>> splat below.
>>> This was running on Xen with:
>>> xen_changeset: Fri Feb 20 16:21:10 2015 +0100 git:24b2b8d-dirty
>> "-dirty" meaning what?
>>
>>> (XEN) [2015-02-23 09:16:26.292] Assertion 'cpu < nr_cpu_ids' failed at 
> .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
>> Since with debug=y the callstack entries should be reliable, I can't
>> see how this matches up with ...
>>
>>> (XEN) [2015-02-23 09:16:26.292] Xen call trace:
>>> (XEN) [2015-02-23 09:16:26.292]    [<ffff82d08012c018>] 
> cpu_raise_softirq+0xd7/0xeb
>> ... this, since
>>
>> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
>> {
>>     unsigned int this_cpu = smp_processor_id();
>>
>>     if ( test_and_set_bit(nr, &softirq_pending(cpu))
>>          || (cpu == this_cpu)
>>          || arch_skip_send_event_check(cpu) )
>>         return;
>>
>>     if ( !per_cpu(batching, this_cpu) || in_irq() )
>>         smp_send_event_check_cpu(cpu);
>>     else
>>         set_bit(nr, &per_cpu(batch_mask, this_cpu));
>> }
>>
>> doesn't indicate any use of cpumask functions. If, however,
>> arch_skip_send_event_check()'s call to cpumask_test_cpu()
>> didn't get inlined, that might be the cause. Albeit that would mean
>> smp_processor_id() returned an out-of-range value... In any
>> event we'll need to know what exactly above code location refers
>> to inside the entire function.
> 
> Are you sure your code is up to date?
> 
> Current staging has

Ah, I looked at master, not staging.

> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
>     unsigned int this_cpu = smp_processor_id();
> 
>     if ( test_and_set_bit(nr, &softirq_pending(cpu))
>          || (cpu == this_cpu)
>          || arch_skip_send_event_check(cpu) )
>         return;
> 
>     if ( !per_cpu(batching, this_cpu) || in_irq() )
>         smp_send_event_check_cpu(cpu);
>     else
>         __cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu));
> }
> 
> 
> And furthermore, I think the final __cpumask_set_cpu(...) appears
> wrong.  The first parameter should be 'cpu' rather than 'nr'.  I am not
> surprised that the ASSERT() is firing.

No, the conversion to __cpumask_set_cpu() was wrong here in
the first place - this ought to be __set_bit(). Will submit a fix in a
minute.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97
  2015-02-23 10:45   ` Sander Eikelenboom
@ 2015-02-23 10:57     ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2015-02-23 10:57 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

>>> On 23.02.15 at 11:45, <linux@eikelenboom.it> wrote:
> Any instructions on how to figure that out ?

No need anymore - with Andrew's help it's now already clear what's
wrong.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-23 10:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-23  9:27 Assertion 'cpu < nr_cpu_ids' failed at .../src/new/xen-unstable/xen/include/xen/cpumask.h:97 Sander Eikelenboom
2015-02-23 10:06 ` Jan Beulich
2015-02-23 10:38   ` Andrew Cooper
2015-02-23 10:56     ` Jan Beulich
2015-02-23 10:45   ` Sander Eikelenboom
2015-02-23 10:57     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.