xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Roger Pau Monne <roger.pau@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, WeiLiu <wl@xen.org>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH] x86/SMP: don't try to stop already stopped CPUs
Date: Mon, 03 Jun 2019 09:40:19 -0600	[thread overview]
Message-ID: <5CF53F630200007800234B85@prv1-mh.provo.novell.com> (raw)
In-Reply-To: <20190603141254.neovzuf2rdxywhss@Air-de-Roger>

>>> On 03.06.19 at 16:12, <roger.pau@citrix.com> wrote:
> On Wed, May 29, 2019 at 04:17:49AM -0600, Jan Beulich wrote:
>> In particular with an enabled IOMMU (but not really limited to this
>> case), trying to invoke fixup_irqs() after having already done
>> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
>> 
>>  RIP:    e008:[<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>  RFLAGS: 0000000000010006   CONTEXT: hypervisor (d0v0)
>>  rax: ffff8320291de00c   rbx: 0000000000000003   rcx: ffff832035000000
>>  rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82d0805ca840
>>  rbp: ffff83009e8a79c8   rsp: ffff83009e8a79a8   r8:  0000000000000000
>>  r9:  0000000000000004   r10: 000000000008b9f9   r11: 0000000000000006
>>  r12: 0000000000010000   r13: 0000000000000003   r14: 0000000000000000
>>  r15: 00000000fffeffff   cr0: 0000000080050033   cr4: 00000000003406e0
>>  cr3: 0000002035d59000   cr2: ffff88824ccb4ee0
>>  fsb: 00007f2143f08840   gsb: ffff888256a00000   gss: 0000000000000000
>>  ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>  Xen code around <ffff82d08026a036> 
> (amd_iommu_read_ioapic_from_ire+0xde/0x113):
>>   ff 07 00 00 39 d3 74 02 <0f> 0b 41 81 e4 00 f8 ff ff 8b 10 89 d0 25 00 00
>>  Xen stack trace from rsp=ffff83009e8a79a8:
>>  ...
>>  Xen call trace:
>>     [<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>     [<ffff82d08026bf7b>] iommu_read_apic_from_ire+0x10/0x12
>>     [<ffff82d08027f718>] io_apic.c#modify_IO_APIC_irq+0x5e/0x126
>>     [<ffff82d08027f9c5>] io_apic.c#unmask_IO_APIC_irq+0x2d/0x41
>>     [<ffff82d080289bc7>] fixup_irqs+0x320/0x40b
>>     [<ffff82d0802a82c4>] smp_send_stop+0x4b/0xa8
>>     [<ffff82d0802a7b2f>] machine_restart+0x98/0x288
>>     [<ffff82d080252242>] console_suspend+0/0x28
>>     [<ffff82d0802b01da>] do_general_protection+0x204/0x24e
>>     [<ffff82d080385a3d>] x86_64/entry.S#handle_exception_saved+0x68/0x94
>>     [<00000000aa5b526b>] 00000000aa5b526b
>>     [<ffff82d0802a7c7d>] machine_restart+0x1e6/0x288
>>     [<ffff82d080240f75>] hwdom_shutdown+0xa2/0x11d
>>     [<ffff82d08020baa2>] domain_shutdown+0x4f/0xd8
>>     [<ffff82d08023fe98>] do_sched_op+0x12f/0x42a
>>     [<ffff82d08037e404>] pv_hypercall+0x1e4/0x564
>>     [<ffff82d080385432>] lstar_enter+0x112/0x120
>> 
>> Don't call fixup_irqs() and don't send any IPI if there's only one
>> online CPU anyway, and don't call __stop_this_cpu() at all when the CPU
>> we're on was already marked offline (by a prior invocation of
>> __stop_this_cpu()).
>> 
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> --- a/xen/arch/x86/smp.c
>> +++ b/xen/arch/x86/smp.c
>> @@ -302,23 +302,31 @@ static void stop_this_cpu(void *dummy)
>>   */
>>  void smp_send_stop(void)
>>  {
>> -    int timeout = 10;
>> +    unsigned int cpu = smp_processor_id();
>>  
>> -    local_irq_disable();
>> -    fixup_irqs(cpumask_of(smp_processor_id()), 0);
>> -    local_irq_enable();
>> -
>> -    smp_call_function(stop_this_cpu, NULL, 0);
>> -
>> -    /* Wait 10ms for all other CPUs to go offline. */
>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> -        mdelay(1);
>> -
>> -    local_irq_disable();
>> -    disable_IO_APIC();
>> -    hpet_disable();
>> -    __stop_this_cpu();
>> -    local_irq_enable();
>> +    if ( num_online_cpus() > 1 )
>> +    {
>> +        int timeout = 10;
>> +
>> +        local_irq_disable();
>> +        fixup_irqs(cpumask_of(cpu), 0);
>> +        local_irq_enable();
>> +
>> +        smp_call_function(stop_this_cpu, NULL, 0);
>> +
>> +        /* Wait 10ms for all other CPUs to go offline. */
>> +        while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> +            mdelay(1);
>> +    }
>> +
>> +    if ( cpu_online(cpu) )
> 
> Won't this be better placed inside the previous if? Is it valid to
> have a single CPU and try to offline it?

No to the first question, and I'm not sure I see how you came to
the 2nd one. If there's just a single online CPU, then there's no
need to fixup_irqs(), and there's no-one to IPI. Yet the local CPU
still needs to do everything that should happen once in UP mode,
unless this CPU has been offlined already before (as was the
case in Andrew's report, at least as far as I was able to deduce).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

WARNING: multiple messages have this Message-ID (diff)
From: "Jan Beulich" <JBeulich@suse.com>
To: "Roger Pau Monne" <roger.pau@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, WeiLiu <wl@xen.org>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] [PATCH] x86/SMP: don't try to stop already stopped CPUs
Date: Mon, 03 Jun 2019 09:40:19 -0600	[thread overview]
Message-ID: <5CF53F630200007800234B85@prv1-mh.provo.novell.com> (raw)
Message-ID: <20190603154019.Y0Jwa1X-N9X73W0LNsmJlP16PO94Rd_rlBr91oQQgog@z> (raw)
In-Reply-To: <20190603141254.neovzuf2rdxywhss@Air-de-Roger>

>>> On 03.06.19 at 16:12, <roger.pau@citrix.com> wrote:
> On Wed, May 29, 2019 at 04:17:49AM -0600, Jan Beulich wrote:
>> In particular with an enabled IOMMU (but not really limited to this
>> case), trying to invoke fixup_irqs() after having already done
>> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
>> 
>>  RIP:    e008:[<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>  RFLAGS: 0000000000010006   CONTEXT: hypervisor (d0v0)
>>  rax: ffff8320291de00c   rbx: 0000000000000003   rcx: ffff832035000000
>>  rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82d0805ca840
>>  rbp: ffff83009e8a79c8   rsp: ffff83009e8a79a8   r8:  0000000000000000
>>  r9:  0000000000000004   r10: 000000000008b9f9   r11: 0000000000000006
>>  r12: 0000000000010000   r13: 0000000000000003   r14: 0000000000000000
>>  r15: 00000000fffeffff   cr0: 0000000080050033   cr4: 00000000003406e0
>>  cr3: 0000002035d59000   cr2: ffff88824ccb4ee0
>>  fsb: 00007f2143f08840   gsb: ffff888256a00000   gss: 0000000000000000
>>  ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>  Xen code around <ffff82d08026a036> 
> (amd_iommu_read_ioapic_from_ire+0xde/0x113):
>>   ff 07 00 00 39 d3 74 02 <0f> 0b 41 81 e4 00 f8 ff ff 8b 10 89 d0 25 00 00
>>  Xen stack trace from rsp=ffff83009e8a79a8:
>>  ...
>>  Xen call trace:
>>     [<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>     [<ffff82d08026bf7b>] iommu_read_apic_from_ire+0x10/0x12
>>     [<ffff82d08027f718>] io_apic.c#modify_IO_APIC_irq+0x5e/0x126
>>     [<ffff82d08027f9c5>] io_apic.c#unmask_IO_APIC_irq+0x2d/0x41
>>     [<ffff82d080289bc7>] fixup_irqs+0x320/0x40b
>>     [<ffff82d0802a82c4>] smp_send_stop+0x4b/0xa8
>>     [<ffff82d0802a7b2f>] machine_restart+0x98/0x288
>>     [<ffff82d080252242>] console_suspend+0/0x28
>>     [<ffff82d0802b01da>] do_general_protection+0x204/0x24e
>>     [<ffff82d080385a3d>] x86_64/entry.S#handle_exception_saved+0x68/0x94
>>     [<00000000aa5b526b>] 00000000aa5b526b
>>     [<ffff82d0802a7c7d>] machine_restart+0x1e6/0x288
>>     [<ffff82d080240f75>] hwdom_shutdown+0xa2/0x11d
>>     [<ffff82d08020baa2>] domain_shutdown+0x4f/0xd8
>>     [<ffff82d08023fe98>] do_sched_op+0x12f/0x42a
>>     [<ffff82d08037e404>] pv_hypercall+0x1e4/0x564
>>     [<ffff82d080385432>] lstar_enter+0x112/0x120
>> 
>> Don't call fixup_irqs() and don't send any IPI if there's only one
>> online CPU anyway, and don't call __stop_this_cpu() at all when the CPU
>> we're on was already marked offline (by a prior invocation of
>> __stop_this_cpu()).
>> 
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> --- a/xen/arch/x86/smp.c
>> +++ b/xen/arch/x86/smp.c
>> @@ -302,23 +302,31 @@ static void stop_this_cpu(void *dummy)
>>   */
>>  void smp_send_stop(void)
>>  {
>> -    int timeout = 10;
>> +    unsigned int cpu = smp_processor_id();
>>  
>> -    local_irq_disable();
>> -    fixup_irqs(cpumask_of(smp_processor_id()), 0);
>> -    local_irq_enable();
>> -
>> -    smp_call_function(stop_this_cpu, NULL, 0);
>> -
>> -    /* Wait 10ms for all other CPUs to go offline. */
>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> -        mdelay(1);
>> -
>> -    local_irq_disable();
>> -    disable_IO_APIC();
>> -    hpet_disable();
>> -    __stop_this_cpu();
>> -    local_irq_enable();
>> +    if ( num_online_cpus() > 1 )
>> +    {
>> +        int timeout = 10;
>> +
>> +        local_irq_disable();
>> +        fixup_irqs(cpumask_of(cpu), 0);
>> +        local_irq_enable();
>> +
>> +        smp_call_function(stop_this_cpu, NULL, 0);
>> +
>> +        /* Wait 10ms for all other CPUs to go offline. */
>> +        while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> +            mdelay(1);
>> +    }
>> +
>> +    if ( cpu_online(cpu) )
> 
> Won't this be better placed inside the previous if? Is it valid to
> have a single CPU and try to offline it?

No to the first question, and I'm not sure I see how you came to
the 2nd one. If there's just a single online CPU, then there's no
need to fixup_irqs(), and there's no-one to IPI. Yet the local CPU
still needs to do everything that should happen once in UP mode,
unless this CPU has been offlined already before (as was the
case in Andrew's report, at least as far as I was able to deduce).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2019-06-03 15:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 10:17 [PATCH] x86/SMP: don't try to stop already stopped CPUs Jan Beulich
2019-05-29 10:17 ` [Xen-devel] " Jan Beulich
2019-06-03 14:12 ` Roger Pau Monné
2019-06-03 14:12   ` [Xen-devel] " Roger Pau Monné
2019-06-03 15:40   ` Jan Beulich [this message]
2019-06-03 15:40     ` Jan Beulich
2019-06-03 16:35     ` Roger Pau Monné
2019-06-03 16:35       ` [Xen-devel] " Roger Pau Monné
2019-06-17 17:39 ` Andrew Cooper
2019-06-17 17:55   ` Andrew Cooper
2019-06-17 18:27     ` Andrew Cooper
2019-06-18 10:26       ` Jan Beulich
2019-06-18  9:23   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5CF53F630200007800234B85@prv1-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).