All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Gonglei (Arei)" <arei.gonglei@huawei.com>
To: "Kevin O'Connor" <kevin@koconnor.net>,
	"Xulei (Stone)" <stone.xulei@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: "seabios@seabios.org" <seabios@seabios.org>,
	"Huangweidong \(C\)" <weidong.huang@huawei.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Date: Sat, 19 Dec 2015 01:08:05 +0000	[thread overview]
Message-ID: <33183CC9F5247A488A2544077AF19020B02B79EE@SZXEMA503-MBS.china.huawei.com> (raw)
In-Reply-To: <20151119134039.GA27717@morn.lan>

Hi Kevin & Paolo,

Luckily, I reproduced this problem last night. And I got the below log when SeaBIOS is stuck.

[BTW, the whole SeaBIOS log attached]

[2015-12-18 10:38:10] >>>>>gonglei: enter smp_setup()...
[2015-12-18 10:38:10] >>>>>gonglei: begine to enable local APIC...
[2015-12-18 10:38:10] >>>>>gonglei: finish enable local APIC...
[2015-12-18 10:38:10] >>>gonglei: cmos_smp_count=8
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=5
[2015-12-18 10:38:10] ===: CountCPUs=2, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=7
[2015-12-18 10:38:10] ===: CountCPUs=3, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=1
[2015-12-18 10:38:10] ===: CountCPUs=4, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=2
[2015-12-18 10:38:10] ===: CountCPUs=5, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=4
[2015-12-18 10:38:10] ===: CountCPUs=6, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=3
[2015-12-18 10:38:10] ===: CountCPUs=7, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=6
[2015-12-18 10:38:10] ===: CountCPUs=8, SMPStack=0x6d84
[2015-12-18 10:38:10]  gonglei: finish while   

[pid 31509 is a vcpu thread used 100% cpu overhead]

# cat /proc/31509/stack  
[<ffffffffa05c337c>] vmx_vcpu_run+0x35c/0x580 [kvm_intel]
[<ffffffffa06b0f10>] em_push+0x0/0x20 [kvm]
[<ffffffffa06a30dc>] x86_emulate_instruction+0x20c/0x440 [kvm]
[<ffffffffa05c9224>] handle_exception+0xe4/0x1b58 [kvm_intel]
[<ffffffffa06a24c5>] vcpu_enter_guest+0x565/0x790 [kvm]
[<ffffffffa05bf990>] vmx_get_segment_base+0x0/0xb0 [kvm_intel]
[<ffffffffa06a2888>] __vcpu_run+0x198/0x260 [kvm]
[<ffffffffa06a3508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
[<ffffffffa068f92e>] vcpu_load+0x4e/0x80 [kvm]
[<ffffffffa068fcee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
[<ffffffff8109618d>] futex_wake+0xfd/0x110
[<ffffffff811ed57c>] security_file_permission+0x1c/0xa0
[<ffffffff8116bedb>] do_vfs_ioctl+0x8b/0x3b0
[<ffffffff8116c2a1>] sys_ioctl+0xa1/0xb0
[<ffffffff81469272>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

And kvm tracing information:

<...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180078: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180078: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180079: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180079: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180079: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180080: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180080: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180080: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180081: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180081: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180081: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180081: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180082: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180083: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180083: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180083: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180084: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180084: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180084: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180084: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180085: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180085: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180085: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180085: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180086: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306

Now, it's very clear that the guest stuck in yiled(), and then kvm encounter the exception #UD.

Do you have any thoughts? Thanks!


The Seabios patch below:

diff --git a/roms/seabios/src/boot.c b/roms/seabios/src/boot.c
index f23e9e1..552914a 100644
--- a/roms/seabios/src/boot.c
+++ b/roms/seabios/src/boot.c
@@ -93,7 +93,7 @@ glob_prefix(const char *glob, const char *str)
 static int
 find_prio(const char *glob)
 {
-    dprintf(1, "Searching bootorder for: %s\n", glob);
+    //dprintf(1, "Searching bootorder for: %s\n", glob);
     int i;
     for (i = 0; i < BootorderCount; i++)
         if (glob_prefix(glob, Bootorder[i]))
diff --git a/roms/seabios/src/fw/smp.c b/roms/seabios/src/fw/smp.c
index a466ea6..46ec607 100644
--- a/roms/seabios/src/fw/smp.c
+++ b/roms/seabios/src/fw/smp.c
@@ -46,12 +46,16 @@ int apic_id_is_present(u8 apic_id)
     return !!(FoundAPICIDs[apic_id/32] & (1ul << (apic_id % 32)));
 }
 
+// Atomic lock for shared stack across processors.
+u32 SMPLock __VISIBLE;
+u32 SMPStack __VISIBLE;
+
 void VISIBLE32FLAT
 handle_smp(void)
 {
     if (!CONFIG_QEMU)
         return;
-
+    dprintf(1, ">>> enter handle_smp...\n");
     // Enable CPU caching
     setcr0(getcr0() & ~(CR0_CD|CR0_NW));
 
@@ -70,19 +74,16 @@ handle_smp(void)
     FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
 
     CountCPUs++;
+    dprintf(1, "===: CountCPUs=%d, SMPStack=0x%x\n", CountCPUs, SMPStack);
 }
 
-// Atomic lock for shared stack across processors.
-u32 SMPLock __VISIBLE;
-u32 SMPStack __VISIBLE;
-
 // find and initialize the CPUs by launching a SIPI to them
 void
 smp_setup(void)
 {
     if (!CONFIG_QEMU)
         return;
-
+    dprintf(1, ">>>>>gonglei: enter smp_setup()...\n");
     ASSERT32FLAT();
     u32 eax, ebx, ecx, cpuid_features;
     cpuid(1, &eax, &ebx, &ecx, &cpuid_features);
@@ -106,7 +107,7 @@ smp_setup(void)
     u64 new = (0xea | ((u64)SEG_BIOS<<24)
                | (((u32)entry_smp - BUILD_BIOS_ADDR) << 8));
     *(u64*)BUILD_AP_BOOT_ADDR = new;
-
+    dprintf(1, ">>>>>gonglei: begine to enable local APIC...\n");
     // enable local APIC
     u32 val = readl(APIC_SVR);
     writel(APIC_SVR, val | APIC_ENABLED);
@@ -125,10 +126,21 @@ smp_setup(void)
     writel(APIC_ICR_LOW, 0x000C4500);
     u32 sipi_vector = BUILD_AP_BOOT_ADDR >> 12;
     writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector);
-
+    dprintf(1, ">>>>>gonglei: finish enable local APIC...\n");
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
-    while (cmos_smp_count != CountCPUs)
+    dprintf(1, ">>>gonglei: cmos_smp_count=%d\n", cmos_smp_count);
+    while (cmos_smp_count != CountCPUs) {
         asm volatile(
             // Release lock and allow other processors to use the stack.
             "  movl %%esp, %1\n"
@@ -139,8 +151,11 @@ smp_setup(void)
             "  jc 1b\n"
             : "+m" (SMPLock), "+m" (SMPStack)
             : : "cc", "memory");
+       //yield();
+    }
+    dprintf(1, " gonglei: finish while\n");
     yield();
-
+    dprintf(1, " gonglei: finish yield\n");
     // Restore memory.
     *(u64*)BUILD_AP_BOOT_ADDR = old;
 
diff --git a/roms/seabios/src/misc.c b/roms/seabios/src/misc.c
index 8caaf31..77f6be3 100644
--- a/roms/seabios/src/misc.c
+++ b/roms/seabios/src/misc.c
@@ -64,6 +64,10 @@ void VISIBLE16
 handle_02(void)
 {
     debug_isr(DEBUG_ISR_02);
+    dprintf(1, "gonglei hand nmi inject, write rtc \n");
 }
 
 void
diff --git a/roms/seabios/src/stacks.c b/roms/seabios/src/stacks.c
index 1dbdfe9..c1b5203 100644
--- a/roms/seabios/src/stacks.c
+++ b/roms/seabios/src/stacks.c
@@ -174,6 +174,7 @@ call16_smm(u32 eax, u32 edx, void *func)
 static void
 call32_sloppy_prep(void)
 {
+    dprintf(1, ">>> enter call32_sloppy_prep...\n");
     // Backup cmos index register and disable nmi
     u8 cmosindex = inb(PORT_CMOS_INDEX);
     outb(cmosindex | NMI_DISABLE_BIT, PORT_CMOS_INDEX);


Regards,
-Gonglei


> -----Original Message-----
> From: Kevin O'Connor [mailto:kevin@koconnor.net]
> Sent: Thursday, November 19, 2015 9:41 PM
> To: Xulei (Stone)
> Cc: Gonglei (Arei); qemu-devel; seabios@seabios.org; Huangweidong (C)
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
> > Kevin,
> >
> > After deeply analyzing, i think there may be 3 possible reasons:
> > 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
> > lock to protect.  So, sometimes, 2 or more vcpu may get the same
> > current value of CountCPUs. Then we'll get a single incrementation
> > instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
> > loop forever;
> 
> The handle_smp() code is called from romlayout.S:entry_smp() which does take
> a lock.  So, all of handle_smp() should run synchronous.
> 
> > 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
> 
> Not sure - the last time there were problems in this area of the code others
> used kvmtrace to try and track this down.  Since you are getting dprintf
> statements, you could also try outputting cmos_smp_count prior to the loop
> (see patch below).
> 
> > 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
> > I've tested, when yield() is running, SeaBIOS seems has not created
> > some other threads except the main thread. So I don't know what's the
> > function of yield() here.?
> 
> The yield() allows hardware interrupts to occur.  But note that
> yield() isn't called in the loop - is is only called after the loop completes.
> 
> If you are only getting this on massive repetitive reboot requests, there are
> some other possible explanations:
> 
> - perhaps the SIPI is getting lost because one of the CPUs is still
>   resetting or still processing a SIPI from the last reboot?
> 
> - the seabios code itself may have been corrupted if the memcpy() in
>   qemu_prep_reset() got far enough along to clear HaveRunPost, but did
>   not get far enough along to fully complete the memcpy().
> 
> If the failure is reproducible, the patch below could help narrow the
> possibilities.
> 
> -Kevin
> 
> 
> --- a/src/fw/smp.c
> +++ b/src/fw/smp.c
> @@ -125,6 +125,7 @@ smp_setup(void)
> 
>      // Wait for other CPUs to process the SIPI.
>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
> +    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
>      while (cmos_smp_count != CountCPUs)
>          asm volatile(
>              // Release lock and allow other processors to use the stack.
> @@ -136,6 +137,7 @@ smp_setup(void)
>              "  jc 1b\n"
>              : "+m" (SMPLock), "+m" (SMPStack)
>              : : "cc", "memory");
> +    dprintf(1, "finish smp\n");
>      yield();
> 
>      // Restore memory.

WARNING: multiple messages have this Message-ID (diff)
From: "Gonglei (Arei)" <arei.gonglei@huawei.com>
To: Kevin O'Connor <kevin@koconnor.net>,
	"Xulei (Stone)" <stone.xulei@huawei.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: "seabios@seabios.org" <seabios@seabios.org>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Date: Sat, 19 Dec 2015 01:08:05 +0000	[thread overview]
Message-ID: <33183CC9F5247A488A2544077AF19020B02B79EE@SZXEMA503-MBS.china.huawei.com> (raw)
In-Reply-To: <20151119134039.GA27717@morn.lan>

Hi Kevin & Paolo,

Luckily, I reproduced this problem last night. And I got the below log when SeaBIOS is stuck.

[BTW, the whole SeaBIOS log attached]

[2015-12-18 10:38:10] >>>>>gonglei: enter smp_setup()...
[2015-12-18 10:38:10] >>>>>gonglei: begine to enable local APIC...
[2015-12-18 10:38:10] >>>>>gonglei: finish enable local APIC...
[2015-12-18 10:38:10] >>>gonglei: cmos_smp_count=8
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=5
[2015-12-18 10:38:10] ===: CountCPUs=2, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=7
[2015-12-18 10:38:10] ===: CountCPUs=3, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=1
[2015-12-18 10:38:10] ===: CountCPUs=4, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=2
[2015-12-18 10:38:10] ===: CountCPUs=5, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=4
[2015-12-18 10:38:10] ===: CountCPUs=6, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=3
[2015-12-18 10:38:10] ===: CountCPUs=7, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=6
[2015-12-18 10:38:10] ===: CountCPUs=8, SMPStack=0x6d84
[2015-12-18 10:38:10]  gonglei: finish while   

[pid 31509 is a vcpu thread used 100% cpu overhead]

# cat /proc/31509/stack  
[<ffffffffa05c337c>] vmx_vcpu_run+0x35c/0x580 [kvm_intel]
[<ffffffffa06b0f10>] em_push+0x0/0x20 [kvm]
[<ffffffffa06a30dc>] x86_emulate_instruction+0x20c/0x440 [kvm]
[<ffffffffa05c9224>] handle_exception+0xe4/0x1b58 [kvm_intel]
[<ffffffffa06a24c5>] vcpu_enter_guest+0x565/0x790 [kvm]
[<ffffffffa05bf990>] vmx_get_segment_base+0x0/0xb0 [kvm_intel]
[<ffffffffa06a2888>] __vcpu_run+0x198/0x260 [kvm]
[<ffffffffa06a3508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
[<ffffffffa068f92e>] vcpu_load+0x4e/0x80 [kvm]
[<ffffffffa068fcee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
[<ffffffff8109618d>] futex_wake+0xfd/0x110
[<ffffffff811ed57c>] security_file_permission+0x1c/0xa0
[<ffffffff8116bedb>] do_vfs_ioctl+0x8b/0x3b0
[<ffffffff8116c2a1>] sys_ioctl+0xa1/0xb0
[<ffffffff81469272>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

And kvm tracing information:

<...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180078: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180078: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180079: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180079: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180079: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180080: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180080: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180080: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180081: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180081: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180081: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180081: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180082: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180083: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180083: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180083: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180084: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180084: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180084: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180084: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180085: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
<...>-31509 [035] 154753.180085: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180085: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180085: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180086: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306

Now, it's very clear that the guest stuck in yiled(), and then kvm encounter the exception #UD.

Do you have any thoughts? Thanks!


The Seabios patch below:

diff --git a/roms/seabios/src/boot.c b/roms/seabios/src/boot.c
index f23e9e1..552914a 100644
--- a/roms/seabios/src/boot.c
+++ b/roms/seabios/src/boot.c
@@ -93,7 +93,7 @@ glob_prefix(const char *glob, const char *str)
 static int
 find_prio(const char *glob)
 {
-    dprintf(1, "Searching bootorder for: %s\n", glob);
+    //dprintf(1, "Searching bootorder for: %s\n", glob);
     int i;
     for (i = 0; i < BootorderCount; i++)
         if (glob_prefix(glob, Bootorder[i]))
diff --git a/roms/seabios/src/fw/smp.c b/roms/seabios/src/fw/smp.c
index a466ea6..46ec607 100644
--- a/roms/seabios/src/fw/smp.c
+++ b/roms/seabios/src/fw/smp.c
@@ -46,12 +46,16 @@ int apic_id_is_present(u8 apic_id)
     return !!(FoundAPICIDs[apic_id/32] & (1ul << (apic_id % 32)));
 }
 
+// Atomic lock for shared stack across processors.
+u32 SMPLock __VISIBLE;
+u32 SMPStack __VISIBLE;
+
 void VISIBLE32FLAT
 handle_smp(void)
 {
     if (!CONFIG_QEMU)
         return;
-
+    dprintf(1, ">>> enter handle_smp...\n");
     // Enable CPU caching
     setcr0(getcr0() & ~(CR0_CD|CR0_NW));
 
@@ -70,19 +74,16 @@ handle_smp(void)
     FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
 
     CountCPUs++;
+    dprintf(1, "===: CountCPUs=%d, SMPStack=0x%x\n", CountCPUs, SMPStack);
 }
 
-// Atomic lock for shared stack across processors.
-u32 SMPLock __VISIBLE;
-u32 SMPStack __VISIBLE;
-
 // find and initialize the CPUs by launching a SIPI to them
 void
 smp_setup(void)
 {
     if (!CONFIG_QEMU)
         return;
-
+    dprintf(1, ">>>>>gonglei: enter smp_setup()...\n");
     ASSERT32FLAT();
     u32 eax, ebx, ecx, cpuid_features;
     cpuid(1, &eax, &ebx, &ecx, &cpuid_features);
@@ -106,7 +107,7 @@ smp_setup(void)
     u64 new = (0xea | ((u64)SEG_BIOS<<24)
                | (((u32)entry_smp - BUILD_BIOS_ADDR) << 8));
     *(u64*)BUILD_AP_BOOT_ADDR = new;
-
+    dprintf(1, ">>>>>gonglei: begine to enable local APIC...\n");
     // enable local APIC
     u32 val = readl(APIC_SVR);
     writel(APIC_SVR, val | APIC_ENABLED);
@@ -125,10 +126,21 @@ smp_setup(void)
     writel(APIC_ICR_LOW, 0x000C4500);
     u32 sipi_vector = BUILD_AP_BOOT_ADDR >> 12;
     writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector);
-
+    dprintf(1, ">>>>>gonglei: finish enable local APIC...\n");
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
-    while (cmos_smp_count != CountCPUs)
+    dprintf(1, ">>>gonglei: cmos_smp_count=%d\n", cmos_smp_count);
+    while (cmos_smp_count != CountCPUs) {
         asm volatile(
             // Release lock and allow other processors to use the stack.
             "  movl %%esp, %1\n"
@@ -139,8 +151,11 @@ smp_setup(void)
             "  jc 1b\n"
             : "+m" (SMPLock), "+m" (SMPStack)
             : : "cc", "memory");
+       //yield();
+    }
+    dprintf(1, " gonglei: finish while\n");
     yield();
-
+    dprintf(1, " gonglei: finish yield\n");
     // Restore memory.
     *(u64*)BUILD_AP_BOOT_ADDR = old;
 
diff --git a/roms/seabios/src/misc.c b/roms/seabios/src/misc.c
index 8caaf31..77f6be3 100644
--- a/roms/seabios/src/misc.c
+++ b/roms/seabios/src/misc.c
@@ -64,6 +64,10 @@ void VISIBLE16
 handle_02(void)
 {
     debug_isr(DEBUG_ISR_02);
+    dprintf(1, "gonglei hand nmi inject, write rtc \n");
 }
 
 void
diff --git a/roms/seabios/src/stacks.c b/roms/seabios/src/stacks.c
index 1dbdfe9..c1b5203 100644
--- a/roms/seabios/src/stacks.c
+++ b/roms/seabios/src/stacks.c
@@ -174,6 +174,7 @@ call16_smm(u32 eax, u32 edx, void *func)
 static void
 call32_sloppy_prep(void)
 {
+    dprintf(1, ">>> enter call32_sloppy_prep...\n");
     // Backup cmos index register and disable nmi
     u8 cmosindex = inb(PORT_CMOS_INDEX);
     outb(cmosindex | NMI_DISABLE_BIT, PORT_CMOS_INDEX);


Regards,
-Gonglei


> -----Original Message-----
> From: Kevin O'Connor [mailto:kevin@koconnor.net]
> Sent: Thursday, November 19, 2015 9:41 PM
> To: Xulei (Stone)
> Cc: Gonglei (Arei); qemu-devel; seabios@seabios.org; Huangweidong (C)
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
> > Kevin,
> >
> > After deeply analyzing, i think there may be 3 possible reasons:
> > 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
> > lock to protect.  So, sometimes, 2 or more vcpu may get the same
> > current value of CountCPUs. Then we'll get a single incrementation
> > instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
> > loop forever;
> 
> The handle_smp() code is called from romlayout.S:entry_smp() which does take
> a lock.  So, all of handle_smp() should run synchronous.
> 
> > 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
> 
> Not sure - the last time there were problems in this area of the code others
> used kvmtrace to try and track this down.  Since you are getting dprintf
> statements, you could also try outputting cmos_smp_count prior to the loop
> (see patch below).
> 
> > 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
> > I've tested, when yield() is running, SeaBIOS seems has not created
> > some other threads except the main thread. So I don't know what's the
> > function of yield() here.?
> 
> The yield() allows hardware interrupts to occur.  But note that
> yield() isn't called in the loop - is is only called after the loop completes.
> 
> If you are only getting this on massive repetitive reboot requests, there are
> some other possible explanations:
> 
> - perhaps the SIPI is getting lost because one of the CPUs is still
>   resetting or still processing a SIPI from the last reboot?
> 
> - the seabios code itself may have been corrupted if the memcpy() in
>   qemu_prep_reset() got far enough along to clear HaveRunPost, but did
>   not get far enough along to fully complete the memcpy().
> 
> If the failure is reproducible, the patch below could help narrow the
> possibilities.
> 
> -Kevin
> 
> 
> --- a/src/fw/smp.c
> +++ b/src/fw/smp.c
> @@ -125,6 +125,7 @@ smp_setup(void)
> 
>      // Wait for other CPUs to process the SIPI.
>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
> +    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
>      while (cmos_smp_count != CountCPUs)
>          asm volatile(
>              // Release lock and allow other processors to use the stack.
> @@ -136,6 +137,7 @@ smp_setup(void)
>              "  jc 1b\n"
>              : "+m" (SMPLock), "+m" (SMPStack)
>              : : "cc", "memory");
> +    dprintf(1, "finish smp\n");
>      yield();
> 
>      // Restore memory.

  parent reply	other threads:[~2015-12-19  1:08 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-03  6:58 [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform Xulei (Stone, Euler)
2015-11-04  0:48 ` Gonglei
2015-11-04 17:42   ` Kevin O'Connor
2015-11-06  9:12     ` Xulei (Stone)
2015-11-09 13:32       ` Kevin O'Connor
2015-11-09 20:06         ` Kevin O'Connor
2015-11-09 20:27           ` Kevin O'Connor
2015-11-19  1:04             ` Xulei (Stone)
2015-11-19 12:42               ` Xulei (Stone)
2015-11-19 13:40                 ` Kevin O'Connor
2015-11-20  2:05                   ` Xulei (Stone)
     [not found]                   ` <33183CC9F5247A488A2544077AF19020B02B72BA@SZXEMA503-MBS.china.huawei.com>
2015-12-18 23:13                     ` Kevin O'Connor
2015-12-18 23:13                       ` Kevin O'Connor
2015-12-19  6:28                       ` Gonglei (Arei)
2015-12-19  6:28                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-19 12:03                       ` Gonglei (Arei)
2015-12-19 12:03                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-19 15:11                         ` Kevin O'Connor
2015-12-19 15:11                           ` Kevin O'Connor
2015-12-20  9:49                           ` Gonglei (Arei)
2015-12-20  9:49                             ` [Qemu-devel] " Gonglei (Arei)
2015-12-20 14:33                             ` Kevin O'Connor
2015-12-20 14:33                               ` Kevin O'Connor
2015-12-21  9:41                               ` Gonglei (Arei)
2015-12-21  9:41                                 ` [Qemu-devel] " Gonglei (Arei)
2015-12-21 18:47                                 ` Kevin O'Connor
2015-12-21 18:47                                   ` [Qemu-devel] " Kevin O'Connor
2015-12-22  2:14                                   ` Gonglei (Arei)
2015-12-22  2:14                                     ` Gonglei (Arei)
2015-12-22  3:15                                     ` Xulei (Stone)
2015-12-22  3:15                                       ` [Qemu-devel] " Xulei (Stone)
2015-12-22 15:38                                       ` Kevin O'Connor
2015-12-22 15:38                                         ` [Qemu-devel] " Kevin O'Connor
2015-12-22 15:51                                     ` Kevin O'Connor
2015-12-22 15:51                                       ` Kevin O'Connor
2015-12-23  6:40                                       ` Gonglei (Arei)
2015-12-23  6:40                                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-23 18:06                                         ` Kevin O'Connor
2015-12-23 18:06                                           ` Kevin O'Connor
2015-12-19  1:08                   ` Gonglei (Arei) [this message]
2015-12-19  1:08                     ` Gonglei (Arei)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33183CC9F5247A488A2544077AF19020B02B79EE@SZXEMA503-MBS.china.huawei.com \
    --to=arei.gonglei@huawei.com \
    --cc=kevin@koconnor.net \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seabios@seabios.org \
    --cc=stone.xulei@huawei.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.