From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin O'Connor Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error Date: Wed, 11 Mar 2015 14:40:39 -0400 Message-ID: <20150311184039.GA7341@morn.localdomain> References: <54FF4541.9080608@redhat.com> <20150310202958.GR2338@work-vm> <20150311134556.GH2334@work-vm> <20150311154220.GA26463@morn.localdomain> <20150311155306.GK2334@work-vm> <20150311163739.GA29522@morn.localdomain> <20150311165203.GL2334@work-vm> <20150311173738.GD29522@morn.localdomain> <20150311175904.GN2334@work-vm> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Bandan Das , Paolo Bonzini , kraxel@redhat.com, Andrey Korolyov , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" To: "Dr. David Alan Gilbert" Return-path: Received: from mail-qc0-f172.google.com ([209.85.216.172]:40852 "EHLO mail-qc0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751176AbbCKSkm (ORCPT ); Wed, 11 Mar 2015 14:40:42 -0400 Received: by qcvs11 with SMTP id s11so12591914qcv.7 for ; Wed, 11 Mar 2015 11:40:41 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20150311175904.GN2334@work-vm> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Mar 11, 2015 at 05:59:04PM +0000, Dr. David Alan Gilbert wrote: > * Kevin O'Connor (kevin@koconnor.net) wrote: > > On Wed, Mar 11, 2015 at 04:52:03PM +0000, Dr. David Alan Gilbert wrote: > > > * Kevin O'Connor (kevin@koconnor.net) wrote: > > > > So, I couldn't get this to fail on my older AMD machine at all with > > > > the default SeaBIOS code. But, when I change the code with the patch > > > > below, it failed right away. > > [...] > > > > And the failed debug output looks like: > > > > > > > > SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain) > > > > [...] > > > > cmos_smp_count0=20 > > > > [...] > > > > cmos_smp_count=1 > > > > cmos_smp_count2=1/20 > > > > Found 1 cpu(s) max supported 20 cpu(s) > > > > > > > > I'm going to check the assembly for a compiler error, but is it > > > > possible QEMU is returning incorrect data in cmos index 0x5f? > > > > I checked the SeaBIOS assembler and it looks sane. So, I think the > > question is, why is QEMU sometimes returning a 0 instead of 127 from > > cmos 0x5f. > > My reading of the logs I've just created is that qemu doesn't think > it's ever being asked to read 5f in the failed case: > > good: > > pc_cmos_init 5f setting smp_cpus=20 > cmos: read index=0x0f val=0x00 > cmos: read index=0x34 val=0x00 > cmos: read index=0x35 val=0x3f > cmos: read index=0x38 val=0x30 > cmos: read index=0x3d val=0x12 > cmos: read index=0x38 val=0x30 > cmos: read index=0x0b val=0x02 > cmos: read index=0x0d val=0x80 > cmos: read index=0x5f val=0x13 Yeh! > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > > bad: > pc_cmos_init 5f setting smp_cpus=20 > cmos: read index=0x0f val=0x00 > cmos: read index=0x34 val=0x00 > cmos: read index=0x35 val=0x3f > cmos: read index=0x38 val=0x30 > cmos: read index=0x3d val=0x12 > cmos: read index=0x38 val=0x30 > cmos: read index=0x0b val=0x02 > cmos: read index=0x0d val=0x80 Oh! > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 For what it's worth, I can't seem to trigger the problem if I move the cmos read above the SIPI/LAPIC code (see patch below). I used this command line: while true; do (sleep 5; echo -e '\001cq\n')| ../qemu/qemu-git/x86_64-softmmu/qemu-system-x86_64 -chardev file,path=foo.`date +%s`,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga -L test 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done This is on an "AMD Phenom(tm) II X6 1090T Processor" machine. -Kevin --- a/src/fw/smp.c +++ b/src/fw/smp.c @@ -107,6 +107,8 @@ smp_setup(void) | (((u32)entry_smp - BUILD_BIOS_ADDR) << 8)); *(u64*)BUILD_AP_BOOT_ADDR = new; + u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + // enable local APIC u32 val = readl(APIC_SVR); writel(APIC_SVR, val | APIC_ENABLED); @@ -127,7 +129,7 @@ smp_setup(void) writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector); // Wait for other CPUs to process the SIPI. - u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. @@ -140,6 +142,8 @@ smp_setup(void) : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield(); + dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count + , rtc_read(CMOS_BIOS_SMP_COUNT) + 1); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old; diff --git a/src/post.c b/src/post.c index 9ea5620..dc11c72 100644 --- a/src/post.c +++ b/src/post.c @@ -170,6 +170,7 @@ platform_hardware_setup(void) clock_setup(); // Platform specific setup + dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1); qemu_platform_setup(); coreboot_platform_setup(); } From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50270) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVlYl-0005Kc-4k for qemu-devel@nongnu.org; Wed, 11 Mar 2015 14:40:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YVlYf-000811-S4 for qemu-devel@nongnu.org; Wed, 11 Mar 2015 14:40:46 -0400 Received: from mail-qc0-f176.google.com ([209.85.216.176]:46530) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVlYf-00080p-OP for qemu-devel@nongnu.org; Wed, 11 Mar 2015 14:40:41 -0400 Received: by qcyl6 with SMTP id l6so12566716qcy.13 for ; Wed, 11 Mar 2015 11:40:41 -0700 (PDT) Date: Wed, 11 Mar 2015 14:40:39 -0400 From: Kevin O'Connor Message-ID: <20150311184039.GA7341@morn.localdomain> References: <54FF4541.9080608@redhat.com> <20150310202958.GR2338@work-vm> <20150311134556.GH2334@work-vm> <20150311154220.GA26463@morn.localdomain> <20150311155306.GK2334@work-vm> <20150311163739.GA29522@morn.localdomain> <20150311165203.GL2334@work-vm> <20150311173738.GD29522@morn.localdomain> <20150311175904.GN2334@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150311175904.GN2334@work-vm> Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Andrey Korolyov , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , Bandan Das , kraxel@redhat.com, Paolo Bonzini On Wed, Mar 11, 2015 at 05:59:04PM +0000, Dr. David Alan Gilbert wrote: > * Kevin O'Connor (kevin@koconnor.net) wrote: > > On Wed, Mar 11, 2015 at 04:52:03PM +0000, Dr. David Alan Gilbert wrote: > > > * Kevin O'Connor (kevin@koconnor.net) wrote: > > > > So, I couldn't get this to fail on my older AMD machine at all with > > > > the default SeaBIOS code. But, when I change the code with the patch > > > > below, it failed right away. > > [...] > > > > And the failed debug output looks like: > > > > > > > > SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain) > > > > [...] > > > > cmos_smp_count0=20 > > > > [...] > > > > cmos_smp_count=1 > > > > cmos_smp_count2=1/20 > > > > Found 1 cpu(s) max supported 20 cpu(s) > > > > > > > > I'm going to check the assembly for a compiler error, but is it > > > > possible QEMU is returning incorrect data in cmos index 0x5f? > > > > I checked the SeaBIOS assembler and it looks sane. So, I think the > > question is, why is QEMU sometimes returning a 0 instead of 127 from > > cmos 0x5f. > > My reading of the logs I've just created is that qemu doesn't think > it's ever being asked to read 5f in the failed case: > > good: > > pc_cmos_init 5f setting smp_cpus=20 > cmos: read index=0x0f val=0x00 > cmos: read index=0x34 val=0x00 > cmos: read index=0x35 val=0x3f > cmos: read index=0x38 val=0x30 > cmos: read index=0x3d val=0x12 > cmos: read index=0x38 val=0x30 > cmos: read index=0x0b val=0x02 > cmos: read index=0x0d val=0x80 > cmos: read index=0x5f val=0x13 Yeh! > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > > bad: > pc_cmos_init 5f setting smp_cpus=20 > cmos: read index=0x0f val=0x00 > cmos: read index=0x34 val=0x00 > cmos: read index=0x35 val=0x3f > cmos: read index=0x38 val=0x30 > cmos: read index=0x3d val=0x12 > cmos: read index=0x38 val=0x30 > cmos: read index=0x0b val=0x02 > cmos: read index=0x0d val=0x80 Oh! > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 > cmos: read index=0x0f val=0x00 For what it's worth, I can't seem to trigger the problem if I move the cmos read above the SIPI/LAPIC code (see patch below). I used this command line: while true; do (sleep 5; echo -e '\001cq\n')| ../qemu/qemu-git/x86_64-softmmu/qemu-system-x86_64 -chardev file,path=foo.`date +%s`,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga -L test 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done This is on an "AMD Phenom(tm) II X6 1090T Processor" machine. -Kevin --- a/src/fw/smp.c +++ b/src/fw/smp.c @@ -107,6 +107,8 @@ smp_setup(void) | (((u32)entry_smp - BUILD_BIOS_ADDR) << 8)); *(u64*)BUILD_AP_BOOT_ADDR = new; + u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + // enable local APIC u32 val = readl(APIC_SVR); writel(APIC_SVR, val | APIC_ENABLED); @@ -127,7 +129,7 @@ smp_setup(void) writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector); // Wait for other CPUs to process the SIPI. - u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. @@ -140,6 +142,8 @@ smp_setup(void) : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield(); + dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count + , rtc_read(CMOS_BIOS_SMP_COUNT) + 1); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old; diff --git a/src/post.c b/src/post.c index 9ea5620..dc11c72 100644 --- a/src/post.c +++ b/src/post.c @@ -170,6 +170,7 @@ platform_hardware_setup(void) clock_setup(); // Platform specific setup + dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1); qemu_platform_setup(); coreboot_platform_setup(); }