All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Xulei (Stone)" <stone.xulei@huawei.com>
To: Kevin O'Connor <kevin@koconnor.net>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"seabios@seabios.org" <seabios@seabios.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Date: Fri, 20 Nov 2015 02:05:55 +0000	[thread overview]
Message-ID: <8E78D212B8C25246BE4CE7EA0E645FE52B88A9@SZXEMI504-MBS.china.huawei.com> (raw)
In-Reply-To: 20151119134039.GA27717@morn.lan



>On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
>> Kevin,
>> 
>> After deeply analyzing, i think there may be 3 possible reasons:
>> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
>> lock to protect.  So, sometimes, 2 or more vcpu may get the same
>> current value of CountCPUs. Then we'll get a single incrementation
>> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
>> loop forever;
>
>The handle_smp() code is called from romlayout.S:entry_smp() which
>does take a lock.  So, all of handle_smp() should run synchronous.
>

Ok, this possibility is ruled out!

>> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
>
>Not sure - the last time there were problems in this area of the code
>others used kvmtrace to try and track this down.  Since you are
>getting dprintf statements, you could also try outputting
>cmos_smp_count prior to the loop (see patch below).
>
I'll test again with this patch, and observe the output.
But frankly speaking, i don't think SeaBIOS may read an incorrect number.
Because, QEMU set smp_cpus value in pc_cmos_init (cmos ox5f) at the 1st
time (QEMU does not execute pc_cmos_init again during repetitive reboot).
SeaBIOS works well for many times which means there is no reason that
a time SeaBIOS suddenly reads an incorrect number.

>> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
>> I've tested, when yield() is running, SeaBIOS seems has not created
>> some other threads except the main thread. So I don't know what's
>> the function of yield() here.?
>
>The yield() allows hardware interrupts to occur.  But note that
>yield() isn't called in the loop - is is only called after the loop
>completes.
>
>If you are only getting this on massive repetitive reboot requests,
>there are some other possible explanations:
>
>- perhaps the SIPI is getting lost because one of the CPUs is still
>  resetting or still processing a SIPI from the last reboot?
>
Seems impossible. Seen from the qemu log and SeaBIOS log, the VM has 
booted successfully and can execute our "regular rebooting" daemon
process before the last reboot.

>- the seabios code itself may have been corrupted if the memcpy() in
>  qemu_prep_reset() got far enough along to clear HaveRunPost, but did
>  not get far enough along to fully complete the memcpy().
>
BTW, my VM reboots every 220 second (reboot time interval = 220s). I think
SeaBIOS has enough time to process all kinds of affairs, like SIPI and
memcpy().

Kevin, I want to know whether it is possible that if my VM is stuck at QEMU
(a point of pci device reset procedure? or whatever) SeaBIOS will hold at
hanle_smp() and could not printf "Found %d cpu(s) max supported %d cpu(s)"?
Is this possible?

================== bad QEMU log=======
[2015-11-13 18:45:57] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:57] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650381}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 796285}, "event": "RESET"}
[2015-11-13 18:45:58] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:59] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411560, "microseconds": 212196}, "event": "RESET"}
[2015-11-13 18:46:00] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650558}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}

================ good QEMU log=========
[2015-11-13 18:42:12] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:12] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718617}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 848236}, "event": "RESET"}
[2015-11-13 18:42:14] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:14] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411335, "microseconds": 280198}, "event": "RESET"}
[2015-11-13 18:42:15] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718794}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 3 that means DRIVER
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '-1' hanle control message event = 0, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 4 send control message event = 1, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '1' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '2' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '3' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '4' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '2' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 214826}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel1"}}
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '1' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '3' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '4' hanle control message event = 6, value = 1
[2015-11-13 18:42:31] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 220665}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel3"}}

>If the failure is reproducible, the patch below could help narrow the
>possibilities.
>
>-Kevin
>
>
>--- a/src/fw/smp.c
>+++ b/src/fw/smp.c
>@@ -125,6 +125,7 @@ smp_setup(void)
> 
>     // Wait for other CPUs to process the SIPI.
>     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>+    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
>     while (cmos_smp_count != CountCPUs)
>         asm volatile(
>             // Release lock and allow other processors to use the stack.
>@@ -136,6 +137,7 @@ smp_setup(void)
>             "  jc 1b\n"
>             : "+m" (SMPLock), "+m" (SMPStack)
>             : : "cc", "memory");
>+    dprintf(1, "finish smp\n");
>     yield();
> 
>     // Restore memory.

  reply	other threads:[~2015-11-20  2:06 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-03  6:58 [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform Xulei (Stone, Euler)
2015-11-04  0:48 ` Gonglei
2015-11-04 17:42   ` Kevin O'Connor
2015-11-06  9:12     ` Xulei (Stone)
2015-11-09 13:32       ` Kevin O'Connor
2015-11-09 20:06         ` Kevin O'Connor
2015-11-09 20:27           ` Kevin O'Connor
2015-11-19  1:04             ` Xulei (Stone)
2015-11-19 12:42               ` Xulei (Stone)
2015-11-19 13:40                 ` Kevin O'Connor
2015-11-20  2:05                   ` Xulei (Stone) [this message]
     [not found]                   ` <33183CC9F5247A488A2544077AF19020B02B72BA@SZXEMA503-MBS.china.huawei.com>
2015-12-18 23:13                     ` Kevin O'Connor
2015-12-18 23:13                       ` Kevin O'Connor
2015-12-19  6:28                       ` Gonglei (Arei)
2015-12-19  6:28                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-19 12:03                       ` Gonglei (Arei)
2015-12-19 12:03                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-19 15:11                         ` Kevin O'Connor
2015-12-19 15:11                           ` Kevin O'Connor
2015-12-20  9:49                           ` Gonglei (Arei)
2015-12-20  9:49                             ` [Qemu-devel] " Gonglei (Arei)
2015-12-20 14:33                             ` Kevin O'Connor
2015-12-20 14:33                               ` Kevin O'Connor
2015-12-21  9:41                               ` Gonglei (Arei)
2015-12-21  9:41                                 ` [Qemu-devel] " Gonglei (Arei)
2015-12-21 18:47                                 ` Kevin O'Connor
2015-12-21 18:47                                   ` [Qemu-devel] " Kevin O'Connor
2015-12-22  2:14                                   ` Gonglei (Arei)
2015-12-22  2:14                                     ` Gonglei (Arei)
2015-12-22  3:15                                     ` Xulei (Stone)
2015-12-22  3:15                                       ` [Qemu-devel] " Xulei (Stone)
2015-12-22 15:38                                       ` Kevin O'Connor
2015-12-22 15:38                                         ` [Qemu-devel] " Kevin O'Connor
2015-12-22 15:51                                     ` Kevin O'Connor
2015-12-22 15:51                                       ` Kevin O'Connor
2015-12-23  6:40                                       ` Gonglei (Arei)
2015-12-23  6:40                                         ` [Qemu-devel] " Gonglei (Arei)
2015-12-23 18:06                                         ` Kevin O'Connor
2015-12-23 18:06                                           ` Kevin O'Connor
2015-12-19  1:08                   ` Gonglei (Arei)
2015-12-19  1:08                     ` [Qemu-devel] " Gonglei (Arei)
  -- strict thread matches above, loose matches on Subject: below --
2015-11-04  0:19 Xulei (Stone, Euler)
2015-11-03  6:29 Xulei (Stone, Euler)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8E78D212B8C25246BE4CE7EA0E645FE52B88A9@SZXEMI504-MBS.china.huawei.com \
    --to=stone.xulei@huawei.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kevin@koconnor.net \
    --cc=qemu-devel@nongnu.org \
    --cc=seabios@seabios.org \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.