All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-06 10:19 ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-06 10:19 UTC (permalink / raw)
  To: Gonglei (Arei), Paolo Bonzini, jan.kiszka, amit.shah, quintela
  Cc: Huangweidong (C), qemu-devel, kvm

Hi all,

I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.

In actually, we can't attach our initial intention under a special situation:

1. the process of migration destination is in a coroutine
 Commit 82a4da79fd6
2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
 if the Qemu encounter a EAGIN while reading QEMUFile.
 commit 595ab64169b

static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
                                 size_t size)
{
    QEMUFileSocket *s = opaque;
    ssize_t len;

    for (;;) {
        len = qemu_recv(s->fd, buf, size, 0);
        if (len != -1) {
            break;
        }
        if (socket_error() == EAGAIN) {
            yield_until_fd_readable(s->fd);
        } else if (socket_error() != EINTR) {
            break;
        }
    }

    if (len == -1) {
        len = -socket_error();
    }
    return len;
}

3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
  after invoking fetch_active_ports_list().
4. reproduce the problem by fault injection.

The debug logs:

[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
[2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co()    // Enter corountine
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
[2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
[2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug **********    // fault injection begin, migration coroutine yiled cpu
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592    //vm_clock timer boom
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb   //calling the timer callback
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state   //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
[2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started 
[2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start            //enter vm_start here
[2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
[2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961


The below is my debugging diff (Not based on the newest master branch)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 6a45af9..a00b02f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
     if (!s->post_load) {
         return;
     }
+    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
     for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
         port = s->post_load->connected[i].port;
         host_connected = s->post_load->connected[i].host_connected;
@@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
              */
             send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
                                port->host_connected);
+            QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
         }
         vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
         if (vsc->set_guest_connected) {
@@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
         }
     }
     timer_mod(s->post_load->timer, 1);
+    QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
+    yield_until_fd_readable(qemu_get_fd(f));
     return 0;
 }
 
@@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
     if (version_id > 3) {
         return -EINVAL;
     }
-
+    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
     /* The virtio device */
     return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
 }
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 5b47056..2d988cb 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
 {
     APICCommonState *s = APIC_COMMON(dev);
     int i;
-
+    QEMU_LOG("gonglei: kvm_put_apic_state\n");
     memset(kapic, 0, sizeof(*kapic));
     kvm_apic_set_reg(kapic, 0x2, s->id << 24);
     kvm_apic_set_reg(kapic, 0x8, s->tpr);
diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index d2a6c4c..07699ec 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
 
     chip.chip_id = KVM_IRQCHIP_IOAPIC;
     kioapic = &chip.chip.ioapic;
-
+    QEMU_LOG("gonglei: kvm_ioapic_put\n");
     kioapic->id = s->id;
     kioapic->ioregsel = s->ioregsel;
     kioapic->base_address = s->busdev.mmio[0].addr;
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 042e960..110be20 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
     if (info->post_load) {
         info->post_load(s);
     }
+    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
     return 0;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 92c1427..d074c0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
     const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
     const char *portName = "charchannel1";
     size_t uRet;
-
+    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
diff --git a/qemu-timer.c b/qemu-timer.c
index 5741f0d..dd36bc9 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
     QEMUTimerList *tl;
     bool old = clock->enabled;
     clock->enabled = enabled;
+    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
     if (enabled && !old) {
         qemu_clock_notify(type);
     } else if (!enabled && old) {
@@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
     bool progress = false;
     QEMUTimerCB *cb;
     void *opaque;
+    static int count = 10;
 
     qemu_event_reset(&timer_list->timers_done_ev);
     if (!timer_list->clock->enabled) {
@@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
             qemu_mutex_unlock(&timer_list->active_timers_lock);
             break;
         }
-
+	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
+	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
+             ts->expire_time, current_time);
+	}
         /* remove timer from the list before calling the callback */
         timer_list->active_timers = ts->next;
         ts->next = NULL;
diff --git a/vl.c b/vl.c
index 06f34fe..824b246 100644
--- a/vl.c
+++ b/vl.c
@@ -978,7 +978,7 @@ void vm_start(void)
 {
     RunState requested;
     int64_t start_time, end_time;
-
+    QEMU_LOG("gonglei: enter vm_start\n");
     qemu_vmstop_requested(&requested);
     if (runstate_is_running() && requested == RUN_STATE_MAX) {
         return;


Can we wait looply the migration process finished when EAGAIN?

Regards,
-Gonglei


> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Friday, August 05, 2016 5:14 PM
> To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> Subject: [Question] virtio-serial misses irq delivery on migration?
> 
> Hi Paolo , Jan, Amit
> 
> Recently we encountered a problem that the virtio-serial can't work after
> Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> 
> It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> because
> the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> interrupt
> can't be injected to VM because the irq line bit had been set to 1, but the
> frontend
> driver never handle it or never know it.
> 
>  Bug 867366 - virtio-serial misses irq delivery on migration
> https://bugzilla.redhat.com/show_bug.cgi?id=867366
> 
> But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> post_load
> send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> propagate
> guest_connected to the port on post_load" are applied.
> 
> I noticed that Paolo posted another problem maybe have a pertential problem
> about
> apic in Comment 23. But this patch
>  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> haven't merged into qemu master.
> 
> Would you give me some clues please? Thanks!
> 
> Regards,
> -Gonglei
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-06 10:19 ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-06 10:19 UTC (permalink / raw)
  To: Gonglei (Arei), Paolo Bonzini, jan.kiszka, amit.shah, quintela
  Cc: qemu-devel, kvm, Huangweidong (C)

Hi all,

I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.

In actually, we can't attach our initial intention under a special situation:

1. the process of migration destination is in a coroutine
 Commit 82a4da79fd6
2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
 if the Qemu encounter a EAGIN while reading QEMUFile.
 commit 595ab64169b

static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
                                 size_t size)
{
    QEMUFileSocket *s = opaque;
    ssize_t len;

    for (;;) {
        len = qemu_recv(s->fd, buf, size, 0);
        if (len != -1) {
            break;
        }
        if (socket_error() == EAGAIN) {
            yield_until_fd_readable(s->fd);
        } else if (socket_error() != EINTR) {
            break;
        }
    }

    if (len == -1) {
        len = -socket_error();
    }
    return len;
}

3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
  after invoking fetch_active_ports_list().
4. reproduce the problem by fault injection.

The debug logs:

[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
[2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co()    // Enter corountine
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
[2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
[2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug **********    // fault injection begin, migration coroutine yiled cpu
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592    //vm_clock timer boom
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb   //calling the timer callback
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state   //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
[2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started 
[2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start            //enter vm_start here
[2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
[2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961


The below is my debugging diff (Not based on the newest master branch)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 6a45af9..a00b02f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
     if (!s->post_load) {
         return;
     }
+    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
     for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
         port = s->post_load->connected[i].port;
         host_connected = s->post_load->connected[i].host_connected;
@@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
              */
             send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
                                port->host_connected);
+            QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
         }
         vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
         if (vsc->set_guest_connected) {
@@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
         }
     }
     timer_mod(s->post_load->timer, 1);
+    QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
+    yield_until_fd_readable(qemu_get_fd(f));
     return 0;
 }
 
@@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
     if (version_id > 3) {
         return -EINVAL;
     }
-
+    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
     /* The virtio device */
     return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
 }
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 5b47056..2d988cb 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
 {
     APICCommonState *s = APIC_COMMON(dev);
     int i;
-
+    QEMU_LOG("gonglei: kvm_put_apic_state\n");
     memset(kapic, 0, sizeof(*kapic));
     kvm_apic_set_reg(kapic, 0x2, s->id << 24);
     kvm_apic_set_reg(kapic, 0x8, s->tpr);
diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index d2a6c4c..07699ec 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
 
     chip.chip_id = KVM_IRQCHIP_IOAPIC;
     kioapic = &chip.chip.ioapic;
-
+    QEMU_LOG("gonglei: kvm_ioapic_put\n");
     kioapic->id = s->id;
     kioapic->ioregsel = s->ioregsel;
     kioapic->base_address = s->busdev.mmio[0].addr;
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 042e960..110be20 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
     if (info->post_load) {
         info->post_load(s);
     }
+    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
     return 0;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 92c1427..d074c0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
     const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
     const char *portName = "charchannel1";
     size_t uRet;
-
+    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
     ret = qemu_loadvm_state(f);
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
diff --git a/qemu-timer.c b/qemu-timer.c
index 5741f0d..dd36bc9 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
     QEMUTimerList *tl;
     bool old = clock->enabled;
     clock->enabled = enabled;
+    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
     if (enabled && !old) {
         qemu_clock_notify(type);
     } else if (!enabled && old) {
@@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
     bool progress = false;
     QEMUTimerCB *cb;
     void *opaque;
+    static int count = 10;
 
     qemu_event_reset(&timer_list->timers_done_ev);
     if (!timer_list->clock->enabled) {
@@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
             qemu_mutex_unlock(&timer_list->active_timers_lock);
             break;
         }
-
+	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
+	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
+             ts->expire_time, current_time);
+	}
         /* remove timer from the list before calling the callback */
         timer_list->active_timers = ts->next;
         ts->next = NULL;
diff --git a/vl.c b/vl.c
index 06f34fe..824b246 100644
--- a/vl.c
+++ b/vl.c
@@ -978,7 +978,7 @@ void vm_start(void)
 {
     RunState requested;
     int64_t start_time, end_time;
-
+    QEMU_LOG("gonglei: enter vm_start\n");
     qemu_vmstop_requested(&requested);
     if (runstate_is_running() && requested == RUN_STATE_MAX) {
         return;


Can we wait looply the migration process finished when EAGAIN?

Regards,
-Gonglei


> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Friday, August 05, 2016 5:14 PM
> To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> Subject: [Question] virtio-serial misses irq delivery on migration?
> 
> Hi Paolo , Jan, Amit
> 
> Recently we encountered a problem that the virtio-serial can't work after
> Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> 
> It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> because
> the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> interrupt
> can't be injected to VM because the irq line bit had been set to 1, but the
> frontend
> driver never handle it or never know it.
> 
>  Bug 867366 - virtio-serial misses irq delivery on migration
> https://bugzilla.redhat.com/show_bug.cgi?id=867366
> 
> But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> post_load
> send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> propagate
> guest_connected to the port on post_load" are applied.
> 
> I noticed that Paolo posted another problem maybe have a pertential problem
> about
> apic in Comment 23. But this patch
>  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> haven't merged into qemu master.
> 
> Would you give me some clues please? Thanks!
> 
> Regards,
> -Gonglei
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
  2016-08-06 10:19 ` [Qemu-devel] " Gonglei (Arei)
  (?)
@ 2016-08-08 12:13 ` Dr. David Alan Gilbert
  2016-08-08 13:05     ` Gonglei (Arei)
  -1 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2016-08-08 12:13 UTC (permalink / raw)
  To: Gonglei (Arei)
  Cc: Paolo Bonzini, jan.kiszka, amit.shah, quintela, Huangweidong (C),
	qemu-devel, kvm

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi all,
> 
> I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
> before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.
> 
> In actually, we can't attach our initial intention under a special situation:
> 
> 1. the process of migration destination is in a coroutine
>  Commit 82a4da79fd6
> 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
>  if the Qemu encounter a EAGIN while reading QEMUFile.
>  commit 595ab64169b
> 
> static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
>                                  size_t size)
> {
>     QEMUFileSocket *s = opaque;
>     ssize_t len;
> 
>     for (;;) {
>         len = qemu_recv(s->fd, buf, size, 0);
>         if (len != -1) {
>             break;
>         }
>         if (socket_error() == EAGAIN) {
>             yield_until_fd_readable(s->fd);
>         } else if (socket_error() != EINTR) {
>             break;
>         }
>     }
> 
>     if (len == -1) {
>         len = -socket_error();
>     }
>     return len;
> }
> 
> 3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
>   after invoking fetch_active_ports_list().

I don't understand the details of this interrupt injection, or why the timer
will run if we're still not finished migration; it doesn't sound right that a QEMU_CLOCK_VIRTUAL
timer should trigger while we're still receiving the VM and the guest is paused.

However, would it be fixed by using a vm_change_state_handler like ui/spice-core.c does?

Dave


> 4. reproduce the problem by fault injection.
> 
> The debug logs:
> 
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
> [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co()    // Enter corountine
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
> [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug **********    // fault injection begin, migration coroutine yiled cpu
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592    //vm_clock timer boom
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb   //calling the timer callback
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state   //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started 
> [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start            //enter vm_start here
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961
> 
> 
> The below is my debugging diff (Not based on the newest master branch)
> 
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index 6a45af9..a00b02f 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
>      if (!s->post_load) {
>          return;
>      }
> +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
>      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
>          port = s->post_load->connected[i].port;
>          host_connected = s->post_load->connected[i].host_connected;
> @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
>               */
>              send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
>                                 port->host_connected);
> +            QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
>          }
>          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
>          if (vsc->set_guest_connected) {
> @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
>          }
>      }
>      timer_mod(s->post_load->timer, 1);
> +    QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
> +    yield_until_fd_readable(qemu_get_fd(f));
>      return 0;
>  }
>  
> @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>      if (version_id > 3) {
>          return -EINVAL;
>      }
> -
> +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
>      /* The virtio device */
>      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
>  }
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index 5b47056..2d988cb 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
>  {
>      APICCommonState *s = APIC_COMMON(dev);
>      int i;
> -
> +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
>      memset(kapic, 0, sizeof(*kapic));
>      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
>      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index d2a6c4c..07699ec 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
>  
>      chip.chip_id = KVM_IRQCHIP_IOAPIC;
>      kioapic = &chip.chip.ioapic;
> -
> +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
>      kioapic->id = s->id;
>      kioapic->ioregsel = s->ioregsel;
>      kioapic->base_address = s->busdev.mmio[0].addr;
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index 042e960..110be20 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
>      return 0;
>  }
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 92c1427..d074c0a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
>      const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
>      const char *portName = "charchannel1";
>      size_t uRet;
> -
> +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
>      ret = qemu_loadvm_state(f);
>      qemu_fclose(f);
>      free_xbzrle_decoded_buf();
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 5741f0d..dd36bc9 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
>      QEMUTimerList *tl;
>      bool old = clock->enabled;
>      clock->enabled = enabled;
> +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
>      if (enabled && !old) {
>          qemu_clock_notify(type);
>      } else if (!enabled && old) {
> @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
>      bool progress = false;
>      QEMUTimerCB *cb;
>      void *opaque;
> +    static int count = 10;
>  
>      qemu_event_reset(&timer_list->timers_done_ev);
>      if (!timer_list->clock->enabled) {
> @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
>              qemu_mutex_unlock(&timer_list->active_timers_lock);
>              break;
>          }
> -
> +	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> +	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> +             ts->expire_time, current_time);
> +	}
>          /* remove timer from the list before calling the callback */
>          timer_list->active_timers = ts->next;
>          ts->next = NULL;
> diff --git a/vl.c b/vl.c
> index 06f34fe..824b246 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -978,7 +978,7 @@ void vm_start(void)
>  {
>      RunState requested;
>      int64_t start_time, end_time;
> -
> +    QEMU_LOG("gonglei: enter vm_start\n");
>      qemu_vmstop_requested(&requested);
>      if (runstate_is_running() && requested == RUN_STATE_MAX) {
>          return;
> 
> 
> Can we wait looply the migration process finished when EAGAIN?
> 
> Regards,
> -Gonglei
> 
> 
> > -----Original Message-----
> > From: Gonglei (Arei)
> > Sent: Friday, August 05, 2016 5:14 PM
> > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > Subject: [Question] virtio-serial misses irq delivery on migration?
> > 
> > Hi Paolo , Jan, Amit
> > 
> > Recently we encountered a problem that the virtio-serial can't work after
> > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > 
> > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > because
> > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > interrupt
> > can't be injected to VM because the irq line bit had been set to 1, but the
> > frontend
> > driver never handle it or never know it.
> > 
> >  Bug 867366 - virtio-serial misses irq delivery on migration
> > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > 
> > But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> > post_load
> > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > propagate
> > guest_connected to the port on post_load" are applied.
> > 
> > I noticed that Paolo posted another problem maybe have a pertential problem
> > about
> > apic in Comment 23. But this patch
> >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > haven't merged into qemu master.
> > 
> > Would you give me some clues please? Thanks!
> > 
> > Regards,
> > -Gonglei
> > 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
  2016-08-08 12:13 ` Dr. David Alan Gilbert
@ 2016-08-08 13:05     ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 13:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Paolo Bonzini, jan.kiszka, amit.shah, quintela, Huangweidong (C),
	qemu-devel, kvm

Hi Dave,


> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Monday, August 08, 2016 8:13 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> migration?
> 
> * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > Hi all,
> >
> > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> vcpus are started
> > before we invoke send_control_event(),which queue a message in the virtio
> ring and trigger a irq.
> >
> > In actually, we can't attach our initial intention under a special situation:
> >
> > 1. the process of migration destination is in a coroutine
> >  Commit 82a4da79fd6
> > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> >  if the Qemu encounter a EAGIN while reading QEMUFile.
> >  commit 595ab64169b
> >
> > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> >                                  size_t size)
> > {
> >     QEMUFileSocket *s = opaque;
> >     ssize_t len;
> >
> >     for (;;) {
> >         len = qemu_recv(s->fd, buf, size, 0);
> >         if (len != -1) {
> >             break;
> >         }
> >         if (socket_error() == EAGAIN) {
> >             yield_until_fd_readable(s->fd);
> >         } else if (socket_error() != EINTR) {
> >             break;
> >         }
> >     }
> >
> >     if (len == -1) {
> >         len = -socket_error();
> >     }
> >     return len;
> > }
> >
> > 3. The main thread can get the cpu and timer will run, asumes that we get
> EAGIN
> >   after invoking fetch_active_ports_list().
> 
> I don't understand the details of this interrupt injection, or why the timer
> will run if we're still not finished migration; it doesn't sound right that a
> QEMU_CLOCK_VIRTUAL
> timer should trigger while we're still receiving the VM and the guest is paused.
> 

Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
You can see the above step 2)

For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()

static ssize_t channel_get_buffer(void *opaque,
                                  uint8_t *buf,
                                  int64_t pos,
                                  size_t size)
{
    QIOChannel *ioc = QIO_CHANNEL(opaque);
    ssize_t ret;

    do {
        ret = qio_channel_read(ioc, (char *)buf, size, NULL);
        if (ret < 0) {
            if (ret == QIO_CHANNEL_ERR_BLOCK) {
                qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
            } else {
                /* XXX handle Error * object */
                return -EIO;
            }
        }
    } while (ret == QIO_CHANNEL_ERR_BLOCK);

    return ret;
}

> However, would it be fixed by using a vm_change_state_handler like
> ui/spice-core.c does?
> 

I'll check this method, thanks!

Regards,
-Gonglei

> Dave
> 
> 
> > 4. reproduce the problem by fault injection.
> >
> > The debug logs:
> >
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "xbzrle"}]}
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "rdma-pin-all"}]}
> > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> process_incoming_migration_co()    // Enter corountine
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> init virtqueue
> > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> begin to inject debug **********    // fault injection begin, migration
> coroutine yiled cpu
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> current_time: 9750771061592    //vm_clock timer boom
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> virtio_serial_post_load_timer_cb   //calling the timer callback
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 0 // send a message to the guest, raise the
> irq line.
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 64056}, "event":
> "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 0
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750770788478, current_time: 9750771061592
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> //Enter migration coroutine again, and the restore the lapic register, the
> previous lapic's info will be covered :(
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> //enter vm_start here
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> that means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> old: 1
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750772061592, current_time: 9750794269805
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750795284148, current_time: 9750795312685
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750796329656, current_time: 9750796369458
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750797396208, current_time: 9750797426752
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750798445707, current_time: 9750798477520
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750799496162, current_time: 9750799532183
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750800556817, current_time: 9750800588741
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750801606596, current_time: 9750801637961
> >
> >
> > The below is my debugging diff (Not based on the newest master branch)
> >
> > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > index 6a45af9..a00b02f 100644
> > --- a/hw/char/virtio-serial-bus.c
> > +++ b/hw/char/virtio-serial-bus.c
> > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >      if (!s->post_load) {
> >          return;
> >      }
> > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> >          port = s->post_load->connected[i].port;
> >          host_connected = s->post_load->connected[i].host_connected;
> > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >               */
> >              send_control_event(s, port->id,
> VIRTIO_CONSOLE_PORT_OPEN,
> >                                 port->host_connected);
> > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> port->host_connected);
> >          }
> >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> >          if (vsc->set_guest_connected) {
> > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> version_id,
> >          }
> >      }
> >      timer_mod(s->post_load->timer, 1);
> > +    QEMU_LOG("gonglei: ************** begin to inject debug
> **********\n");
> > +    yield_until_fd_readable(qemu_get_fd(f));
> >      return 0;
> >  }
> >
> > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> *opaque, int version_id)
> >      if (version_id > 3) {
> >          return -EINVAL;
> >      }
> > -
> > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> >      /* The virtio device */
> >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> >  }
> > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > index 5b47056..2d988cb 100644
> > --- a/hw/i386/kvm/apic.c
> > +++ b/hw/i386/kvm/apic.c
> > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> kvm_lapic_state *kapic)
> >  {
> >      APICCommonState *s = APIC_COMMON(dev);
> >      int i;
> > -
> > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> >      memset(kapic, 0, sizeof(*kapic));
> >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > index d2a6c4c..07699ec 100644
> > --- a/hw/i386/kvm/ioapic.c
> > +++ b/hw/i386/kvm/ioapic.c
> > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> >
> >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> >      kioapic = &chip.chip.ioapic;
> > -
> > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> >      kioapic->id = s->id;
> >      kioapic->ioregsel = s->ioregsel;
> >      kioapic->base_address = s->busdev.mmio[0].addr;
> > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > index 042e960..110be20 100644
> > --- a/hw/intc/apic_common.c
> > +++ b/hw/intc/apic_common.c
> > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> version_id)
> >      if (info->post_load) {
> >          info->post_load(s);
> >      }
> > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> >      return 0;
> >  }
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 92c1427..d074c0a 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> *opaque)
> >      const char *arpGuestCMD =
> "{\"execute\":\"guest-write-flag-arp\"}\n";
> >      const char *portName = "charchannel1";
> >      size_t uRet;
> > -
> > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> >      ret = qemu_loadvm_state(f);
> >      qemu_fclose(f);
> >      free_xbzrle_decoded_buf();
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 5741f0d..dd36bc9 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> enabled)
> >      QEMUTimerList *tl;
> >      bool old = clock->enabled;
> >      clock->enabled = enabled;
> > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> old);
> >      if (enabled && !old) {
> >          qemu_clock_notify(type);
> >      } else if (!enabled && old) {
> > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >      bool progress = false;
> >      QEMUTimerCB *cb;
> >      void *opaque;
> > +    static int count = 10;
> >
> >      qemu_event_reset(&timer_list->timers_done_ev);
> >      if (!timer_list->clock->enabled) {
> > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> >              break;
> >          }
> > -
> > +	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > +	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > +             ts->expire_time, current_time);
> > +	}
> >          /* remove timer from the list before calling the callback */
> >          timer_list->active_timers = ts->next;
> >          ts->next = NULL;
> > diff --git a/vl.c b/vl.c
> > index 06f34fe..824b246 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -978,7 +978,7 @@ void vm_start(void)
> >  {
> >      RunState requested;
> >      int64_t start_time, end_time;
> > -
> > +    QEMU_LOG("gonglei: enter vm_start\n");
> >      qemu_vmstop_requested(&requested);
> >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> >          return;
> >
> >
> > Can we wait looply the migration process finished when EAGAIN?
> >
> > Regards,
> > -Gonglei
> >
> >
> > > -----Original Message-----
> > > From: Gonglei (Arei)
> > > Sent: Friday, August 05, 2016 5:14 PM
> > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > >
> > > Hi Paolo , Jan, Amit
> > >
> > > Recently we encountered a problem that the virtio-serial can't work after
> > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > >
> > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > because
> > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > interrupt
> > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > frontend
> > > driver never handle it or never know it.
> > >
> > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > >
> > > But my qemu is the newest qemu, Both commit
> 80dcfb8532"virtio-serial-bus:
> > > post_load
> > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > propagate
> > > guest_connected to the port on post_load" are applied.
> > >
> > > I noticed that Paolo posted another problem maybe have a pertential
> problem
> > > about
> > > apic in Comment 23. But this patch
> > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > haven't merged into qemu master.
> > >
> > > Would you give me some clues please? Thanks!
> > >
> > > Regards,
> > > -Gonglei
> > >
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-08 13:05     ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 13:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Paolo Bonzini, jan.kiszka, amit.shah, quintela, Huangweidong (C),
	qemu-devel, kvm

Hi Dave,


> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Monday, August 08, 2016 8:13 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> migration?
> 
> * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > Hi all,
> >
> > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> vcpus are started
> > before we invoke send_control_event(),which queue a message in the virtio
> ring and trigger a irq.
> >
> > In actually, we can't attach our initial intention under a special situation:
> >
> > 1. the process of migration destination is in a coroutine
> >  Commit 82a4da79fd6
> > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> >  if the Qemu encounter a EAGIN while reading QEMUFile.
> >  commit 595ab64169b
> >
> > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> >                                  size_t size)
> > {
> >     QEMUFileSocket *s = opaque;
> >     ssize_t len;
> >
> >     for (;;) {
> >         len = qemu_recv(s->fd, buf, size, 0);
> >         if (len != -1) {
> >             break;
> >         }
> >         if (socket_error() == EAGAIN) {
> >             yield_until_fd_readable(s->fd);
> >         } else if (socket_error() != EINTR) {
> >             break;
> >         }
> >     }
> >
> >     if (len == -1) {
> >         len = -socket_error();
> >     }
> >     return len;
> > }
> >
> > 3. The main thread can get the cpu and timer will run, asumes that we get
> EAGIN
> >   after invoking fetch_active_ports_list().
> 
> I don't understand the details of this interrupt injection, or why the timer
> will run if we're still not finished migration; it doesn't sound right that a
> QEMU_CLOCK_VIRTUAL
> timer should trigger while we're still receiving the VM and the guest is paused.
> 

Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
You can see the above step 2)

For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()

static ssize_t channel_get_buffer(void *opaque,
                                  uint8_t *buf,
                                  int64_t pos,
                                  size_t size)
{
    QIOChannel *ioc = QIO_CHANNEL(opaque);
    ssize_t ret;

    do {
        ret = qio_channel_read(ioc, (char *)buf, size, NULL);
        if (ret < 0) {
            if (ret == QIO_CHANNEL_ERR_BLOCK) {
                qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
            } else {
                /* XXX handle Error * object */
                return -EIO;
            }
        }
    } while (ret == QIO_CHANNEL_ERR_BLOCK);

    return ret;
}

> However, would it be fixed by using a vm_change_state_handler like
> ui/spice-core.c does?
> 

I'll check this method, thanks!

Regards,
-Gonglei

> Dave
> 
> 
> > 4. reproduce the problem by fault injection.
> >
> > The debug logs:
> >
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "xbzrle"}]}
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "rdma-pin-all"}]}
> > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> process_incoming_migration_co()    // Enter corountine
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> init virtqueue
> > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> begin to inject debug **********    // fault injection begin, migration
> coroutine yiled cpu
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> current_time: 9750771061592    //vm_clock timer boom
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> virtio_serial_post_load_timer_cb   //calling the timer callback
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 0 // send a message to the guest, raise the
> irq line.
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 64056}, "event":
> "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 0
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750770788478, current_time: 9750771061592
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> //Enter migration coroutine again, and the restore the lapic register, the
> previous lapic's info will be covered :(
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> //enter vm_start here
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> that means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> old: 1
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750772061592, current_time: 9750794269805
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750795284148, current_time: 9750795312685
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750796329656, current_time: 9750796369458
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750797396208, current_time: 9750797426752
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750798445707, current_time: 9750798477520
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750799496162, current_time: 9750799532183
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750800556817, current_time: 9750800588741
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750801606596, current_time: 9750801637961
> >
> >
> > The below is my debugging diff (Not based on the newest master branch)
> >
> > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > index 6a45af9..a00b02f 100644
> > --- a/hw/char/virtio-serial-bus.c
> > +++ b/hw/char/virtio-serial-bus.c
> > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >      if (!s->post_load) {
> >          return;
> >      }
> > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> >          port = s->post_load->connected[i].port;
> >          host_connected = s->post_load->connected[i].host_connected;
> > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >               */
> >              send_control_event(s, port->id,
> VIRTIO_CONSOLE_PORT_OPEN,
> >                                 port->host_connected);
> > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> port->host_connected);
> >          }
> >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> >          if (vsc->set_guest_connected) {
> > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> version_id,
> >          }
> >      }
> >      timer_mod(s->post_load->timer, 1);
> > +    QEMU_LOG("gonglei: ************** begin to inject debug
> **********\n");
> > +    yield_until_fd_readable(qemu_get_fd(f));
> >      return 0;
> >  }
> >
> > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> *opaque, int version_id)
> >      if (version_id > 3) {
> >          return -EINVAL;
> >      }
> > -
> > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> >      /* The virtio device */
> >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> >  }
> > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > index 5b47056..2d988cb 100644
> > --- a/hw/i386/kvm/apic.c
> > +++ b/hw/i386/kvm/apic.c
> > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> kvm_lapic_state *kapic)
> >  {
> >      APICCommonState *s = APIC_COMMON(dev);
> >      int i;
> > -
> > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> >      memset(kapic, 0, sizeof(*kapic));
> >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > index d2a6c4c..07699ec 100644
> > --- a/hw/i386/kvm/ioapic.c
> > +++ b/hw/i386/kvm/ioapic.c
> > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> >
> >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> >      kioapic = &chip.chip.ioapic;
> > -
> > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> >      kioapic->id = s->id;
> >      kioapic->ioregsel = s->ioregsel;
> >      kioapic->base_address = s->busdev.mmio[0].addr;
> > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > index 042e960..110be20 100644
> > --- a/hw/intc/apic_common.c
> > +++ b/hw/intc/apic_common.c
> > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> version_id)
> >      if (info->post_load) {
> >          info->post_load(s);
> >      }
> > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> >      return 0;
> >  }
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 92c1427..d074c0a 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> *opaque)
> >      const char *arpGuestCMD =
> "{\"execute\":\"guest-write-flag-arp\"}\n";
> >      const char *portName = "charchannel1";
> >      size_t uRet;
> > -
> > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> >      ret = qemu_loadvm_state(f);
> >      qemu_fclose(f);
> >      free_xbzrle_decoded_buf();
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 5741f0d..dd36bc9 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> enabled)
> >      QEMUTimerList *tl;
> >      bool old = clock->enabled;
> >      clock->enabled = enabled;
> > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> old);
> >      if (enabled && !old) {
> >          qemu_clock_notify(type);
> >      } else if (!enabled && old) {
> > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >      bool progress = false;
> >      QEMUTimerCB *cb;
> >      void *opaque;
> > +    static int count = 10;
> >
> >      qemu_event_reset(&timer_list->timers_done_ev);
> >      if (!timer_list->clock->enabled) {
> > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> >              break;
> >          }
> > -
> > +	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > +	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > +             ts->expire_time, current_time);
> > +	}
> >          /* remove timer from the list before calling the callback */
> >          timer_list->active_timers = ts->next;
> >          ts->next = NULL;
> > diff --git a/vl.c b/vl.c
> > index 06f34fe..824b246 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -978,7 +978,7 @@ void vm_start(void)
> >  {
> >      RunState requested;
> >      int64_t start_time, end_time;
> > -
> > +    QEMU_LOG("gonglei: enter vm_start\n");
> >      qemu_vmstop_requested(&requested);
> >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> >          return;
> >
> >
> > Can we wait looply the migration process finished when EAGAIN?
> >
> > Regards,
> > -Gonglei
> >
> >
> > > -----Original Message-----
> > > From: Gonglei (Arei)
> > > Sent: Friday, August 05, 2016 5:14 PM
> > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > >
> > > Hi Paolo , Jan, Amit
> > >
> > > Recently we encountered a problem that the virtio-serial can't work after
> > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > >
> > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > because
> > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > interrupt
> > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > frontend
> > > driver never handle it or never know it.
> > >
> > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > >
> > > But my qemu is the newest qemu, Both commit
> 80dcfb8532"virtio-serial-bus:
> > > post_load
> > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > propagate
> > > guest_connected to the port on post_load" are applied.
> > >
> > > I noticed that Paolo posted another problem maybe have a pertential
> problem
> > > about
> > > apic in Comment 23. But this patch
> > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > haven't merged into qemu master.
> > >
> > > Would you give me some clues please? Thanks!
> > >
> > > Regards,
> > > -Gonglei
> > >
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
  2016-08-08 13:05     ` Gonglei (Arei)
  (?)
@ 2016-08-08 14:18     ` Dr. David Alan Gilbert
  2016-08-08 14:30         ` [Qemu-devel] " Gonglei (Arei)
  -1 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2016-08-08 14:18 UTC (permalink / raw)
  To: Gonglei (Arei)
  Cc: Paolo Bonzini, jan.kiszka, amit.shah, quintela, Huangweidong (C),
	qemu-devel, kvm

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
> 
> 
> > -----Original Message-----
> > From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> > Sent: Monday, August 08, 2016 8:13 PM
> > To: Gonglei (Arei)
> > Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> > quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> > kvm@vger.kernel.org
> > Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> > migration?
> > 
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi all,
> > >
> > > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> > vcpus are started
> > > before we invoke send_control_event(),which queue a message in the virtio
> > ring and trigger a irq.
> > >
> > > In actually, we can't attach our initial intention under a special situation:
> > >
> > > 1. the process of migration destination is in a coroutine
> > >  Commit 82a4da79fd6
> > > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > >  if the Qemu encounter a EAGIN while reading QEMUFile.
> > >  commit 595ab64169b
> > >
> > > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > >                                  size_t size)
> > > {
> > >     QEMUFileSocket *s = opaque;
> > >     ssize_t len;
> > >
> > >     for (;;) {
> > >         len = qemu_recv(s->fd, buf, size, 0);
> > >         if (len != -1) {
> > >             break;
> > >         }
> > >         if (socket_error() == EAGAIN) {
> > >             yield_until_fd_readable(s->fd);
> > >         } else if (socket_error() != EINTR) {
> > >             break;
> > >         }
> > >     }
> > >
> > >     if (len == -1) {
> > >         len = -socket_error();
> > >     }
> > >     return len;
> > > }
> > >
> > > 3. The main thread can get the cpu and timer will run, asumes that we get
> > EAGIN
> > >   after invoking fetch_active_ports_list().
> > 
> > I don't understand the details of this interrupt injection, or why the timer
> > will run if we're still not finished migration; it doesn't sound right that a
> > QEMU_CLOCK_VIRTUAL
> > timer should trigger while we're still receiving the VM and the guest is paused.
> > 
> 
> Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
> You can see the above step 2)
> 
> For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
> 
> static ssize_t channel_get_buffer(void *opaque,
>                                   uint8_t *buf,
>                                   int64_t pos,
>                                   size_t size)
> {
>     QIOChannel *ioc = QIO_CHANNEL(opaque);
>     ssize_t ret;
> 
>     do {
>         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
>         if (ret < 0) {
>             if (ret == QIO_CHANNEL_ERR_BLOCK) {
>                 qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
>             } else {
>                 /* XXX handle Error * object */
>                 return -EIO;
>             }
>         }
>     } while (ret == QIO_CHANNEL_ERR_BLOCK);
> 
>     return ret;
> }

Yes, I understand it yields; but I would have expected the timer code
not to fire virtual timers while the VM is in paused state.

> > However, would it be fixed by using a vm_change_state_handler like
> > ui/spice-core.c does?
> > 
> 
> I'll check this method, thanks!
> 

Dave

> Regards,
> -Gonglei
> 
> > Dave
> > 
> > 
> > > 4. reproduce the problem by fault injection.
> > >
> > > The debug logs:
> > >
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "xbzrle"}]}
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "rdma-pin-all"}]}
> > > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> > process_incoming_migration_co()    // Enter corountine
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> > init virtqueue
> > > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> > begin to inject debug **********    // fault injection begin, migration
> > coroutine yiled cpu
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> > current_time: 9750771061592    //vm_clock timer boom
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> > virtio_serial_post_load_timer_cb   //calling the timer callback
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 0 // send a message to the guest, raise the
> > irq line.
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 64056}, "event":
> > "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 0
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750770788478, current_time: 9750771061592
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > //Enter migration coroutine again, and the restore the lapic register, the
> > previous lapic's info will be covered :(
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> > //enter vm_start here
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> > that means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> > old: 1
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750772061592, current_time: 9750794269805
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750795284148, current_time: 9750795312685
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750796329656, current_time: 9750796369458
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750797396208, current_time: 9750797426752
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750798445707, current_time: 9750798477520
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750799496162, current_time: 9750799532183
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750800556817, current_time: 9750800588741
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750801606596, current_time: 9750801637961
> > >
> > >
> > > The below is my debugging diff (Not based on the newest master branch)
> > >
> > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > > index 6a45af9..a00b02f 100644
> > > --- a/hw/char/virtio-serial-bus.c
> > > +++ b/hw/char/virtio-serial-bus.c
> > > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >      if (!s->post_load) {
> > >          return;
> > >      }
> > > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > >          port = s->post_load->connected[i].port;
> > >          host_connected = s->post_load->connected[i].host_connected;
> > > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >               */
> > >              send_control_event(s, port->id,
> > VIRTIO_CONSOLE_PORT_OPEN,
> > >                                 port->host_connected);
> > > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> > port->host_connected);
> > >          }
> > >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > >          if (vsc->set_guest_connected) {
> > > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> > version_id,
> > >          }
> > >      }
> > >      timer_mod(s->post_load->timer, 1);
> > > +    QEMU_LOG("gonglei: ************** begin to inject debug
> > **********\n");
> > > +    yield_until_fd_readable(qemu_get_fd(f));
> > >      return 0;
> > >  }
> > >
> > > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> > *opaque, int version_id)
> > >      if (version_id > 3) {
> > >          return -EINVAL;
> > >      }
> > > -
> > > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > >      /* The virtio device */
> > >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > >  }
> > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > > index 5b47056..2d988cb 100644
> > > --- a/hw/i386/kvm/apic.c
> > > +++ b/hw/i386/kvm/apic.c
> > > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> > kvm_lapic_state *kapic)
> > >  {
> > >      APICCommonState *s = APIC_COMMON(dev);
> > >      int i;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > >      memset(kapic, 0, sizeof(*kapic));
> > >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > > index d2a6c4c..07699ec 100644
> > > --- a/hw/i386/kvm/ioapic.c
> > > +++ b/hw/i386/kvm/ioapic.c
> > > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> > >
> > >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > >      kioapic = &chip.chip.ioapic;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > >      kioapic->id = s->id;
> > >      kioapic->ioregsel = s->ioregsel;
> > >      kioapic->base_address = s->busdev.mmio[0].addr;
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 042e960..110be20 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> > version_id)
> > >      if (info->post_load) {
> > >          info->post_load(s);
> > >      }
> > > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > >      return 0;
> > >  }
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 92c1427..d074c0a 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> > *opaque)
> > >      const char *arpGuestCMD =
> > "{\"execute\":\"guest-write-flag-arp\"}\n";
> > >      const char *portName = "charchannel1";
> > >      size_t uRet;
> > > -
> > > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > >      ret = qemu_loadvm_state(f);
> > >      qemu_fclose(f);
> > >      free_xbzrle_decoded_buf();
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 5741f0d..dd36bc9 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> > enabled)
> > >      QEMUTimerList *tl;
> > >      bool old = clock->enabled;
> > >      clock->enabled = enabled;
> > > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> > old);
> > >      if (enabled && !old) {
> > >          qemu_clock_notify(type);
> > >      } else if (!enabled && old) {
> > > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >      bool progress = false;
> > >      QEMUTimerCB *cb;
> > >      void *opaque;
> > > +    static int count = 10;
> > >
> > >      qemu_event_reset(&timer_list->timers_done_ev);
> > >      if (!timer_list->clock->enabled) {
> > > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> > >              break;
> > >          }
> > > -
> > > +	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > > +	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > > +             ts->expire_time, current_time);
> > > +	}
> > >          /* remove timer from the list before calling the callback */
> > >          timer_list->active_timers = ts->next;
> > >          ts->next = NULL;
> > > diff --git a/vl.c b/vl.c
> > > index 06f34fe..824b246 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -978,7 +978,7 @@ void vm_start(void)
> > >  {
> > >      RunState requested;
> > >      int64_t start_time, end_time;
> > > -
> > > +    QEMU_LOG("gonglei: enter vm_start\n");
> > >      qemu_vmstop_requested(&requested);
> > >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > >          return;
> > >
> > >
> > > Can we wait looply the migration process finished when EAGAIN?
> > >
> > > Regards,
> > > -Gonglei
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gonglei (Arei)
> > > > Sent: Friday, August 05, 2016 5:14 PM
> > > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > > >
> > > > Hi Paolo , Jan, Amit
> > > >
> > > > Recently we encountered a problem that the virtio-serial can't work after
> > > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > > >
> > > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > > because
> > > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > > interrupt
> > > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > > frontend
> > > > driver never handle it or never know it.
> > > >
> > > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > > >
> > > > But my qemu is the newest qemu, Both commit
> > 80dcfb8532"virtio-serial-bus:
> > > > post_load
> > > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > > propagate
> > > > guest_connected to the port on post_load" are applied.
> > > >
> > > > I noticed that Paolo posted another problem maybe have a pertential
> > problem
> > > > about
> > > > apic in Comment 23. But this patch
> > > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > > haven't merged into qemu master.
> > > >
> > > > Would you give me some clues please? Thanks!
> > > >
> > > > Regards,
> > > > -Gonglei
> > > >
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] virtio-serial misses irq delivery on migration?
  2016-08-08 14:18     ` Dr. David Alan Gilbert
@ 2016-08-08 14:30         ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 14:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Huangweidong (C),
	kvm, Juan Quintela, Jan Kiszka, qemu-devel, Amit Shah,
	Paolo Bonzini

That's because the virtual vm clock is enabled at initialization. It doesn't need to wait for invoking vm_start() to fire.
发件人:Dr. David Alan Gilbert
收件人:龚磊,
抄送:Paolo Bonzini,Jan Kiszka,Amit Shah,Juan Quintela,黄伟栋,qemu-devel@nongnu.org,kvm@vger.kernel.org,
时间:2016-08-08 22:18:15
主题:Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
>
>
> > -----Original Message-----
> > From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> > Sent: Monday, August 08, 2016 8:13 PM
> > To: Gonglei (Arei)
> > Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> > quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> > kvm@vger.kernel.org
> > Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> > migration?
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi all,
> > >
> > > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> > vcpus are started
> > > before we invoke send_control_event(),which queue a message in the virtio
> > ring and trigger a irq.
> > >
> > > In actually, we can't attach our initial intention under a special situation:
> > >
> > > 1. the process of migration destination is in a coroutine
> > >  Commit 82a4da79fd6
> > > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > >  if the Qemu encounter a EAGIN while reading QEMUFile.
> > >  commit 595ab64169b
> > >
> > > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > >                                  size_t size)
> > > {
> > >     QEMUFileSocket *s = opaque;
> > >     ssize_t len;
> > >
> > >     for (;;) {
> > >         len = qemu_recv(s->fd, buf, size, 0);
> > >         if (len != -1) {
> > >             break;
> > >         }
> > >         if (socket_error() == EAGAIN) {
> > >             yield_until_fd_readable(s->fd);
> > >         } else if (socket_error() != EINTR) {
> > >             break;
> > >         }
> > >     }
> > >
> > >     if (len == -1) {
> > >         len = -socket_error();
> > >     }
> > >     return len;
> > > }
> > >
> > > 3. The main thread can get the cpu and timer will run, asumes that we get
> > EAGIN
> > >   after invoking fetch_active_ports_list().
> >
> > I don't understand the details of this interrupt injection, or why the timer
> > will run if we're still not finished migration; it doesn't sound right that a
> > QEMU_CLOCK_VIRTUAL
> > timer should trigger while we're still receiving the VM and the guest is paused.
> >
>
> Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
> You can see the above step 2)
>
> For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
>
> static ssize_t channel_get_buffer(void *opaque,
>                                   uint8_t *buf,
>                                   int64_t pos,
>                                   size_t size)
> {
>     QIOChannel *ioc = QIO_CHANNEL(opaque);
>     ssize_t ret;
>
>     do {
>         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
>         if (ret < 0) {
>             if (ret == QIO_CHANNEL_ERR_BLOCK) {
>                 qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
>             } else {
>                 /* XXX handle Error * object */
>                 return -EIO;
>             }
>         }
>     } while (ret == QIO_CHANNEL_ERR_BLOCK);
>
>     return ret;
> }

Yes, I understand it yields; but I would have expected the timer code
not to fire virtual timers while the VM is in paused state.

> > However, would it be fixed by using a vm_change_state_handler like
> > ui/spice-core.c does?
> >
>
> I'll check this method, thanks!
>

Dave

> Regards,
> -Gonglei
>
> > Dave
> >
> >
> > > 4. reproduce the problem by fault injection.
> > >
> > > The debug logs:
> > >
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "xbzrle"}]}
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "rdma-pin-all"}]}
> > > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> > process_incoming_migration_co()    // Enter corountine
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> > init virtqueue
> > > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> > begin to inject debug **********    // fault injection begin, migration
> > coroutine yiled cpu
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> > current_time: 9750771061592    //vm_clock timer boom
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> > virtio_serial_post_load_timer_cb   //calling the timer callback
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 0 // send a message to the guest, raise the
> > irq line.
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 64056}, "event":
> > "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 0
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750770788478, current_time: 9750771061592
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > //Enter migration coroutine again, and the restore the lapic register, the
> > previous lapic's info will be covered :(
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> > //enter vm_start here
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> > that means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> > old: 1
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750772061592, current_time: 9750794269805
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750795284148, current_time: 9750795312685
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750796329656, current_time: 9750796369458
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750797396208, current_time: 9750797426752
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750798445707, current_time: 9750798477520
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750799496162, current_time: 9750799532183
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750800556817, current_time: 9750800588741
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750801606596, current_time: 9750801637961
> > >
> > >
> > > The below is my debugging diff (Not based on the newest master branch)
> > >
> > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > > index 6a45af9..a00b02f 100644
> > > --- a/hw/char/virtio-serial-bus.c
> > > +++ b/hw/char/virtio-serial-bus.c
> > > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >      if (!s->post_load) {
> > >          return;
> > >      }
> > > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > >          port = s->post_load->connected[i].port;
> > >          host_connected = s->post_load->connected[i].host_connected;
> > > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >               */
> > >              send_control_event(s, port->id,
> > VIRTIO_CONSOLE_PORT_OPEN,
> > >                                 port->host_connected);
> > > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> > port->host_connected);
> > >          }
> > >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > >          if (vsc->set_guest_connected) {
> > > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> > version_id,
> > >          }
> > >      }
> > >      timer_mod(s->post_load->timer, 1);
> > > +    QEMU_LOG("gonglei: ************** begin to inject debug
> > **********\n");
> > > +    yield_until_fd_readable(qemu_get_fd(f));
> > >      return 0;
> > >  }
> > >
> > > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> > *opaque, int version_id)
> > >      if (version_id > 3) {
> > >          return -EINVAL;
> > >      }
> > > -
> > > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > >      /* The virtio device */
> > >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > >  }
> > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > > index 5b47056..2d988cb 100644
> > > --- a/hw/i386/kvm/apic.c
> > > +++ b/hw/i386/kvm/apic.c
> > > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> > kvm_lapic_state *kapic)
> > >  {
> > >      APICCommonState *s = APIC_COMMON(dev);
> > >      int i;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > >      memset(kapic, 0, sizeof(*kapic));
> > >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > > index d2a6c4c..07699ec 100644
> > > --- a/hw/i386/kvm/ioapic.c
> > > +++ b/hw/i386/kvm/ioapic.c
> > > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> > >
> > >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > >      kioapic = &chip.chip.ioapic;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > >      kioapic->id = s->id;
> > >      kioapic->ioregsel = s->ioregsel;
> > >      kioapic->base_address = s->busdev.mmio[0].addr;
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 042e960..110be20 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> > version_id)
> > >      if (info->post_load) {
> > >          info->post_load(s);
> > >      }
> > > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > >      return 0;
> > >  }
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 92c1427..d074c0a 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> > *opaque)
> > >      const char *arpGuestCMD =
> > "{\"execute\":\"guest-write-flag-arp\"}\n";
> > >      const char *portName = "charchannel1";
> > >      size_t uRet;
> > > -
> > > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > >      ret = qemu_loadvm_state(f);
> > >      qemu_fclose(f);
> > >      free_xbzrle_decoded_buf();
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 5741f0d..dd36bc9 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> > enabled)
> > >      QEMUTimerList *tl;
> > >      bool old = clock->enabled;
> > >      clock->enabled = enabled;
> > > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> > old);
> > >      if (enabled && !old) {
> > >          qemu_clock_notify(type);
> > >      } else if (!enabled && old) {
> > > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >      bool progress = false;
> > >      QEMUTimerCB *cb;
> > >      void *opaque;
> > > +    static int count = 10;
> > >
> > >      qemu_event_reset(&timer_list->timers_done_ev);
> > >      if (!timer_list->clock->enabled) {
> > > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> > >              break;
> > >          }
> > > -
> > > + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > > +     QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > > +             ts->expire_time, current_time);
> > > + }
> > >          /* remove timer from the list before calling the callback */
> > >          timer_list->active_timers = ts->next;
> > >          ts->next = NULL;
> > > diff --git a/vl.c b/vl.c
> > > index 06f34fe..824b246 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -978,7 +978,7 @@ void vm_start(void)
> > >  {
> > >      RunState requested;
> > >      int64_t start_time, end_time;
> > > -
> > > +    QEMU_LOG("gonglei: enter vm_start\n");
> > >      qemu_vmstop_requested(&requested);
> > >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > >          return;
> > >
> > >
> > > Can we wait looply the migration process finished when EAGAIN?
> > >
> > > Regards,
> > > -Gonglei
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gonglei (Arei)
> > > > Sent: Friday, August 05, 2016 5:14 PM
> > > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > > >
> > > > Hi Paolo , Jan, Amit
> > > >
> > > > Recently we encountered a problem that the virtio-serial can't work after
> > > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > > >
> > > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > > because
> > > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > > interrupt
> > > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > > frontend
> > > > driver never handle it or never know it.
> > > >
> > > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > > >
> > > > But my qemu is the newest qemu, Both commit
> > 80dcfb8532"virtio-serial-bus:
> > > > post_load
> > > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > > propagate
> > > > guest_connected to the port on post_load" are applied.
> > > >
> > > > I noticed that Paolo posted another problem maybe have a pertential
> > problem
> > > > about
> > > > apic in Comment 23. But this patch
> > > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > > haven't merged into qemu master.
> > > >
> > > > Would you give me some clues please? Thanks!
> > > >
> > > > Regards,
> > > > -Gonglei
> > > >
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-08 14:30         ` Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 14:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Paolo Bonzini, Jan Kiszka, Amit Shah, Juan Quintela,
	Huangweidong (C),
	qemu-devel, kvm

That's because the virtual vm clock is enabled at initialization. It doesn't need to wait for invoking vm_start() to fire.
发件人:Dr. David Alan Gilbert
收件人:龚磊,
抄送:Paolo Bonzini,Jan Kiszka,Amit Shah,Juan Quintela,黄伟栋,qemu-devel@nongnu.org,kvm@vger.kernel.org,
时间:2016-08-08 22:18:15
主题:Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
>
>
> > -----Original Message-----
> > From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> > Sent: Monday, August 08, 2016 8:13 PM
> > To: Gonglei (Arei)
> > Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> > quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> > kvm@vger.kernel.org
> > Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> > migration?
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi all,
> > >
> > > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> > vcpus are started
> > > before we invoke send_control_event(),which queue a message in the virtio
> > ring and trigger a irq.
> > >
> > > In actually, we can't attach our initial intention under a special situation:
> > >
> > > 1. the process of migration destination is in a coroutine
> > >  Commit 82a4da79fd6
> > > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > >  if the Qemu encounter a EAGIN while reading QEMUFile.
> > >  commit 595ab64169b
> > >
> > > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > >                                  size_t size)
> > > {
> > >     QEMUFileSocket *s = opaque;
> > >     ssize_t len;
> > >
> > >     for (;;) {
> > >         len = qemu_recv(s->fd, buf, size, 0);
> > >         if (len != -1) {
> > >             break;
> > >         }
> > >         if (socket_error() == EAGAIN) {
> > >             yield_until_fd_readable(s->fd);
> > >         } else if (socket_error() != EINTR) {
> > >             break;
> > >         }
> > >     }
> > >
> > >     if (len == -1) {
> > >         len = -socket_error();
> > >     }
> > >     return len;
> > > }
> > >
> > > 3. The main thread can get the cpu and timer will run, asumes that we get
> > EAGIN
> > >   after invoking fetch_active_ports_list().
> >
> > I don't understand the details of this interrupt injection, or why the timer
> > will run if we're still not finished migration; it doesn't sound right that a
> > QEMU_CLOCK_VIRTUAL
> > timer should trigger while we're still receiving the VM and the guest is paused.
> >
>
> Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
> You can see the above step 2)
>
> For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
>
> static ssize_t channel_get_buffer(void *opaque,
>                                   uint8_t *buf,
>                                   int64_t pos,
>                                   size_t size)
> {
>     QIOChannel *ioc = QIO_CHANNEL(opaque);
>     ssize_t ret;
>
>     do {
>         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
>         if (ret < 0) {
>             if (ret == QIO_CHANNEL_ERR_BLOCK) {
>                 qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
>             } else {
>                 /* XXX handle Error * object */
>                 return -EIO;
>             }
>         }
>     } while (ret == QIO_CHANNEL_ERR_BLOCK);
>
>     return ret;
> }

Yes, I understand it yields; but I would have expected the timer code
not to fire virtual timers while the VM is in paused state.

> > However, would it be fixed by using a vm_change_state_handler like
> > ui/spice-core.c does?
> >
>
> I'll check this method, thanks!
>

Dave

> Regards,
> -Gonglei
>
> > Dave
> >
> >
> > > 4. reproduce the problem by fault injection.
> > >
> > > The debug logs:
> > >
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "xbzrle"}]}
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "rdma-pin-all"}]}
> > > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> > process_incoming_migration_co()    // Enter corountine
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> > init virtqueue
> > > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> > begin to inject debug **********    // fault injection begin, migration
> > coroutine yiled cpu
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> > current_time: 9750771061592    //vm_clock timer boom
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> > virtio_serial_post_load_timer_cb   //calling the timer callback
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 0 // send a message to the guest, raise the
> > irq line.
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 64056}, "event":
> > "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 0
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750770788478, current_time: 9750771061592
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > //Enter migration coroutine again, and the restore the lapic register, the
> > previous lapic's info will be covered :(
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> > //enter vm_start here
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> > that means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> > old: 1
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750772061592, current_time: 9750794269805
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750795284148, current_time: 9750795312685
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750796329656, current_time: 9750796369458
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750797396208, current_time: 9750797426752
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750798445707, current_time: 9750798477520
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750799496162, current_time: 9750799532183
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750800556817, current_time: 9750800588741
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750801606596, current_time: 9750801637961
> > >
> > >
> > > The below is my debugging diff (Not based on the newest master branch)
> > >
> > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > > index 6a45af9..a00b02f 100644
> > > --- a/hw/char/virtio-serial-bus.c
> > > +++ b/hw/char/virtio-serial-bus.c
> > > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >      if (!s->post_load) {
> > >          return;
> > >      }
> > > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > >          port = s->post_load->connected[i].port;
> > >          host_connected = s->post_load->connected[i].host_connected;
> > > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > >               */
> > >              send_control_event(s, port->id,
> > VIRTIO_CONSOLE_PORT_OPEN,
> > >                                 port->host_connected);
> > > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> > port->host_connected);
> > >          }
> > >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > >          if (vsc->set_guest_connected) {
> > > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> > version_id,
> > >          }
> > >      }
> > >      timer_mod(s->post_load->timer, 1);
> > > +    QEMU_LOG("gonglei: ************** begin to inject debug
> > **********\n");
> > > +    yield_until_fd_readable(qemu_get_fd(f));
> > >      return 0;
> > >  }
> > >
> > > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> > *opaque, int version_id)
> > >      if (version_id > 3) {
> > >          return -EINVAL;
> > >      }
> > > -
> > > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > >      /* The virtio device */
> > >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > >  }
> > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > > index 5b47056..2d988cb 100644
> > > --- a/hw/i386/kvm/apic.c
> > > +++ b/hw/i386/kvm/apic.c
> > > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> > kvm_lapic_state *kapic)
> > >  {
> > >      APICCommonState *s = APIC_COMMON(dev);
> > >      int i;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > >      memset(kapic, 0, sizeof(*kapic));
> > >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > > index d2a6c4c..07699ec 100644
> > > --- a/hw/i386/kvm/ioapic.c
> > > +++ b/hw/i386/kvm/ioapic.c
> > > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> > >
> > >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > >      kioapic = &chip.chip.ioapic;
> > > -
> > > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > >      kioapic->id = s->id;
> > >      kioapic->ioregsel = s->ioregsel;
> > >      kioapic->base_address = s->busdev.mmio[0].addr;
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 042e960..110be20 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> > version_id)
> > >      if (info->post_load) {
> > >          info->post_load(s);
> > >      }
> > > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > >      return 0;
> > >  }
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 92c1427..d074c0a 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> > *opaque)
> > >      const char *arpGuestCMD =
> > "{\"execute\":\"guest-write-flag-arp\"}\n";
> > >      const char *portName = "charchannel1";
> > >      size_t uRet;
> > > -
> > > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > >      ret = qemu_loadvm_state(f);
> > >      qemu_fclose(f);
> > >      free_xbzrle_decoded_buf();
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 5741f0d..dd36bc9 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> > enabled)
> > >      QEMUTimerList *tl;
> > >      bool old = clock->enabled;
> > >      clock->enabled = enabled;
> > > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> > old);
> > >      if (enabled && !old) {
> > >          qemu_clock_notify(type);
> > >      } else if (!enabled && old) {
> > > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >      bool progress = false;
> > >      QEMUTimerCB *cb;
> > >      void *opaque;
> > > +    static int count = 10;
> > >
> > >      qemu_event_reset(&timer_list->timers_done_ev);
> > >      if (!timer_list->clock->enabled) {
> > > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> > >              break;
> > >          }
> > > -
> > > + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > > +     QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > > +             ts->expire_time, current_time);
> > > + }
> > >          /* remove timer from the list before calling the callback */
> > >          timer_list->active_timers = ts->next;
> > >          ts->next = NULL;
> > > diff --git a/vl.c b/vl.c
> > > index 06f34fe..824b246 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -978,7 +978,7 @@ void vm_start(void)
> > >  {
> > >      RunState requested;
> > >      int64_t start_time, end_time;
> > > -
> > > +    QEMU_LOG("gonglei: enter vm_start\n");
> > >      qemu_vmstop_requested(&requested);
> > >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > >          return;
> > >
> > >
> > > Can we wait looply the migration process finished when EAGAIN?
> > >
> > > Regards,
> > > -Gonglei
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gonglei (Arei)
> > > > Sent: Friday, August 05, 2016 5:14 PM
> > > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > > >
> > > > Hi Paolo , Jan, Amit
> > > >
> > > > Recently we encountered a problem that the virtio-serial can't work after
> > > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > > >
> > > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > > because
> > > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > > interrupt
> > > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > > frontend
> > > > driver never handle it or never know it.
> > > >
> > > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > > >
> > > > But my qemu is the newest qemu, Both commit
> > 80dcfb8532"virtio-serial-bus:
> > > > post_load
> > > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > > propagate
> > > > guest_connected to the port on post_load" are applied.
> > > >
> > > > I noticed that Paolo posted another problem maybe have a pertential
> > problem
> > > > about
> > > > apic in Comment 23. But this patch
> > > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > > haven't merged into qemu master.
> > > >
> > > > Would you give me some clues please? Thanks!
> > > >
> > > > Regards,
> > > > -Gonglei
> > > >
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-08  9:23 Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-08  9:23 UTC (permalink / raw)
  To: Gonglei (Arei), Paolo Bonzini, jan.kiszka, amit.shah, quintela
  Cc: qemu-devel, Huangweidong (C)

Hello Paolo,

Any ideas? Thanks!


Regards,
-Gonglei


> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Saturday, August 06, 2016 6:20 PM
> To: Gonglei (Arei); 'Paolo Bonzini'; 'jan.kiszka@siemens.com';
> 'amit.shah@redhat.com'; 'quintela@redhat.com'
> Cc: 'qemu-devel@nongnu.org'; 'kvm@vger.kernel.org'; Huangweidong (C)
> Subject: RE: [Question] virtio-serial misses irq delivery on migration?
> 
> Hi all,
> 
> I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> vcpus are started
> before we invoke send_control_event(),which queue a message in the virtio
> ring and trigger a irq.
> 
> In actually, we can't attach our initial intention under a special situation:
> 
> 1. the process of migration destination is in a coroutine
>  Commit 82a4da79fd6
> 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
>  if the Qemu encounter a EAGIN while reading QEMUFile.
>  commit 595ab64169b
> 
> static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
>                                  size_t size)
> {
>     QEMUFileSocket *s = opaque;
>     ssize_t len;
> 
>     for (;;) {
>         len = qemu_recv(s->fd, buf, size, 0);
>         if (len != -1) {
>             break;
>         }
>         if (socket_error() == EAGAIN) {
>             yield_until_fd_readable(s->fd);
>         } else if (socket_error() != EINTR) {
>             break;
>         }
>     }
> 
>     if (len == -1) {
>         len = -socket_error();
>     }
>     return len;
> }
> 
> 3. The main thread can get the cpu and timer will run, asumes that we get
> EAGIN
>   after invoking fetch_active_ports_list().
> 4. reproduce the problem by fault injection.
> 
> The debug logs:
> 
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "xbzrle"}]}
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "rdma-pin-all"}]}
> [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> process_incoming_migration_co()    // Enter corountine
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init
> virtqueue
> [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> begin to inject debug **********    // fault injection begin, migration
> coroutine yiled cpu
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> current_time: 9750771061592    //vm_clock timer boom
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> virtio_serial_post_load_timer_cb   //calling the timer callback
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control
> message event = 6, value = 0 // send a message to the guest, raise the irq line.
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 64056}, "event":
> "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control
> message event = 6, value = 0
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750770788478, current_time: 9750771061592
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> //Enter migration coroutine again, and the restore the lapic register, the
> previous lapic's info will be covered :(
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> //enter vm_start here
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that
> means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> means DRIVER OK
> [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old:
> 1
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750772061592, current_time: 9750794269805
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control
> message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750795284148, current_time: 9750795312685
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control
> message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750796329656, current_time: 9750796369458
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750797396208, current_time: 9750797426752
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750798445707, current_time: 9750798477520
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750799496162, current_time: 9750799532183
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750800556817, current_time: 9750800588741
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750801606596, current_time: 9750801637961
> 
> 
> The below is my debugging diff (Not based on the newest master branch)
> 
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index 6a45af9..a00b02f 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
>      if (!s->post_load) {
>          return;
>      }
> +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
>      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
>          port = s->post_load->connected[i].port;
>          host_connected = s->post_load->connected[i].host_connected;
> @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
>               */
>              send_control_event(s, port->id,
> VIRTIO_CONSOLE_PORT_OPEN,
>                                 port->host_connected);
> +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> port->host_connected);
>          }
>          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
>          if (vsc->set_guest_connected) {
> @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> version_id,
>          }
>      }
>      timer_mod(s->post_load->timer, 1);
> +    QEMU_LOG("gonglei: ************** begin to inject debug
> **********\n");
> +    yield_until_fd_readable(qemu_get_fd(f));
>      return 0;
>  }
> 
> @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque,
> int version_id)
>      if (version_id > 3) {
>          return -EINVAL;
>      }
> -
> +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
>      /* The virtio device */
>      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
>  }
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index 5b47056..2d988cb 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> kvm_lapic_state *kapic)
>  {
>      APICCommonState *s = APIC_COMMON(dev);
>      int i;
> -
> +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
>      memset(kapic, 0, sizeof(*kapic));
>      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
>      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index d2a6c4c..07699ec 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> 
>      chip.chip_id = KVM_IRQCHIP_IOAPIC;
>      kioapic = &chip.chip.ioapic;
> -
> +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
>      kioapic->id = s->id;
>      kioapic->ioregsel = s->ioregsel;
>      kioapic->base_address = s->busdev.mmio[0].addr;
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index 042e960..110be20 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
>      return 0;
>  }
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 92c1427..d074c0a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> *opaque)
>      const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
>      const char *portName = "charchannel1";
>      size_t uRet;
> -
> +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
>      ret = qemu_loadvm_state(f);
>      qemu_fclose(f);
>      free_xbzrle_decoded_buf();
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 5741f0d..dd36bc9 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> enabled)
>      QEMUTimerList *tl;
>      bool old = clock->enabled;
>      clock->enabled = enabled;
> +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> old);
>      if (enabled && !old) {
>          qemu_clock_notify(type);
>      } else if (!enabled && old) {
> @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
>      bool progress = false;
>      QEMUTimerCB *cb;
>      void *opaque;
> +    static int count = 10;
> 
>      qemu_event_reset(&timer_list->timers_done_ev);
>      if (!timer_list->clock->enabled) {
> @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
>              qemu_mutex_unlock(&timer_list->active_timers_lock);
>              break;
>          }
> -
> +	if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> +	    QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> +             ts->expire_time, current_time);
> +	}
>          /* remove timer from the list before calling the callback */
>          timer_list->active_timers = ts->next;
>          ts->next = NULL;
> diff --git a/vl.c b/vl.c
> index 06f34fe..824b246 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -978,7 +978,7 @@ void vm_start(void)
>  {
>      RunState requested;
>      int64_t start_time, end_time;
> -
> +    QEMU_LOG("gonglei: enter vm_start\n");
>      qemu_vmstop_requested(&requested);
>      if (runstate_is_running() && requested == RUN_STATE_MAX) {
>          return;
> 
> 
> Can we wait looply the migration process finished when EAGAIN?
> 
> Regards,
> -Gonglei
> 
> 
> > -----Original Message-----
> > From: Gonglei (Arei)
> > Sent: Friday, August 05, 2016 5:14 PM
> > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > Subject: [Question] virtio-serial misses irq delivery on migration?
> >
> > Hi Paolo , Jan, Amit
> >
> > Recently we encountered a problem that the virtio-serial can't work after
> > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> >
> > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > because
> > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > interrupt
> > can't be injected to VM because the irq line bit had been set to 1, but the
> > frontend
> > driver never handle it or never know it.
> >
> >  Bug 867366 - virtio-serial misses irq delivery on migration
> > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> >
> > But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> > post_load
> > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > propagate
> > guest_connected to the port on post_load" are applied.
> >
> > I noticed that Paolo posted another problem maybe have a pertential
> problem
> > about
> > apic in Comment 23. But this patch
> >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > haven't merged into qemu master.
> >
> > Would you give me some clues please? Thanks!
> >
> > Regards,
> > -Gonglei
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-05  9:14 Gonglei (Arei)
  0 siblings, 0 replies; 10+ messages in thread
From: Gonglei (Arei) @ 2016-08-05  9:14 UTC (permalink / raw)
  To: Paolo Bonzini, jan.kiszka, amit.shah; +Cc: qemu-devel, kvm, Huangweidong (C)

Hi Paolo , Jan, Amit

Recently we encountered a problem that the virtio-serial can't work after
Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(

It's phenomenon was much like BZ 867366, the usb-table mouse didn't work because
the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new interrupt
can't be injected to VM because the irq line bit had been set to 1, but the frontend
driver never handle it or never know it.

 Bug 867366 - virtio-serial misses irq delivery on migration
https://bugzilla.redhat.com/show_bug.cgi?id=867366

But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus: post_load
send_event when vm is running" and commit bc6b815d9e " virtio-serial: propagate 
guest_connected to the port on post_load" are applied.

I noticed that Paolo posted another problem maybe have a pertential problem about
apic in Comment 23. But this patch
 https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
haven't merged into qemu master.

Would you give me some clues please? Thanks!

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-08 14:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-06 10:19 [Question] virtio-serial misses irq delivery on migration? Gonglei (Arei)
2016-08-06 10:19 ` [Qemu-devel] " Gonglei (Arei)
2016-08-08 12:13 ` Dr. David Alan Gilbert
2016-08-08 13:05   ` Gonglei (Arei)
2016-08-08 13:05     ` Gonglei (Arei)
2016-08-08 14:18     ` Dr. David Alan Gilbert
2016-08-08 14:30       ` Gonglei (Arei)
2016-08-08 14:30         ` [Qemu-devel] " Gonglei (Arei)
  -- strict thread matches above, loose matches on Subject: below --
2016-08-08  9:23 Gonglei (Arei)
2016-08-05  9:14 Gonglei (Arei)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.