Re: [Qemu-devel] sda abort with virtio-scsi

From: "Denis V. Lunev" <den-lists@parallels.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Minter <jminter@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Hannes Reinecke <hare@suse.de>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
Date: Thu, 4 Feb 2016 14:00:24 +0300	[thread overview]
Message-ID: <56B32F48.5070804@parallels.com> (raw)
In-Reply-To: <56B326B4.1020407@redhat.com>

On 02/04/2016 01:23 PM, Paolo Bonzini wrote:
>
> On 04/02/2016 00:34, Jim Minter wrote:
>> I was worried there was
>> some way in which the contention could cause an abort and perhaps thence
>> the lockup (which does not seem to recover when the host load goes down).
> I don't know... It's not the most tested code, but it is not very
> complicated either.
>
> The certain points that can be extracted from the kernel messages are:
> 1) there was a cancellation request that took a long time, >20 seconds;
> 2) despite taking a long time, it _did_ recover sooner or later because
> otherwise you'd not have the lockup splat either.
>
> Paolo
>
>>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3
>> (qemu-kvm-1.5.3-105.el7_2.3)

my $0.02 to the account. This could be related or could be
not related.

speaking about NMI we do observe similar problems on older
AMD hosts with big enough number of VCPUs in guest.
On a simple boot we see something like this (the probability to
face the problem is around 1/10). RHEV 2.3 QEMU is used, the same
kernel is running on host node:

[   72.189005] Kernel panic - not syncing: softlockup: hung tasks
[   72.189005] CPU: 5 PID: 593 Comm: systemd-udevd Tainted: G             L ------------   3.10.0-327.4.4.el7.x86_64 #1
[   72.189005] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.vz7.2 04/01/2014
[   72.189005]  ffffffff81871a03 0000000050291887 ffff88007fd43e18 ffffffff8163515c
[   72.189005]  ffff88007fd43e98 ffffffff8162e9d7 0000000000000008 ffff88007fd43ea8
[   72.189005]  ffff88007fd43e48 0000000050291887 0000000000002710 0000000000000000
[   72.189005] Call Trace:
[   72.189005]  <IRQ>  [<ffffffff8163515c>] dump_stack+0x19/0x1b
[   72.189005]  [<ffffffff8162e9d7>] panic+0xd8/0x1e7
[   72.189005]  [<ffffffff8111b376>] watchdog_timer_fn+0x1b6/0x1c0
[   72.189005]  [<ffffffff8111b1c0>] ? watchdog_enable+0xc0/0xc0
[   72.189005]  [<ffffffff810a9d42>] __hrtimer_run_queues+0xd2/0x260
[   72.189005]  [<ffffffff810aa2e0>] hrtimer_interrupt+0xb0/0x1e0
[   72.189005]  [<ffffffff816471dc>] ? call_softirq+0x1c/0x30
[   72.189005]  [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[   72.189005]  [<ffffffff81049537>] local_apic_timer_interrupt+0x37/0x60
[   72.189005]  [<ffffffff81647e4f>] smp_apic_timer_interrupt+0x3f/0x60
[   72.189005]  [<ffffffff8164651d>] apic_timer_interrupt+0x6d/0x80
[   72.189005]  <EOI>  [<ffffffff812f27e9>] ? free_cpumask_var+0x9/0x10
[   72.189005]  [<ffffffff810e6de2>] ? smp_call_function_many+0x202/0x260
[   72.189005]  [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[   72.189005]  [<ffffffff810e6e9d>] on_each_cpu+0x2d/0x60
[   72.189005]  [<ffffffff81066119>] flush_tlb_kernel_range+0x59/0xa0
[   72.189005]  [<ffffffff811a4730>] __purge_vmap_area_lazy+0x1a0/0x210
[   72.189005]  [<ffffffff811a4927>] vm_unmap_aliases+0x187/0x1b0
[   72.189005]  [<ffffffff810620e8>] change_page_attr_set_clr+0xe8/0x4d0
[   72.189005]  [<ffffffff81131127>] ? ring_buffer_time_stamp+0x7/0x10
[   72.189005]  [<ffffffff8106299f>] set_memory_ro+0x2f/0x40
[   72.189005]  [<ffffffff8162f7c4>] set_section_ro_nx+0x3a/0x71
[   72.189005]  [<ffffffff810ed19c>] load_module+0x103c/0x1b50
[   72.189005]  [<ffffffff810e9743>] ? copy_module_from_fd.isra.42+0x53/0x150
[   72.189005]  [<ffffffff810ede66>] SyS_finit_module+0xa6/0xd0
[   72.189005]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
[   72.189005] Shutting down cpus with NMI

Timesources changes does not help. Also there is a strange stuff like this:

Having configured the serial port to point at a unix socket, and timestamping
the messages on the host side, we observe

guest timestamp | host timestamp
        0.000000 | 20:09:26.805461
        2.587056 | 20:09:30.000993
        7.607329 | 20:09:35.062367
       12.645539 | 20:09:40.057634
       22.608054 | 20:09:50.028727
       32.395499 | 20:10:00.041215
       42.571265 | 20:10:10.041960
       47.606661 | 20:10:15.028973
       48.627059 | 20:10:20.022359
       49.029059 | 20:10:25.047857
       49.399065 | 20:10:30.066884
       49.809077 | 20:10:35.036467
       58.159132 | 20:10:40.013387
       68.043371 | 20:10:40.266714

Note the anomaly around 47 seconds from boot.

anyway, this story is far from the end and we are unable to provide more 
details. Roma is digging this story. Den