All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
@ 2014-09-02 13:12 Mark Cave-Ayland
  2014-09-03 10:41 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Cave-Ayland @ 2014-09-02 13:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Waldemar Brodkorb

Hi all,

I've had a couple of reports of virtio hangs with qemu-system-sparc64 
from QEMU 2.1 and I've recently been given access to one of the systems 
in question for testing since I've been unable to reproduce it locally 
myself.

After some initial investigations, it seems that the following factors 
are involved:


1) qemu-system-sparc64 is started with a virtio-enabled kernel and a 
virtio block device with the following command line:

qemu-system-sparc64 -M sun4u -nographic -net nic,model=virtio -net user 
-drive file=qemu-sparc64.img,if=virtio,index=0 -kernel 
qemu-sparc64-archive-kernel

I've been unable to reproduce the hang with the standard cmd646 IDE 
interface.

2) The host system is generally under high load at the time (the 
particular system I'm looking at seems to run periodic build scripts 
which make the system fairly unresponsive at times)

The test case involves extracting a large .tar.gz file on the virtio 
device repeatedly until the point at which it hangs.

3) The hang appears to be random in that whilst extracting the large 
.tar.gz file, the console output stops at different positions each time.


I can immediately think of 2 possibilities: the first is that possibly 
something is happening to the -nographic console since extracting a 
large .tar.gz file generates a lot of output; however the second report 
I've had of this was just a freeze during the Debian installer which 
makes me think this is less likely.

The second possibly is something related to virtio, which seems more 
likely since if I restart qemu-system-sparc64 with the same image after 
a hang then I see reports of bad/missing inodes on the console which 
implies that something has gone wrong writing to the disk.

Fortunately I can reproduce the issue with a debug-enabled build of 
qemu-system-sparc64, and I've posted a backtrace obtained during the 
hung state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't 
see anything too obvious, other than that the ppoll() could possibly be 
waiting for a bad file descriptor?


Many thanks,

Mark.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
  2014-09-02 13:12 [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1 Mark Cave-Ayland
@ 2014-09-03 10:41 ` Stefan Hajnoczi
  2014-09-03 17:38   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-09-03 10:41 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Waldemar Brodkorb, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Tue, Sep 02, 2014 at 02:12:45PM +0100, Mark Cave-Ayland wrote:
> Fortunately I can reproduce the issue with a debug-enabled build of
> qemu-system-sparc64, and I've posted a backtrace obtained during the hung
> state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't see
> anything too obvious, other than that the ppoll() could possibly be waiting
> for a bad file descriptor?

The backtrace looks like a normal QEMU run.  Nothing obvious there.

This suggests the QEMU monitor is still operational and the guest is
still executing code.

Does the I/O time out inside the guest?  Normally messages are printed
in dmesg if I/O requests are pending for too long.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
  2014-09-03 10:41 ` Stefan Hajnoczi
@ 2014-09-03 17:38   ` Aneesh Kumar K.V
  2014-09-03 21:21     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-03 17:38 UTC (permalink / raw)
  To: Stefan Hajnoczi, Mark Cave-Ayland, Alexey Kardashevskiy, Alexander Graf
  Cc: Waldemar Brodkorb, qemu-devel

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Tue, Sep 02, 2014 at 02:12:45PM +0100, Mark Cave-Ayland wrote:
>> Fortunately I can reproduce the issue with a debug-enabled build of
>> qemu-system-sparc64, and I've posted a backtrace obtained during the hung
>> state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't see
>> anything too obvious, other than that the ppoll() could possibly be waiting
>> for a bad file descriptor?
>
> The backtrace looks like a normal QEMU run.  Nothing obvious there.
>
> This suggests the QEMU monitor is still operational and the guest is
> still executing code.
>
> Does the I/O time out inside the guest?  Normally messages are printed
> in dmesg if I/O requests are pending for too long.

I am also finding this on ppc64. The system hang even with virtio-9p, so
it may not be virtio blk related. git bisect is complicated because few
commits in between cause other boot failures with ppc64.

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
  2014-09-03 17:38   ` Aneesh Kumar K.V
@ 2014-09-03 21:21     ` Aneesh Kumar K.V
  2014-09-03 22:20       ` Alexander Graf
  0 siblings, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-03 21:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Mark Cave-Ayland, Alexey Kardashevskiy, Alexander Graf
  Cc: Waldemar Brodkorb, qemu-devel

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> On Tue, Sep 02, 2014 at 02:12:45PM +0100, Mark Cave-Ayland wrote:
>>> Fortunately I can reproduce the issue with a debug-enabled build of
>>> qemu-system-sparc64, and I've posted a backtrace obtained during the hung
>>> state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't see
>>> anything too obvious, other than that the ppoll() could possibly be waiting
>>> for a bad file descriptor?
>>
>> The backtrace looks like a normal QEMU run.  Nothing obvious there.
>>
>> This suggests the QEMU monitor is still operational and the guest is
>> still executing code.
>>
>> Does the I/O time out inside the guest?  Normally messages are printed
>> in dmesg if I/O requests are pending for too long.
>
> I am also finding this on ppc64. The system hang even with virtio-9p, so
> it may not be virtio blk related. git bisect is complicated because few
> commits in between cause other boot failures with ppc64.

The bad commit is:

git show cc943c36faa192cd4b32af8fe5edb31894017d35
commit cc943c36faa192cd4b32af8fe5edb31894017d35
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Sun Jul 27 09:08:29 2014 +0200

    pci: Use bus master address space for delivering MSI/MSI-X messages


git bisect log
git bisect start
# good: [5149e557d786ab83748588c9b1b13b43c81af6ab] Open 2.1 development tree
git bisect good 5149e557d786ab83748588c9b1b13b43c81af6ab
# bad: [246ad8e69be03d2774401adf63d7e1da8df166ac] hw/9pfs: Don't return type from host in readdir on local 9p filesystem
git bisect bad 246ad8e69be03d2774401adf63d7e1da8df166ac
# good: [d1a721ab816d1b954c0988aafdec4e109b953a9f] target-ppc: Add POWER8's TIR SPR
git bisect good d1a721ab816d1b954c0988aafdec4e109b953a9f
# good: [85d1277e668106294d134a101729c6f36289da1a] virtio: move common virtio properties to bus class device
git bisect good 85d1277e668106294d134a101729c6f36289da1a
# good: [86298845e127365e8a5b7419a5ee9039bbd1837f] qtest: Adding qtest_memset and qmemset.
git bisect good 86298845e127365e8a5b7419a5ee9039bbd1837f
# bad: [33886ebeec0c0ff6253a49253fae0db44c9ed0f3] Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
git bisect bad 33886ebeec0c0ff6253a49253fae0db44c9ed0f3
# bad: [8e6e2c2ae7a81f625cf1cb320891d5270e277548] Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
git bisect bad 8e6e2c2ae7a81f625cf1cb320891d5270e277548
# good: [39ba3bf69c4ef4d8a8b683ee7282efd25b3f01ff] qcow2: fix new_blocks double-free in alloc_refcount_block()
git bisect good 39ba3bf69c4ef4d8a8b683ee7282efd25b3f01ff
# good: [5edbdbcdf882e4220adc7dbf433351077cd1fbbc] ivshmem: check the value returned by fstat()
git bisect good 5edbdbcdf882e4220adc7dbf433351077cd1fbbc
# bad: [f2c85a2f36f57f155cda7bc9f7c42b44f1a2439e] Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
git bisect bad f2c85a2f36f57f155cda7bc9f7c42b44f1a2439e
# bad: [988eba0f681bd4f82e9e02998da8106f165ed82c] pc-dimm: fix up error message
git bisect bad 988eba0f681bd4f82e9e02998da8106f165ed82c
# bad: [d209c7440a642ba08bbb0f13e22390460d3661ed] hw/audio/intel-hda: Fix MSI capability address
git bisect bad d209c7440a642ba08bbb0f13e22390460d3661ed
# bad: [cc943c36faa192cd4b32af8fe5edb31894017d35] pci: Use bus master address space for delivering MSI/MSI-X messages
git bisect bad cc943c36faa192cd4b32af8fe5edb31894017d35
# good: [2d591ce2aeebf9620ff527c7946844a3122afeec] Merge remote-tracking branch 'remotes/mdroth/qga-pull-2014-08-08' into staging
git bisect good 2d591ce2aeebf9620ff527c7946844a3122afeec
# first bad commit: [cc943c36faa192cd4b32af8fe5edb31894017d35] pci: Use bus master address space for delivering MSI/MSI-X messages
-bash-4.2# 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
  2014-09-03 21:21     ` Aneesh Kumar K.V
@ 2014-09-03 22:20       ` Alexander Graf
  2014-09-03 23:13         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2014-09-03 22:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Stefan Hajnoczi, Mark Cave-Ayland,
	Alexey Kardashevskiy
  Cc: Waldemar Brodkorb, qemu-devel



On 03.09.14 23:21, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>
>>> On Tue, Sep 02, 2014 at 02:12:45PM +0100, Mark Cave-Ayland wrote:
>>>> Fortunately I can reproduce the issue with a debug-enabled build of
>>>> qemu-system-sparc64, and I've posted a backtrace obtained during the hung
>>>> state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't see
>>>> anything too obvious, other than that the ppoll() could possibly be waiting
>>>> for a bad file descriptor?
>>>
>>> The backtrace looks like a normal QEMU run.  Nothing obvious there.
>>>
>>> This suggests the QEMU monitor is still operational and the guest is
>>> still executing code.
>>>
>>> Does the I/O time out inside the guest?  Normally messages are printed
>>> in dmesg if I/O requests are pending for too long.
>>
>> I am also finding this on ppc64. The system hang even with virtio-9p, so
>> it may not be virtio blk related. git bisect is complicated because few
>> commits in between cause other boot failures with ppc64.
> 
> The bad commit is:
> 
> git show cc943c36faa192cd4b32af8fe5edb31894017d35
> commit cc943c36faa192cd4b32af8fe5edb31894017d35
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date:   Sun Jul 27 09:08:29 2014 +0200
> 
>     pci: Use bus master address space for delivering MSI/MSI-X messages

Does "spapr_pci: map the MSI window in each PHB" from Greg fix this for
you on ppc64?


Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1
  2014-09-03 22:20       ` Alexander Graf
@ 2014-09-03 23:13         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-03 23:13 UTC (permalink / raw)
  To: Alexander Graf, Stefan Hajnoczi, Mark Cave-Ayland, Alexey Kardashevskiy
  Cc: Waldemar Brodkorb, qemu-devel

Alexander Graf <agraf@suse.de> writes:

> On 03.09.14 23:21, Aneesh Kumar K.V wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>> 
>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>
>>>> On Tue, Sep 02, 2014 at 02:12:45PM +0100, Mark Cave-Ayland wrote:
>>>>> Fortunately I can reproduce the issue with a debug-enabled build of
>>>>> qemu-system-sparc64, and I've posted a backtrace obtained during the hung
>>>>> state at http://www.ilande.co.uk/tmp/sparc64-gdb-bt.txt. I can't see
>>>>> anything too obvious, other than that the ppoll() could possibly be waiting
>>>>> for a bad file descriptor?
>>>>
>>>> The backtrace looks like a normal QEMU run.  Nothing obvious there.
>>>>
>>>> This suggests the QEMU monitor is still operational and the guest is
>>>> still executing code.
>>>>
>>>> Does the I/O time out inside the guest?  Normally messages are printed
>>>> in dmesg if I/O requests are pending for too long.
>>>
>>> I am also finding this on ppc64. The system hang even with virtio-9p, so
>>> it may not be virtio blk related. git bisect is complicated because few
>>> commits in between cause other boot failures with ppc64.
>> 
>> The bad commit is:
>> 
>> git show cc943c36faa192cd4b32af8fe5edb31894017d35
>> commit cc943c36faa192cd4b32af8fe5edb31894017d35
>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>> Date:   Sun Jul 27 09:08:29 2014 +0200
>> 
>>     pci: Use bus master address space for delivering MSI/MSI-X messages
>
> Does "spapr_pci: map the MSI window in each PHB" from Greg fix this for
> you on ppc64?
>

Yes that patch fixed the issue for me.

Thanks,
-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-09-03 23:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 13:12 [Qemu-devel] qemu-system-sparc64 hang (possibly virtio related?) with 2.1 Mark Cave-Ayland
2014-09-03 10:41 ` Stefan Hajnoczi
2014-09-03 17:38   ` Aneesh Kumar K.V
2014-09-03 21:21     ` Aneesh Kumar K.V
2014-09-03 22:20       ` Alexander Graf
2014-09-03 23:13         ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.