All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
@ 2020-07-24 13:27 Cornelia Huck
  2020-07-24 13:30 ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-24 13:27 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin, Cindy Lu; +Cc: qemu-s390x, qemu-devel

When I start qemu with a second virtio-net-ccw device (i.e. adding
-device virtio-net-ccw in addition to the autogenerated device), I get
a segfault. gdb points to

#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {

(backtrace doesn't go further)

Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
the autogenerated virtio-net-ccw device is present) works. Specifying
several "-device virtio-net-pci" works as well.

Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
works (in-between state does not compile).

This is reproducible with tcg as well. Same problem both with
--enable-vhost-vdpa and --disable-vhost-vdpa.

Have not yet tried to figure out what might be special with
virtio-ccw... anyone have an idea?

[This should probably be considered a blocker?]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-24 13:27 [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device Cornelia Huck
@ 2020-07-24 13:30 ` Michael S. Tsirkin
  2020-07-24 14:56   ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2020-07-24 13:30 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-s390x, Jason Wang, qemu-devel, Cindy Lu

On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
> When I start qemu with a second virtio-net-ccw device (i.e. adding
> -device virtio-net-ccw in addition to the autogenerated device), I get
> a segfault. gdb points to
> 
> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
>     config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> 
> (backtrace doesn't go further)
> 
> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
> the autogenerated virtio-net-ccw device is present) works. Specifying
> several "-device virtio-net-pci" works as well.
> 
> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
> works (in-between state does not compile).

Ouch. I didn't test all in-between states :(
But I wish we had a 0-day instrastructure like kernel has,
that catches things like that.

> This is reproducible with tcg as well. Same problem both with
> --enable-vhost-vdpa and --disable-vhost-vdpa.
> 
> Have not yet tried to figure out what might be special with
> virtio-ccw... anyone have an idea?
> 
> [This should probably be considered a blocker?]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-24 13:30 ` Michael S. Tsirkin
@ 2020-07-24 14:56   ` Cornelia Huck
  2020-07-24 15:17     ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-24 14:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-s390x, Jason Wang, qemu-devel, Cindy Lu

On Fri, 24 Jul 2020 09:30:58 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
> > When I start qemu with a second virtio-net-ccw device (i.e. adding
> > -device virtio-net-ccw in addition to the autogenerated device), I get
> > a segfault. gdb points to
> > 
> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
> >     config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> > 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > 
> > (backtrace doesn't go further)

The core was incomplete, but running under gdb directly shows that it
is just a bog-standard config space access (first for that device).

The cause of the crash is that nc->peer is not set... no idea how that
can happen, not that familiar with that part of QEMU. (Should the code
check, or is that really something that should not happen?)

What I don't understand is why it is set correctly for the first,
autogenerated virtio-net-ccw device, but not for the second one, and
why virtio-net-pci doesn't show these problems. The only difference
between -ccw and -pci that comes to my mind here is that config space
accesses for ccw are done via an asynchronous operation, so timing
might be different.

> > 
> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
> > the autogenerated virtio-net-ccw device is present) works. Specifying
> > several "-device virtio-net-pci" works as well.
> > 
> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
> > works (in-between state does not compile).  
> 
> Ouch. I didn't test all in-between states :(
> But I wish we had a 0-day instrastructure like kernel has,
> that catches things like that.

Yep, that would be useful... so patchew only builds the complete series?

> 
> > This is reproducible with tcg as well. Same problem both with
> > --enable-vhost-vdpa and --disable-vhost-vdpa.
> > 
> > Have not yet tried to figure out what might be special with
> > virtio-ccw... anyone have an idea?
> > 
> > [This should probably be considered a blocker?]  

I think so, as it makes s390x unusable with more that one
virtio-net-ccw device, and I don't even see a workaround.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-24 14:56   ` Cornelia Huck
@ 2020-07-24 15:17     ` Michael S. Tsirkin
  2020-07-24 15:34       ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2020-07-24 15:17 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-s390x, Jason Wang, qemu-devel, Cindy Lu

On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
> On Fri, 24 Jul 2020 09:30:58 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
> > > -device virtio-net-ccw in addition to the autogenerated device), I get
> > > a segfault. gdb points to
> > > 
> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
> > >     config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> > > 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > 
> > > (backtrace doesn't go further)
> 
> The core was incomplete, but running under gdb directly shows that it
> is just a bog-standard config space access (first for that device).
> 
> The cause of the crash is that nc->peer is not set... no idea how that
> can happen, not that familiar with that part of QEMU. (Should the code
> check, or is that really something that should not happen?)
> 
> What I don't understand is why it is set correctly for the first,
> autogenerated virtio-net-ccw device, but not for the second one, and
> why virtio-net-pci doesn't show these problems. The only difference
> between -ccw and -pci that comes to my mind here is that config space
> accesses for ccw are done via an asynchronous operation, so timing
> might be different.

Hopefully Jason has an idea. Could you post a full command line
please? Do you need a working guest to trigger this? Does this trigger
on an x86 host?

> > > 
> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
> > > the autogenerated virtio-net-ccw device is present) works. Specifying
> > > several "-device virtio-net-pci" works as well.
> > > 
> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
> > > works (in-between state does not compile).  
> > 
> > Ouch. I didn't test all in-between states :(
> > But I wish we had a 0-day instrastructure like kernel has,
> > that catches things like that.
> 
> Yep, that would be useful... so patchew only builds the complete series?
> 
> > 
> > > This is reproducible with tcg as well. Same problem both with
> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
> > > 
> > > Have not yet tried to figure out what might be special with
> > > virtio-ccw... anyone have an idea?
> > > 
> > > [This should probably be considered a blocker?]  
> 
> I think so, as it makes s390x unusable with more that one
> virtio-net-ccw device, and I don't even see a workaround.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-24 15:17     ` Michael S. Tsirkin
@ 2020-07-24 15:34       ` Cornelia Huck
  2020-07-25  0:40         ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-24 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-s390x, Jason Wang, qemu-devel, Cindy Lu

On Fri, 24 Jul 2020 11:17:57 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
> > On Fri, 24 Jul 2020 09:30:58 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:  
> > > > When I start qemu with a second virtio-net-ccw device (i.e. adding
> > > > -device virtio-net-ccw in addition to the autogenerated device), I get
> > > > a segfault. gdb points to
> > > > 
> > > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
> > > >     config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> > > > 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > 
> > > > (backtrace doesn't go further)  
> > 
> > The core was incomplete, but running under gdb directly shows that it
> > is just a bog-standard config space access (first for that device).
> > 
> > The cause of the crash is that nc->peer is not set... no idea how that
> > can happen, not that familiar with that part of QEMU. (Should the code
> > check, or is that really something that should not happen?)
> > 
> > What I don't understand is why it is set correctly for the first,
> > autogenerated virtio-net-ccw device, but not for the second one, and
> > why virtio-net-pci doesn't show these problems. The only difference
> > between -ccw and -pci that comes to my mind here is that config space
> > accesses for ccw are done via an asynchronous operation, so timing
> > might be different.  
> 
> Hopefully Jason has an idea. Could you post a full command line
> please? Do you need a working guest to trigger this? Does this trigger
> on an x86 host?

Yes, it does trigger with tcg-on-x86 as well. I've been using

s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 
-device virtio-net-ccw

It seems it needs the guest actually doing something with the nics; I
cannot reproduce the crash if I use the old advent calendar moon buggy
image and just add a virtio-net-ccw device.

(I don't think it's a problem with my local build, as I see the problem
both on my laptop and on an LPAR.)

> 
> > > > 
> > > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
> > > > the autogenerated virtio-net-ccw device is present) works. Specifying
> > > > several "-device virtio-net-pci" works as well.
> > > > 
> > > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
> > > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
> > > > works (in-between state does not compile).    
> > > 
> > > Ouch. I didn't test all in-between states :(
> > > But I wish we had a 0-day instrastructure like kernel has,
> > > that catches things like that.  
> > 
> > Yep, that would be useful... so patchew only builds the complete series?
> >   
> > >   
> > > > This is reproducible with tcg as well. Same problem both with
> > > > --enable-vhost-vdpa and --disable-vhost-vdpa.
> > > > 
> > > > Have not yet tried to figure out what might be special with
> > > > virtio-ccw... anyone have an idea?
> > > > 
> > > > [This should probably be considered a blocker?]    
> > 
> > I think so, as it makes s390x unusable with more that one
> > virtio-net-ccw device, and I don't even see a workaround.  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-24 15:34       ` Cornelia Huck
@ 2020-07-25  0:40         ` Jason Wang
  2020-07-27  6:43           ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2020-07-25  0:40 UTC (permalink / raw)
  To: Cornelia Huck, Michael S. Tsirkin; +Cc: qemu-s390x, qemu-devel, Cindy Lu

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]


On 2020/7/24 下午11:34, Cornelia Huck wrote:
> On Fri, 24 Jul 2020 11:17:57 -0400
> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>
>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
>>> On Fri, 24 Jul 2020 09:30:58 -0400
>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>    
>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
>>>>> a segfault. gdb points to
>>>>>
>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
>>>>>      config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>>>>>
>>>>> (backtrace doesn't go further)
>>> The core was incomplete, but running under gdb directly shows that it
>>> is just a bog-standard config space access (first for that device).
>>>
>>> The cause of the crash is that nc->peer is not set... no idea how that
>>> can happen, not that familiar with that part of QEMU. (Should the code
>>> check, or is that really something that should not happen?)
>>>
>>> What I don't understand is why it is set correctly for the first,
>>> autogenerated virtio-net-ccw device, but not for the second one, and
>>> why virtio-net-pci doesn't show these problems. The only difference
>>> between -ccw and -pci that comes to my mind here is that config space
>>> accesses for ccw are done via an asynchronous operation, so timing
>>> might be different.
>> Hopefully Jason has an idea. Could you post a full command line
>> please? Do you need a working guest to trigger this? Does this trigger
>> on an x86 host?
> Yes, it does trigger with tcg-on-x86 as well. I've been using
>
> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
> -device virtio-net-ccw
>
> It seems it needs the guest actually doing something with the nics; I
> cannot reproduce the crash if I use the old advent calendar moon buggy
> image and just add a virtio-net-ccw device.
>
> (I don't think it's a problem with my local build, as I see the problem
> both on my laptop and on an LPAR.)


It looks to me we forget the check the existence of peer.

Please try the attached patch to see if it works.

Thanks


[-- Attachment #2: 0001-virtio-net-check-the-existence-of-peer-before-accesi.patch --]
[-- Type: text/x-patch, Size: 2822 bytes --]

From f6959056dcc65cbdc256c4af2b1a0eaee784c15f Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Sat, 25 Jul 2020 08:13:17 +0800
Subject: [PATCH] virtio-net: check the existence of peer before accesing its
 config

We try to get config from peer unconditionally which may lead NULL
pointer dereference. Add a check before trying to access the config.

Fixes: 108a64818e69b ("vhost-vdpa: introduce vhost-vdpa backend")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4895af1cbe..935b9ef5c7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -125,6 +125,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_config netcfg;
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     int ret = 0;
     memset(&netcfg, 0 , sizeof(struct virtio_net_config));
@@ -142,13 +143,12 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
                  VIRTIO_NET_RSS_SUPPORTED_HASHES);
     memcpy(config, &netcfg, n->config_size);
 
-    NetClientState *nc = qemu_get_queue(n->nic);
-    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
         ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t *)&netcfg,
-                             n->config_size);
-    if (ret != -1) {
-        memcpy(config, &netcfg, n->config_size);
-    }
+                                   n->config_size);
+        if (ret != -1) {
+            memcpy(config, &netcfg, n->config_size);
+        }
     }
 }
 
@@ -156,6 +156,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_config netcfg = {};
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     memcpy(&netcfg, config, n->config_size);
 
@@ -166,11 +167,10 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
         qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
     }
 
-    NetClientState *nc = qemu_get_queue(n->nic);
-    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-        vhost_net_set_config(get_vhost_net(nc->peer), (uint8_t *)&netcfg,
-                               0, n->config_size,
-                        VHOST_SET_CONFIG_TYPE_MASTER);
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        vhost_net_set_config(get_vhost_net(nc->peer),
+                             (uint8_t *)&netcfg, 0, n->config_size,
+                             VHOST_SET_CONFIG_TYPE_MASTER);
       }
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-25  0:40         ` Jason Wang
@ 2020-07-27  6:43           ` Cornelia Huck
  2020-07-27  7:38             ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-27  6:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-s390x, qemu-devel, Cindy Lu, Michael S. Tsirkin

On Sat, 25 Jul 2020 08:40:07 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/7/24 下午11:34, Cornelia Huck wrote:
> > On Fri, 24 Jul 2020 11:17:57 -0400
> > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >  
> >> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:  
> >>> On Fri, 24 Jul 2020 09:30:58 -0400
> >>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >>>      
> >>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:  
> >>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
> >>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
> >>>>> a segfault. gdb points to
> >>>>>
> >>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
> >>>>>      config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> >>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> >>>>>
> >>>>> (backtrace doesn't go further)  
> >>> The core was incomplete, but running under gdb directly shows that it
> >>> is just a bog-standard config space access (first for that device).
> >>>
> >>> The cause of the crash is that nc->peer is not set... no idea how that
> >>> can happen, not that familiar with that part of QEMU. (Should the code
> >>> check, or is that really something that should not happen?)
> >>>
> >>> What I don't understand is why it is set correctly for the first,
> >>> autogenerated virtio-net-ccw device, but not for the second one, and
> >>> why virtio-net-pci doesn't show these problems. The only difference
> >>> between -ccw and -pci that comes to my mind here is that config space
> >>> accesses for ccw are done via an asynchronous operation, so timing
> >>> might be different.  
> >> Hopefully Jason has an idea. Could you post a full command line
> >> please? Do you need a working guest to trigger this? Does this trigger
> >> on an x86 host?  
> > Yes, it does trigger with tcg-on-x86 as well. I've been using
> >
> > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
> > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
> > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
> > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
> > -device virtio-net-ccw
> >
> > It seems it needs the guest actually doing something with the nics; I
> > cannot reproduce the crash if I use the old advent calendar moon buggy
> > image and just add a virtio-net-ccw device.
> >
> > (I don't think it's a problem with my local build, as I see the problem
> > both on my laptop and on an LPAR.)  
> 
> 
> It looks to me we forget the check the existence of peer.
> 
> Please try the attached patch to see if it works.

Thanks, that patch gets my guest up and running again. So, FWIW,

Tested-by: Cornelia Huck <cohuck@redhat.com>

Any idea why this did not hit with virtio-net-pci (or the autogenerated
virtio-net-ccw device)?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27  6:43           ` Cornelia Huck
@ 2020-07-27  7:38             ` Jason Wang
  2020-07-27  8:41               ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2020-07-27  7:38 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-s390x, qemu-devel, Cindy Lu, Michael S. Tsirkin


On 2020/7/27 下午2:43, Cornelia Huck wrote:
> On Sat, 25 Jul 2020 08:40:07 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/7/24 下午11:34, Cornelia Huck wrote:
>>> On Fri, 24 Jul 2020 11:17:57 -0400
>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>   
>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
>>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>>>       
>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
>>>>>>> a segfault. gdb points to
>>>>>>>
>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
>>>>>>>       config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
>>>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>>>>>>>
>>>>>>> (backtrace doesn't go further)
>>>>> The core was incomplete, but running under gdb directly shows that it
>>>>> is just a bog-standard config space access (first for that device).
>>>>>
>>>>> The cause of the crash is that nc->peer is not set... no idea how that
>>>>> can happen, not that familiar with that part of QEMU. (Should the code
>>>>> check, or is that really something that should not happen?)
>>>>>
>>>>> What I don't understand is why it is set correctly for the first,
>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
>>>>> why virtio-net-pci doesn't show these problems. The only difference
>>>>> between -ccw and -pci that comes to my mind here is that config space
>>>>> accesses for ccw are done via an asynchronous operation, so timing
>>>>> might be different.
>>>> Hopefully Jason has an idea. Could you post a full command line
>>>> please? Do you need a working guest to trigger this? Does this trigger
>>>> on an x86 host?
>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
>>>
>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
>>> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>>> -device virtio-net-ccw
>>>
>>> It seems it needs the guest actually doing something with the nics; I
>>> cannot reproduce the crash if I use the old advent calendar moon buggy
>>> image and just add a virtio-net-ccw device.
>>>
>>> (I don't think it's a problem with my local build, as I see the problem
>>> both on my laptop and on an LPAR.)
>>
>> It looks to me we forget the check the existence of peer.
>>
>> Please try the attached patch to see if it works.
> Thanks, that patch gets my guest up and running again. So, FWIW,
>
> Tested-by: Cornelia Huck <cohuck@redhat.com>
>
> Any idea why this did not hit with virtio-net-pci (or the autogenerated
> virtio-net-ccw device)?


It can be hit with virtio-net-pci as well (just start without peer).

For autogenerated virtio-net-cww, I think the reason is that it has 
already had a peer set.

Thanks


>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27  7:38             ` Jason Wang
@ 2020-07-27  8:41               ` Cornelia Huck
  2020-07-27  8:51                 ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-27  8:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-s390x, qemu-devel, Cindy Lu, Michael S. Tsirkin

On Mon, 27 Jul 2020 15:38:12 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/7/27 下午2:43, Cornelia Huck wrote:
> > On Sat, 25 Jul 2020 08:40:07 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/7/24 下午11:34, Cornelia Huck wrote:  
> >>> On Fri, 24 Jul 2020 11:17:57 -0400
> >>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >>>     
> >>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:  
> >>>>> On Fri, 24 Jul 2020 09:30:58 -0400
> >>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >>>>>         
> >>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:  
> >>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
> >>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
> >>>>>>> a segfault. gdb points to
> >>>>>>>
> >>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
> >>>>>>>       config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> >>>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> >>>>>>>
> >>>>>>> (backtrace doesn't go further)  
> >>>>> The core was incomplete, but running under gdb directly shows that it
> >>>>> is just a bog-standard config space access (first for that device).
> >>>>>
> >>>>> The cause of the crash is that nc->peer is not set... no idea how that
> >>>>> can happen, not that familiar with that part of QEMU. (Should the code
> >>>>> check, or is that really something that should not happen?)
> >>>>>
> >>>>> What I don't understand is why it is set correctly for the first,
> >>>>> autogenerated virtio-net-ccw device, but not for the second one, and
> >>>>> why virtio-net-pci doesn't show these problems. The only difference
> >>>>> between -ccw and -pci that comes to my mind here is that config space
> >>>>> accesses for ccw are done via an asynchronous operation, so timing
> >>>>> might be different.  
> >>>> Hopefully Jason has an idea. Could you post a full command line
> >>>> please? Do you need a working guest to trigger this? Does this trigger
> >>>> on an x86 host?  
> >>> Yes, it does trigger with tcg-on-x86 as well. I've been using
> >>>
> >>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
> >>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
> >>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
> >>> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
> >>> -device virtio-net-ccw
> >>>
> >>> It seems it needs the guest actually doing something with the nics; I
> >>> cannot reproduce the crash if I use the old advent calendar moon buggy
> >>> image and just add a virtio-net-ccw device.
> >>>
> >>> (I don't think it's a problem with my local build, as I see the problem
> >>> both on my laptop and on an LPAR.)  
> >>
> >> It looks to me we forget the check the existence of peer.
> >>
> >> Please try the attached patch to see if it works.  
> > Thanks, that patch gets my guest up and running again. So, FWIW,
> >
> > Tested-by: Cornelia Huck <cohuck@redhat.com>
> >
> > Any idea why this did not hit with virtio-net-pci (or the autogenerated
> > virtio-net-ccw device)?  
> 
> 
> It can be hit with virtio-net-pci as well (just start without peer).

Hm, I had not been able to reproduce the crash with a 'naked' -device
virtio-net-pci. But checking seems to be the right idea anyway.

> 
> For autogenerated virtio-net-cww, I think the reason is that it has 
> already had a peer set.

Ok, that might well be.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27  8:41               ` Cornelia Huck
@ 2020-07-27  8:51                 ` Jason Wang
  2020-07-27 11:43                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2020-07-27  8:51 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-s390x, qemu-devel, Cindy Lu, Michael S. Tsirkin


On 2020/7/27 下午4:41, Cornelia Huck wrote:
> On Mon, 27 Jul 2020 15:38:12 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/7/27 下午2:43, Cornelia Huck wrote:
>>> On Sat, 25 Jul 2020 08:40:07 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2020/7/24 下午11:34, Cornelia Huck wrote:
>>>>> On Fri, 24 Jul 2020 11:17:57 -0400
>>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>>>      
>>>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
>>>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
>>>>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>>>>>          
>>>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
>>>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
>>>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
>>>>>>>>> a segfault. gdb points to
>>>>>>>>>
>>>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
>>>>>>>>>        config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
>>>>>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>>>>>>>>>
>>>>>>>>> (backtrace doesn't go further)
>>>>>>> The core was incomplete, but running under gdb directly shows that it
>>>>>>> is just a bog-standard config space access (first for that device).
>>>>>>>
>>>>>>> The cause of the crash is that nc->peer is not set... no idea how that
>>>>>>> can happen, not that familiar with that part of QEMU. (Should the code
>>>>>>> check, or is that really something that should not happen?)
>>>>>>>
>>>>>>> What I don't understand is why it is set correctly for the first,
>>>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
>>>>>>> why virtio-net-pci doesn't show these problems. The only difference
>>>>>>> between -ccw and -pci that comes to my mind here is that config space
>>>>>>> accesses for ccw are done via an asynchronous operation, so timing
>>>>>>> might be different.
>>>>>> Hopefully Jason has an idea. Could you post a full command line
>>>>>> please? Do you need a working guest to trigger this? Does this trigger
>>>>>> on an x86 host?
>>>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
>>>>>
>>>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
>>>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
>>>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
>>>>> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>>>>> -device virtio-net-ccw
>>>>>
>>>>> It seems it needs the guest actually doing something with the nics; I
>>>>> cannot reproduce the crash if I use the old advent calendar moon buggy
>>>>> image and just add a virtio-net-ccw device.
>>>>>
>>>>> (I don't think it's a problem with my local build, as I see the problem
>>>>> both on my laptop and on an LPAR.)
>>>> It looks to me we forget the check the existence of peer.
>>>>
>>>> Please try the attached patch to see if it works.
>>> Thanks, that patch gets my guest up and running again. So, FWIW,
>>>
>>> Tested-by: Cornelia Huck <cohuck@redhat.com>
>>>
>>> Any idea why this did not hit with virtio-net-pci (or the autogenerated
>>> virtio-net-ccw device)?
>>
>> It can be hit with virtio-net-pci as well (just start without peer).
> Hm, I had not been able to reproduce the crash with a 'naked' -device
> virtio-net-pci. But checking seems to be the right idea anyway.


Sorry for being unclear, I meant for networking part, you just need 
start without peer, and you need a real guest (any Linux) that is trying 
to access the config space of virtio-net.

Thanks


>
>> For autogenerated virtio-net-cww, I think the reason is that it has
>> already had a peer set.
> Ok, that might well be.
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27  8:51                 ` Jason Wang
@ 2020-07-27 11:43                   ` Michael S. Tsirkin
  2020-07-27 12:44                     ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2020-07-27 11:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-s390x, Cornelia Huck, qemu-devel, Cindy Lu

On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
> 
> On 2020/7/27 下午4:41, Cornelia Huck wrote:
> > On Mon, 27 Jul 2020 15:38:12 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> > 
> > > On 2020/7/27 下午2:43, Cornelia Huck wrote:
> > > > On Sat, 25 Jul 2020 08:40:07 +0800
> > > > Jason Wang <jasowang@redhat.com> wrote:
> > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
> > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
> > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
> > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
> > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
> > > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e. adding
> > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated device), I get
> > > > > > > > > > a segfault. gdb points to
> > > > > > > > > > 
> > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
> > > > > > > > > >        config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> > > > > > > > > > 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > > > > > > > 
> > > > > > > > > > (backtrace doesn't go further)
> > > > > > > > The core was incomplete, but running under gdb directly shows that it
> > > > > > > > is just a bog-standard config space access (first for that device).
> > > > > > > > 
> > > > > > > > The cause of the crash is that nc->peer is not set... no idea how that
> > > > > > > > can happen, not that familiar with that part of QEMU. (Should the code
> > > > > > > > check, or is that really something that should not happen?)
> > > > > > > > 
> > > > > > > > What I don't understand is why it is set correctly for the first,
> > > > > > > > autogenerated virtio-net-ccw device, but not for the second one, and
> > > > > > > > why virtio-net-pci doesn't show these problems. The only difference
> > > > > > > > between -ccw and -pci that comes to my mind here is that config space
> > > > > > > > accesses for ccw are done via an asynchronous operation, so timing
> > > > > > > > might be different.
> > > > > > > Hopefully Jason has an idea. Could you post a full command line
> > > > > > > please? Do you need a working guest to trigger this? Does this trigger
> > > > > > > on an x86 host?
> > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
> > > > > > 
> > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
> > > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
> > > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
> > > > > > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
> > > > > > -device virtio-net-ccw
> > > > > > 
> > > > > > It seems it needs the guest actually doing something with the nics; I
> > > > > > cannot reproduce the crash if I use the old advent calendar moon buggy
> > > > > > image and just add a virtio-net-ccw device.
> > > > > > 
> > > > > > (I don't think it's a problem with my local build, as I see the problem
> > > > > > both on my laptop and on an LPAR.)
> > > > > It looks to me we forget the check the existence of peer.
> > > > > 
> > > > > Please try the attached patch to see if it works.
> > > > Thanks, that patch gets my guest up and running again. So, FWIW,
> > > > 
> > > > Tested-by: Cornelia Huck <cohuck@redhat.com>
> > > > 
> > > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
> > > > virtio-net-ccw device)?
> > > 
> > > It can be hit with virtio-net-pci as well (just start without peer).
> > Hm, I had not been able to reproduce the crash with a 'naked' -device
> > virtio-net-pci. But checking seems to be the right idea anyway.
> 
> 
> Sorry for being unclear, I meant for networking part, you just need start
> without peer, and you need a real guest (any Linux) that is trying to access
> the config space of virtio-net.
> 
> Thanks

A pxe guest will do it, but that doesn't support ccw, right?

I'm still unclear why this triggers with ccw but not pci -
any idea?

> 
> > 
> > > For autogenerated virtio-net-cww, I think the reason is that it has
> > > already had a peer set.
> > Ok, that might well be.
> > 
> > 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27 11:43                   ` Michael S. Tsirkin
@ 2020-07-27 12:44                     ` Jason Wang
  2020-07-27 13:16                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2020-07-27 12:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-s390x, Cornelia Huck, qemu-devel, Cindy Lu


On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
>> On 2020/7/27 下午4:41, Cornelia Huck wrote:
>>> On Mon, 27 Jul 2020 15:38:12 +0800
>>> Jason Wang<jasowang@redhat.com>  wrote:
>>>
>>>> On 2020/7/27 下午2:43, Cornelia Huck wrote:
>>>>> On Sat, 25 Jul 2020 08:40:07 +0800
>>>>> Jason Wang<jasowang@redhat.com>  wrote:
>>>>>> On 2020/7/24 下午11:34, Cornelia Huck wrote:
>>>>>>> On Fri, 24 Jul 2020 11:17:57 -0400
>>>>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
>>>>>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
>>>>>>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
>>>>>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
>>>>>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
>>>>>>>>>>> a segfault. gdb points to
>>>>>>>>>>>
>>>>>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
>>>>>>>>>>>         config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
>>>>>>>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>>>>>>>>>>>
>>>>>>>>>>> (backtrace doesn't go further)
>>>>>>>>> The core was incomplete, but running under gdb directly shows that it
>>>>>>>>> is just a bog-standard config space access (first for that device).
>>>>>>>>>
>>>>>>>>> The cause of the crash is that nc->peer is not set... no idea how that
>>>>>>>>> can happen, not that familiar with that part of QEMU. (Should the code
>>>>>>>>> check, or is that really something that should not happen?)
>>>>>>>>>
>>>>>>>>> What I don't understand is why it is set correctly for the first,
>>>>>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
>>>>>>>>> why virtio-net-pci doesn't show these problems. The only difference
>>>>>>>>> between -ccw and -pci that comes to my mind here is that config space
>>>>>>>>> accesses for ccw are done via an asynchronous operation, so timing
>>>>>>>>> might be different.
>>>>>>>> Hopefully Jason has an idea. Could you post a full command line
>>>>>>>> please? Do you need a working guest to trigger this? Does this trigger
>>>>>>>> on an x86 host?
>>>>>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
>>>>>>>
>>>>>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
>>>>>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
>>>>>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
>>>>>>> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>>>>>>> -device virtio-net-ccw
>>>>>>>
>>>>>>> It seems it needs the guest actually doing something with the nics; I
>>>>>>> cannot reproduce the crash if I use the old advent calendar moon buggy
>>>>>>> image and just add a virtio-net-ccw device.
>>>>>>>
>>>>>>> (I don't think it's a problem with my local build, as I see the problem
>>>>>>> both on my laptop and on an LPAR.)
>>>>>> It looks to me we forget the check the existence of peer.
>>>>>>
>>>>>> Please try the attached patch to see if it works.
>>>>> Thanks, that patch gets my guest up and running again. So, FWIW,
>>>>>
>>>>> Tested-by: Cornelia Huck<cohuck@redhat.com>
>>>>>
>>>>> Any idea why this did not hit with virtio-net-pci (or the autogenerated
>>>>> virtio-net-ccw device)?
>>>> It can be hit with virtio-net-pci as well (just start without peer).
>>> Hm, I had not been able to reproduce the crash with a 'naked' -device
>>> virtio-net-pci. But checking seems to be the right idea anyway.
>> Sorry for being unclear, I meant for networking part, you just need start
>> without peer, and you need a real guest (any Linux) that is trying to access
>> the config space of virtio-net.
>>
>> Thanks
> A pxe guest will do it, but that doesn't support ccw, right?


Yes, it depends on the cli actually.


>
> I'm still unclear why this triggers with ccw but not pci -
> any idea?


I don't test pxe but I can reproduce this with pci (just start a linux 
guest without a peer).

Thanks




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27 12:44                     ` Jason Wang
@ 2020-07-27 13:16                       ` Michael S. Tsirkin
  2020-07-28  4:10                         ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2020-07-27 13:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-s390x, Cornelia Huck, qemu-devel, Cindy Lu

On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
> 
> On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
> > On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
> > > On 2020/7/27 下午4:41, Cornelia Huck wrote:
> > > > On Mon, 27 Jul 2020 15:38:12 +0800
> > > > Jason Wang<jasowang@redhat.com>  wrote:
> > > > 
> > > > > On 2020/7/27 下午2:43, Cornelia Huck wrote:
> > > > > > On Sat, 25 Jul 2020 08:40:07 +0800
> > > > > > Jason Wang<jasowang@redhat.com>  wrote:
> > > > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
> > > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
> > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
> > > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
> > > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
> > > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
> > > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
> > > > > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e. adding
> > > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated device), I get
> > > > > > > > > > > > a segfault. gdb points to
> > > > > > > > > > > > 
> > > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
> > > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
> > > > > > > > > > > > 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > > > > > > > > > 
> > > > > > > > > > > > (backtrace doesn't go further)
> > > > > > > > > > The core was incomplete, but running under gdb directly shows that it
> > > > > > > > > > is just a bog-standard config space access (first for that device).
> > > > > > > > > > 
> > > > > > > > > > The cause of the crash is that nc->peer is not set... no idea how that
> > > > > > > > > > can happen, not that familiar with that part of QEMU. (Should the code
> > > > > > > > > > check, or is that really something that should not happen?)
> > > > > > > > > > 
> > > > > > > > > > What I don't understand is why it is set correctly for the first,
> > > > > > > > > > autogenerated virtio-net-ccw device, but not for the second one, and
> > > > > > > > > > why virtio-net-pci doesn't show these problems. The only difference
> > > > > > > > > > between -ccw and -pci that comes to my mind here is that config space
> > > > > > > > > > accesses for ccw are done via an asynchronous operation, so timing
> > > > > > > > > > might be different.
> > > > > > > > > Hopefully Jason has an idea. Could you post a full command line
> > > > > > > > > please? Do you need a working guest to trigger this? Does this trigger
> > > > > > > > > on an x86 host?
> > > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
> > > > > > > > 
> > > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
> > > > > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
> > > > > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
> > > > > > > > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
> > > > > > > > -device virtio-net-ccw
> > > > > > > > 
> > > > > > > > It seems it needs the guest actually doing something with the nics; I
> > > > > > > > cannot reproduce the crash if I use the old advent calendar moon buggy
> > > > > > > > image and just add a virtio-net-ccw device.
> > > > > > > > 
> > > > > > > > (I don't think it's a problem with my local build, as I see the problem
> > > > > > > > both on my laptop and on an LPAR.)
> > > > > > > It looks to me we forget the check the existence of peer.
> > > > > > > 
> > > > > > > Please try the attached patch to see if it works.
> > > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
> > > > > > 
> > > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
> > > > > > 
> > > > > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
> > > > > > virtio-net-ccw device)?
> > > > > It can be hit with virtio-net-pci as well (just start without peer).
> > > > Hm, I had not been able to reproduce the crash with a 'naked' -device
> > > > virtio-net-pci. But checking seems to be the right idea anyway.
> > > Sorry for being unclear, I meant for networking part, you just need start
> > > without peer, and you need a real guest (any Linux) that is trying to access
> > > the config space of virtio-net.
> > > 
> > > Thanks
> > A pxe guest will do it, but that doesn't support ccw, right?
> 
> 
> Yes, it depends on the cli actually.
> 
> 
> > 
> > I'm still unclear why this triggers with ccw but not pci -
> > any idea?
> 
> 
> I don't test pxe but I can reproduce this with pci (just start a linux guest
> without a peer).
> 
> Thanks
> 

Might be a good addition to a unit test. Not sure what would the
test do exactly: just make sure guest runs? Looks like a lot of work
for an empty test ... maybe we can poke at the guest config with
qtest commands at least.

-- 
MST



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
  2020-07-27 13:16                       ` Michael S. Tsirkin
@ 2020-07-28  4:10                         ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2020-07-28  4:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-s390x, Cornelia Huck, qemu-devel, Cindy Lu


On 2020/7/27 下午9:16, Michael S. Tsirkin wrote:
> On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
>> On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
>>> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
>>>> On 2020/7/27 下午4:41, Cornelia Huck wrote:
>>>>> On Mon, 27 Jul 2020 15:38:12 +0800
>>>>> Jason Wang<jasowang@redhat.com>  wrote:
>>>>>
>>>>>> On 2020/7/27 下午2:43, Cornelia Huck wrote:
>>>>>>> On Sat, 25 Jul 2020 08:40:07 +0800
>>>>>>> Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>>> On 2020/7/24 下午11:34, Cornelia Huck wrote:
>>>>>>>>> On Fri, 24 Jul 2020 11:17:57 -0400
>>>>>>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>>>>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
>>>>>>>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
>>>>>>>>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>>>>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
>>>>>>>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
>>>>>>>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
>>>>>>>>>>>>> a segfault. gdb points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
>>>>>>>>>>>>>          config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
>>>>>>>>>>>>> 146	    if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> (backtrace doesn't go further)
>>>>>>>>>>> The core was incomplete, but running under gdb directly shows that it
>>>>>>>>>>> is just a bog-standard config space access (first for that device).
>>>>>>>>>>>
>>>>>>>>>>> The cause of the crash is that nc->peer is not set... no idea how that
>>>>>>>>>>> can happen, not that familiar with that part of QEMU. (Should the code
>>>>>>>>>>> check, or is that really something that should not happen?)
>>>>>>>>>>>
>>>>>>>>>>> What I don't understand is why it is set correctly for the first,
>>>>>>>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
>>>>>>>>>>> why virtio-net-pci doesn't show these problems. The only difference
>>>>>>>>>>> between -ccw and -pci that comes to my mind here is that config space
>>>>>>>>>>> accesses for ccw are done via an asynchronous operation, so timing
>>>>>>>>>>> might be different.
>>>>>>>>>> Hopefully Jason has an idea. Could you post a full command line
>>>>>>>>>> please? Do you need a working guest to trigger this? Does this trigger
>>>>>>>>>> on an x86 host?
>>>>>>>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
>>>>>>>>>
>>>>>>>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
>>>>>>>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
>>>>>>>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
>>>>>>>>> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>>>>>>>>> -device virtio-net-ccw
>>>>>>>>>
>>>>>>>>> It seems it needs the guest actually doing something with the nics; I
>>>>>>>>> cannot reproduce the crash if I use the old advent calendar moon buggy
>>>>>>>>> image and just add a virtio-net-ccw device.
>>>>>>>>>
>>>>>>>>> (I don't think it's a problem with my local build, as I see the problem
>>>>>>>>> both on my laptop and on an LPAR.)
>>>>>>>> It looks to me we forget the check the existence of peer.
>>>>>>>>
>>>>>>>> Please try the attached patch to see if it works.
>>>>>>> Thanks, that patch gets my guest up and running again. So, FWIW,
>>>>>>>
>>>>>>> Tested-by: Cornelia Huck<cohuck@redhat.com>
>>>>>>>
>>>>>>> Any idea why this did not hit with virtio-net-pci (or the autogenerated
>>>>>>> virtio-net-ccw device)?
>>>>>> It can be hit with virtio-net-pci as well (just start without peer).
>>>>> Hm, I had not been able to reproduce the crash with a 'naked' -device
>>>>> virtio-net-pci. But checking seems to be the right idea anyway.
>>>> Sorry for being unclear, I meant for networking part, you just need start
>>>> without peer, and you need a real guest (any Linux) that is trying to access
>>>> the config space of virtio-net.
>>>>
>>>> Thanks
>>> A pxe guest will do it, but that doesn't support ccw, right?
>>
>> Yes, it depends on the cli actually.
>>
>>
>>> I'm still unclear why this triggers with ccw but not pci -
>>> any idea?
>>
>> I don't test pxe but I can reproduce this with pci (just start a linux guest
>> without a peer).
>>
>> Thanks
>>
> Might be a good addition to a unit test. Not sure what would the
> test do exactly: just make sure guest runs? Looks like a lot of work
> for an empty test ... maybe we can poke at the guest config with
> qtest commands at least.


That should work or we can simply extend the exist virtio-net qtest to 
do that.

Thanks


>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-28  4:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 13:27 [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device Cornelia Huck
2020-07-24 13:30 ` Michael S. Tsirkin
2020-07-24 14:56   ` Cornelia Huck
2020-07-24 15:17     ` Michael S. Tsirkin
2020-07-24 15:34       ` Cornelia Huck
2020-07-25  0:40         ` Jason Wang
2020-07-27  6:43           ` Cornelia Huck
2020-07-27  7:38             ` Jason Wang
2020-07-27  8:41               ` Cornelia Huck
2020-07-27  8:51                 ` Jason Wang
2020-07-27 11:43                   ` Michael S. Tsirkin
2020-07-27 12:44                     ` Jason Wang
2020-07-27 13:16                       ` Michael S. Tsirkin
2020-07-28  4:10                         ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.