From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [BUG/RFC] vhost: net: big endian viring access despite virtio 1 Date: Thu, 26 Jan 2017 21:20:03 +0200 Message-ID: <20170126211511-mutt-send-email-mst@kernel.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jason Wang , kvm@vger.kernel.org, Greg Kurz , "virtualization@lists.linux-foundation.org" , netdev@vger.kernel.org, Cornelia Huck To: Halil Pasic Return-path: Received: from mx1.redhat.com ([209.132.183.28]:49990 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752938AbdAZTUI (ORCPT ); Thu, 26 Jan 2017 14:20:08 -0500 Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote: > > Hi! > > Recently I have been investigating some strange migration problems on > s390x. > > It turned out under certain circumstances vhost_net corrupts avail.idx by > using wrong endianness. > > I managed to track the problem down (I'm pretty sure). It boils down to > the following. > > When stopping vhost userspace (QEMU) calls vhost_net_set_backend with > the fd argument set to -1, this leads to is_le being reset in > vhost_vq_init_access. On a BE system resetting to legacy means resetting > to BE. Usually this is not a problem, but in the case when oldubufs is not > zero (which is not likely if no network stress applied) it is a problem. > That code path needs to write avail.idx, and ends up using wrong > endianness when doing that (but only on a BE system). > > That is the story in prose, now let's see the corresponding code annotated > with some comments. > > from drivers/vhost/net.c: > static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) > { > /* [..] some not too interesting stuff */ > sock = get_socket(fd); > /* fd == -1 --> sock == NULL */ > if (IS_ERR(sock)) { > r = PTR_ERR(sock); > goto err_vq; > } > > /* start polling new socket */ > oldsock = vq->private_data; > if (sock != oldsock) { > ubufs = vhost_net_ubuf_alloc(vq, > sock && vhost_sock_zcopy(sock)); > if (IS_ERR(ubufs)) { > r = PTR_ERR(ubufs); > goto err_ubufs; > } > > vhost_net_disable_vq(n, vq); > ==> vq->private_data = sock; > /* now vq->private_data is NULL */ > ==> r = vhost_vq_init_access(vq); > if (r) > goto err_used; > /* vq endianness has been reset to BE on s390 */ > r = vhost_net_enable_vq(n, vq); > if (r) > goto err_used; > > ==> oldubufs = nvq->ubufs; > /* here oldubufs might become != 0 */ > nvq->ubufs = ubufs; > > n->tx_packets = 0; > n->tx_zcopy_err = 0; > n->tx_flush = false; > } > mutex_unlock(&vq->mutex); > > if (oldubufs) { > vhost_net_ubuf_put_wait_and_free(oldubufs); > mutex_lock(&vq->mutex); > ==> vhost_zerocopy_signal_used(n, vq); > /* tries to update virtqueue structures; endianness is BE on s390 > * now (but should be LE for virtio-1) */ > mutex_unlock(&vq->mutex); > } > /*[..] rest of the function */ > } > > from drivers/vhost/vhost.c: > > int vhost_vq_init_access(struct vhost_virtqueue *vq) > { > __virtio16 last_used_idx; > int r; > bool is_le = vq->is_le; > > if (!vq->private_data) { > ==> vhost_reset_is_le(vq); > /* resets to native endianness and returns */ > return 0; > } > > ==> vhost_init_is_le(vq); > /* here we init is_le */ > > r = vhost_update_used_flags(vq); > if (r) > goto err; > vq->signalled_used_valid = false; > if (!vq->iotlb && > !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) { > r = -EFAULT; > goto err; > } > r = vhost_get_user(vq, last_used_idx, &vq->used->idx); > if (r) { > vq_err(vq, "Can't access used idx at %p\n", > &vq->used->idx); > goto err; > } > vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); > return 0; > > err: > vq->is_le = is_le; > return r; > } > > AFAIU this can be fixed very simply by omitting the reset. Below the > patch. I'm not sure though, the endianness handling ain't simple in > vhost. Am I going in the right direction? > > We have a test (on s390x only) running for a couple of hours now and so > far so good but it's a bit early to announce it is tested for s390x. > If the patch is reasonable, I'm intend to do a version with proper > attribution and stuff. > > By the way the reset was first introduced by > https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons > for the reset were discussed -- but no enlightenment for me). > > Finally I would like to credit Dave Gilbert for hinting that the > unreasonable avail.idx may be due to an endianness problem and Michael A. > Tebolt for reporting the bug and testing. > > -------------------------8<-------------- > >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 > From: Halil Pasic > Date: Thu, 26 Jan 2017 00:06:15 +0100 > Subject: [PATCH] vhost: remove useless/dangerous reset of is_le > > The reset of is_le does no good, but it contributes its fair share to a > bug in vhost_net, which occurs if we have some oldubufs when stopping and > setting a fd = -1 as a backend. Instead of doing something convoluted in > vhost_net, let's just get rid of the reset. > > Signed-off-by: Halil Pasic > Fixes: commit 2751c9882b94 > --- > drivers/vhost/vhost.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index d643260..08072a2 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) > int r; > bool is_le = vq->is_le; > > - if (!vq->private_data) { > - vhost_reset_is_le(vq); > + if (!vq->private_data) > return 0; > - } > > vhost_init_is_le(vq); I think you do need to reset it, just maybe within vhost_init_is_le. if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) vq->is_le = true; else vhost_reset_is_le(vq); > -- > 2.8.4