All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Jintack Lim <jintack@cs.columbia.edu>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH net V2 4/4] vhost: log dirty page correctly
Date: Wed, 12 Dec 2018 09:32:45 -0500	[thread overview]
Message-ID: <20181212092435-mutt-send-email-mst__46964.7169964903$1544625046$gmane$org@kernel.org> (raw)
In-Reply-To: <20181212100819.21295-5-jasowang@redhat.com>

On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>    get HVA, for writable descriptor, get HVA through iovec. For used
>    ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>    through GPA. Pay attention this reverse mapping is not guaranteed
>    to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> Cc: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

It's a nasty bug for sure but it's been like this for a long
time so I'm inclined to say let's put it in 4.21,
and queue for stable.

So please split this out from this series.

Also, I'd like to see a feature bit that allows GPA in IOTLBs.

> ---
>  drivers/vhost/net.c   |  3 +-
>  drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
>  drivers/vhost/vhost.h |  3 +-
>  3 files changed, 69 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ad7a6f475a44..784df2b49628 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
>  		if (nvq->done_idx > VHOST_NET_BATCH)
>  			vhost_net_signal_used(nvq);
>  		if (unlikely(vq_log))
> -			vhost_log_write(vq, vq_log, log, vhost_len);
> +			vhost_log_write(vq, vq_log, log, vhost_len,
> +					vq->iov, in);
>  		total_len += vhost_len;
>  		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>  			vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 55e5aa662ad5..3660310604fd 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
>  	return r;
>  }
>  
> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> +{
> +	struct vhost_umem *umem = vq->umem;
> +	struct vhost_umem_node *u;
> +	u64 gpa;
> +	int r;
> +	bool hit = false;
> +
> +	list_for_each_entry(u, &umem->umem_list, link) {
> +		if (u->userspace_addr < hva &&
> +		    u->userspace_addr + u->size >=
> +		    hva + len) {
> +			gpa = u->start + hva - u->userspace_addr;
> +			r = log_write(vq->log_base, gpa, len);
> +			if (r < 0)
> +				return r;
> +			hit = true;
> +		}
> +	}
> +
> +	/* No reverse mapping, should be a bug */
> +	WARN_ON(!hit);

Maybe it should but userspace can trigger this easily I think.
We need to stop the device not warn in kernel log.

Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.


> +	return 0;
> +}
> +
> +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> +{
> +	struct iovec iov[64];
> +	int i, ret;
> +
> +	if (!vq->iotlb) {
> +		log_write(vq->log_base, vq->log_addr + used_offset, len);
> +		return;
> +	}

This change seems questionable. used ring writes 
use their own machinery it does not go through iotlb.
Same should apply to log I think.

> +
> +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> +			     len, iov, 64, VHOST_ACCESS_WO);
> +	WARN_ON(ret < 0);


Same thing here. translation failures can be triggered from guest.
warn on is not a good error handling strategy ...

> +
> +	for (i = 0; i < ret; i++) {
> +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
> +				    iov[i].iov_len);
> +		WARN_ON(ret);
> +	}
> +}
> +
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len)
> +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
>  {
>  	int i, r;
>  
> +	if (vq->iotlb) {
> +		for (i = 0; i < count; i++) {
> +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> +					  iov[i].iov_len);
> +			if (r < 0)
> +				return r;
> +		}
> +		return 0;
> +	}
> +
>  	/* Make sure data written is seen before log. */
>  	smp_wmb();
>  	for (i = 0; i < log_num; ++i) {
> @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  		smp_wmb();
>  		/* Log used flag write. */
>  		used = &vq->used->flags;
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof vq->used->flags);
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof vq->used->flags);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  		smp_wmb();
>  		/* Log avail event write */
>  		used = vhost_avail_event(vq);
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof *vhost_avail_event(vq));
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof *vhost_avail_event(vq));
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  		/* Make sure data is seen before log. */
>  		smp_wmb();
>  		/* Log used ring entry write. */
> -		log_write(vq->log_base,
> -			  vq->log_addr +
> -			   ((void __user *)used - (void __user *)vq->used),
> -			  count * sizeof *used);
> +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
> +			 count * sizeof *used);
>  	}
>  	old = vq->last_used_idx;
>  	new = (vq->last_used_idx += count);
> @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		/* Make sure used idx is seen before log. */
>  		smp_wmb();
>  		/* Log used index update. */
> -		log_write(vq->log_base,
> -			  vq->log_addr + offsetof(struct vring_used, idx),
> -			  sizeof vq->used->idx);
> +		log_used(vq, offsetof(struct vring_used, idx),
> +			 sizeof vq->used->idx);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 466ef7542291..1b675dad5e05 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
>  bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len);
> +		    unsigned int log_num, u64 len,
> +		    struct iovec *iov, int count);
>  int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
>  
>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
> -- 
> 2.17.1

  parent reply	other threads:[~2018-12-12 14:32 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:33   ` Michael S. Tsirkin
2018-12-12 14:33   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:20   ` Michael S. Tsirkin
2018-12-12 14:20   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
2018-12-12 14:24   ` Michael S. Tsirkin
2018-12-12 14:24   ` Michael S. Tsirkin
2018-12-13  2:27     ` Jason Wang
2018-12-13  2:27     ` Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
2018-12-12 14:32   ` Michael S. Tsirkin
2018-12-13  2:39     ` Jason Wang
2018-12-13 14:31       ` Michael S. Tsirkin
2018-12-13 14:31       ` Michael S. Tsirkin
2018-12-14  2:43         ` Jason Wang
2018-12-14 13:20           ` Michael S. Tsirkin
2018-12-14 13:20           ` Michael S. Tsirkin
2018-12-24  3:43             ` Jason Wang
2018-12-24  3:43               ` Jason Wang
2018-12-24 17:41               ` Michael S. Tsirkin
2018-12-25  9:43                 ` Jason Wang
2018-12-25 16:25                   ` Michael S. Tsirkin
2018-12-25 16:25                     ` Michael S. Tsirkin
2018-12-26  5:43                     ` Jason Wang
2018-12-26  5:43                       ` Jason Wang
2018-12-26 13:46                       ` Michael S. Tsirkin
2018-12-26 13:46                       ` Michael S. Tsirkin
2018-12-27  9:32                         ` Jason Wang
2018-12-27  9:32                         ` Jason Wang
2018-12-25  9:43                 ` Jason Wang
2018-12-24 17:41               ` Michael S. Tsirkin
2018-12-14  2:43         ` Jason Wang
2018-12-13  2:39     ` Jason Wang
2018-12-12 14:32   ` Michael S. Tsirkin [this message]
2018-12-12 10:08 ` Jason Wang
2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
2018-12-12 23:31 ` David Miller
2018-12-13  2:42   ` Jason Wang
2018-12-13  2:42   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20181212092435-mutt-send-email-mst__46964.7169964903$1544625046$gmane$org@kernel.org' \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.