From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH net-next V4 5/5] vhost: access vq metadata through kernel virtual address
Date: Wed, 23 Jan 2019 23:53:47 -0500 [thread overview]
Message-ID: <20190123235219-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <4521d3d8-561e-53f5-98e1-bf7ace003701@redhat.com>
On Thu, Jan 24, 2019 at 12:11:28PM +0800, Jason Wang wrote:
>
> On 2019/1/24 下午12:07, Jason Wang wrote:
> >
> > On 2019/1/23 下午10:08, Michael S. Tsirkin wrote:
> > > On Wed, Jan 23, 2019 at 05:55:57PM +0800, Jason Wang wrote:
> > > > It was noticed that the copy_user() friends that was used to access
> > > > virtqueue metdata tends to be very expensive for dataplane
> > > > implementation like vhost since it involves lots of software checks,
> > > > speculation barrier, hardware feature toggling (e.g SMAP). The
> > > > extra cost will be more obvious when transferring small packets since
> > > > the time spent on metadata accessing become more significant.
> > > >
> > > > This patch tries to eliminate those overheads by accessing them
> > > > through kernel virtual address by vmap(). To make the pages can be
> > > > migrated, instead of pinning them through GUP, we use MMU notifiers to
> > > > invalidate vmaps and re-establish vmaps during each round of metadata
> > > > prefetching if necessary. For devices that doesn't use metadata
> > > > prefetching, the memory accessors fallback to normal copy_user()
> > > > implementation gracefully. The invalidation was synchronized with
> > > > datapath through vq mutex, and in order to avoid hold vq mutex during
> > > > range checking, MMU notifier was teared down when trying to modify vq
> > > > metadata.
> > > >
> > > > Another thing is kernel lacks efficient solution for tracking dirty
> > > > pages by vmap(), this will lead issues if vhost is using file backed
> > > > memory which needs care of writeback. This patch solves this issue by
> > > > just skipping the vma that is file backed and fallback to normal
> > > > copy_user() friends. This might introduce some overheads for file
> > > > backed users but consider this use case is rare we could do
> > > > optimizations on top.
> > > >
> > > > Note that this was only done when device IOTLB is not enabled. We
> > > > could use similar method to optimize it in the future.
> > > >
> > > > Tests shows at most about 22% improvement on TX PPS when using
> > > > virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:
> > > >
> > > > SMAP on | SMAP off
> > > > Before: 5.0Mpps | 6.6Mpps
> > > > After: 6.1Mpps | 7.4Mpps
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > So this is the bulk of the change.
> > > Threee things that I need to look into
> > > - Are there any security issues with bypassing the speculation barrier
> > > that is normally present after access_ok?
> >
> >
> > If we can make sure the bypassing was only used in a kthread (vhost), it
> > should be fine I think.
> >
> >
> > > - How hard does the special handling for
> > > file backed storage make testing?
> >
> >
> > It's as simple as un-commenting vhost_can_vmap()? Or I can try to hack
> > qemu or dpdk to test this.
> >
> >
> > > On the one hand we could add a module parameter to
> > > force copy to/from user. on the other that's
> > > another configuration we need to support.
> >
> >
> > That sounds sub-optimal since it leave the choice to users.
> >
> >
> > > But iotlb is not using vmap, so maybe that's enough
> > > for testing.
> > > - How hard is it to figure out which mode uses which code.
>
>
> It's as simple as tracing __get_user() usage in vhost process?
>
> Thanks
Well there are now mtu notifiers etc etc. It's hardly as well
contained as that.
>
> > >
> > >
> > >
> > > Meanwhile, could you pls post data comparing this last patch with the
> > > below? This removes the speculation barrier replacing it with a
> > > (useless but at least more lightweight) data dependency.
> >
> >
> > SMAP off
> >
> > Your patch: 7.2MPPs
> >
> > vmap: 7.4Mpps
> >
> > I don't test SMAP on, since it will be much slow for sure.
> >
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index bac939af8dbb..352ee7e14476 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -739,7 +739,7 @@ static int vhost_copy_to_user(struct
> > > vhost_virtqueue *vq, void __user *to,
> > > int ret;
> > > if (!vq->iotlb)
> > > - return __copy_to_user(to, from, size);
> > > + return copy_to_user(to, from, size);
> > > else {
> > > /* This function should be called after iotlb
> > > * prefetch, which means we're sure that all vq
> > > @@ -752,7 +752,7 @@ static int vhost_copy_to_user(struct
> > > vhost_virtqueue *vq, void __user *to,
> > > VHOST_ADDR_USED);
> > > if (uaddr)
> > > - return __copy_to_user(uaddr, from, size);
> > > + return copy_to_user(uaddr, from, size);
> > > ret = translate_desc(vq, (u64)(uintptr_t)to, size,
> > > vq->iotlb_iov,
> > > ARRAY_SIZE(vq->iotlb_iov),
> > > @@ -774,7 +774,7 @@ static int vhost_copy_from_user(struct
> > > vhost_virtqueue *vq, void *to,
> > > int ret;
> > > if (!vq->iotlb)
> > > - return __copy_from_user(to, from, size);
> > > + return copy_from_user(to, from, size);
> > > else {
> > > /* This function should be called after iotlb
> > > * prefetch, which means we're sure that vq
> > > @@ -787,7 +787,7 @@ static int vhost_copy_from_user(struct
> > > vhost_virtqueue *vq, void *to,
> > > struct iov_iter f;
> > > if (uaddr)
> > > - return __copy_from_user(to, uaddr, size);
> > > + return copy_from_user(to, uaddr, size);
> > > ret = translate_desc(vq, (u64)(uintptr_t)from, size,
> > > vq->iotlb_iov,
> > > ARRAY_SIZE(vq->iotlb_iov),
> > > @@ -855,13 +855,13 @@ static inline void __user
> > > *__vhost_get_user(struct vhost_virtqueue *vq,
> > > ({ \
> > > int ret = -EFAULT; \
> > > if (!vq->iotlb) { \
> > > - ret = __put_user(x, ptr); \
> > > + ret = put_user(x, ptr); \
> > > } else { \
> > > __typeof__(ptr) to = \
> > > (__typeof__(ptr)) __vhost_get_user(vq, ptr, \
> > > sizeof(*ptr), VHOST_ADDR_USED); \
> > > if (to != NULL) \
> > > - ret = __put_user(x, to); \
> > > + ret = put_user(x, to); \
> > > else \
> > > ret = -EFAULT; \
> > > } \
> > > @@ -872,14 +872,14 @@ static inline void __user
> > > *__vhost_get_user(struct vhost_virtqueue *vq,
> > > ({ \
> > > int ret; \
> > > if (!vq->iotlb) { \
> > > - ret = __get_user(x, ptr); \
> > > + ret = get_user(x, ptr); \
> > > } else { \
> > > __typeof__(ptr) from = \
> > > (__typeof__(ptr)) __vhost_get_user(vq, ptr, \
> > > sizeof(*ptr), \
> > > type); \
> > > if (from != NULL) \
> > > - ret = __get_user(x, from); \
> > > + ret = get_user(x, from); \
> > > else \
> > > ret = -EFAULT; \
> > > } \
next prev parent reply other threads:[~2019-01-24 4:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-23 9:55 [PATCH net-next V4 0/5] vhost: accelerate metadata access through vmap() Jason Wang
2019-01-23 9:55 ` [PATCH net-next V4 1/5] vhost: generalize adding used elem Jason Wang
2019-01-23 9:55 ` [PATCH net-next V4 2/5] vhost: fine grain userspace memory accessors Jason Wang
2019-01-23 9:55 ` [PATCH net-next V4 3/5] vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() Jason Wang
2019-01-23 9:55 ` [PATCH net-next V4 4/5] vhost: introduce helpers to get the size of metadata area Jason Wang
2019-01-23 9:55 ` [PATCH net-next V4 5/5] vhost: access vq metadata through kernel virtual address Jason Wang
2019-01-23 14:08 ` Michael S. Tsirkin
2019-01-24 4:07 ` Jason Wang
2019-01-24 4:11 ` Jason Wang
2019-01-24 4:53 ` Michael S. Tsirkin [this message]
2019-01-25 2:33 ` Jason Wang
2019-01-24 4:51 ` Michael S. Tsirkin
2019-01-25 3:00 ` Michael S. Tsirkin
2019-01-25 9:16 ` Jason Wang
2019-01-25 3:03 ` Michael S. Tsirkin
2019-01-25 9:21 ` Jason Wang
2019-01-25 9:24 ` Jason Wang
2019-01-23 13:58 ` [PATCH net-next V4 0/5] vhost: accelerate metadata access through vmap() Michael S. Tsirkin
2019-01-23 17:24 ` David Miller
2019-01-26 22:37 ` David Miller
2019-01-27 0:31 ` Michael S. Tsirkin
2019-01-29 2:34 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190123235219-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).