[RFC,V3,1/5] vhost: generalize adding used elem
diff mbox series

Message ID 20181229124656.3900-2-jasowang@redhat.com
State New, archived
Headers show
Series
  • Hi:
Related show

Commit Message

Jason Wang Dec. 29, 2018, 12:46 p.m. UTC
Use one generic vhost_copy_to_user() instead of two dedicated
accessor. This will simplify the conversion to fine grain
accessors. About 2% improvement of PPS were seen during vitio-user
txonly test.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Comments

Michael S. Tsirkin Jan. 4, 2019, 9:29 p.m. UTC | #1
On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote:
> Use one generic vhost_copy_to_user() instead of two dedicated
> accessor. This will simplify the conversion to fine grain
> accessors. About 2% improvement of PPS were seen during vitio-user
> txonly test.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I don't hve a problem with this patch but do you have
any idea how come removing what's supposed to be
an optimization speeds things up?

> ---
>  drivers/vhost/vhost.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 55e5aa662ad5..f179b5ee14c4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2174,16 +2174,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  
>  	start = vq->last_used_idx & (vq->num - 1);
>  	used = vq->used->ring + start;
> -	if (count == 1) {
> -		if (vhost_put_user(vq, heads[0].id, &used->id)) {
> -			vq_err(vq, "Failed to write used id");
> -			return -EFAULT;
> -		}
> -		if (vhost_put_user(vq, heads[0].len, &used->len)) {
> -			vq_err(vq, "Failed to write used len");
> -			return -EFAULT;
> -		}
> -	} else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
> +	if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
>  		vq_err(vq, "Failed to write used");
>  		return -EFAULT;
>  	}
> -- 
> 2.17.1
Sean Christopherson Jan. 5, 2019, 12:33 a.m. UTC | #2
On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote:
> > Use one generic vhost_copy_to_user() instead of two dedicated
> > accessor. This will simplify the conversion to fine grain
> > accessors. About 2% improvement of PPS were seen during vitio-user
> > txonly test.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> I don't hve a problem with this patch but do you have
> any idea how come removing what's supposed to be
> an optimization speeds things up?

With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair,
which is probably slower than the overhead of CALL+RET to whatever flavor
of copy_user_generic() gets used.  CALL+RET is really the only overhead
since all variants of copy_user_generic() unroll accesses smaller than
64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8
bytes in a single MOV.

Removing the special casing also eliminates a few hundred bytes of code
as well as the need for hardware to predict count==1 vs. count>1.

> 
> > ---
> >  drivers/vhost/vhost.c | 11 +----------
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 55e5aa662ad5..f179b5ee14c4 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2174,16 +2174,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
> >  
> >  	start = vq->last_used_idx & (vq->num - 1);
> >  	used = vq->used->ring + start;
> > -	if (count == 1) {
> > -		if (vhost_put_user(vq, heads[0].id, &used->id)) {
> > -			vq_err(vq, "Failed to write used id");
> > -			return -EFAULT;
> > -		}
> > -		if (vhost_put_user(vq, heads[0].len, &used->len)) {
> > -			vq_err(vq, "Failed to write used len");
> > -			return -EFAULT;
> > -		}
> > -	} else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
> > +	if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
> >  		vq_err(vq, "Failed to write used");
> >  		return -EFAULT;
> >  	}
> > -- 
> > 2.17.1
Jason Wang Jan. 7, 2019, 7 a.m. UTC | #3
On 2019/1/5 上午8:33, Sean Christopherson wrote:
> On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote:
>> On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote:
>>> Use one generic vhost_copy_to_user() instead of two dedicated
>>> accessor. This will simplify the conversion to fine grain
>>> accessors. About 2% improvement of PPS were seen during vitio-user
>>> txonly test.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> I don't hve a problem with this patch but do you have
>> any idea how come removing what's supposed to be
>> an optimization speeds things up?
> With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair,
> which is probably slower than the overhead of CALL+RET to whatever flavor
> of copy_user_generic() gets used.  CALL+RET is really the only overhead
> since all variants of copy_user_generic() unroll accesses smaller than
> 64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8
> bytes in a single MOV.
>
> Removing the special casing also eliminates a few hundred bytes of code
> as well as the need for hardware to predict count==1 vs. count>1.
>

Yes, I don't measure, but STAC/CALC is pretty expensive when we are do 
very small copies based on the result of nosmap PPS.

Thanks
Michael S. Tsirkin Jan. 7, 2019, 2:50 p.m. UTC | #4
On Mon, Jan 07, 2019 at 03:00:17PM +0800, Jason Wang wrote:
> 
> On 2019/1/5 上午8:33, Sean Christopherson wrote:
> > On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote:
> > > On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote:
> > > > Use one generic vhost_copy_to_user() instead of two dedicated
> > > > accessor. This will simplify the conversion to fine grain
> > > > accessors. About 2% improvement of PPS were seen during vitio-user
> > > > txonly test.
> > > > 
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > I don't hve a problem with this patch but do you have
> > > any idea how come removing what's supposed to be
> > > an optimization speeds things up?
> > With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair,
> > which is probably slower than the overhead of CALL+RET to whatever flavor
> > of copy_user_generic() gets used.  CALL+RET is really the only overhead
> > since all variants of copy_user_generic() unroll accesses smaller than
> > 64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8
> > bytes in a single MOV.
> > 
> > Removing the special casing also eliminates a few hundred bytes of code
> > as well as the need for hardware to predict count==1 vs. count>1.
> > 
> 
> Yes, I don't measure, but STAC/CALC is pretty expensive when we are do very
> small copies based on the result of nosmap PPS.
> 
> Thanks

Yes all this really looks like a poster child for uaccess_begin/end
plus unsafe accesses. And if these APIs don't do the job for us
then maybe better ones are needed ...

Patch
diff mbox series

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 55e5aa662ad5..f179b5ee14c4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2174,16 +2174,7 @@  static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 
 	start = vq->last_used_idx & (vq->num - 1);
 	used = vq->used->ring + start;
-	if (count == 1) {
-		if (vhost_put_user(vq, heads[0].id, &used->id)) {
-			vq_err(vq, "Failed to write used id");
-			return -EFAULT;
-		}
-		if (vhost_put_user(vq, heads[0].len, &used->len)) {
-			vq_err(vq, "Failed to write used len");
-			return -EFAULT;
-		}
-	} else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
+	if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
 		vq_err(vq, "Failed to write used");
 		return -EFAULT;
 	}