netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] uaccess: user_access_begin_after_access_ok()
@ 2020-06-02  8:45 Michael S. Tsirkin
  2020-06-02 10:15 ` Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-02  8:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jason Wang, netdev

So vhost needs to poke at userspace *a lot* in a quick succession.  It
is thus benefitial to enable userspace access, do our thing, then
disable. Except access_ok has already been pre-validated with all the
relevant nospec checks, so we don't need that.  Add an API to allow
userspace access after access_ok and barrier_nospec are done.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Jason, so I've been thinking using something along these lines,
then switching vhost to use unsafe_copy_to_user and friends would
solve lots of problems you observed with SMAP.

What do you think? Do we need any other APIs to make it practical?

 arch/x86/include/asm/uaccess.h | 1 +
 include/linux/uaccess.h        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d8f283b9a569..fa5afb3a54fe 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -483,6 +483,7 @@ static __must_check __always_inline bool user_access_begin(const void __user *pt
 	return 1;
 }
 #define user_access_begin(a,b)	user_access_begin(a,b)
+#define user_access_begin_after_access_ok()	__uaccess_begin()
 #define user_access_end()	__uaccess_end()
 
 #define user_access_save()	smap_save()
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad..4c0a959ad639 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -370,6 +370,7 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 
 #ifndef user_access_begin
 #define user_access_begin(ptr,len) access_ok(ptr, len)
+#define user_access_begin_after_access_ok() do { } while (0)
 #define user_access_end() do { } while (0)
 #define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0)
 #define unsafe_get_user(x,p,e) unsafe_op_wrap(__get_user(x,p),e)
-- 
MST


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02  8:45 [PATCH RFC] uaccess: user_access_begin_after_access_ok() Michael S. Tsirkin
@ 2020-06-02 10:15 ` Jason Wang
  2020-06-02 16:33   ` Al Viro
  2020-06-02 16:30 ` Al Viro
  2020-06-03  1:48 ` Al Viro
  2 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-02 10:15 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: netdev


On 2020/6/2 下午4:45, Michael S. Tsirkin wrote:
> So vhost needs to poke at userspace *a lot* in a quick succession.  It
> is thus benefitial to enable userspace access, do our thing, then
> disable. Except access_ok has already been pre-validated with all the
> relevant nospec checks, so we don't need that.  Add an API to allow
> userspace access after access_ok and barrier_nospec are done.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Jason, so I've been thinking using something along these lines,
> then switching vhost to use unsafe_copy_to_user and friends would
> solve lots of problems you observed with SMAP.
>
> What do you think?


I'm fine with this approach.


>   Do we need any other APIs to make it practical?


It's not clear whether we need a new API, I think __uaccess_being() has 
the assumption that the address has been validated by access_ok().

Thanks


>
>   arch/x86/include/asm/uaccess.h | 1 +
>   include/linux/uaccess.h        | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index d8f283b9a569..fa5afb3a54fe 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -483,6 +483,7 @@ static __must_check __always_inline bool user_access_begin(const void __user *pt
>   	return 1;
>   }
>   #define user_access_begin(a,b)	user_access_begin(a,b)
> +#define user_access_begin_after_access_ok()	__uaccess_begin()
>   #define user_access_end()	__uaccess_end()
>   
>   #define user_access_save()	smap_save()
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 67f016010aad..4c0a959ad639 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -370,6 +370,7 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
>   
>   #ifndef user_access_begin
>   #define user_access_begin(ptr,len) access_ok(ptr, len)
> +#define user_access_begin_after_access_ok() do { } while (0)
>   #define user_access_end() do { } while (0)
>   #define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0)
>   #define unsafe_get_user(x,p,e) unsafe_op_wrap(__get_user(x,p),e)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02  8:45 [PATCH RFC] uaccess: user_access_begin_after_access_ok() Michael S. Tsirkin
  2020-06-02 10:15 ` Jason Wang
@ 2020-06-02 16:30 ` Al Viro
  2020-06-02 20:42   ` Michael S. Tsirkin
  2020-06-03  1:48 ` Al Viro
  2 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-02 16:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Jason Wang, netdev

On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> So vhost needs to poke at userspace *a lot* in a quick succession.  It
> is thus benefitial to enable userspace access, do our thing, then
> disable. Except access_ok has already been pre-validated with all the
> relevant nospec checks, so we don't need that.  Add an API to allow
> userspace access after access_ok and barrier_nospec are done.

This is the wrong way to do it, and this API is certain to be abused
elsewhere.  NAK - we need to sort out vhost-related problems, but
this is not an acceptable solution.  Sorry.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 10:15 ` Jason Wang
@ 2020-06-02 16:33   ` Al Viro
  2020-06-02 17:18     ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-02 16:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, linux-kernel, netdev, Linus Torvalds

On Tue, Jun 02, 2020 at 06:15:57PM +0800, Jason Wang wrote:
> 
> On 2020/6/2 下午4:45, Michael S. Tsirkin wrote:
> > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > is thus benefitial to enable userspace access, do our thing, then
> > disable. Except access_ok has already been pre-validated with all the
> > relevant nospec checks, so we don't need that.  Add an API to allow
> > userspace access after access_ok and barrier_nospec are done.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Jason, so I've been thinking using something along these lines,
> > then switching vhost to use unsafe_copy_to_user and friends would
> > solve lots of problems you observed with SMAP.
> > 
> > What do you think?
> 
> 
> I'm fine with this approach.

I am not.

> >   Do we need any other APIs to make it practical?
> 
> 
> It's not clear whether we need a new API, I think __uaccess_being() has the
> assumption that the address has been validated by access_ok().

__uaccess_begin() is a stopgap, not a public API.

The problem is real, but "let's add a public API that would do user_access_begin()
with access_ok() already done" is no-go.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 16:33   ` Al Viro
@ 2020-06-02 17:18     ` Linus Torvalds
  2020-06-02 17:44       ` Al Viro
  2020-06-02 20:32       ` Michael S. Tsirkin
  0 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2020-06-02 17:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Jason Wang, Michael S. Tsirkin, Linux Kernel Mailing List, Netdev

On Tue, Jun 2, 2020 at 9:33 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> >
> > It's not clear whether we need a new API, I think __uaccess_being() has the
> > assumption that the address has been validated by access_ok().
>
> __uaccess_begin() is a stopgap, not a public API.

Correct. It's just an x86 implementation detail.

> The problem is real, but "let's add a public API that would do user_access_begin()
> with access_ok() already done" is no-go.

Yeah, it's completely pointless.

The solution to this is easy: remove the incorrect and useless early
"access_ok()". Boom, done.

Then use user_access_begin() and the appropriate unsage_get/put_user()
sequence, and user_access_end().

The range test that user-access-begin does is not just part of the
ABI, it's just required in general. We have almost thirty years of
history of trying to avoid it, AND IT WAS ALL BOGUS.

The fact is, the range check is pretty damn cheap, and not doing the
range check has always been a complete and utter disaster.

You have exactly two cases:

 (a) the access_ok() would be right above the code and can't be missed

 (b) not

and in (a) the solution is: remove the access_ok() and let
user_access_begin() do the range check.

In (b), the solution is literally "DON'T DO THAT!"

Because EVERY SINGLE TIME people have eventually noticed (possibly
after code movement) that "oops, we never did the access_ok at all,
and now we can be fooled into kernel corruption and a security issue".

And even if that didn't happen, the worry was there.

End result: use user_access_begin() and stop trying to remove the two
cycles or whatever of the limit checking cost. The "upside" of
removing that limit check just isn't. It's a downside.

                 Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 17:18     ` Linus Torvalds
@ 2020-06-02 17:44       ` Al Viro
  2020-06-02 17:46         ` Al Viro
  2020-06-02 20:32       ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-02 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Wang, Michael S. Tsirkin, Linux Kernel Mailing List, Netdev

On Tue, Jun 02, 2020 at 10:18:09AM -0700, Linus Torvalds wrote:


> You have exactly two cases:
> 
>  (a) the access_ok() would be right above the code and can't be missed
> 
>  (b) not

   (c) what you really want is not quite access_ok().

Again, that "not quite access_ok()" should be right next to STAC, and
come from the same primitive - I'm not saying the current model is
anywhere near sane.  We need a range-checking primitive right next
to memory access; it's just that for KVM and vhost we might want
a different check and, for things like s390 and sparc (mips as well,
in some configs), potentially different part that would do the memory
access itself as well.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 17:44       ` Al Viro
@ 2020-06-02 17:46         ` Al Viro
  0 siblings, 0 replies; 37+ messages in thread
From: Al Viro @ 2020-06-02 17:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Wang, Michael S. Tsirkin, Linux Kernel Mailing List, Netdev

On Tue, Jun 02, 2020 at 06:44:30PM +0100, Al Viro wrote:
> On Tue, Jun 02, 2020 at 10:18:09AM -0700, Linus Torvalds wrote:
> 
> 
> > You have exactly two cases:
> > 
> >  (a) the access_ok() would be right above the code and can't be missed
> > 
> >  (b) not
> 
>    (c) what you really want is not quite access_ok().
> 
> Again, that "not quite access_ok()" should be right next to STAC, and
> come from the same primitive - I'm not saying the current model is
> anywhere near sane.  We need a range-checking primitive right next
> to memory access; it's just that for KVM and vhost we might want
> a different check and, for things like s390 and sparc (mips as well,

things like vhost on s390 and sparc, that is.

> in some configs), potentially different part that would do the memory
> access itself as well.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 17:18     ` Linus Torvalds
  2020-06-02 17:44       ` Al Viro
@ 2020-06-02 20:32       ` Michael S. Tsirkin
  2020-06-02 20:41         ` David Laight
  2020-06-02 20:43         ` Linus Torvalds
  1 sibling, 2 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-02 20:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Jason Wang, Linux Kernel Mailing List, Netdev

On Tue, Jun 02, 2020 at 10:18:09AM -0700, Linus Torvalds wrote:
> On Tue, Jun 2, 2020 at 9:33 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > >
> > > It's not clear whether we need a new API, I think __uaccess_being() has the
> > > assumption that the address has been validated by access_ok().
> >
> > __uaccess_begin() is a stopgap, not a public API.
> 
> Correct. It's just an x86 implementation detail.
> 
> > The problem is real, but "let's add a public API that would do user_access_begin()
> > with access_ok() already done" is no-go.
> 
> Yeah, it's completely pointless.
> 
> The solution to this is easy: remove the incorrect and useless early
> "access_ok()". Boom, done.

Hmm are you sure we can drop it? access_ok is done in the context
of the process. Access itself in the context of a kernel thread
that borrows the same mm. IIUC if the process can be 32 bit
while the kernel is 64 bit, access_ok in the context of the
kernel thread will not DTRT.


> Then use user_access_begin() and the appropriate unsage_get/put_user()
> sequence, and user_access_end().
> 
> The range test that user-access-begin does is not just part of the
> ABI, it's just required in general. We have almost thirty years of
> history of trying to avoid it, AND IT WAS ALL BOGUS.
> 
> The fact is, the range check is pretty damn cheap, and not doing the
> range check has always been a complete and utter disaster.
> 
> You have exactly two cases:
> 
>  (a) the access_ok() would be right above the code and can't be missed
> 
>  (b) not
> 
> and in (a) the solution is: remove the access_ok() and let
> user_access_begin() do the range check.
> 
> In (b), the solution is literally "DON'T DO THAT!"
> 
> Because EVERY SINGLE TIME people have eventually noticed (possibly
> after code movement) that "oops, we never did the access_ok at all,
> and now we can be fooled into kernel corruption and a security issue".
> 
> And even if that didn't happen, the worry was there.
> 
> End result: use user_access_begin() and stop trying to remove the two
> cycles or whatever of the limit checking cost. The "upside" of
> removing that limit check just isn't. It's a downside.
> 
>                  Linus

That's true. Limit check cost is measureable but very small.
It's the speculation barrier that's costly.

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 20:32       ` Michael S. Tsirkin
@ 2020-06-02 20:41         ` David Laight
  2020-06-02 21:58           ` Al Viro
  2020-06-02 20:43         ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: David Laight @ 2020-06-02 20:41 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', Linus Torvalds
  Cc: Al Viro, Jason Wang, Linux Kernel Mailing List, Netdev

From: Michael S. Tsirkin
> Sent: 02 June 2020 21:33
> On Tue, Jun 02, 2020 at 10:18:09AM -0700, Linus Torvalds wrote:
> > On Tue, Jun 2, 2020 at 9:33 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > >
> > > > It's not clear whether we need a new API, I think __uaccess_being() has the
> > > > assumption that the address has been validated by access_ok().
> > >
> > > __uaccess_begin() is a stopgap, not a public API.
> >
> > Correct. It's just an x86 implementation detail.
> >
> > > The problem is real, but "let's add a public API that would do user_access_begin()
> > > with access_ok() already done" is no-go.
> >
> > Yeah, it's completely pointless.
> >
> > The solution to this is easy: remove the incorrect and useless early
> > "access_ok()". Boom, done.
> 
> Hmm are you sure we can drop it? access_ok is done in the context
> of the process. Access itself in the context of a kernel thread
> that borrows the same mm. IIUC if the process can be 32 bit
> while the kernel is 64 bit, access_ok in the context of the
> kernel thread will not DTRT.

In which case you need a 'user_access_begin' that takes the mm
as an additional parameter.

I found an 'interesting' acccess_ok() call in the code that copies
iov[] into kernel (eg for readv()).

a) It is a long way from any copies.
b) It can be conditionally ignored - and is so for one call.
   The oddball is code that reads from a different process.
   I didn't spot an equivalent check, but it all worked by
   mapping in the required page - so I'm not sure what happens.

Are there really just 2 limits for access_ok().
One for 64bit programs and one for 32bit?
With the limit being just below the 'dso' page??
So checking the current processes limit is never going
to restrict access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 16:30 ` Al Viro
@ 2020-06-02 20:42   ` Michael S. Tsirkin
  2020-06-02 22:10     ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-02 20:42 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Jason Wang, netdev

On Tue, Jun 02, 2020 at 05:30:48PM +0100, Al Viro wrote:
> On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > is thus benefitial to enable userspace access, do our thing, then
> > disable. Except access_ok has already been pre-validated with all the
> > relevant nospec checks, so we don't need that.  Add an API to allow
> > userspace access after access_ok and barrier_nospec are done.
> 
> This is the wrong way to do it, and this API is certain to be abused
> elsewhere.  NAK - we need to sort out vhost-related problems, but
> this is not an acceptable solution.  Sorry.

OK so summarizing what you and Linus both said, we need at
least a way to make sure access_ok (and preferably the barrier too)
is not missed.

Another comment is about actually checking that performance impact
is significant and worth the complexity and risk.

Is that a fair summary?

I'm actually thinking it's doable with a new __unsafe_user type of
pointer, sparse will then catch errors for us.


-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 20:32       ` Michael S. Tsirkin
  2020-06-02 20:41         ` David Laight
@ 2020-06-02 20:43         ` Linus Torvalds
  2020-06-03  6:01           ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2020-06-02 20:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, Jason Wang, Linux Kernel Mailing List, Netdev

On Tue, Jun 2, 2020 at 1:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Hmm are you sure we can drop it? access_ok is done in the context
> of the process. Access itself in the context of a kernel thread
> that borrows the same mm. IIUC if the process can be 32 bit
> while the kernel is 64 bit, access_ok in the context of the
> kernel thread will not DTRT.

You're historically expected to just "set_fs()" when you do use_mm().

Then we fixed it in commit...

Oh, when I look for it, I notice that it still hasn't gotten merged.
It's still pending, see

  https://lore.kernel.org/lkml/20200416053158.586887-4-hch@lst.de/

for the current thing.

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 20:41         ` David Laight
@ 2020-06-02 21:58           ` Al Viro
  2020-06-03  8:08             ` David Laight
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-02 21:58 UTC (permalink / raw)
  To: David Laight
  Cc: 'Michael S. Tsirkin',
	Linus Torvalds, Jason Wang, Linux Kernel Mailing List, Netdev

On Tue, Jun 02, 2020 at 08:41:38PM +0000, David Laight wrote:

> In which case you need a 'user_access_begin' that takes the mm
> as an additional parameter.

	What does any of that have to do with mm?  Details, please.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 20:42   ` Michael S. Tsirkin
@ 2020-06-02 22:10     ` Al Viro
  2020-06-03  5:17       ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-02 22:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Jason Wang, netdev

On Tue, Jun 02, 2020 at 04:42:03PM -0400, Michael S. Tsirkin wrote:
> On Tue, Jun 02, 2020 at 05:30:48PM +0100, Al Viro wrote:
> > On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > > is thus benefitial to enable userspace access, do our thing, then
> > > disable. Except access_ok has already been pre-validated with all the
> > > relevant nospec checks, so we don't need that.  Add an API to allow
> > > userspace access after access_ok and barrier_nospec are done.
> > 
> > This is the wrong way to do it, and this API is certain to be abused
> > elsewhere.  NAK - we need to sort out vhost-related problems, but
> > this is not an acceptable solution.  Sorry.
> 
> OK so summarizing what you and Linus both said, we need at
> least a way to make sure access_ok (and preferably the barrier too)
> is not missed.
> 
> Another comment is about actually checking that performance impact
> is significant and worth the complexity and risk.
> 
> Is that a fair summary?
> 
> I'm actually thinking it's doable with a new __unsafe_user type of
> pointer, sparse will then catch errors for us.

Er... how would sparse keep track of the range?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02  8:45 [PATCH RFC] uaccess: user_access_begin_after_access_ok() Michael S. Tsirkin
  2020-06-02 10:15 ` Jason Wang
  2020-06-02 16:30 ` Al Viro
@ 2020-06-03  1:48 ` Al Viro
  2020-06-03  3:57   ` Jason Wang
  2020-06-03  5:29   ` Michael S. Tsirkin
  2 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2020-06-03  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Jason Wang, netdev

On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> So vhost needs to poke at userspace *a lot* in a quick succession.  It
> is thus benefitial to enable userspace access, do our thing, then
> disable. Except access_ok has already been pre-validated with all the
> relevant nospec checks, so we don't need that.  Add an API to allow
> userspace access after access_ok and barrier_nospec are done.

BTW, what are you going to do about vq->iotlb != NULL case?  Because
you sure as hell do *NOT* want e.g. translate_desc() under STAC.
Disable it around the calls of translate_desc()?

How widely do you hope to stretch the user_access areas, anyway?

BTW, speaking of possible annotations: looks like there's a large
subset of call graph that can be reached only from vhost_worker()
or from several ioctls, with all uaccess limited to that subgraph
(thankfully).  Having that explicitly marked might be a good idea...

Unrelated question, while we are at it: is there any point having
vhost_get_user() a polymorphic macro?  In all callers the third
argument is __virtio16 __user * and the second one is an explicit
*<something> where <something> is __virtio16 *.  Similar for
vhost_put_user(): in all callers the third arugment is
__virtio16 __user * and the second - cpu_to_vhost16(vq, something).

Incidentally, who had come up with the name __vhost_get_user?
Makes for lovey WTF moment for readers - esp. in vhost_put_user()...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  1:48 ` Al Viro
@ 2020-06-03  3:57   ` Jason Wang
  2020-06-03  4:18     ` Al Viro
  2020-06-03  5:29   ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-03  3:57 UTC (permalink / raw)
  To: Al Viro, Michael S. Tsirkin; +Cc: linux-kernel, netdev


On 2020/6/3 上午9:48, Al Viro wrote:
> On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
>> So vhost needs to poke at userspace *a lot* in a quick succession.  It
>> is thus benefitial to enable userspace access, do our thing, then
>> disable. Except access_ok has already been pre-validated with all the
>> relevant nospec checks, so we don't need that.  Add an API to allow
>> userspace access after access_ok and barrier_nospec are done.
> BTW, what are you going to do about vq->iotlb != NULL case?  Because
> you sure as hell do *NOT* want e.g. translate_desc() under STAC.
> Disable it around the calls of translate_desc()?


translate_desc() itself doesn't do userspace access, so the idea is 
probably disabling STAC around a batch of vhost_get_uesr()/vhost_put_user().


>
> How widely do you hope to stretch the user_access areas, anyway?


To have best performance for small packets like 64B, if possible, we 
want to disable STAC not only for the metadata access done by vhost 
accessors but also the data access via iov iterator.


>
> BTW, speaking of possible annotations: looks like there's a large
> subset of call graph that can be reached only from vhost_worker()
> or from several ioctls, with all uaccess limited to that subgraph
> (thankfully).  Having that explicitly marked might be a good idea...
>
> Unrelated question, while we are at it: is there any point having
> vhost_get_user() a polymorphic macro?  In all callers the third
> argument is __virtio16 __user * and the second one is an explicit
> *<something> where <something> is __virtio16 *.  Similar for
> vhost_put_user(): in all callers the third arugment is
> __virtio16 __user * and the second - cpu_to_vhost16(vq, something).


This is because all virtqueue metadata that needs to be accessed 
atomically is __virtio16 now, but this may not be true for future virtio 
extension.


>
> Incidentally, who had come up with the name __vhost_get_user?
> Makes for lovey WTF moment for readers - esp. in vhost_put_user()...


I think the confusion comes since it does not accept userspace pointer 
(when IOTLB is enabled).

How about renaming it as vhost_read()/vhost_write() ?

Thanks

>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  3:57   ` Jason Wang
@ 2020-06-03  4:18     ` Al Viro
  2020-06-03  5:18       ` Jason Wang
  2020-06-03  6:25       ` Michael S. Tsirkin
  0 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2020-06-03  4:18 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, linux-kernel, netdev

On Wed, Jun 03, 2020 at 11:57:11AM +0800, Jason Wang wrote:

> > How widely do you hope to stretch the user_access areas, anyway?
> 
> 
> To have best performance for small packets like 64B, if possible, we want to
> disable STAC not only for the metadata access done by vhost accessors but
> also the data access via iov iterator.

If you want to try and convince Linus to go for that, make sure to Cc
me on that thread.  Always liked quality flame...

The same goes for interval tree lookups with uaccess allowed.  IOW, I _really_
doubt that it's a good idea.

> > Incidentally, who had come up with the name __vhost_get_user?
> > Makes for lovey WTF moment for readers - esp. in vhost_put_user()...
> 
> 
> I think the confusion comes since it does not accept userspace pointer (when
> IOTLB is enabled).
> 
> How about renaming it as vhost_read()/vhost_write() ?

Huh?

__vhost_get_user() is IOTLB remapping of userland pointer.  It does not access
userland memory.  Neither for read, nor for write.  It is used by vhost_get_user()
and vhost_put_user().

Why would you want to rename it into vhost_read _or_ vhost_write, and in any case,
how do you give one function two names?  IDGI...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 22:10     ` Al Viro
@ 2020-06-03  5:17       ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  5:17 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Jason Wang, netdev

On Tue, Jun 02, 2020 at 11:10:57PM +0100, Al Viro wrote:
> On Tue, Jun 02, 2020 at 04:42:03PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jun 02, 2020 at 05:30:48PM +0100, Al Viro wrote:
> > > On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > > > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > > > is thus benefitial to enable userspace access, do our thing, then
> > > > disable. Except access_ok has already been pre-validated with all the
> > > > relevant nospec checks, so we don't need that.  Add an API to allow
> > > > userspace access after access_ok and barrier_nospec are done.
> > > 
> > > This is the wrong way to do it, and this API is certain to be abused
> > > elsewhere.  NAK - we need to sort out vhost-related problems, but
> > > this is not an acceptable solution.  Sorry.
> > 
> > OK so summarizing what you and Linus both said, we need at
> > least a way to make sure access_ok (and preferably the barrier too)
> > is not missed.
> > 
> > Another comment is about actually checking that performance impact
> > is significant and worth the complexity and risk.
> > 
> > Is that a fair summary?
> > 
> > I'm actually thinking it's doable with a new __unsafe_user type of
> > pointer, sparse will then catch errors for us.
> 
> Er... how would sparse keep track of the range?

Using types. So you start with a user pointer:

struct foo __user *up;

Now you validate it, including a speculation barrier:

struct foo __valdated_user *p = user_access_validate(up, sizeof *up);

and you can save it and use it with something like unsafe_get_user and unsafe_put_user
that gets __valdated_user pointers:

user_access_begin_validated(p, sizeof *p)
valiated_get_user(bar, foo->bar, err_fault)
valiated_put_user(baz, foo->baz, err_fault)
user_access_end()




-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  4:18     ` Al Viro
@ 2020-06-03  5:18       ` Jason Wang
  2020-06-03  5:46         ` Michael S. Tsirkin
  2020-06-03  6:25       ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-03  5:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Michael S. Tsirkin, linux-kernel, netdev


On 2020/6/3 下午12:18, Al Viro wrote:
> On Wed, Jun 03, 2020 at 11:57:11AM +0800, Jason Wang wrote:
>
>>> How widely do you hope to stretch the user_access areas, anyway?
>>
>> To have best performance for small packets like 64B, if possible, we want to
>> disable STAC not only for the metadata access done by vhost accessors but
>> also the data access via iov iterator.
> If you want to try and convince Linus to go for that, make sure to Cc
> me on that thread.  Always liked quality flame...
>
> The same goes for interval tree lookups with uaccess allowed.  IOW, I _really_
> doubt that it's a good idea.


I see. We are just seeking an approach to perform better in order to 
compete with userspace dpdk backends.

I tried another approach of using direct mapping + mmu notifier [1] but 
the synchronization with MMU notifier is not easy to perform well.

[1] https://patchwork.kernel.org/patch/11133009/


>
>>> Incidentally, who had come up with the name __vhost_get_user?
>>> Makes for lovey WTF moment for readers - esp. in vhost_put_user()...
>>
>> I think the confusion comes since it does not accept userspace pointer (when
>> IOTLB is enabled).
>>
>> How about renaming it as vhost_read()/vhost_write() ?
> Huh?
>
> __vhost_get_user() is IOTLB remapping of userland pointer.  It does not access
> userland memory.  Neither for read, nor for write.  It is used by vhost_get_user()
> and vhost_put_user().
>
> Why would you want to rename it into vhost_read _or_ vhost_write, and in any case,
> how do you give one function two names?  IDGI...


I get you know, I thought you're concerning the names of 
vhost_get_user()/vhost_put_user() but actually __vhost_get_user().

Maybe something like __vhost_fetch_uaddr() is better.

Thanks


>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  1:48 ` Al Viro
  2020-06-03  3:57   ` Jason Wang
@ 2020-06-03  5:29   ` Michael S. Tsirkin
  2020-06-03 16:52     ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  5:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Jason Wang, netdev

On Wed, Jun 03, 2020 at 02:48:15AM +0100, Al Viro wrote:
> On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > is thus benefitial to enable userspace access, do our thing, then
> > disable. Except access_ok has already been pre-validated with all the
> > relevant nospec checks, so we don't need that.  Add an API to allow
> > userspace access after access_ok and barrier_nospec are done.
> 
> BTW, what are you going to do about vq->iotlb != NULL case?  Because
> you sure as hell do *NOT* want e.g. translate_desc() under STAC.
> Disable it around the calls of translate_desc()?
> 
> How widely do you hope to stretch the user_access areas, anyway?

So ATM I'm looking at adding support for the packed ring format.
That does something like:


get_user(flags, desc->flags)
smp_rmb()
if (flags & VALID)
copy_from_user(&adesc, desc, sizeof adesc);


this would be a good candidate I think.






> BTW, speaking of possible annotations: looks like there's a large
> subset of call graph that can be reached only from vhost_worker()
> or from several ioctls, with all uaccess limited to that subgraph
> (thankfully).  Having that explicitly marked might be a good idea...

Sure. What's a good way to do that though? Any examples to follow?
Or do you mean code comments?


> Unrelated question, while we are at it: is there any point having
> vhost_get_user() a polymorphic macro?  In all callers the third
> argument is __virtio16 __user * and the second one is an explicit
> *<something> where <something> is __virtio16 *.  Similar for
> vhost_put_user(): in all callers the third arugment is
> __virtio16 __user * and the second - cpu_to_vhost16(vq, something).
> 
> Incidentally, who had come up with the name __vhost_get_user?
> Makes for lovey WTF moment for readers - esp. in vhost_put_user()...


Good points, I'll fix these.

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  5:18       ` Jason Wang
@ 2020-06-03  5:46         ` Michael S. Tsirkin
  2020-06-03  6:23           ` Jason Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  5:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: Al Viro, linux-kernel, netdev

On Wed, Jun 03, 2020 at 01:18:54PM +0800, Jason Wang wrote:
> 
> On 2020/6/3 下午12:18, Al Viro wrote:
> > On Wed, Jun 03, 2020 at 11:57:11AM +0800, Jason Wang wrote:
> > 
> > > > How widely do you hope to stretch the user_access areas, anyway?
> > > 
> > > To have best performance for small packets like 64B, if possible, we want to
> > > disable STAC not only for the metadata access done by vhost accessors but
> > > also the data access via iov iterator.
> > If you want to try and convince Linus to go for that, make sure to Cc
> > me on that thread.  Always liked quality flame...
> > 
> > The same goes for interval tree lookups with uaccess allowed.  IOW, I _really_
> > doubt that it's a good idea.
> 
> 
> I see. We are just seeking an approach to perform better in order to compete
> with userspace dpdk backends.
> 
> I tried another approach of using direct mapping + mmu notifier [1] but the
> synchronization with MMU notifier is not easy to perform well.
> 
> [1] https://patchwork.kernel.org/patch/11133009/
> 
> 
> > 
> > > > Incidentally, who had come up with the name __vhost_get_user?
> > > > Makes for lovey WTF moment for readers - esp. in vhost_put_user()...
> > > 
> > > I think the confusion comes since it does not accept userspace pointer (when
> > > IOTLB is enabled).
> > > 
> > > How about renaming it as vhost_read()/vhost_write() ?
> > Huh?
> > 
> > __vhost_get_user() is IOTLB remapping of userland pointer.  It does not access
> > userland memory.  Neither for read, nor for write.  It is used by vhost_get_user()
> > and vhost_put_user().
> > 
> > Why would you want to rename it into vhost_read _or_ vhost_write, and in any case,
> > how do you give one function two names?  IDGI...
> 
> 
> I get you know, I thought you're concerning the names of
> vhost_get_user()/vhost_put_user() but actually __vhost_get_user().
> 
> Maybe something like __vhost_fetch_uaddr() is better.
> 
> Thanks


It's basically vhost_translate_uaddr isn't it?

BTW now I re-read it I don't understand __vhost_get_user_slow:


static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
                                          void __user *addr, unsigned int size,
                                          int type)
{
        int ret;

        ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
                             ARRAY_SIZE(vq->iotlb_iov),
                             VHOST_ACCESS_RO);

..
}

how does this work? how can we cast a pointer to guest address without
adding any offsets?



> 
> > 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 20:43         ` Linus Torvalds
@ 2020-06-03  6:01           ` Michael S. Tsirkin
       [not found]             ` <CAHk-=wi3=QuD30fRq8fYYTj9WmkgeZ0VR_Sh3DQHU+nmwj-jMg@mail.gmail.com>
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  6:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Jason Wang, Linux Kernel Mailing List, Netdev

On Tue, Jun 02, 2020 at 01:43:20PM -0700, Linus Torvalds wrote:
> On Tue, Jun 2, 2020 at 1:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Hmm are you sure we can drop it? access_ok is done in the context
> > of the process. Access itself in the context of a kernel thread
> > that borrows the same mm. IIUC if the process can be 32 bit
> > while the kernel is 64 bit, access_ok in the context of the
> > kernel thread will not DTRT.
> 
> You're historically expected to just "set_fs()" when you do use_mm().

Right and we do that, but that still sets the segment according to the
current thread's flags, right?

E.g. I see:

#define USER_DS         MAKE_MM_SEG(TASK_SIZE_MAX)

and

#define TASK_SIZE               (test_thread_flag(TIF_ADDR32) ? \
                                        IA32_PAGE_OFFSET : TASK_SIZE_MAX)


so if this is run from a kernel thread on a 64 bit kernel, we get
TASK_SIZE_MAX even if we got the pointer from a 32 bit userspace
address.



> Then we fixed it in commit...
> 
> Oh, when I look for it, I notice that it still hasn't gotten merged.
> It's still pending, see
> 
>   https://lore.kernel.org/lkml/20200416053158.586887-4-hch@lst.de/
> 
> for the current thing.
> 
>               Linus


Maybe kthread_use_mm should also get the fs, not just mm.
Then we can just use access_ok directly before the access.


-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  5:46         ` Michael S. Tsirkin
@ 2020-06-03  6:23           ` Jason Wang
  2020-06-03  6:30             ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-03  6:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, linux-kernel, netdev


On 2020/6/3 下午1:46, Michael S. Tsirkin wrote:
> On Wed, Jun 03, 2020 at 01:18:54PM +0800, Jason Wang wrote:
>> On 2020/6/3 下午12:18, Al Viro wrote:
>>> On Wed, Jun 03, 2020 at 11:57:11AM +0800, Jason Wang wrote:
>>>
>>>>> How widely do you hope to stretch the user_access areas, anyway?
>>>> To have best performance for small packets like 64B, if possible, we want to
>>>> disable STAC not only for the metadata access done by vhost accessors but
>>>> also the data access via iov iterator.
>>> If you want to try and convince Linus to go for that, make sure to Cc
>>> me on that thread.  Always liked quality flame...
>>>
>>> The same goes for interval tree lookups with uaccess allowed.  IOW, I _really_
>>> doubt that it's a good idea.
>>
>> I see. We are just seeking an approach to perform better in order to compete
>> with userspace dpdk backends.
>>
>> I tried another approach of using direct mapping + mmu notifier [1] but the
>> synchronization with MMU notifier is not easy to perform well.
>>
>> [1] https://patchwork.kernel.org/patch/11133009/
>>
>>
>>>>> Incidentally, who had come up with the name __vhost_get_user?
>>>>> Makes for lovey WTF moment for readers - esp. in vhost_put_user()...
>>>> I think the confusion comes since it does not accept userspace pointer (when
>>>> IOTLB is enabled).
>>>>
>>>> How about renaming it as vhost_read()/vhost_write() ?
>>> Huh?
>>>
>>> __vhost_get_user() is IOTLB remapping of userland pointer.  It does not access
>>> userland memory.  Neither for read, nor for write.  It is used by vhost_get_user()
>>> and vhost_put_user().
>>>
>>> Why would you want to rename it into vhost_read _or_ vhost_write, and in any case,
>>> how do you give one function two names?  IDGI...
>>
>> I get you know, I thought you're concerning the names of
>> vhost_get_user()/vhost_put_user() but actually __vhost_get_user().
>>
>> Maybe something like __vhost_fetch_uaddr() is better.
>>
>> Thanks
>
> It's basically vhost_translate_uaddr isn't it?


Yes.


>
> BTW now I re-read it I don't understand __vhost_get_user_slow:
>
>
> static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
>                                            void __user *addr, unsigned int size,
>                                            int type)
> {
>          int ret;
>
>          ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
>                               ARRAY_SIZE(vq->iotlb_iov),
>                               VHOST_ACCESS_RO);
>
> ..
> }
>
> how does this work? how can we cast a pointer to guest address without
> adding any offsets?


I'm not sure I get you here. What kind of offset did you mean?

Thanks


>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  4:18     ` Al Viro
  2020-06-03  5:18       ` Jason Wang
@ 2020-06-03  6:25       ` Michael S. Tsirkin
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  6:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Jason Wang, linux-kernel, netdev

On Wed, Jun 03, 2020 at 05:18:49AM +0100, Al Viro wrote:
> On Wed, Jun 03, 2020 at 11:57:11AM +0800, Jason Wang wrote:
> 
> > > How widely do you hope to stretch the user_access areas, anyway?
> > 
> > 
> > To have best performance for small packets like 64B, if possible, we want to
> > disable STAC not only for the metadata access done by vhost accessors but
> > also the data access via iov iterator.
> 
> If you want to try and convince Linus to go for that, make sure to Cc
> me on that thread.  Always liked quality flame...

It's not about iov the way I see it.

It's a more fine-grained version of "nosmap".

Right now you basically want nosmap if you are running
a workload that does 100 byte reads/writes of userspace memory
all the time.

Which isn't so bad, I'm not sure how much security does smap add at all.
Were there any exploits that it blocked ever since it was introduced?

But in any case, it would be nice to also have an option to make it
possible to disable smap e.g. just for vhost. Not because vhost is
so secure, but simply because user wants this tradeoff.

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  6:23           ` Jason Wang
@ 2020-06-03  6:30             ` Michael S. Tsirkin
  2020-06-03  6:36               ` Jason Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-03  6:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: Al Viro, linux-kernel, netdev

On Wed, Jun 03, 2020 at 02:23:08PM +0800, Jason Wang wrote:
> > 
> > BTW now I re-read it I don't understand __vhost_get_user_slow:
> > 
> > 
> > static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
> >                                            void __user *addr, unsigned int size,
> >                                            int type)
> > {
> >          int ret;
> > 
> >          ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
> >                               ARRAY_SIZE(vq->iotlb_iov),
> >                               VHOST_ACCESS_RO);
> > 
> > ..
> > }
> > 
> > how does this work? how can we cast a pointer to guest address without
> > adding any offsets?
> 
> 
> I'm not sure I get you here. What kind of offset did you mean?
> 
> Thanks

OK so points:

1. type argument seems unused. Right?
2. Second argument to translate_desc is a GPA, isn't it?
   Here we cast a userspace address to this type. What if it
   matches a valid GPA by mistake?

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  6:30             ` Michael S. Tsirkin
@ 2020-06-03  6:36               ` Jason Wang
  2020-06-04 16:49                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-03  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, linux-kernel, netdev


On 2020/6/3 下午2:30, Michael S. Tsirkin wrote:
> On Wed, Jun 03, 2020 at 02:23:08PM +0800, Jason Wang wrote:
>>> BTW now I re-read it I don't understand __vhost_get_user_slow:
>>>
>>>
>>> static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
>>>                                             void __user *addr, unsigned int size,
>>>                                             int type)
>>> {
>>>           int ret;
>>>
>>>           ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
>>>                                ARRAY_SIZE(vq->iotlb_iov),
>>>                                VHOST_ACCESS_RO);
>>>
>>> ..
>>> }
>>>
>>> how does this work? how can we cast a pointer to guest address without
>>> adding any offsets?
>>
>> I'm not sure I get you here. What kind of offset did you mean?
>>
>> Thanks
> OK so points:
>
> 1. type argument seems unused. Right?


Yes, we can remove that.


> 2. Second argument to translate_desc is a GPA, isn't it?


No, it's IOVA, this function will be called only when IOTLB is enabled.

Thanks


>     Here we cast a userspace address to this type. What if it
>     matches a valid GPA by mistake?
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-02 21:58           ` Al Viro
@ 2020-06-03  8:08             ` David Laight
  0 siblings, 0 replies; 37+ messages in thread
From: David Laight @ 2020-06-03  8:08 UTC (permalink / raw)
  To: 'Al Viro'
  Cc: 'Michael S. Tsirkin',
	Linus Torvalds, Jason Wang, Linux Kernel Mailing List, Netdev

From: Al Viro <viro@ftp.linux.org.uk> On Behalf Of Al Viro
> Sent: 02 June 2020 22:58
> On Tue, Jun 02, 2020 at 08:41:38PM +0000, David Laight wrote:
> 
> > In which case you need a 'user_access_begin' that takes the mm
> > as an additional parameter.
> 
> 	What does any of that have to do with mm?  Details, please.

Actually probably nothing.

I was sort of thinking that maybe the user process's memory
map (mm?) would be temporarily 'attached' to the kernel thread
so that it used the normal copy_to/from_user() fault
handling to access the 'other' process.

In which case you'd want to do the bound check against the
limit of the user addresses in the mm rather than those of
the current process.

But later posts probably imply that it is all done differently.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  5:29   ` Michael S. Tsirkin
@ 2020-06-03 16:52     ` Al Viro
  2020-06-04  6:10       ` Jason Wang
  2020-06-04 10:10       ` Michael S. Tsirkin
  0 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2020-06-03 16:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Jason Wang, netdev

On Wed, Jun 03, 2020 at 01:29:00AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jun 03, 2020 at 02:48:15AM +0100, Al Viro wrote:
> > On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > > is thus benefitial to enable userspace access, do our thing, then
> > > disable. Except access_ok has already been pre-validated with all the
> > > relevant nospec checks, so we don't need that.  Add an API to allow
> > > userspace access after access_ok and barrier_nospec are done.
> > 
> > BTW, what are you going to do about vq->iotlb != NULL case?  Because
> > you sure as hell do *NOT* want e.g. translate_desc() under STAC.
> > Disable it around the calls of translate_desc()?
> > 
> > How widely do you hope to stretch the user_access areas, anyway?
> 
> So ATM I'm looking at adding support for the packed ring format.
> That does something like:
> 
> get_user(flags, desc->flags)
> smp_rmb()
> if (flags & VALID)
> copy_from_user(&adesc, desc, sizeof adesc);
> 
> this would be a good candidate I think.

Perhaps, once we get stac/clac out of raw_copy_from_user() (coming cycle,
probably).  BTW, how large is the structure and how is it aligned?

> > BTW, speaking of possible annotations: looks like there's a large
> > subset of call graph that can be reached only from vhost_worker()
> > or from several ioctls, with all uaccess limited to that subgraph
> > (thankfully).  Having that explicitly marked might be a good idea...
> 
> Sure. What's a good way to do that though? Any examples to follow?
> Or do you mean code comments?

Not sure...  FWIW, the part of call graph from "known to be only
used by vhost_worker" (->handle_kick/vhost_work_init callback/
vhost_poll_init callback) and "part of ->ioctl()" to actual uaccess
primitives is fairly large - the longest chain is
handle_tx_net ->
  handle_tx ->
    handle_tx_zerocopy ->
      get_tx_bufs ->
	vhost_net_tx_get_vq_desc ->
	  vhost_tx_batch ->
	    vhost_net_signal_used ->
	      vhost_add_used_and_signal_n ->
		vhost_signal ->
		  vhost_notify ->
		    vhost_get_avail_flags ->
		      vhost_get_avail ->
			vhost_get_user ->
			  __get_user()
i.e. 14 levels deep and the graph doesn't factorize well...

Something along the lines of "all callers of thus annotated function
must be annotated the same way themselves, any implicit conversion
of pointers to such functions to anything other than boolean yields
a warning, explicit cast is allowed only with __force", perhaps?
Then slap such annotations on vhost_{get,put,copy_to,copy_from}_user(),
on ->handle_kick(), a force-cast in the only caller of ->handle_kick()
and force-casts in the 3 callers in ->ioctl().

And propagate the annotations until the warnings stop, basically...

Shouldn't be terribly hard to teach sparse that kind of stuff and it
might be useful elsewhere.  It would act as a qualifier on function
pointers, with syntax ultimately expanding to __attribute__((something)).
I'll need to refresh my memories of the parser, but IIRC that shouldn't
require serious modifications.  Most of the work would be in
evaluate_call(), just before calling evaluate_symbol_call()...
I'll look into that; not right now, though.

BTW, __vhost_get_user() might be better off expanded in both callers -
that would get their structure similar to vhost_copy_{to,from}_user(),
especially if you expand __vhost_get_user_slow() as well.

Not sure I understand what's going with ->meta_iotlb[] - what are the
lifetime rules for struct vhost_iotlb_map and what prevents the pointers
from going stale?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
       [not found]             ` <CAHk-=wi3=QuD30fRq8fYYTj9WmkgeZ0VR_Sh3DQHU+nmwj-jMg@mail.gmail.com>
@ 2020-06-03 16:59               ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2020-06-03 16:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, Jason Wang, Linux Kernel Mailing List, Netdev

[ Just a re-send without html and a few fixes for mobile editing,
since that email got eaten by the mailing list Gods ]

On Tue, Jun 2, 2020, 23:02 Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Right and we do that, but that still sets the segment according to the
> current thread's flags, right?

But that shouldn't matter.

Sure, the limit might be for a 64-bit task, but it's not like anybody
really cares for the access. The "good address but I should limit user
space mappings to 32-bit" only matters for creating new mappings, not
for normal accesses to user space.

In fact, your very quotes show this effect:

> #define USER_DS         MAKE_MM_SEG(TASK_SIZE_MAX)

Look, to above is what set_fs(USER_DS) will use: always the max address.

Because access_ok() doesn't care. It just checks that it's any user
address at all, and the "is this mapped" is then encoded in the
existing page tables and vma lists.

So no, the current threads flags shouldn't matter for
usrr_access_begin() and unsafe_get/put_user() at all.

Where do you see them mattering?

In contrast, some things then do take the "I'm a 32-bit app" into
account, and they may use TASK_SIZE for limit checking, but on the
whole they should be very very rare. Things like "mmap()" etc, but
that's irrelevant for this discussion. But that's why you then have:

> #define TASK_SIZE               (test_thread_flag(TIF_ADDR32) ? \
>                                         IA32_PAGE_OFFSET : TASK_SIZE_MAX)

Which actually makes that choice.

(Admittedly that's a horrible horrible hack, and we should long since
have stopped doing that hiding inside the #define, but nobody has had
the energy to make it explicit in the mmap paths, I think)

> so if this is run from a kernel thread on a 64 bit kernel, we get
> TASK_SIZE_MAX even if we got the pointer from a 32 bit userspace
> address.

But that's what *normal* access_ok() does too. TASK_SIZE_MAX is fine.
All it needs to check is that it isn't a kernel address.

> Maybe kthread_use_mm should also get the fs, not just mm.
> Then we can just use access_ok directly before the access.

I'm entirely missing why you think we should care about the fs side.

Again, an access shouldn't care, either at access_ok() time, at
user_access_begin() time, or at actual user access itself. We've got
everything set up in the page tables and the vma information.

Can you point to what I'm missing in this discussion?

          Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03 16:52     ` Al Viro
@ 2020-06-04  6:10       ` Jason Wang
  2020-06-04 14:59         ` Al Viro
  2020-06-04 10:10       ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-04  6:10 UTC (permalink / raw)
  To: Al Viro, Michael S. Tsirkin; +Cc: linux-kernel, netdev


On 2020/6/4 上午12:52, Al Viro wrote:
> On Wed, Jun 03, 2020 at 01:29:00AM -0400, Michael S. Tsirkin wrote:
>> On Wed, Jun 03, 2020 at 02:48:15AM +0100, Al Viro wrote:
>>> On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
>>>> So vhost needs to poke at userspace *a lot* in a quick succession.  It
>>>> is thus benefitial to enable userspace access, do our thing, then
>>>> disable. Except access_ok has already been pre-validated with all the
>>>> relevant nospec checks, so we don't need that.  Add an API to allow
>>>> userspace access after access_ok and barrier_nospec are done.
>>> BTW, what are you going to do about vq->iotlb != NULL case?  Because
>>> you sure as hell do *NOT* want e.g. translate_desc() under STAC.
>>> Disable it around the calls of translate_desc()?
>>>
>>> How widely do you hope to stretch the user_access areas, anyway?
>> So ATM I'm looking at adding support for the packed ring format.
>> That does something like:
>>
>> get_user(flags, desc->flags)
>> smp_rmb()
>> if (flags & VALID)
>> copy_from_user(&adesc, desc, sizeof adesc);
>>
>> this would be a good candidate I think.
> Perhaps, once we get stac/clac out of raw_copy_from_user() (coming cycle,
> probably).  BTW, how large is the structure and how is it aligned?


Each descriptor is 16 bytes, and 16 bytes aligned.


>
>>> BTW, speaking of possible annotations: looks like there's a large
>>> subset of call graph that can be reached only from vhost_worker()
>>> or from several ioctls, with all uaccess limited to that subgraph
>>> (thankfully).  Having that explicitly marked might be a good idea...
>> Sure. What's a good way to do that though? Any examples to follow?
>> Or do you mean code comments?
> Not sure...  FWIW, the part of call graph from "known to be only
> used by vhost_worker" (->handle_kick/vhost_work_init callback/
> vhost_poll_init callback) and "part of ->ioctl()" to actual uaccess
> primitives is fairly large - the longest chain is
> handle_tx_net ->
>    handle_tx ->
>      handle_tx_zerocopy ->
>        get_tx_bufs ->
> 	vhost_net_tx_get_vq_desc ->
> 	  vhost_tx_batch ->
> 	    vhost_net_signal_used ->
> 	      vhost_add_used_and_signal_n ->
> 		vhost_signal ->
> 		  vhost_notify ->
> 		    vhost_get_avail_flags ->
> 		      vhost_get_avail ->
> 			vhost_get_user ->
> 			  __get_user()
> i.e. 14 levels deep and the graph doesn't factorize well...
>
> Something along the lines of "all callers of thus annotated function
> must be annotated the same way themselves, any implicit conversion
> of pointers to such functions to anything other than boolean yields
> a warning, explicit cast is allowed only with __force", perhaps?
> Then slap such annotations on vhost_{get,put,copy_to,copy_from}_user(),
> on ->handle_kick(), a force-cast in the only caller of ->handle_kick()
> and force-casts in the 3 callers in ->ioctl().
>
> And propagate the annotations until the warnings stop, basically...
>
> Shouldn't be terribly hard to teach sparse that kind of stuff and it
> might be useful elsewhere.  It would act as a qualifier on function
> pointers, with syntax ultimately expanding to __attribute__((something)).
> I'll need to refresh my memories of the parser, but IIRC that shouldn't
> require serious modifications.  Most of the work would be in
> evaluate_call(), just before calling evaluate_symbol_call()...
> I'll look into that; not right now, though.
>
> BTW, __vhost_get_user() might be better off expanded in both callers -
> that would get their structure similar to vhost_copy_{to,from}_user(),
> especially if you expand __vhost_get_user_slow() as well.
>
> Not sure I understand what's going with ->meta_iotlb[] - what are the
> lifetime rules for struct vhost_iotlb_map


It used to cache the translation for virtqueue address which. Vhost will 
try to get those addresses from IOTLB and store them in meta_iotlb, and 
it will be invalidated when userspace update or invalidate a new mapping.


> and what prevents the pointers
> from going stale?
>
>

The vq->mutex is used to synchronize between the invalidation and vhost 
workers.

Thanks



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03 16:52     ` Al Viro
  2020-06-04  6:10       ` Jason Wang
@ 2020-06-04 10:10       ` Michael S. Tsirkin
  2020-06-04 15:03         ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-04 10:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Jason Wang, netdev

On Wed, Jun 03, 2020 at 05:52:05PM +0100, Al Viro wrote:
> On Wed, Jun 03, 2020 at 01:29:00AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jun 03, 2020 at 02:48:15AM +0100, Al Viro wrote:
> > > On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> > > > So vhost needs to poke at userspace *a lot* in a quick succession.  It
> > > > is thus benefitial to enable userspace access, do our thing, then
> > > > disable. Except access_ok has already been pre-validated with all the
> > > > relevant nospec checks, so we don't need that.  Add an API to allow
> > > > userspace access after access_ok and barrier_nospec are done.
> > > 
> > > BTW, what are you going to do about vq->iotlb != NULL case?  Because
> > > you sure as hell do *NOT* want e.g. translate_desc() under STAC.
> > > Disable it around the calls of translate_desc()?
> > > 
> > > How widely do you hope to stretch the user_access areas, anyway?
> > 
> > So ATM I'm looking at adding support for the packed ring format.
> > That does something like:
> > 
> > get_user(flags, desc->flags)
> > smp_rmb()
> > if (flags & VALID)
> > copy_from_user(&adesc, desc, sizeof adesc);
> > 
> > this would be a good candidate I think.
> 
> Perhaps, once we get stac/clac out of raw_copy_from_user() (coming cycle,
> probably).

That sounds good. Presumably raw_copy_from_user will be smart enough
to optimize aligned 2 byte accesses for flags above?

>  BTW, how large is the structure and how is it aligned?

It's a batch of 16 byte structures, aligned to 16 bytes.
At the moment I used batch size of 64 which seems enough.
Ideally we'd actually read the whole batch like this
without stac/clac back and forth. E.g.

	struct vring_desc adesc[64] = {};

	stac()
	for (i = 0; i < 64; ++i) {
	 get_user(flags, desc[i].flags)
	 smp_rmb()
	 if (!(flags & VALID))
		break;
	 copy_from_user(&adesc[i], desc + i, sizeof adesc[i]);
	}
	clac()




> > > BTW, speaking of possible annotations: looks like there's a large
> > > subset of call graph that can be reached only from vhost_worker()
> > > or from several ioctls, with all uaccess limited to that subgraph
> > > (thankfully).  Having that explicitly marked might be a good idea...
> > 
> > Sure. What's a good way to do that though? Any examples to follow?
> > Or do you mean code comments?
> 
> Not sure...  FWIW, the part of call graph from "known to be only
> used by vhost_worker" (->handle_kick/vhost_work_init callback/
> vhost_poll_init callback) and "part of ->ioctl()" to actual uaccess
> primitives is fairly large - the longest chain is
> handle_tx_net ->
>   handle_tx ->
>     handle_tx_zerocopy ->
>       get_tx_bufs ->
> 	vhost_net_tx_get_vq_desc ->
> 	  vhost_tx_batch ->
> 	    vhost_net_signal_used ->
> 	      vhost_add_used_and_signal_n ->
> 		vhost_signal ->
> 		  vhost_notify ->
> 		    vhost_get_avail_flags ->
> 		      vhost_get_avail ->
> 			vhost_get_user ->
> 			  __get_user()
> i.e. 14 levels deep and the graph doesn't factorize well...
> 
> Something along the lines of "all callers of thus annotated function
> must be annotated the same way themselves, any implicit conversion
> of pointers to such functions to anything other than boolean yields
> a warning, explicit cast is allowed only with __force", perhaps?
> Then slap such annotations on vhost_{get,put,copy_to,copy_from}_user(),
> on ->handle_kick(), a force-cast in the only caller of ->handle_kick()
> and force-casts in the 3 callers in ->ioctl().
> 
> And propagate the annotations until the warnings stop, basically...
> 
> Shouldn't be terribly hard to teach sparse that kind of stuff and it
> might be useful elsewhere.  It would act as a qualifier on function
> pointers, with syntax ultimately expanding to __attribute__((something)).
> I'll need to refresh my memories of the parser, but IIRC that shouldn't
> require serious modifications.  Most of the work would be in
> evaluate_call(), just before calling evaluate_symbol_call()...
> I'll look into that; not right now, though.


Thanks, that does sound useful!

> BTW, __vhost_get_user() might be better off expanded in both callers -
> that would get their structure similar to vhost_copy_{to,from}_user(),
> especially if you expand __vhost_get_user_slow() as well.

I agree, that does sound like a good cleanup.

> Not sure I understand what's going with ->meta_iotlb[] - what are the
> lifetime rules for struct vhost_iotlb_map and what prevents the pointers
> from going stale?

It can be zeroed at any point.
We just try to call __vhost_vq_meta_reset whenever anything can go
stale.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-04  6:10       ` Jason Wang
@ 2020-06-04 14:59         ` Al Viro
  2020-06-04 16:46           ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-04 14:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, linux-kernel, netdev

On Thu, Jun 04, 2020 at 02:10:27PM +0800, Jason Wang wrote:

> > > get_user(flags, desc->flags)
> > > smp_rmb()
> > > if (flags & VALID)
> > > copy_from_user(&adesc, desc, sizeof adesc);
> > > 
> > > this would be a good candidate I think.
> > Perhaps, once we get stac/clac out of raw_copy_from_user() (coming cycle,
> > probably).  BTW, how large is the structure and how is it aligned?
> 
> 
> Each descriptor is 16 bytes, and 16 bytes aligned.

Won't it be cheaper to grap the entire thing unconditionally?  And what does
that rmb order, while we are at it - won't all coherency work in terms of
entire cachelines anyway?

Confused...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-04 10:10       ` Michael S. Tsirkin
@ 2020-06-04 15:03         ` Al Viro
  2020-06-04 16:47           ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2020-06-04 15:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Jason Wang, netdev

On Thu, Jun 04, 2020 at 06:10:23AM -0400, Michael S. Tsirkin wrote:

> 	stac()
> 	for (i = 0; i < 64; ++i) {
> 	 get_user(flags, desc[i].flags)
unsafe_get_user(), please.
> 	 smp_rmb()
> 	 if (!(flags & VALID))
> 		break;
> 	 copy_from_user(&adesc[i], desc + i, sizeof adesc[i]);
... and that would raw_copy_from_user() (or unsafe_copy_from_user(),
for wrapper that would take a label to bugger off to)
> 	}
> 	clac()

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-04 14:59         ` Al Viro
@ 2020-06-04 16:46           ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-04 16:46 UTC (permalink / raw)
  To: Al Viro; +Cc: Jason Wang, linux-kernel, netdev

On Thu, Jun 04, 2020 at 03:59:24PM +0100, Al Viro wrote:
> On Thu, Jun 04, 2020 at 02:10:27PM +0800, Jason Wang wrote:
> 
> > > > get_user(flags, desc->flags)
> > > > smp_rmb()
> > > > if (flags & VALID)
> > > > copy_from_user(&adesc, desc, sizeof adesc);
> > > > 
> > > > this would be a good candidate I think.
> > > Perhaps, once we get stac/clac out of raw_copy_from_user() (coming cycle,
> > > probably).  BTW, how large is the structure and how is it aligned?
> > 
> > 
> > Each descriptor is 16 bytes, and 16 bytes aligned.
> 
> Won't it be cheaper to grap the entire thing unconditionally?

Yes but we must read the rest of descriptor after the flags are valid.
If it's read before then the value we get might be the invalid one -
the one it had before another thread gave up control.

>  And what does
> that rmb order, while we are at it - won't all coherency work in terms of
> entire cachelines anyway?

Would be great to know that, but it's hardly guaranteed on all architectures, is it?

> Confused...



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-04 15:03         ` Al Viro
@ 2020-06-04 16:47           ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-04 16:47 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Jason Wang, netdev

On Thu, Jun 04, 2020 at 04:03:35PM +0100, Al Viro wrote:
> On Thu, Jun 04, 2020 at 06:10:23AM -0400, Michael S. Tsirkin wrote:
> 
> > 	stac()
> > 	for (i = 0; i < 64; ++i) {
> > 	 get_user(flags, desc[i].flags)
> unsafe_get_user(), please.
> > 	 smp_rmb()
> > 	 if (!(flags & VALID))
> > 		break;
> > 	 copy_from_user(&adesc[i], desc + i, sizeof adesc[i]);
> ... and that would raw_copy_from_user() (or unsafe_copy_from_user(),
> for wrapper that would take a label to bugger off to)
> > 	}
> > 	clac()

Absolutely, that's all just pseudo-code.

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-03  6:36               ` Jason Wang
@ 2020-06-04 16:49                 ` Michael S. Tsirkin
  2020-06-05 10:03                   ` Jason Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-04 16:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: Al Viro, linux-kernel, netdev

On Wed, Jun 03, 2020 at 02:36:46PM +0800, Jason Wang wrote:
> 
> On 2020/6/3 下午2:30, Michael S. Tsirkin wrote:
> > On Wed, Jun 03, 2020 at 02:23:08PM +0800, Jason Wang wrote:
> > > > BTW now I re-read it I don't understand __vhost_get_user_slow:
> > > > 
> > > > 
> > > > static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
> > > >                                             void __user *addr, unsigned int size,
> > > >                                             int type)
> > > > {
> > > >           int ret;
> > > > 
> > > >           ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
> > > >                                ARRAY_SIZE(vq->iotlb_iov),
> > > >                                VHOST_ACCESS_RO);
> > > > 
> > > > ..
> > > > }
> > > > 
> > > > how does this work? how can we cast a pointer to guest address without
> > > > adding any offsets?
> > > 
> > > I'm not sure I get you here. What kind of offset did you mean?
> > > 
> > > Thanks
> > OK so points:
> > 
> > 1. type argument seems unused. Right?
> 
> 
> Yes, we can remove that.
> 
> 
> > 2. Second argument to translate_desc is a GPA, isn't it?
> 
> 
> No, it's IOVA, this function will be called only when IOTLB is enabled.
> 
> Thanks

Right IOVA. Point stands how does it make sense to cast
a userspace pointer to an IOVA? I guess it's just
because it's talking to qemu actually, so it's abusing
the notation a bit ...

> 
> >     Here we cast a userspace address to this type. What if it
> >     matches a valid GPA by mistake?
> > 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-04 16:49                 ` Michael S. Tsirkin
@ 2020-06-05 10:03                   ` Jason Wang
  2020-06-06 20:08                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Wang @ 2020-06-05 10:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, linux-kernel, netdev


On 2020/6/5 上午12:49, Michael S. Tsirkin wrote:
>>> 2. Second argument to translate_desc is a GPA, isn't it?
>> No, it's IOVA, this function will be called only when IOTLB is enabled.
>>
>> Thanks
> Right IOVA. Point stands how does it make sense to cast
> a userspace pointer to an IOVA? I guess it's just
> because it's talking to qemu actually, so it's abusing
> the notation a bit ...
>

Yes, but the issues are:

1) VHOST_SET_VRING_ADDR is used for iotlb and !iotlb
2) so did the memory accessors

Unless we differ separate IOTLB datapath out, there's probably not easy 
to have another notation.

Thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()
  2020-06-05 10:03                   ` Jason Wang
@ 2020-06-06 20:08                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2020-06-06 20:08 UTC (permalink / raw)
  To: Jason Wang; +Cc: Al Viro, linux-kernel, netdev

On Fri, Jun 05, 2020 at 06:03:38PM +0800, Jason Wang wrote:
> 
> On 2020/6/5 上午12:49, Michael S. Tsirkin wrote:
> > > > 2. Second argument to translate_desc is a GPA, isn't it?
> > > No, it's IOVA, this function will be called only when IOTLB is enabled.
> > > 
> > > Thanks
> > Right IOVA. Point stands how does it make sense to cast
> > a userspace pointer to an IOVA? I guess it's just
> > because it's talking to qemu actually, so it's abusing
> > the notation a bit ...
> > 
> 
> Yes, but the issues are:
> 
> 1) VHOST_SET_VRING_ADDR is used for iotlb and !iotlb
> 2) so did the memory accessors
> 
> Unless we differ separate IOTLB datapath out, there's probably not easy to
> have another notation.
> 
> Thanks

With the batching/format independence rework, it might be possible to
separate that down the road. We'll see.

-- 
MST


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-06-06 20:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02  8:45 [PATCH RFC] uaccess: user_access_begin_after_access_ok() Michael S. Tsirkin
2020-06-02 10:15 ` Jason Wang
2020-06-02 16:33   ` Al Viro
2020-06-02 17:18     ` Linus Torvalds
2020-06-02 17:44       ` Al Viro
2020-06-02 17:46         ` Al Viro
2020-06-02 20:32       ` Michael S. Tsirkin
2020-06-02 20:41         ` David Laight
2020-06-02 21:58           ` Al Viro
2020-06-03  8:08             ` David Laight
2020-06-02 20:43         ` Linus Torvalds
2020-06-03  6:01           ` Michael S. Tsirkin
     [not found]             ` <CAHk-=wi3=QuD30fRq8fYYTj9WmkgeZ0VR_Sh3DQHU+nmwj-jMg@mail.gmail.com>
2020-06-03 16:59               ` Linus Torvalds
2020-06-02 16:30 ` Al Viro
2020-06-02 20:42   ` Michael S. Tsirkin
2020-06-02 22:10     ` Al Viro
2020-06-03  5:17       ` Michael S. Tsirkin
2020-06-03  1:48 ` Al Viro
2020-06-03  3:57   ` Jason Wang
2020-06-03  4:18     ` Al Viro
2020-06-03  5:18       ` Jason Wang
2020-06-03  5:46         ` Michael S. Tsirkin
2020-06-03  6:23           ` Jason Wang
2020-06-03  6:30             ` Michael S. Tsirkin
2020-06-03  6:36               ` Jason Wang
2020-06-04 16:49                 ` Michael S. Tsirkin
2020-06-05 10:03                   ` Jason Wang
2020-06-06 20:08                     ` Michael S. Tsirkin
2020-06-03  6:25       ` Michael S. Tsirkin
2020-06-03  5:29   ` Michael S. Tsirkin
2020-06-03 16:52     ` Al Viro
2020-06-04  6:10       ` Jason Wang
2020-06-04 14:59         ` Al Viro
2020-06-04 16:46           ` Michael S. Tsirkin
2020-06-04 10:10       ` Michael S. Tsirkin
2020-06-04 15:03         ` Al Viro
2020-06-04 16:47           ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).