linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
       [not found]                   ` <20200303132150.GD2596@hirez.programming.kicks-ass.net>
@ 2020-03-03 13:47                     ` Florian Weimer
  2020-03-03 15:01                       ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2020-03-03 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pierre-Loup A. Griffais, Thomas Gleixner, André Almeida,
	linux-kernel, kernel, krisman, shuah, linux-kselftest, rostedt,
	ryao, dvhart, mingo, z.figura12, steven, steven, malteskarupke,
	carlos, adhemerval.zanella, libc-alpha, linux-api

(added missing Cc: for linux-api, better late than never I guess)

* Peter Zijlstra:

>> What's the actual type of *uaddr?  Does it vary by size (which I assume
>> is in bits?)?  Are there alignment constraints?
>
> Yeah, u8, u16, u32, u64 depending on the size specified in flags.
> Naturally aligned.

So 4-byte alignment for u32 and 8-byte alignment for u64 on all
architectures?

(I really want to nail this down, sorry.)

>> These system calls seemed to be type-polymorphic still, which is
>> problematic for defining a really nice C interface.  I would really like
>> to have a strongly typed interface for this, with a nice struct futex
>> wrapper type (even if it means that we need four of them).
>
> You mean like: futex_wait1(u8 *,...) futex_wait2(u16 *,...)
> futex_wait4(u32 *,...) etc.. ?
>
> I suppose making it 16 or so syscalls (more if we want WAKE_OP or
> requeue across size) is a bit daft, so yeah, sucks.

We could abstract this in the userspace wrapper.  It would help to have
an explicit size argument, or at least an extension-safe way to pass
this information to the kernel.  I guess if everything else fails, we
could use the flags bits for that, as long as it is clear that the
interface will only support these six types (four without NUMA, two with
NUMA).

>> Will all architectures support all sizes?  If not, how do we probe which
>> size/flags combinations are supported?
>
> Up to the native word size (long), IOW ILP32 will not support u64.

Many ILP32 targets could support atomic accesses on 8-byte storage
units, as long as there is 8-byte alignment.  But given how common
4-byte-align u64 is on 32-bit, maybe that's not such a good idea.

> Overlapping futexes are expressly forbidden, that is:
>
> {
> 	u32 var;
> 	void *addr = &var;
> }
>
> P0()
> {
> 	futex_wait4(addr,...);
> }
>
> P1()
> {
> 	futex_wait1(addr+1,...);
> }
>
> Will have one of them return something bad.

That makes sense.  A strongly typed interface would also reflect that in
the types.

>> > For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
>> > node_id', with the following semantics:
>> >
>> >  - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
>> >    directly used to index into per-node hash-tables. When -1, it is
>> >    replaced by the current node_id and an smp_mb() is issued before we
>> >    load and compare the @uaddr.
>> >
>> >  - on WAKE/REQUEUE, it is an immediate index.
>> 
>> Does this mean the first waiter determines the NUMA index, and all
>> future waiters use the same chain even if they are on different nodes?
>
> Every new waiter could (re)set node_id, after all, when its not actually
> waiting, nobody cares what's in that field.
>
>> I think documenting this as a node index would be a mistake.  It could
>> be an arbitrary hint for locating the corresponding kernel data
>> structures.
>
> Nah, it allows explicit placement, after all, we have set_mempolicy()
> and sched_setaffinity() and all the other NUMA crud so that programs
> that think they know what they're doing, can do explicit placement.

But I'm not sure if it makes sense to read the node ID from the
neighboring value of a futex used in this way.  Or do you think that
userspace might set the node ID to help the kernel implementation, and
not just relying on it to be set by the kernel after initializing it to
-1?

Conversely, even for non-NUMA systems, a lookup hint that allows to
reduce in-kernel futex contention might be helpful.  If it's documented
to be the NUME node ID, that wouldn't be possible.

>> > Any invalid value with result in EINVAL.
>> 
>> Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the
>> need to maintain alignment and avoid padding.
>
> Yes, but it works, unlike uaddr+4 :-) Also, 1 and 2 byte futexes and
> NUMA_FLAG are incompatible due to this, but I feel short futexes and
> NUMA don't really make sense anyway, the only reason to use a short
> futex is to save space, so you don't want another 4 bytes for numa on
> top of that anyway.

I think it would be much easier to make the NUMA hint the same size of
the futex, so 4 and 8 bytes.  It could also make sense to require 8 and
16 byte alignment, to permit different implementation choices in the
future.

So we'd have:

struct futex8  { u8 value; };
struct futex16 { u16 value __attribute__ ((aligned (2))); };
struct futex32 { u32 value __attribute__ ((aligned (4))); };
struct futex64 { u64 value __attribute__ ((aligned (8))); };
struct futex32_numa { u32 value __attribute__ ((aligned (8))); u32 hint; };
struct futex64_numa { u64 value __attribute__ ((aligned (16))); u64 hint; };

Thanks,
Florian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
  2020-03-03 13:47                     ` 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes] Florian Weimer
@ 2020-03-03 15:01                       ` Peter Zijlstra
  2020-03-05 16:14                         ` André Almeida
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-03-03 15:01 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pierre-Loup A. Griffais, Thomas Gleixner, André Almeida,
	linux-kernel, kernel, krisman, shuah, linux-kselftest, rostedt,
	ryao, dvhart, mingo, z.figura12, steven, steven, malteskarupke,
	carlos, adhemerval.zanella, libc-alpha, linux-api

On Tue, Mar 03, 2020 at 02:47:11PM +0100, Florian Weimer wrote:
> (added missing Cc: for linux-api, better late than never I guess)
> 
> * Peter Zijlstra:
> 
> >> What's the actual type of *uaddr?  Does it vary by size (which I assume
> >> is in bits?)?  Are there alignment constraints?
> >
> > Yeah, u8, u16, u32, u64 depending on the size specified in flags.
> > Naturally aligned.
> 
> So 4-byte alignment for u32 and 8-byte alignment for u64 on all
> architectures?
> 
> (I really want to nail this down, sorry.)

Exactly so.

> >> These system calls seemed to be type-polymorphic still, which is
> >> problematic for defining a really nice C interface.  I would really like
> >> to have a strongly typed interface for this, with a nice struct futex
> >> wrapper type (even if it means that we need four of them).
> >
> > You mean like: futex_wait1(u8 *,...) futex_wait2(u16 *,...)
> > futex_wait4(u32 *,...) etc.. ?
> >
> > I suppose making it 16 or so syscalls (more if we want WAKE_OP or
> > requeue across size) is a bit daft, so yeah, sucks.
> 
> We could abstract this in the userspace wrapper.  It would help to have
> an explicit size argument, or at least an extension-safe way to pass
> this information to the kernel.  I guess if everything else fails, we
> could use the flags bits for that, as long as it is clear that the
> interface will only support these six types (four without NUMA, two with
> NUMA).

The problem is the cmp_requeue syscall, that already has 6 arguments. I
don't see where else than the flags field we can stuff this :/

> >> Will all architectures support all sizes?  If not, how do we probe which
> >> size/flags combinations are supported?
> >
> > Up to the native word size (long), IOW ILP32 will not support u64.
> 
> Many ILP32 targets could support atomic accesses on 8-byte storage
> units, as long as there is 8-byte alignment.  But given how common
> 4-byte-align u64 is on 32-bit, maybe that's not such a good idea.

'Many' might be over-stating it, but yeah, there are definitely a bunch
of them that can do it (x86, armv7-lpae, arc, are the ones I know from
memory). The problem is that the syscalls then look like:

  sys_futex_wait(void *uaddr, u64 val, unsigned long flags, ktime_t *timo);
  struct futex_wait {
	  void *uaddr;
	  u64 val;
	  u64 flags;
  };
  sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
		  u64 flags, ktime_t *timo);
  sys_futex_wake(void *uaddr, unsigned int nr, u64 flags);
  sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
		  unsigned int nr_requeue, u64 cmpval, unsigned long flags);

And that makes 7 arguments for cmp_requeue, which can't be. Maybe we if
combine nr_wake and nr_requeue in one as 2 u16... ?

And then we need to go detector if the platform supports it or not..

> >> > For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
> >> > node_id', with the following semantics:
> >> >
> >> >  - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
> >> >    directly used to index into per-node hash-tables. When -1, it is
> >> >    replaced by the current node_id and an smp_mb() is issued before we
> >> >    load and compare the @uaddr.
> >> >
> >> >  - on WAKE/REQUEUE, it is an immediate index.
> >> 
> >> Does this mean the first waiter determines the NUMA index, and all
> >> future waiters use the same chain even if they are on different nodes?
> >
> > Every new waiter could (re)set node_id, after all, when its not actually
> > waiting, nobody cares what's in that field.
> >
> >> I think documenting this as a node index would be a mistake.  It could
> >> be an arbitrary hint for locating the corresponding kernel data
> >> structures.
> >
> > Nah, it allows explicit placement, after all, we have set_mempolicy()
> > and sched_setaffinity() and all the other NUMA crud so that programs
> > that think they know what they're doing, can do explicit placement.
> 
> But I'm not sure if it makes sense to read the node ID from the
> neighboring value of a futex used in this way.  Or do you think that
> userspace might set the node ID to help the kernel implementation, and
> not just relying on it to be set by the kernel after initializing it to
> -1?

I'm fairly sure that there will be a number of users that will
definitely want to do that; this would be the same people that use
set_mempolicy() and sched_setaffinity() and do all the other numa
binding crud.

HPC, certain database vendors, possibly RT and KVM users.

> Conversely, even for non-NUMA systems, a lookup hint that allows to
> reduce in-kernel futex contention might be helpful.  If it's documented
> to be the NUME node ID, that wouldn't be possible.

Do we really have significant contention on small systems? And how would
increasing the hash-table not solve that?

> >> > Any invalid value with result in EINVAL.
> >> 
> >> Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the
> >> need to maintain alignment and avoid padding.
> >
> > Yes, but it works, unlike uaddr+4 :-) Also, 1 and 2 byte futexes and
> > NUMA_FLAG are incompatible due to this, but I feel short futexes and
> > NUMA don't really make sense anyway, the only reason to use a short
> > futex is to save space, so you don't want another 4 bytes for numa on
> > top of that anyway.
> 
> I think it would be much easier to make the NUMA hint the same size of
> the futex, so 4 and 8 bytes.  It could also make sense to require 8 and
> 16 byte alignment, to permit different implementation choices in the
> future.
> 
> So we'd have:
> 
> struct futex8  { u8 value; };
> struct futex16 { u16 value __attribute__ ((aligned (2))); };
> struct futex32 { u32 value __attribute__ ((aligned (4))); };
> struct futex64 { u64 value __attribute__ ((aligned (8))); };
> struct futex32_numa { u32 value __attribute__ ((aligned (8))); u32 hint; };
> struct futex64_numa { u64 value __attribute__ ((aligned (16))); u64 hint; };

That works, I suppose... although I'm sure someone will curse us for it
when trying to pack some extra things in his cacheline.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
  2020-03-03 15:01                       ` Peter Zijlstra
@ 2020-03-05 16:14                         ` André Almeida
  2020-03-05 16:25                           ` Florian Weimer
  2020-03-05 18:51                           ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: André Almeida @ 2020-03-05 16:14 UTC (permalink / raw)
  To: Peter Zijlstra, Florian Weimer
  Cc: Pierre-Loup A. Griffais, Thomas Gleixner, linux-kernel, kernel,
	krisman, shuah, linux-kselftest, rostedt, ryao, dvhart, mingo,
	z.figura12, steven, steven, malteskarupke, carlos,
	adhemerval.zanella, libc-alpha, linux-api

On 3/3/20 12:01 PM, Peter Zijlstra wrote:
> On Tue, Mar 03, 2020 at 02:47:11PM +0100, Florian Weimer wrote:
>> (added missing Cc: for linux-api, better late than never I guess)
>>
>> * Peter Zijlstra:
>>
>>>> What's the actual type of *uaddr?  Does it vary by size (which I assume
>>>> is in bits?)?  Are there alignment constraints?
>>>
>>> Yeah, u8, u16, u32, u64 depending on the size specified in flags.
>>> Naturally aligned.
>>
>> So 4-byte alignment for u32 and 8-byte alignment for u64 on all
>> architectures?
>>
>> (I really want to nail this down, sorry.)
> 
> Exactly so.
> 
>>>> These system calls seemed to be type-polymorphic still, which is
>>>> problematic for defining a really nice C interface.  I would really like
>>>> to have a strongly typed interface for this, with a nice struct futex
>>>> wrapper type (even if it means that we need four of them).
>>>
>>> You mean like: futex_wait1(u8 *,...) futex_wait2(u16 *,...)
>>> futex_wait4(u32 *,...) etc.. ?
>>>
>>> I suppose making it 16 or so syscalls (more if we want WAKE_OP or
>>> requeue across size) is a bit daft, so yeah, sucks.
>>
>> We could abstract this in the userspace wrapper.  It would help to have
>> an explicit size argument, or at least an extension-safe way to pass
>> this information to the kernel.  I guess if everything else fails, we
>> could use the flags bits for that, as long as it is clear that the
>> interface will only support these six types (four without NUMA, two with
>> NUMA).
> 
> The problem is the cmp_requeue syscall, that already has 6 arguments. I
> don't see where else than the flags field we can stuff this :/
> 
>>>> Will all architectures support all sizes?  If not, how do we probe which
>>>> size/flags combinations are supported?
>>>
>>> Up to the native word size (long), IOW ILP32 will not support u64.
>>
>> Many ILP32 targets could support atomic accesses on 8-byte storage
>> units, as long as there is 8-byte alignment.  But given how common
>> 4-byte-align u64 is on 32-bit, maybe that's not such a good idea.
> 
> 'Many' might be over-stating it, but yeah, there are definitely a bunch
> of them that can do it (x86, armv7-lpae, arc, are the ones I know from
> memory). The problem is that the syscalls then look like:
> 
>   sys_futex_wait(void *uaddr, u64 val, unsigned long flags, ktime_t *timo);
>   struct futex_wait {
> 	  void *uaddr;
> 	  u64 val;
> 	  u64 flags;
>   };
>   sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
> 		  u64 flags, ktime_t *timo);
>   sys_futex_wake(void *uaddr, unsigned int nr, u64 flags);
>   sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
> 		  unsigned int nr_requeue, u64 cmpval, unsigned long flags);
> 
> And that makes 7 arguments for cmp_requeue, which can't be. Maybe we if
> combine nr_wake and nr_requeue in one as 2 u16... ?
> 
> And then we need to go detector if the platform supports it or not..
> 

Thanks everyone for the feedback around our mechanism. Are the
performance benefits of implementing a syscall to wait on a single futex
significant enough to maintain it instead of just using
`sys_futex_waitv()` with `nr_waiters = 1`? If we join both cases in a
single interface, we may even add a new member for NUMA hint in `struct
futex_wait`.

>>>>> For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
>>>>> node_id', with the following semantics:
>>>>>
>>>>>  - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
>>>>>    directly used to index into per-node hash-tables. When -1, it is
>>>>>    replaced by the current node_id and an smp_mb() is issued before we
>>>>>    load and compare the @uaddr.
>>>>>
>>>>>  - on WAKE/REQUEUE, it is an immediate index.
>>>>
>>>> Does this mean the first waiter determines the NUMA index, and all
>>>> future waiters use the same chain even if they are on different nodes?
>>>
>>> Every new waiter could (re)set node_id, after all, when its not actually
>>> waiting, nobody cares what's in that field.
>>>
>>>> I think documenting this as a node index would be a mistake.  It could
>>>> be an arbitrary hint for locating the corresponding kernel data
>>>> structures.
>>>
>>> Nah, it allows explicit placement, after all, we have set_mempolicy()
>>> and sched_setaffinity() and all the other NUMA crud so that programs
>>> that think they know what they're doing, can do explicit placement.
>>
>> But I'm not sure if it makes sense to read the node ID from the
>> neighboring value of a futex used in this way.  Or do you think that
>> userspace might set the node ID to help the kernel implementation, and
>> not just relying on it to be set by the kernel after initializing it to
>> -1?
> 
> I'm fairly sure that there will be a number of users that will
> definitely want to do that; this would be the same people that use
> set_mempolicy() and sched_setaffinity() and do all the other numa
> binding crud.
> 
> HPC, certain database vendors, possibly RT and KVM users.
> 
>> Conversely, even for non-NUMA systems, a lookup hint that allows to
>> reduce in-kernel futex contention might be helpful.  If it's documented
>> to be the NUME node ID, that wouldn't be possible.
> 
> Do we really have significant contention on small systems? And how would
> increasing the hash-table not solve that?
> 
>>>>> Any invalid value with result in EINVAL.
>>>>
>>>> Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the
>>>> need to maintain alignment and avoid padding.
>>>
>>> Yes, but it works, unlike uaddr+4 :-) Also, 1 and 2 byte futexes and
>>> NUMA_FLAG are incompatible due to this, but I feel short futexes and
>>> NUMA don't really make sense anyway, the only reason to use a short
>>> futex is to save space, so you don't want another 4 bytes for numa on
>>> top of that anyway.
>>
>> I think it would be much easier to make the NUMA hint the same size of
>> the futex, so 4 and 8 bytes.  It could also make sense to require 8 and
>> 16 byte alignment, to permit different implementation choices in the
>> future.
>>
>> So we'd have:
>>
>> struct futex8  { u8 value; };
>> struct futex16 { u16 value __attribute__ ((aligned (2))); };
>> struct futex32 { u32 value __attribute__ ((aligned (4))); };
>> struct futex64 { u64 value __attribute__ ((aligned (8))); };
>> struct futex32_numa { u32 value __attribute__ ((aligned (8))); u32 hint; };
>> struct futex64_numa { u64 value __attribute__ ((aligned (16))); u64 hint; };
> 
> That works, I suppose... although I'm sure someone will curse us for it
> when trying to pack some extra things in his cacheline.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
  2020-03-05 16:14                         ` André Almeida
@ 2020-03-05 16:25                           ` Florian Weimer
  2020-03-05 18:51                           ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2020-03-05 16:25 UTC (permalink / raw)
  To: André Almeida
  Cc: Peter Zijlstra, Pierre-Loup A. Griffais, Thomas Gleixner,
	linux-kernel, kernel, krisman, shuah, linux-kselftest, rostedt,
	ryao, dvhart, mingo, z.figura12, steven, steven, malteskarupke,
	carlos, adhemerval.zanella, libc-alpha, linux-api

* André Almeida:

> Thanks everyone for the feedback around our mechanism. Are the
> performance benefits of implementing a syscall to wait on a single futex
> significant enough to maintain it instead of just using
> `sys_futex_waitv()` with `nr_waiters = 1`? If we join both cases in a
> single interface, we may even add a new member for NUMA hint in `struct
> futex_wait`.

Some seccomp user might want to verify the address, and that's easier if
it's in an argument.  But that's just a rather minor aspect.

Do you propose to drop the storage requirement for the NUMA hint
next to the futex completely?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
  2020-03-05 16:14                         ` André Almeida
  2020-03-05 16:25                           ` Florian Weimer
@ 2020-03-05 18:51                           ` Peter Zijlstra
  2020-03-06 16:57                             ` David Laight
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-03-05 18:51 UTC (permalink / raw)
  To: André Almeida
  Cc: Florian Weimer, Pierre-Loup A. Griffais, Thomas Gleixner,
	linux-kernel, kernel, krisman, shuah, linux-kselftest, rostedt,
	ryao, dvhart, mingo, z.figura12, steven, steven, malteskarupke,
	carlos, adhemerval.zanella, libc-alpha, linux-api

On Thu, Mar 05, 2020 at 01:14:17PM -0300, André Almeida wrote:

> >   sys_futex_wait(void *uaddr, u64 val, unsigned long flags, ktime_t *timo);
> >   struct futex_wait {
> > 	  void *uaddr;
> > 	  u64 val;
> > 	  u64 flags;
> >   };
> >   sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
> > 		  u64 flags, ktime_t *timo);
> >   sys_futex_wake(void *uaddr, unsigned int nr, u64 flags);
> >   sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
> > 		  unsigned int nr_requeue, u64 cmpval, unsigned long flags);
> > 
> > And that makes 7 arguments for cmp_requeue, which can't be. Maybe we if
> > combine nr_wake and nr_requeue in one as 2 u16... ?
> > 
> > And then we need to go detector if the platform supports it or not..
> > 
> 
> Thanks everyone for the feedback around our mechanism. Are the
> performance benefits of implementing a syscall to wait on a single futex
> significant enough to maintain it instead of just using
> `sys_futex_waitv()` with `nr_waiters = 1`? If we join both cases in a
> single interface, we may even add a new member for NUMA hint in `struct
> futex_wait`.

My consideration was that avoiding the get_user/copy_from_user might
become measurable on !PTI systems with SMAP.

But someone would have to build it and measure it before we can be sure
of course.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
  2020-03-05 18:51                           ` Peter Zijlstra
@ 2020-03-06 16:57                             ` David Laight
  0 siblings, 0 replies; 6+ messages in thread
From: David Laight @ 2020-03-06 16:57 UTC (permalink / raw)
  To: 'Peter Zijlstra', André Almeida
  Cc: Florian Weimer, Pierre-Loup A. Griffais, Thomas Gleixner,
	linux-kernel, kernel, krisman, shuah, linux-kselftest, rostedt,
	ryao, dvhart, mingo, z.figura12, steven, steven, malteskarupke,
	carlos, adhemerval.zanella, libc-alpha, linux-api

From: Peter Zijlstra
> Sent: 05 March 2020 18:52
+> On Thu, Mar 05, 2020 at 01:14:17PM -0300, André Almeida wrote:
> 
> > >   sys_futex_wait(void *uaddr, u64 val, unsigned long flags, ktime_t *timo);
> > >   struct futex_wait {
> > > 	  void *uaddr;
> > > 	  u64 val;
> > > 	  u64 flags;
> > >   };
> > >   sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
> > > 		  u64 flags, ktime_t *timo);
> > >   sys_futex_wake(void *uaddr, unsigned int nr, u64 flags);
> > >   sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
> > > 		  unsigned int nr_requeue, u64 cmpval, unsigned long flags);
> > >
> > > And that makes 7 arguments for cmp_requeue, which can't be. Maybe we if
> > > combine nr_wake and nr_requeue in one as 2 u16... ?
> > >
> > > And then we need to go detector if the platform supports it or not..
> > >
> >
> > Thanks everyone for the feedback around our mechanism. Are the
> > performance benefits of implementing a syscall to wait on a single futex
> > significant enough to maintain it instead of just using
> > `sys_futex_waitv()` with `nr_waiters = 1`? If we join both cases in a
> > single interface, we may even add a new member for NUMA hint in `struct
> > futex_wait`.
> 
> My consideration was that avoiding the get_user/copy_from_user might
> become measurable on !PTI systems with SMAP.
> 
> But someone would have to build it and measure it before we can be sure
> of course.

An extra copy_from_user is likely to be noticable.
It certainly makes recvmsg() slower than recv().
Especially if the hardended usercopy crap gets involved.

	David
 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-06 16:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200213214525.183689-1-andrealmeid@collabora.com>
     [not found] ` <20200213214525.183689-2-andrealmeid@collabora.com>
     [not found]   ` <20200228190717.GM18400@hirez.programming.kicks-ass.net>
     [not found]     ` <20200228194958.GO14946@hirez.programming.kicks-ass.net>
     [not found]       ` <87tv3aflqm.fsf@nanos.tec.linutronix.de>
     [not found]         ` <967d5047-2cb6-d6d8-6107-edb99a4c9696@valvesoftware.com>
     [not found]           ` <87o8thg031.fsf@nanos.tec.linutronix.de>
     [not found]             ` <beb82055-96fa-cb64-a06e-9d7a0946587b@valvesoftware.com>
     [not found]               ` <20200303120050.GC2596@hirez.programming.kicks-ass.net>
     [not found]                 ` <87pndth9ur.fsf@oldenburg2.str.redhat.com>
     [not found]                   ` <20200303132150.GD2596@hirez.programming.kicks-ass.net>
2020-03-03 13:47                     ` 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes] Florian Weimer
2020-03-03 15:01                       ` Peter Zijlstra
2020-03-05 16:14                         ` André Almeida
2020-03-05 16:25                           ` Florian Weimer
2020-03-05 18:51                           ` Peter Zijlstra
2020-03-06 16:57                             ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).