[v3,13/13] epoll: implement epoll_create2() syscall
diff mbox series

Message ID 20190516085810.31077-14-rpenyaev@suse.de
State Superseded
Headers show
Series
  • epoll: support pollable epoll from userspace
Related show

Commit Message

Roman Penyaev May 16, 2019, 8:58 a.m. UTC
epoll_create2() is needed to accept EPOLL_USERPOLL flags
and size, i.e. this patch wires up polling from userspace.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Comments

Arnd Bergmann May 16, 2019, 10:03 a.m. UTC | #1
On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> wrote:
>
> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> and size, i.e. this patch wires up polling from userspace.

Could you add the system call to all syscall*.tbl files at the same time here?

        Arnd
Roman Penyaev May 16, 2019, 10:20 a.m. UTC | #2
On 2019-05-16 12:03, Arnd Bergmann wrote:
> On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> 
> wrote:
>> 
>> epoll_create2() is needed to accept EPOLL_USERPOLL flags
>> and size, i.e. this patch wires up polling from userspace.
> 
> Could you add the system call to all syscall*.tbl files at the same 
> time here?

For all other archs, you mean?  Sure.  But what is the rule of thumb?
Sometimes people tend to add to the most common x86 and other tables
are left untouched, but then you commit the rest, e.g.

commit 39036cd2727395c3369b1051005da74059a85317
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Feb 28 13:59:19 2019 +0100

     arch: add pidfd and io_uring syscalls everywhere



--
Roman
Arnd Bergmann May 16, 2019, 10:57 a.m. UTC | #3
On Thu, May 16, 2019 at 12:20 PM Roman Penyaev <rpenyaev@suse.de> wrote:
>
> On 2019-05-16 12:03, Arnd Bergmann wrote:
> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de>
> > wrote:
> >>
> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> >> and size, i.e. this patch wires up polling from userspace.
> >
> > Could you add the system call to all syscall*.tbl files at the same
> > time here?
>
> For all other archs, you mean?  Sure.  But what is the rule of thumb?
> Sometimes people tend to add to the most common x86 and other tables
> are left untouched, but then you commit the rest, e.g.
>
> commit 39036cd2727395c3369b1051005da74059a85317
> Author: Arnd Bergmann <arnd@arndb.de>
> Date:   Thu Feb 28 13:59:19 2019 +0100
>
>      arch: add pidfd and io_uring syscalls everywhere

We only recently introduced syscall.tbl files in a common format,
which makes it much easier to add new ones. I hope we can
do it for all architectures right away from now on.

I just noticed that the new mount API assigns six new system
calls as well, but did not use the same numbers across
architectures. I hope we can still rectify that before -rc1
and use the next available ones (428..433), then yours should
be 434 on all architectures, with the exception of arch/alpha.

      Arnd
Andrew Morton May 22, 2019, 2:33 a.m. UTC | #4
On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> wrote:

> On 2019-05-16 12:03, Arnd Bergmann wrote:
> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> 
> > wrote:
> >> 
> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> >> and size, i.e. this patch wires up polling from userspace.
> > 
> > Could you add the system call to all syscall*.tbl files at the same 
> > time here?
> 
> For all other archs, you mean?  Sure.  But what is the rule of thumb?
> Sometimes people tend to add to the most common x86 and other tables
> are left untouched, but then you commit the rest, e.g.
> 
> commit 39036cd2727395c3369b1051005da74059a85317
> Author: Arnd Bergmann <arnd@arndb.de>
> Date:   Thu Feb 28 13:59:19 2019 +0100
> 
>      arch: add pidfd and io_uring syscalls everywhere
> 

I thought the preferred approach was to wire up the architectures on
which the submitter has tested the syscall, then allow the arch
maintainers to enable the syscall independently?

And to help them in this, provide a test suite for the new syscall
under tools/testing/selftests/.

https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
will certainly help but I do think it would be better to move the test
into the kernel tree to keep it maintained and so that many people run
it in their various setups on an ongoing basis.
Roman Penyaev May 22, 2019, 9:11 a.m. UTC | #5
On 2019-05-22 04:33, Andrew Morton wrote:
> On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> 
> wrote:
> 
>> On 2019-05-16 12:03, Arnd Bergmann wrote:
>> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de>
>> > wrote:
>> >>
>> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
>> >> and size, i.e. this patch wires up polling from userspace.
>> >
>> > Could you add the system call to all syscall*.tbl files at the same
>> > time here?
>> 
>> For all other archs, you mean?  Sure.  But what is the rule of thumb?
>> Sometimes people tend to add to the most common x86 and other tables
>> are left untouched, but then you commit the rest, e.g.
>> 
>> commit 39036cd2727395c3369b1051005da74059a85317
>> Author: Arnd Bergmann <arnd@arndb.de>
>> Date:   Thu Feb 28 13:59:19 2019 +0100
>> 
>>      arch: add pidfd and io_uring syscalls everywhere
>> 
> 
> I thought the preferred approach was to wire up the architectures on
> which the submitter has tested the syscall, then allow the arch
> maintainers to enable the syscall independently?
> 
> And to help them in this, provide a test suite for the new syscall
> under tools/testing/selftests/.
> 
> https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
> will certainly help but I do think it would be better to move the test
> into the kernel tree to keep it maintained and so that many people run
> it in their various setups on an ongoing basis.

Yes, on a next iteration I will add the tool to selftests. Thanks.

--
Roman
Arnd Bergmann May 22, 2019, 11:14 a.m. UTC | #6
On Wed, May 22, 2019 at 4:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> wrote:
> > On 2019-05-16 12:03, Arnd Bergmann wrote:
> > > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de>
> > > wrote:
> > >>
> > >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> > >> and size, i.e. this patch wires up polling from userspace.
> > >
> > > Could you add the system call to all syscall*.tbl files at the same
> > > time here?
> >
> > For all other archs, you mean?  Sure.  But what is the rule of thumb?
> > Sometimes people tend to add to the most common x86 and other tables
> > are left untouched, but then you commit the rest, e.g.
> >
> > commit 39036cd2727395c3369b1051005da74059a85317
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Thu Feb 28 13:59:19 2019 +0100
> >
> >      arch: add pidfd and io_uring syscalls everywhere
> >
>
> I thought the preferred approach was to wire up the architectures on
> which the submitter has tested the syscall, then allow the arch
> maintainers to enable the syscall independently?

I'm hoping to change that practice now, as it has not worked well
in the past:

- half the architectures now use asm-generic/unistd.h, so they are
  already wired up at the same time, regardless of testing
- in the other half, not adding them at the same time actually
  made it harder to test, as it was significantly harder to figure
  out how to build a modified kernel for a given architecture
  than to run the test case
- Not having all architectures add a new call at the same time caused
  the architectures to get out of sync when some got added and others
  did not. Now that we use the same numbers across all architectures,
  that would be even more confusing.

My plan for the long run is to only have one file to which new
system calls get added in the future.

> And to help them in this, provide a test suite for the new syscall
> under tools/testing/selftests/.
>
> https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
> will certainly help but I do think it would be better to move the test
> into the kernel tree to keep it maintained and so that many people run
> it in their various setups on an ongoing basis.

No disagreement on that.

      Arnd
Andrew Morton May 22, 2019, 6:36 p.m. UTC | #7
On Wed, 22 May 2019 13:14:41 +0200 Arnd Bergmann <arnd@arndb.de> wrote:

> > I thought the preferred approach was to wire up the architectures on
> > which the submitter has tested the syscall, then allow the arch
> > maintainers to enable the syscall independently?
> 
> I'm hoping to change that practice now, as it has not worked well
> in the past:
> 
> - half the architectures now use asm-generic/unistd.h, so they are
>   already wired up at the same time, regardless of testing
> - in the other half, not adding them at the same time actually
>   made it harder to test, as it was significantly harder to figure
>   out how to build a modified kernel for a given architecture
>   than to run the test case
> - Not having all architectures add a new call at the same time caused
>   the architectures to get out of sync when some got added and others
>   did not. Now that we use the same numbers across all architectures,
>   that would be even more confusing.
>
> My plan for the long run is to only have one file to which new
> system calls get added in the future.

Fair enough.  We're adding code to architectures without having tested
it on those architectures but we do that all the time anyway - I guess
there's not a lot of point in special-casing new syscalls.

Patch
diff mbox series

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cd5f982b1e5..f0d271875f4e 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@ 
 425	i386	io_uring_setup		sys_io_uring_setup		__ia32_sys_io_uring_setup
 426	i386	io_uring_enter		sys_io_uring_enter		__ia32_sys_io_uring_enter
 427	i386	io_uring_register	sys_io_uring_register		__ia32_sys_io_uring_register
+428	i386	epoll_create2		sys_epoll_create2		__ia32_sys_epoll_create2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 64ca0d06259a..5ee9bb31a552 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@ 
 425	common	io_uring_setup		__x64_sys_io_uring_setup
 426	common	io_uring_enter		__x64_sys_io_uring_enter
 427	common	io_uring_register	__x64_sys_io_uring_register
+428	common	epoll_create2		__x64_sys_epoll_create2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9ff666ce7cb5..b44c3a0c4ad0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2614,6 +2614,14 @@  static int do_epoll_create(int flags, size_t size)
 	return error;
 }
 
+SYSCALL_DEFINE2(epoll_create2, int, flags, size_t, size)
+{
+	if (size == 0)
+		return -EINVAL;
+
+	return do_epoll_create(flags, size);
+}
+
 SYSCALL_DEFINE1(epoll_create1, int, flags)
 {
 	return do_epoll_create(flags, 0);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..5049b0d16949 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -358,6 +358,7 @@  asmlinkage long sys_eventfd2(unsigned int count, int flags);
 
 /* fs/eventpoll.c */
 asmlinkage long sys_epoll_create1(int flags);
+asmlinkage long sys_epoll_create2(int flags, size_t size);
 asmlinkage long sys_epoll_ctl(int epfd, int op, int fd,
 				struct epoll_event __user *event);
 asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index dee7292e1df6..fccfaab366ee 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -832,9 +832,11 @@  __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
 __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
 #define __NR_io_uring_register 427
 __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_epoll_create2 428
+__SYSCALL(__NR_epoll_create2, sys_epoll_create2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 428
+#define __NR_syscalls 429
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 4d9ae5ea6caf..665908b8a326 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -65,6 +65,7 @@  COND_SYSCALL(eventfd2);
 
 /* fs/eventfd.c */
 COND_SYSCALL(epoll_create1);
+COND_SYSCALL(epoll_create2);
 COND_SYSCALL(epoll_ctl);
 COND_SYSCALL(epoll_pwait);
 COND_SYSCALL_COMPAT(epoll_pwait);