All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/1] RFC: Allow busy poll to be set per epoll instance
@ 2024-01-20  0:42 Joe Damato
  2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Damato @ 2024-01-20  0:42 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: chuck.lever, jlayton, linux-api, edumazet, davem,
	alexander.duyck, sridhar.samudrala, kuba, Joe Damato

Greetings:

TL;DR This RFC builds on bf3b9f6372c4 ("epoll: Add busy poll support to
epoll with socket fds.") by adding two fcntl knobs for enabling
epoll-based busy poll on a per epoll basis instead of the current
system-wide sysctl. This change makes epoll-based busy poll much more
usable.

I have another implementation which uses epoll_ctl and adds a new
EPOLL_CTL_BUSY_POLL_TIMEOUT knob instead of using fcntl, but fcntl
seemed to be slightly cleaner.

I am happy to use whatever interface is desired by the kernel community
in order to allow for per-epoll instance busy poll to be supported.

Longer explanation:

Presently epoll has support for a very useful form of busy poll based on
the incoming NAPI ID (see also: SO_INCOMING_NAPI_ID [1]).

This form of busy poll allows epoll_wait to drive NAPI packet processing
which can allow for user applications to decide when it is appropriate
to process network data vs being pre-empted during less optimal times.

For example, a network application might process an entire datagram and
get better use of L2/L3 cache by deferring packet processing until all
events are processed and epoll_wait is called.

The documentation available on this is, IMHO, a bit confusing so please
allow me to explain how to use this kernel feature.

In order to use this feature, user applications must do three things:

1. Ensure each application thread has its own epoll instance mapping
1-to-1 with NIC RX queues. An n-tuple filter would likely be used to
direct connections with specific dest ports to these queues.

2. Ensure that all incoming connections added to an epoll instance
have the same NAPI ID. This can be done with a BPF filter when
SO_REUSEPORT is used or getsockopt + SO_INCOMING_NAPI_ID when a single
accept thread is used which dispatches incoming connections to threads.

3. Lastly, busy poll must be enabled via a sysctl
(/proc/sys/net/core/busy_poll).

The unfortunate part about step 3 above is that this enables busy poll
system-wide which affects all user applications on the system.

It is worth noting that setting /proc/sys/net/core/busy_poll has
different effects on different system calls:

- poll and select based applications would not be affected as busy
  polling is only enabled when this sysctl is set *and* sockets have
  SO_BUSY_POLL set.
- All epoll based applications on the system, however, will busy poll
  when this sysctl is set.

If the user wants to run one low latency epoll-based server application with
epoll-based busy poll, but would like to run the rest of the applications on
the system (which may also use epoll) without busy poll, this
system-wide sysctl presents a significant problem.

This change preserves the system-wide sysctl, but adds a mechanism (via
fcntl) to enable or disable busy poll for epoll instances as needed.

This change is extremely useful for low latency network applications
that need to run side-by-side with other network applications where
latency is not a major concern.

As mentioned above, the epoll_ctl approach I have (which works) seemed
less clean than the fcntl approach in this RFC. I would be happy to use
whatever interface the kernel maintainers prefer to make epoll based busy
poll more convenient for user applications to use.

Thanks,
Joe

[1]: https://lore.kernel.org/lkml/20170324170836.15226.87178.stgit@localhost.localdomain/

Joe Damato (1):
  eventpoll: support busy poll per epoll instance

 fs/eventpoll.c                   | 71 ++++++++++++++++++++++++++++++--
 fs/fcntl.c                       |  5 +++
 include/linux/eventpoll.h        |  2 +
 include/uapi/linux/fcntl.h       |  6 +++
 tools/include/uapi/linux/fcntl.h |  6 +++
 tools/perf/trace/beauty/fcntl.c  |  3 +-
 6 files changed, 88 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-20  0:42 [RFC 0/1] RFC: Allow busy poll to be set per epoll instance Joe Damato
@ 2024-01-20  0:42 ` Joe Damato
  2024-01-21 14:21   ` kernel test robot
                     ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Joe Damato @ 2024-01-20  0:42 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: chuck.lever, jlayton, linux-api, edumazet, davem,
	alexander.duyck, sridhar.samudrala, kuba, Joe Damato

Add F_EPOLL_{S,G}ET_BUSY_POLL_USECS to allow setting a busy poll timeout
per epoll instance so that individual applications can enable (or
disable) epoll based busy poll as needed.

Prior to this change, epoll-based busy poll could only be enabled
system-wide, which limits the usefulness of busy poll.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 fs/eventpoll.c                   | 71 ++++++++++++++++++++++++++++++--
 fs/fcntl.c                       |  5 +++
 include/linux/eventpoll.h        |  2 +
 include/uapi/linux/fcntl.h       |  6 +++
 tools/include/uapi/linux/fcntl.h |  6 +++
 tools/perf/trace/beauty/fcntl.c  |  3 +-
 6 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3534d36a1474..a8087c2b47ef 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -227,6 +227,8 @@ struct eventpoll {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	/* used to track busy poll napi_id */
 	unsigned int napi_id;
+	/* busy poll timeout */
+	u64 busy_poll_usecs;
 #endif
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -386,12 +388,39 @@ static inline int ep_events_available(struct eventpoll *ep)
 		READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
 }
 
+/**
+ * busy_loop_ep_timeout - check if busy poll has timed out. The timeout value
+ * from the epoll instance ep is preferred, but if it is not set fallback to
+ * the system-wide global via busy_loop_timeout.
+ */
+static inline bool busy_loop_ep_timeout(unsigned long start_time, struct eventpoll *ep)
+{
 #ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned long bp_usec = READ_ONCE(ep->busy_poll_usecs);
+
+	if (bp_usec) {
+		unsigned long end_time = start_time + bp_usec;
+		unsigned long now = busy_loop_current_time();
+
+		return time_after(now, end_time);
+	} else {
+		return busy_loop_timeout(start_time);
+	}
+#endif
+	return true;
+}
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static bool ep_busy_loop_on(struct eventpoll *ep)
+{
+	return !!ep->busy_poll_usecs ^ net_busy_loop_on();
+}
+
 static bool ep_busy_loop_end(void *p, unsigned long start_time)
 {
 	struct eventpoll *ep = p;
 
-	return ep_events_available(ep) || busy_loop_timeout(start_time);
+	return ep_events_available(ep) || busy_loop_ep_timeout(start_time, ep);
 }
 
 /*
@@ -404,7 +433,7 @@ static bool ep_busy_loop(struct eventpoll *ep, int nonblock)
 {
 	unsigned int napi_id = READ_ONCE(ep->napi_id);
 
-	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
+	if ((napi_id >= MIN_NAPI_ID) && ep_busy_loop_on(ep)) {
 		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
 			       BUSY_POLL_BUDGET);
 		if (ep_events_available(ep))
@@ -430,7 +459,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 	struct socket *sock;
 	struct sock *sk;
 
-	if (!net_busy_loop_on())
+	ep = epi->ep;
+	if (!ep_busy_loop_on(ep))
 		return;
 
 	sock = sock_from_file(epi->ffd.file);
@@ -442,7 +472,6 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 		return;
 
 	napi_id = READ_ONCE(sk->sk_napi_id);
-	ep = epi->ep;
 
 	/* Non-NAPI IDs can be rejected
 	 *	or
@@ -466,6 +495,10 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 {
 }
 
+static inline bool ep_busy_loop_on(struct eventpoll *ep)
+{
+	return false;
+}
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
 /*
@@ -933,6 +966,33 @@ static const struct file_operations eventpoll_fops = {
 	.llseek		= noop_llseek,
 };
 
+unsigned long eventpoll_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	int ret;
+	struct eventpoll *ep;
+
+	if (!is_file_epoll(file))
+		return -EINVAL;
+
+	ep = file->private_data;
+
+	switch (cmd) {
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	case F_EPOLL_SET_BUSY_POLL_USECS:
+		ret = ep->busy_poll_usecs = arg;
+		break;
+	case F_EPOLL_GET_BUSY_POLL_USECS:
+		ret = ep->busy_poll_usecs;
+		break;
+#endif
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 /*
  * This is called from eventpoll_release() to unlink files from the eventpoll
  * interface. We need to have this facility to cleanup correctly files that are
@@ -2058,6 +2118,9 @@ static int do_epoll_create(int flags)
 		error = PTR_ERR(file);
 		goto out_free_fd;
 	}
+#ifndef CONFIG_NET_RX_BUSY_POLL
+	ep->busy_poll_usecs = 0;
+#endif
 	ep->file = file;
 	fd_install(fd, file);
 	return fd;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index c80a6acad742..f232e7c2eb9d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -9,6 +9,7 @@
 #include <linux/init.h>
 #include <linux/mm.h>
 #include <linux/sched/task.h>
+#include <linux/eventpoll.h>
 #include <linux/fs.h>
 #include <linux/filelock.h>
 #include <linux/file.h>
@@ -419,6 +420,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_SET_RW_HINT:
 		err = fcntl_rw_hint(filp, cmd, arg);
 		break;
+	case F_EPOLL_GET_BUSY_POLL_USECS:
+	case F_EPOLL_SET_BUSY_POLL_USECS:
+		err = eventpoll_fcntl(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 3337745d81bd..3e6a49d14f52 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -22,6 +22,8 @@ struct file;
 struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff);
 #endif
 
+unsigned long eventpoll_fcntl(struct file *file, unsigned int cmd, unsigned long arg);
+
 /* Used to release the epoll bits inside the "struct file" */
 void eventpoll_release_file(struct file *file);
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 282e90aeb163..522134ab9580 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -56,6 +56,12 @@
 #define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
 #define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
 
+/*
+ * Set/Get busy poll usecs for an epoll instance.
+ */
+#define F_EPOLL_GET_BUSY_POLL_USECS (F_LINUX_SPECIFIC_BASE + 15)
+#define F_EPOLL_SET_BUSY_POLL_USECS (F_LINUX_SPECIFIC_BASE + 16)
+
 /*
  * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
  * used to clear any hints previously set.
diff --git a/tools/include/uapi/linux/fcntl.h b/tools/include/uapi/linux/fcntl.h
index 6c80f96049bd..1937f8b74783 100644
--- a/tools/include/uapi/linux/fcntl.h
+++ b/tools/include/uapi/linux/fcntl.h
@@ -56,6 +56,12 @@
 #define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
 #define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
 
+/*
+ * Set/Get busy poll usecs for an epoll instance.
+ */
+#define F_EPOLL_GET_BUSY_POLL_USECS (F_LINUX_SPECIFIC_BASE + 15)
+#define F_EPOLL_SET_BUSY_POLL_USECS (F_LINUX_SPECIFIC_BASE + 16)
+
 /*
  * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
  * used to clear any hints previously set.
diff --git a/tools/perf/trace/beauty/fcntl.c b/tools/perf/trace/beauty/fcntl.c
index 56ef83b3d130..dae5647c5c1a 100644
--- a/tools/perf/trace/beauty/fcntl.c
+++ b/tools/perf/trace/beauty/fcntl.c
@@ -94,7 +94,8 @@ size_t syscall_arg__scnprintf_fcntl_arg(char *bf, size_t size, struct syscall_ar
 	    cmd == F_OFD_SETLK || cmd == F_OFD_SETLKW || cmd == F_OFD_GETLK ||
 	    cmd == F_GETOWN_EX || cmd == F_SETOWN_EX ||
 	    cmd == F_GET_RW_HINT || cmd == F_SET_RW_HINT ||
-	    cmd == F_GET_FILE_RW_HINT || cmd == F_SET_FILE_RW_HINT)
+	    cmd == F_GET_FILE_RW_HINT || cmd == F_SET_FILE_RW_HINT ||
+	    cmd == F_EPOLL_GET_BUSY_POLL_USECS || cmd == F_EPOLL_SET_BUSY_POLL_USECS)
 		return syscall_arg__scnprintf_hex(bf, size, arg);
 
 	return syscall_arg__scnprintf_long(bf, size, arg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
@ 2024-01-21 14:21   ` kernel test robot
  2024-01-21 14:42   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-01-21 14:21 UTC (permalink / raw)
  To: Joe Damato; +Cc: oe-kbuild-all

Hi Joe,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on perf-tools-next/perf-tools-next]
[also build test WARNING on tip/perf/core perf-tools/perf-tools linus/master horms-ipvs/master acme/perf/core v6.7 next-20240119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joe-Damato/eventpoll-support-busy-poll-per-epoll-instance/20240120-084514
base:   https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link:    https://lore.kernel.org/r/20240120004247.42036-2-jdamato%40fastly.com
patch subject: [RFC 1/1] eventpoll: support busy poll per epoll instance
config: i386-buildonly-randconfig-001-20240121 (https://download.01.org/0day-ci/archive/20240121/202401212239.LAqIJgfm-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240121/202401212239.LAqIJgfm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401212239.LAqIJgfm-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> fs/eventpoll.c:398: warning: Function parameter or member 'start_time' not described in 'busy_loop_ep_timeout'
>> fs/eventpoll.c:398: warning: Function parameter or member 'ep' not described in 'busy_loop_ep_timeout'


vim +398 fs/eventpoll.c

   391	
   392	/**
   393	 * busy_loop_ep_timeout - check if busy poll has timed out. The timeout value
   394	 * from the epoll instance ep is preferred, but if it is not set fallback to
   395	 * the system-wide global via busy_loop_timeout.
   396	 */
   397	static inline bool busy_loop_ep_timeout(unsigned long start_time, struct eventpoll *ep)
 > 398	{
   399	#ifdef CONFIG_NET_RX_BUSY_POLL
   400		unsigned long bp_usec = READ_ONCE(ep->busy_poll_usecs);
   401	
   402		if (bp_usec) {
   403			unsigned long end_time = start_time + bp_usec;
   404			unsigned long now = busy_loop_current_time();
   405	
   406			return time_after(now, end_time);
   407		} else {
   408			return busy_loop_timeout(start_time);
   409		}
   410	#endif
   411		return true;
   412	}
   413	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
  2024-01-21 14:21   ` kernel test robot
@ 2024-01-21 14:42   ` kernel test robot
  2024-01-21 16:06   ` kernel test robot
  2024-01-22 15:25   ` Christian Brauner
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-01-21 14:42 UTC (permalink / raw)
  To: Joe Damato; +Cc: oe-kbuild-all

Hi Joe,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on perf-tools-next/perf-tools-next]
[also build test ERROR on tip/perf/core perf-tools/perf-tools linus/master horms-ipvs/master acme/perf/core v6.7 next-20240119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joe-Damato/eventpoll-support-busy-poll-per-epoll-instance/20240120-084514
base:   https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link:    https://lore.kernel.org/r/20240120004247.42036-2-jdamato%40fastly.com
patch subject: [RFC 1/1] eventpoll: support busy poll per epoll instance
config: i386-buildonly-randconfig-003-20240121 (https://download.01.org/0day-ci/archive/20240121/202401212215.zDl6deZY-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240121/202401212215.zDl6deZY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401212215.zDl6deZY-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   fs/eventpoll.c: In function 'eventpoll_fcntl':
>> fs/eventpoll.c:973:27: warning: variable 'ep' set but not used [-Wunused-but-set-variable]
     973 |         struct eventpoll *ep;
         |                           ^~
   fs/eventpoll.c: In function 'do_epoll_create':
>> fs/eventpoll.c:2123:11: error: 'struct eventpoll' has no member named 'busy_poll_usecs'
    2123 |         ep->busy_poll_usecs = 0;
         |           ^~


vim +2123 fs/eventpoll.c

  2086	
  2087	/*
  2088	 * Open an eventpoll file descriptor.
  2089	 */
  2090	static int do_epoll_create(int flags)
  2091	{
  2092		int error, fd;
  2093		struct eventpoll *ep = NULL;
  2094		struct file *file;
  2095	
  2096		/* Check the EPOLL_* constant for consistency.  */
  2097		BUILD_BUG_ON(EPOLL_CLOEXEC != O_CLOEXEC);
  2098	
  2099		if (flags & ~EPOLL_CLOEXEC)
  2100			return -EINVAL;
  2101		/*
  2102		 * Create the internal data structure ("struct eventpoll").
  2103		 */
  2104		error = ep_alloc(&ep);
  2105		if (error < 0)
  2106			return error;
  2107		/*
  2108		 * Creates all the items needed to setup an eventpoll file. That is,
  2109		 * a file structure and a free file descriptor.
  2110		 */
  2111		fd = get_unused_fd_flags(O_RDWR | (flags & O_CLOEXEC));
  2112		if (fd < 0) {
  2113			error = fd;
  2114			goto out_free_ep;
  2115		}
  2116		file = anon_inode_getfile("[eventpoll]", &eventpoll_fops, ep,
  2117					 O_RDWR | (flags & O_CLOEXEC));
  2118		if (IS_ERR(file)) {
  2119			error = PTR_ERR(file);
  2120			goto out_free_fd;
  2121		}
  2122	#ifndef CONFIG_NET_RX_BUSY_POLL
> 2123		ep->busy_poll_usecs = 0;
  2124	#endif
  2125		ep->file = file;
  2126		fd_install(fd, file);
  2127		return fd;
  2128	
  2129	out_free_fd:
  2130		put_unused_fd(fd);
  2131	out_free_ep:
  2132		ep_clear_and_put(ep);
  2133		return error;
  2134	}
  2135	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
  2024-01-21 14:21   ` kernel test robot
  2024-01-21 14:42   ` kernel test robot
@ 2024-01-21 16:06   ` kernel test robot
  2024-01-22 15:25   ` Christian Brauner
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-01-21 16:06 UTC (permalink / raw)
  To: Joe Damato; +Cc: oe-kbuild-all

Hi Joe,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on perf-tools-next/perf-tools-next]
[also build test ERROR on tip/perf/core perf-tools/perf-tools linus/master horms-ipvs/master acme/perf/core v6.7 next-20240119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joe-Damato/eventpoll-support-busy-poll-per-epoll-instance/20240120-084514
base:   https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link:    https://lore.kernel.org/r/20240120004247.42036-2-jdamato%40fastly.com
patch subject: [RFC 1/1] eventpoll: support busy poll per epoll instance
config: i386-buildonly-randconfig-006-20240121 (https://download.01.org/0day-ci/archive/20240121/202401212307.5cz428Nn-lkp@intel.com/config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240121/202401212307.5cz428Nn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401212307.5cz428Nn-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/fcntl.c: In function 'do_fcntl':
>> fs/fcntl.c:425:9: error: implicit declaration of function 'eventpoll_fcntl'; did you mean 'eventpoll_release'? [-Werror=implicit-function-declaration]
      err = eventpoll_fcntl(filp, cmd, arg);
            ^~~~~~~~~~~~~~~
            eventpoll_release
   cc1: some warnings being treated as errors


vim +425 fs/fcntl.c

   316	
   317	static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
   318			struct file *filp)
   319	{
   320		void __user *argp = (void __user *)arg;
   321		int argi = (int)arg;
   322		struct flock flock;
   323		long err = -EINVAL;
   324	
   325		switch (cmd) {
   326		case F_DUPFD:
   327			err = f_dupfd(argi, filp, 0);
   328			break;
   329		case F_DUPFD_CLOEXEC:
   330			err = f_dupfd(argi, filp, O_CLOEXEC);
   331			break;
   332		case F_GETFD:
   333			err = get_close_on_exec(fd) ? FD_CLOEXEC : 0;
   334			break;
   335		case F_SETFD:
   336			err = 0;
   337			set_close_on_exec(fd, argi & FD_CLOEXEC);
   338			break;
   339		case F_GETFL:
   340			err = filp->f_flags;
   341			break;
   342		case F_SETFL:
   343			err = setfl(fd, filp, argi);
   344			break;
   345	#if BITS_PER_LONG != 32
   346		/* 32-bit arches must use fcntl64() */
   347		case F_OFD_GETLK:
   348	#endif
   349		case F_GETLK:
   350			if (copy_from_user(&flock, argp, sizeof(flock)))
   351				return -EFAULT;
   352			err = fcntl_getlk(filp, cmd, &flock);
   353			if (!err && copy_to_user(argp, &flock, sizeof(flock)))
   354				return -EFAULT;
   355			break;
   356	#if BITS_PER_LONG != 32
   357		/* 32-bit arches must use fcntl64() */
   358		case F_OFD_SETLK:
   359		case F_OFD_SETLKW:
   360			fallthrough;
   361	#endif
   362		case F_SETLK:
   363		case F_SETLKW:
   364			if (copy_from_user(&flock, argp, sizeof(flock)))
   365				return -EFAULT;
   366			err = fcntl_setlk(fd, filp, cmd, &flock);
   367			break;
   368		case F_GETOWN:
   369			/*
   370			 * XXX If f_owner is a process group, the
   371			 * negative return value will get converted
   372			 * into an error.  Oops.  If we keep the
   373			 * current syscall conventions, the only way
   374			 * to fix this will be in libc.
   375			 */
   376			err = f_getown(filp);
   377			force_successful_syscall_return();
   378			break;
   379		case F_SETOWN:
   380			err = f_setown(filp, argi, 1);
   381			break;
   382		case F_GETOWN_EX:
   383			err = f_getown_ex(filp, arg);
   384			break;
   385		case F_SETOWN_EX:
   386			err = f_setown_ex(filp, arg);
   387			break;
   388		case F_GETOWNER_UIDS:
   389			err = f_getowner_uids(filp, arg);
   390			break;
   391		case F_GETSIG:
   392			err = filp->f_owner.signum;
   393			break;
   394		case F_SETSIG:
   395			/* arg == 0 restores default behaviour. */
   396			if (!valid_signal(argi)) {
   397				break;
   398			}
   399			err = 0;
   400			filp->f_owner.signum = argi;
   401			break;
   402		case F_GETLEASE:
   403			err = fcntl_getlease(filp);
   404			break;
   405		case F_SETLEASE:
   406			err = fcntl_setlease(fd, filp, argi);
   407			break;
   408		case F_NOTIFY:
   409			err = fcntl_dirnotify(fd, filp, argi);
   410			break;
   411		case F_SETPIPE_SZ:
   412		case F_GETPIPE_SZ:
   413			err = pipe_fcntl(filp, cmd, argi);
   414			break;
   415		case F_ADD_SEALS:
   416		case F_GET_SEALS:
   417			err = memfd_fcntl(filp, cmd, argi);
   418			break;
   419		case F_GET_RW_HINT:
   420		case F_SET_RW_HINT:
   421			err = fcntl_rw_hint(filp, cmd, arg);
   422			break;
   423		case F_EPOLL_GET_BUSY_POLL_USECS:
   424		case F_EPOLL_SET_BUSY_POLL_USECS:
 > 425			err = eventpoll_fcntl(filp, cmd, arg);
   426			break;
   427		default:
   428			break;
   429		}
   430		return err;
   431	}
   432	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
                     ` (2 preceding siblings ...)
  2024-01-21 16:06   ` kernel test robot
@ 2024-01-22 15:25   ` Christian Brauner
  2024-01-22 21:16     ` Joe Damato
  3 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2024-01-22 15:25 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, linux-kernel, chuck.lever, jlayton, linux-api, edumazet,
	davem, alexander.duyck, sridhar.samudrala, kuba

On Sat, Jan 20, 2024 at 12:42:47AM +0000, Joe Damato wrote:
> Add F_EPOLL_{S,G}ET_BUSY_POLL_USECS to allow setting a busy poll timeout
> per epoll instance so that individual applications can enable (or
> disable) epoll based busy poll as needed.
> 
> Prior to this change, epoll-based busy poll could only be enabled
> system-wide, which limits the usefulness of busy poll.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---

This should be an ioctl on the epoll fd, not a fcntl(). fcntl()s
should aim to be generic which this isn't. We've recently rejected a
memfd addition as a fcntl() as well for the same reason.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] eventpoll: support busy poll per epoll instance
  2024-01-22 15:25   ` Christian Brauner
@ 2024-01-22 21:16     ` Joe Damato
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Damato @ 2024-01-22 21:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: netdev, linux-kernel, chuck.lever, jlayton, linux-api, edumazet,
	davem, alexander.duyck, sridhar.samudrala, kuba

On Mon, Jan 22, 2024 at 04:25:01PM +0100, Christian Brauner wrote:
> On Sat, Jan 20, 2024 at 12:42:47AM +0000, Joe Damato wrote:
> > Add F_EPOLL_{S,G}ET_BUSY_POLL_USECS to allow setting a busy poll timeout
> > per epoll instance so that individual applications can enable (or
> > disable) epoll based busy poll as needed.
> > 
> > Prior to this change, epoll-based busy poll could only be enabled
> > system-wide, which limits the usefulness of busy poll.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> 
> This should be an ioctl on the epoll fd, not a fcntl(). fcntl()s
> should aim to be generic which this isn't. We've recently rejected a
> memfd addition as a fcntl() as well for the same reason.

OK, thanks for the review. An ioctl makes more sense, I agree.

I'll rewrite it as you've suggested and send another RFC.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-22 21:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-20  0:42 [RFC 0/1] RFC: Allow busy poll to be set per epoll instance Joe Damato
2024-01-20  0:42 ` [RFC 1/1] eventpoll: support busy poll " Joe Damato
2024-01-21 14:21   ` kernel test robot
2024-01-21 14:42   ` kernel test robot
2024-01-21 16:06   ` kernel test robot
2024-01-22 15:25   ` Christian Brauner
2024-01-22 21:16     ` Joe Damato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.