linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] io_uring: Add support for napi_busy_poll
@ 2022-03-01 13:47 Olivier Langlois
  2022-03-01 13:47 ` [PATCH v4 1/2] io_uring: minor io_cqring_wait() optimization Olivier Langlois
  2022-03-01 13:47 ` [PATCH v4 2/2] io_uring: Add support for napi_busy_poll Olivier Langlois
  0 siblings, 2 replies; 12+ messages in thread
From: Olivier Langlois @ 2022-03-01 13:47 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: Hao Xu, io-uring, linux-kernel

The sqpoll thread can be used for performing the napi busy poll in a
similar way that it does io polling for file systems supporting direct
access bypassing the page cache.

The other way that io_uring can be used for napi busy poll is by
calling io_uring_enter() to get events.

If the user specify a timeout value, it is distributed between polling
and sleeping by using the systemwide setting
/proc/sys/net/core/busy_poll.

The changes have been tested with this program:
https://github.com/lano1106/io_uring_udp_ping

and the result is:
Without sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us

With sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us

v2:
 * Evaluate list_empty(&ctx->napi_list) outside io_napi_busy_loop() to keep
   __io_sq_thread() execution as fast as possible
 * In io_cqring_wait(), move up the sig block to avoid needless computation
   if the block exits the function
 * In io_cqring_wait(), protect ctx->napi_list from race condition by
   splicing it into a local list
 * In io_cqring_wait(), allow busy polling when uts is missing
 * Fix kernel test robot issues
v3:
 * Fix do_div() type mismatch warning
 * Reduce uring_lock contention by creating a spinlock for protecting
   napi_list
 * Support correctly MULTISHOT poll requests
v4:
 * Put back benchmark result in commit text

Olivier Langlois (2):
  io_uring: minor io_cqring_wait() optimization
  io_uring: Add support for napi_busy_poll

 fs/io_uring.c | 246 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 237 insertions(+), 9 deletions(-)


base-commit: 719fce7539cd3e186598e2aed36325fe892150cf
-- 
2.35.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] io_uring: minor io_cqring_wait() optimization
  2022-03-01 13:47 [PATCH v4 0/2] io_uring: Add support for napi_busy_poll Olivier Langlois
@ 2022-03-01 13:47 ` Olivier Langlois
  2022-03-01 13:47 ` [PATCH v4 2/2] io_uring: Add support for napi_busy_poll Olivier Langlois
  1 sibling, 0 replies; 12+ messages in thread
From: Olivier Langlois @ 2022-03-01 13:47 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: Hao Xu, io-uring, linux-kernel

Move up the block manipulating the sig variable to execute code
that may encounter an error and exit first before continuing
exectuing the rest of the function and avoid useless computations

Signed-off-by: Olivier Langlois <olivier@trillion01.com>
---
 fs/io_uring.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4715980e9015..f7b8df79a02b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7732,14 +7732,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			break;
 	} while (1);
 
-	if (uts) {
-		struct timespec64 ts;
-
-		if (get_timespec64(&ts, uts))
-			return -EFAULT;
-		timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
-	}
-
 	if (sig) {
 #ifdef CONFIG_COMPAT
 		if (in_compat_syscall())
@@ -7753,6 +7745,14 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			return ret;
 	}
 
+	if (uts) {
+		struct timespec64 ts;
+
+		if (get_timespec64(&ts, uts))
+			return -EFAULT;
+		timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+	}
+
 	init_waitqueue_func_entry(&iowq.wq, io_wake_function);
 	iowq.wq.private = current;
 	INIT_LIST_HEAD(&iowq.wq.entry);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 13:47 [PATCH v4 0/2] io_uring: Add support for napi_busy_poll Olivier Langlois
  2022-03-01 13:47 ` [PATCH v4 1/2] io_uring: minor io_cqring_wait() optimization Olivier Langlois
@ 2022-03-01 13:47 ` Olivier Langlois
  2022-03-01 18:31   ` Hao Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Olivier Langlois @ 2022-03-01 13:47 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: Hao Xu, io-uring, linux-kernel

The sqpoll thread can be used for performing the napi busy poll in a
similar way that it does io polling for file systems supporting direct
access bypassing the page cache.

The other way that io_uring can be used for napi busy poll is by
calling io_uring_enter() to get events.

If the user specify a timeout value, it is distributed between polling
and sleeping by using the systemwide setting
/proc/sys/net/core/busy_poll.

The changes have been tested with this program:
https://github.com/lano1106/io_uring_udp_ping

and the result is:
Without sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us

With sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us

Co-developed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
---
 fs/io_uring.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 229 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f7b8df79a02b..37c065786e4b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -63,6 +63,7 @@
 #include <net/sock.h>
 #include <net/af_unix.h>
 #include <net/scm.h>
+#include <net/busy_poll.h>
 #include <linux/anon_inodes.h>
 #include <linux/sched/mm.h>
 #include <linux/uaccess.h>
@@ -395,6 +396,11 @@ struct io_ring_ctx {
 	struct list_head	sqd_list;
 
 	unsigned long		check_cq_overflow;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	/* used to track busy poll napi_id */
+	struct list_head	napi_list;
+	spinlock_t		napi_lock;	/* napi_list lock */
+#endif
 
 	struct {
 		unsigned		cached_cq_tail;
@@ -1464,6 +1470,10 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_WQ_LIST(&ctx->locked_free_list);
 	INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
 	INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	INIT_LIST_HEAD(&ctx->napi_list);
+	spin_lock_init(&ctx->napi_lock);
+#endif
 	return ctx;
 err:
 	kfree(ctx->dummy_ubuf);
@@ -5399,6 +5409,108 @@ IO_NETOP_FN(send);
 IO_NETOP_FN(recv);
 #endif /* CONFIG_NET */
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+
+#define NAPI_TIMEOUT			(60 * SEC_CONVERSION)
+
+struct napi_entry {
+	struct list_head	list;
+	unsigned int		napi_id;
+	unsigned long		timeout;
+};
+
+/*
+ * Add busy poll NAPI ID from sk.
+ */
+static void io_add_napi(struct file *file, struct io_ring_ctx *ctx)
+{
+	unsigned int napi_id;
+	struct socket *sock;
+	struct sock *sk;
+	struct napi_entry *ne;
+
+	if (!net_busy_loop_on())
+		return;
+
+	sock = sock_from_file(file);
+	if (!sock)
+		return;
+
+	sk = sock->sk;
+	if (!sk)
+		return;
+
+	napi_id = READ_ONCE(sk->sk_napi_id);
+
+	/* Non-NAPI IDs can be rejected */
+	if (napi_id < MIN_NAPI_ID)
+		return;
+
+	spin_lock(&ctx->napi_lock);
+	list_for_each_entry(ne, &ctx->napi_list, list) {
+		if (ne->napi_id == napi_id) {
+			ne->timeout = jiffies + NAPI_TIMEOUT;
+			goto out;
+		}
+	}
+
+	ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
+	if (!ne)
+		goto out;
+
+	ne->napi_id = napi_id;
+	ne->timeout = jiffies + NAPI_TIMEOUT;
+	list_add_tail(&ne->list, &ctx->napi_list);
+out:
+	spin_unlock(&ctx->napi_lock);
+}
+
+static inline void io_check_napi_entry_timeout(struct napi_entry *ne)
+{
+	if (time_after(jiffies, ne->timeout)) {
+		list_del(&ne->list);
+		kfree(ne);
+	}
+}
+
+/*
+ * Busy poll if globally on and supporting sockets found
+ */
+static bool io_napi_busy_loop(struct list_head *napi_list)
+{
+	struct napi_entry *ne, *n;
+
+	list_for_each_entry_safe(ne, n, napi_list, list) {
+		napi_busy_loop(ne->napi_id, NULL, NULL, true,
+			       BUSY_POLL_BUDGET);
+		io_check_napi_entry_timeout(ne);
+	}
+	return !list_empty(napi_list);
+}
+
+static void io_free_napi_list(struct io_ring_ctx *ctx)
+{
+	spin_lock(&ctx->napi_lock);
+	while (!list_empty(&ctx->napi_list)) {
+		struct napi_entry *ne =
+			list_first_entry(&ctx->napi_list, struct napi_entry,
+					 list);
+
+		list_del(&ne->list);
+		kfree(ne);
+	}
+	spin_unlock(&ctx->napi_lock);
+}
+#else
+static inline void io_add_napi(struct file *file, struct io_ring_ctx *ctx)
+{
+}
+
+static inline void io_free_napi_list(struct io_ring_ctx *ctx)
+{
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
 struct io_poll_table {
 	struct poll_table_struct pt;
 	struct io_kiocb *req;
@@ -5545,6 +5657,7 @@ static int io_poll_check_events(struct io_kiocb *req)
 			if (unlikely(!filled))
 				return -ECANCELED;
 			io_cqring_ev_posted(ctx);
+			io_add_napi(req->file, ctx);
 		} else if (req->result) {
 			return 0;
 		}
@@ -5777,6 +5890,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		__io_poll_execute(req, mask);
 		return 0;
 	}
+	io_add_napi(req->file, req->ctx);
 
 	/*
 	 * Release ownership. If someone tried to queue a tw while it was
@@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 		    !(ctx->flags & IORING_SETUP_R_DISABLED))
 			ret = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		if (!list_empty(&ctx->napi_list) &&
+		    io_napi_busy_loop(&ctx->napi_list))
+			++ret;
+#endif
 		if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
 			wake_up(&ctx->sqo_sq_wait);
 		if (creds)
@@ -7650,6 +7768,9 @@ struct io_wait_queue {
 	struct io_ring_ctx *ctx;
 	unsigned cq_tail;
 	unsigned nr_timeouts;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned busy_poll_to;
+#endif
 };
 
 static inline bool io_should_wake(struct io_wait_queue *iowq)
@@ -7711,6 +7832,87 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	return 1;
 }
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static void io_adjust_busy_loop_timeout(struct timespec64 *ts,
+					struct io_wait_queue *iowq)
+{
+	unsigned busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
+	struct timespec64 pollto = ns_to_timespec64(1000 * (s64)busy_poll_to);
+
+	if (timespec64_compare(ts, &pollto) > 0) {
+		*ts = timespec64_sub(*ts, pollto);
+		iowq->busy_poll_to = busy_poll_to;
+	} else {
+		u64 to = timespec64_to_ns(ts);
+
+		do_div(to, 1000);
+		iowq->busy_poll_to = to;
+		ts->tv_sec = 0;
+		ts->tv_nsec = 0;
+	}
+}
+
+static inline bool io_busy_loop_timeout(unsigned long start_time,
+					unsigned long bp_usec)
+{
+	if (bp_usec) {
+		unsigned long end_time = start_time + bp_usec;
+		unsigned long now = busy_loop_current_time();
+
+		return time_after(now, end_time);
+	}
+	return true;
+}
+
+static bool io_busy_loop_end(void *p, unsigned long start_time)
+{
+	struct io_wait_queue *iowq = p;
+
+	return signal_pending(current) ||
+	       io_should_wake(iowq) ||
+	       io_busy_loop_timeout(start_time, iowq->busy_poll_to);
+}
+
+static void io_blocking_napi_busy_loop(struct list_head *napi_list,
+				       struct io_wait_queue *iowq)
+{
+	unsigned long start_time =
+		list_is_singular(napi_list) ? 0 :
+		busy_loop_current_time();
+
+	do {
+		if (list_is_singular(napi_list)) {
+			struct napi_entry *ne =
+				list_first_entry(napi_list,
+						 struct napi_entry, list);
+
+			napi_busy_loop(ne->napi_id, io_busy_loop_end, iowq,
+				       true, BUSY_POLL_BUDGET);
+			io_check_napi_entry_timeout(ne);
+			break;
+		}
+	} while (io_napi_busy_loop(napi_list) &&
+		 !io_busy_loop_end(iowq, start_time));
+}
+
+static void io_putback_napi_list(struct io_ring_ctx *ctx,
+				 struct list_head *napi_list)
+{
+	struct napi_entry *cne, *lne;
+
+	spin_lock(&ctx->napi_lock);
+	list_for_each_entry(cne, &ctx->napi_list, list)
+		list_for_each_entry(lne, napi_list, list)
+			if (cne->napi_id == lne->napi_id) {
+				list_del(&lne->list);
+				kfree(lne);
+				break;
+			}
+	list_splice(napi_list, &ctx->napi_list);
+	spin_unlock(&ctx->napi_lock);
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
 /*
  * Wait until events become available, if we don't already have some. The
  * application must reap them itself, as they reside on the shared cq ring.
@@ -7723,6 +7925,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	struct io_rings *rings = ctx->rings;
 	ktime_t timeout = KTIME_MAX;
 	int ret;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	LIST_HEAD(local_napi_list);
+#endif
 
 	do {
 		io_cqring_overflow_flush(ctx);
@@ -7745,13 +7950,29 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			return ret;
 	}
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	iowq.busy_poll_to = 0;
+	if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
+		spin_lock(&ctx->napi_lock);
+		list_splice_init(&ctx->napi_list, &local_napi_list);
+		spin_unlock(&ctx->napi_lock);
+	}
+#endif
 	if (uts) {
 		struct timespec64 ts;
 
 		if (get_timespec64(&ts, uts))
 			return -EFAULT;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		if (!list_empty(&local_napi_list))
+			io_adjust_busy_loop_timeout(&ts, &iowq);
+#endif
 		timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
 	}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	else if (!list_empty(&local_napi_list))
+		iowq.busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
+#endif
 
 	init_waitqueue_func_entry(&iowq.wq, io_wake_function);
 	iowq.wq.private = current;
@@ -7761,6 +7982,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 
 	trace_io_uring_cqring_wait(ctx, min_events);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	if (iowq.busy_poll_to)
+		io_blocking_napi_busy_loop(&local_napi_list, &iowq);
+	if (!list_empty(&local_napi_list))
+		io_putback_napi_list(ctx, &local_napi_list);
+#endif
 	do {
 		/* if we can't even flush overflow, don't wait for more */
 		if (!io_cqring_overflow_flush(ctx)) {
@@ -9483,6 +9710,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_req_caches_free(ctx);
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
+	io_free_napi_list(ctx);
 	kfree(ctx->cancel_hash);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 13:47 ` [PATCH v4 2/2] io_uring: Add support for napi_busy_poll Olivier Langlois
@ 2022-03-01 18:31   ` Hao Xu
  2022-03-01 20:06     ` Olivier Langlois
  2022-03-02  5:12     ` Olivier Langlois
  0 siblings, 2 replies; 12+ messages in thread
From: Hao Xu @ 2022-03-01 18:31 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel

Hi Olivier,

On 3/1/22 21:47, Olivier Langlois wrote:
> The sqpoll thread can be used for performing the napi busy poll in a
> similar way that it does io polling for file systems supporting direct
> access bypassing the page cache.
>
> The other way that io_uring can be used for napi busy poll is by
> calling io_uring_enter() to get events.
>
> If the user specify a timeout value, it is distributed between polling
> and sleeping by using the systemwide setting
> /proc/sys/net/core/busy_poll.
>
> The changes have been tested with this program:
> https://github.com/lano1106/io_uring_udp_ping
>
> and the result is:
> Without sqpoll:
> NAPI busy loop disabled:
> rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us
> NAPI busy loop enabled:
> rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us
>
> With sqpoll:
> NAPI busy loop disabled:
> rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us
> NAPI busy loop enabled:
> rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us
>
> Co-developed-by: Hao Xu <haoxu@linux.alibaba.com>
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> ---
>   fs/io_uring.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 229 insertions(+), 1 deletion(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index f7b8df79a02b..37c065786e4b 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -63,6 +63,7 @@
>   #include <net/sock.h>
>   #include <net/af_unix.h>
>   #include <net/scm.h>
> +#include <net/busy_poll.h>
>   #include <linux/anon_inodes.h>
>   #include <linux/sched/mm.h>
>   #include <linux/uaccess.h>
> @@ -395,6 +396,11 @@ struct io_ring_ctx {
>   	struct list_head	sqd_list;
>   
>   	unsigned long		check_cq_overflow;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	/* used to track busy poll napi_id */
> +	struct list_head	napi_list;
> +	spinlock_t		napi_lock;	/* napi_list lock */
> +#endif
>   
>   	struct {
>   		unsigned		cached_cq_tail;
> @@ -1464,6 +1470,10 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>   	INIT_WQ_LIST(&ctx->locked_free_list);
>   	INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
>   	INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	INIT_LIST_HEAD(&ctx->napi_list);
> +	spin_lock_init(&ctx->napi_lock);
> +#endif
>   	return ctx;
>   err:
>   	kfree(ctx->dummy_ubuf);
> @@ -5399,6 +5409,108 @@ IO_NETOP_FN(send);
>   IO_NETOP_FN(recv);
>   #endif /* CONFIG_NET */
>   
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +
> +#define NAPI_TIMEOUT			(60 * SEC_CONVERSION)
> +
> +struct napi_entry {
> +	struct list_head	list;
> +	unsigned int		napi_id;
> +	unsigned long		timeout;
> +};
> +
> +/*
> + * Add busy poll NAPI ID from sk.
> + */
> +static void io_add_napi(struct file *file, struct io_ring_ctx *ctx)
> +{
> +	unsigned int napi_id;
> +	struct socket *sock;
> +	struct sock *sk;
> +	struct napi_entry *ne;
> +
> +	if (!net_busy_loop_on())
> +		return;
> +
> +	sock = sock_from_file(file);
> +	if (!sock)
> +		return;
> +
> +	sk = sock->sk;
> +	if (!sk)
> +		return;
> +
> +	napi_id = READ_ONCE(sk->sk_napi_id);
> +
> +	/* Non-NAPI IDs can be rejected */
> +	if (napi_id < MIN_NAPI_ID)
> +		return;
> +
> +	spin_lock(&ctx->napi_lock);
> +	list_for_each_entry(ne, &ctx->napi_list, list) {
> +		if (ne->napi_id == napi_id) {
> +			ne->timeout = jiffies + NAPI_TIMEOUT;
> +			goto out;
> +		}
> +	}
> +
> +	ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
> +	if (!ne)
> +		goto out;

IMHO, we need to handle -ENOMEM here, I cut off the error handling when

I did the quick coding. Sorry for misleading.

> +
> +	ne->napi_id = napi_id;
> +	ne->timeout = jiffies + NAPI_TIMEOUT;
> +	list_add_tail(&ne->list, &ctx->napi_list);
> +out:
> +	spin_unlock(&ctx->napi_lock);
> +}
> +
> +static inline void io_check_napi_entry_timeout(struct napi_entry *ne)
> +{
> +	if (time_after(jiffies, ne->timeout)) {
> +		list_del(&ne->list);
> +		kfree(ne);
> +	}
> +}
> +
> +/*
> + * Busy poll if globally on and supporting sockets found
> + */
> +static bool io_napi_busy_loop(struct list_head *napi_list)
> +{
> +	struct napi_entry *ne, *n;
> +
> +	list_for_each_entry_safe(ne, n, napi_list, list) {
> +		napi_busy_loop(ne->napi_id, NULL, NULL, true,
> +			       BUSY_POLL_BUDGET);
> +		io_check_napi_entry_timeout(ne);
> +	}
> +	return !list_empty(napi_list);
> +}
> +
> +static void io_free_napi_list(struct io_ring_ctx *ctx)
> +{
> +	spin_lock(&ctx->napi_lock);
> +	while (!list_empty(&ctx->napi_list)) {
> +		struct napi_entry *ne =
> +			list_first_entry(&ctx->napi_list, struct napi_entry,
> +					 list);
> +
> +		list_del(&ne->list);
> +		kfree(ne);
> +	}
> +	spin_unlock(&ctx->napi_lock);
> +}
> +#else
> +static inline void io_add_napi(struct file *file, struct io_ring_ctx *ctx)
> +{
> +}
> +
> +static inline void io_free_napi_list(struct io_ring_ctx *ctx)
> +{
> +}
> +#endif /* CONFIG_NET_RX_BUSY_POLL */
> +
>   struct io_poll_table {
>   	struct poll_table_struct pt;
>   	struct io_kiocb *req;
> @@ -5545,6 +5657,7 @@ static int io_poll_check_events(struct io_kiocb *req)
>   			if (unlikely(!filled))
>   				return -ECANCELED;
>   			io_cqring_ev_posted(ctx);
> +			io_add_napi(req->file, ctx);
>   		} else if (req->result) {
>   			return 0;
>   		}
> @@ -5777,6 +5890,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
>   		__io_poll_execute(req, mask);
>   		return 0;
>   	}
> +	io_add_napi(req->file, req->ctx);
>   
>   	/*
>   	 * Release ownership. If someone tried to queue a tw while it was
> @@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
>   		    !(ctx->flags & IORING_SETUP_R_DISABLED))
>   			ret = io_submit_sqes(ctx, to_submit);
>   		mutex_unlock(&ctx->uring_lock);
> -
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +		if (!list_empty(&ctx->napi_list) &&
> +		    io_napi_busy_loop(&ctx->napi_list))

I'm afraid we may need lock for sqpoll too, since io_add_napi() could be 
in iowq context.

I'll take a look at the lock stuff of this patch tomorrow, too late now 
in my timezone.

> +			++ret;
> +#endif
>   		if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
>   			wake_up(&ctx->sqo_sq_wait);
>   		if (creds)
> @@ -7650,6 +7768,9 @@ struct io_wait_queue {
>   	struct io_ring_ctx *ctx;
>   	unsigned cq_tail;
>   	unsigned nr_timeouts;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	unsigned busy_poll_to;
> +#endif
>   };
>   
>   static inline bool io_should_wake(struct io_wait_queue *iowq)
> @@ -7711,6 +7832,87 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>   	return 1;
>   }
>   
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +static void io_adjust_busy_loop_timeout(struct timespec64 *ts,
> +					struct io_wait_queue *iowq)
> +{
> +	unsigned busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
> +	struct timespec64 pollto = ns_to_timespec64(1000 * (s64)busy_poll_to);
> +
> +	if (timespec64_compare(ts, &pollto) > 0) {
> +		*ts = timespec64_sub(*ts, pollto);
> +		iowq->busy_poll_to = busy_poll_to;
> +	} else {
> +		u64 to = timespec64_to_ns(ts);
> +
> +		do_div(to, 1000);
> +		iowq->busy_poll_to = to;
> +		ts->tv_sec = 0;
> +		ts->tv_nsec = 0;
> +	}
> +}
> +
> +static inline bool io_busy_loop_timeout(unsigned long start_time,
> +					unsigned long bp_usec)
> +{
> +	if (bp_usec) {
> +		unsigned long end_time = start_time + bp_usec;
> +		unsigned long now = busy_loop_current_time();
> +
> +		return time_after(now, end_time);
> +	}
> +	return true;
> +}
> +
> +static bool io_busy_loop_end(void *p, unsigned long start_time)
> +{
> +	struct io_wait_queue *iowq = p;
> +
> +	return signal_pending(current) ||
> +	       io_should_wake(iowq) ||
> +	       io_busy_loop_timeout(start_time, iowq->busy_poll_to);
> +}
> +
> +static void io_blocking_napi_busy_loop(struct list_head *napi_list,
> +				       struct io_wait_queue *iowq)
> +{
> +	unsigned long start_time =
> +		list_is_singular(napi_list) ? 0 :
> +		busy_loop_current_time();
> +
> +	do {
> +		if (list_is_singular(napi_list)) {
> +			struct napi_entry *ne =
> +				list_first_entry(napi_list,
> +						 struct napi_entry, list);
> +
> +			napi_busy_loop(ne->napi_id, io_busy_loop_end, iowq,
> +				       true, BUSY_POLL_BUDGET);
> +			io_check_napi_entry_timeout(ne);
> +			break;
> +		}
> +	} while (io_napi_busy_loop(napi_list) &&
> +		 !io_busy_loop_end(iowq, start_time));
> +}
> +

How about:

if (list is singular) {

     do something;

     return;

}

while (!io_busy_loop_end() && io_napi_busy_loop())

     ;


Btw, start_time seems not used in singular branch.


Regards,

Hao

> +static void io_putback_napi_list(struct io_ring_ctx *ctx,
> +				 struct list_head *napi_list)
> +{
> +	struct napi_entry *cne, *lne;
> +
> +	spin_lock(&ctx->napi_lock);
> +	list_for_each_entry(cne, &ctx->napi_list, list)
> +		list_for_each_entry(lne, napi_list, list)
> +			if (cne->napi_id == lne->napi_id) {
> +				list_del(&lne->list);
> +				kfree(lne);
> +				break;
> +			}
> +	list_splice(napi_list, &ctx->napi_list);
> +	spin_unlock(&ctx->napi_lock);
> +}
> +#endif /* CONFIG_NET_RX_BUSY_POLL */
> +
>   /*
>    * Wait until events become available, if we don't already have some. The
>    * application must reap them itself, as they reside on the shared cq ring.
> @@ -7723,6 +7925,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>   	struct io_rings *rings = ctx->rings;
>   	ktime_t timeout = KTIME_MAX;
>   	int ret;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	LIST_HEAD(local_napi_list);
> +#endif
>   
>   	do {
>   		io_cqring_overflow_flush(ctx);
> @@ -7745,13 +7950,29 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>   			return ret;
>   	}
>   
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	iowq.busy_poll_to = 0;
> +	if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
> +		spin_lock(&ctx->napi_lock);
> +		list_splice_init(&ctx->napi_list, &local_napi_list);
> +		spin_unlock(&ctx->napi_lock);
> +	}
> +#endif
>   	if (uts) {
>   		struct timespec64 ts;
>   
>   		if (get_timespec64(&ts, uts))
>   			return -EFAULT;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +		if (!list_empty(&local_napi_list))
> +			io_adjust_busy_loop_timeout(&ts, &iowq);
> +#endif
>   		timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
>   	}
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	else if (!list_empty(&local_napi_list))
> +		iowq.busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
> +#endif
>   
>   	init_waitqueue_func_entry(&iowq.wq, io_wake_function);
>   	iowq.wq.private = current;
> @@ -7761,6 +7982,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>   	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
>   
>   	trace_io_uring_cqring_wait(ctx, min_events);
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	if (iowq.busy_poll_to)
> +		io_blocking_napi_busy_loop(&local_napi_list, &iowq);
> +	if (!list_empty(&local_napi_list))
> +		io_putback_napi_list(ctx, &local_napi_list);
> +#endif
>   	do {
>   		/* if we can't even flush overflow, don't wait for more */
>   		if (!io_cqring_overflow_flush(ctx)) {
> @@ -9483,6 +9710,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>   	io_req_caches_free(ctx);
>   	if (ctx->hash_map)
>   		io_wq_put_hash(ctx->hash_map);
> +	io_free_napi_list(ctx);
>   	kfree(ctx->cancel_hash);
>   	kfree(ctx->dummy_ubuf);
>   	kfree(ctx);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 18:31   ` Hao Xu
@ 2022-03-01 20:06     ` Olivier Langlois
  2022-03-01 20:14       ` Olivier Langlois
  2022-03-02  6:27       ` Hao Xu
  2022-03-02  5:12     ` Olivier Langlois
  1 sibling, 2 replies; 12+ messages in thread
From: Olivier Langlois @ 2022-03-01 20:06 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel

On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
> 
> > +       ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
> > +       if (!ne)
> > +               goto out;
> 
> IMHO, we need to handle -ENOMEM here, I cut off the error handling
> when
> 
> I did the quick coding. Sorry for misleading.

If you are correct, I would be shocked about this.

I did return in my 'Linux Device Drivers' book and nowhere it is
mentionned that the kmalloc() can return something else than a pointer

No mention at all about the return value

in man page:
https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html
API doc:

https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kmalloc#c.kmalloc

header file:
https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L522

I did browse into the kmalloc code. There is a lot of paths to cover
but from preliminary reading, it pretty much seems that kmalloc only
returns a valid pointer or NULL...

/**
 * kmem_cache_alloc - Allocate an object
 * @cachep: The cache to allocate from.
 * @flags: See kmalloc().
 *
 * Allocate an object from this cache.  The flags are only relevant
 * if the cache has no available objects.
 *
 * Return: pointer to the new object or %NULL in case of error
 */
 
 /**
 * __do_kmalloc - allocate memory
 * @size: how many bytes of memory are required.
 * @flags: the type of memory to allocate (see kmalloc).
 * @caller: function caller for debug tracking of the caller
 *
 * Return: pointer to the allocated memory or %NULL in case of error
 */

I'll need someone else to confirm about possible kmalloc() return
values with perhaps an example

I am a bit skeptic that something special needs to be done here...

Or perhaps you are suggesting that io_add_napi() returns an error code
when allocation fails.

as done here:
https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/core_marvel.c#L867

If that is what you suggest, what would this info do for the caller?

IMHO, it wouldn't help in any way...
> 
> > 
> > @@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx
> > *ctx, bool cap_entries)
> >                     !(ctx->flags & IORING_SETUP_R_DISABLED))
> >                         ret = io_submit_sqes(ctx, to_submit);
> >                 mutex_unlock(&ctx->uring_lock);
> > -
> > +#ifdef CONFIG_NET_RX_BUSY_POLL
> > +               if (!list_empty(&ctx->napi_list) &&
> > +                   io_napi_busy_loop(&ctx->napi_list))
> 
> I'm afraid we may need lock for sqpoll too, since io_add_napi() could
> be 
> in iowq context.
> 
> I'll take a look at the lock stuff of this patch tomorrow, too late
> now 
> in my timezone.

Ok, please do. I'm not a big user of io workers. I may have omitted to
consider this possibility.

If that is the case, I think that this would be very easy to fix by
locking the spinlock while __io_sq_thread() is using napi_list.
> 
> How about:
> 
> if (list is singular) {
> 
>      do something;
> 
>      return;
> 
> }
> 
> while (!io_busy_loop_end() && io_napi_busy_loop())
> 
>      ;
> 

is there a concern with the current code?
What would be the benefit of your suggestion over current code?

To me, it seems that if io_blocking_napi_busy_loop() is called, a
reasonable expectation would be that some busy looping is done or else
you could return the function without doing anything which would, IMHO,
be misleading.

By definition, napi_busy_loop() is not blocking and if you desire the
device to be in busy poll mode, you need to do it once in a while or
else, after a certain time, the device will return back to its
interrupt mode.

IOW, io_blocking_napi_busy_loop() follows the same logic used by
napi_busy_loop() that does not call loop_end() before having perform 1
loop iteration.

> Btw, start_time seems not used in singular branch.

I know. This is why it is conditionally initialized.

Greetings,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 20:06     ` Olivier Langlois
@ 2022-03-01 20:14       ` Olivier Langlois
  2022-03-02  6:27       ` Hao Xu
  1 sibling, 0 replies; 12+ messages in thread
From: Olivier Langlois @ 2022-03-01 20:14 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel

On Tue, 2022-03-01 at 15:06 -0500, Olivier Langlois wrote:
> On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
> > 
> > 
> > How about:
> > 
> > if (list is singular) {
> > 
> >      do something;
> > 
> >      return;
> > 
> > }
> > 
> > while (!io_busy_loop_end() && io_napi_busy_loop())
> > 
> >      ;
> > 
> 
> is there a concern with the current code?
> What would be the benefit of your suggestion over current code?
> 
> To me, it seems that if io_blocking_napi_busy_loop() is called, a
> reasonable expectation would be that some busy looping is done or
> else
> you could return the function without doing anything which would,
> IMHO,
> be misleading.
> 
> By definition, napi_busy_loop() is not blocking and if you desire the
> device to be in busy poll mode, you need to do it once in a while or
> else, after a certain time, the device will return back to its
> interrupt mode.
> 
> IOW, io_blocking_napi_busy_loop() follows the same logic used by
> napi_busy_loop() that does not call loop_end() before having perform
> 1
> loop iteration.
> 
> > Btw, start_time seems not used in singular branch.
> 
> I know. This is why it is conditionally initialized.
> 
> Greetings,
> 
Another argument for not touching the code the way that it is:
io_napi_busy_loop() has another function on top of iterating the
napi_list and calling napi_busy_loop() for each of them.

The function also check the list entries validity and frees them when
they time out. Not calling io_napi_busy_loop() you would bypass this
check and that could result in timed out entries to never be disposed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 18:31   ` Hao Xu
  2022-03-01 20:06     ` Olivier Langlois
@ 2022-03-02  5:12     ` Olivier Langlois
  2022-03-02  6:35       ` Hao Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Olivier Langlois @ 2022-03-02  5:12 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel

On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
> > +static void io_blocking_napi_busy_loop(struct list_head
> > *napi_list,
> > +                                      struct io_wait_queue *iowq)
> > +{
> > +       unsigned long start_time =
> > +               list_is_singular(napi_list) ? 0 :
> > +               busy_loop_current_time();
> > +
> > +       do {
> > +               if (list_is_singular(napi_list)) {
> > +                       struct napi_entry *ne =
> > +                               list_first_entry(napi_list,
> > +                                                struct napi_entry,
> > list);
> > +
> > +                       napi_busy_loop(ne->napi_id,
> > io_busy_loop_end, iowq,
> > +                                      true, BUSY_POLL_BUDGET);
> > +                       io_check_napi_entry_timeout(ne);
> > +                       break;
> > +               }
> > +       } while (io_napi_busy_loop(napi_list) &&
> > +                !io_busy_loop_end(iowq, start_time));
> > +}
> > +
> 
> How about:
> 
> if (list is singular) {
> 
>      do something;
> 
>      return;
> 
> }
> 
> while (!io_busy_loop_end() && io_napi_busy_loop())
> 
>      ;
> 
> 
> Btw, start_time seems not used in singular branch.

Hao,

it takes me few readings before being able to figure out the idea
behind your suggestions. Sorry about that!

So, if I get it correctly, you are proposing extract out the singular
block out of the while loop...

IMHO, this is not a good idea because you could start iterating the
do/while loop with a multiple entries list that ends up becoming a
singular list after one or few iterations.

Check what io_napi_busy_loop() is doing...

It does not look like that but a lot thoughts have been put into
writing io_blocking_napi_busy_loop()...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-01 20:06     ` Olivier Langlois
  2022-03-01 20:14       ` Olivier Langlois
@ 2022-03-02  6:27       ` Hao Xu
  2022-03-02  6:38         ` Hao Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Hao Xu @ 2022-03-02  6:27 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel


On 3/2/22 04:06, Olivier Langlois wrote:
> On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
>>> +       ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>>> +       if (!ne)
>>> +               goto out;
>> IMHO, we need to handle -ENOMEM here, I cut off the error handling
>> when
>>
>> I did the quick coding. Sorry for misleading.
> If you are correct, I would be shocked about this.
>
> I did return in my 'Linux Device Drivers' book and nowhere it is
> mentionned that the kmalloc() can return something else than a pointer
>
> No mention at all about the return value
>
> in man page:
> https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html
> API doc:
>
> https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kmalloc#c.kmalloc
>
> header file:
> https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L522
>
> I did browse into the kmalloc code. There is a lot of paths to cover
> but from preliminary reading, it pretty much seems that kmalloc only
> returns a valid pointer or NULL...
>
> /**
>   * kmem_cache_alloc - Allocate an object
>   * @cachep: The cache to allocate from.
>   * @flags: See kmalloc().
>   *
>   * Allocate an object from this cache.  The flags are only relevant
>   * if the cache has no available objects.
>   *
>   * Return: pointer to the new object or %NULL in case of error
>   */
>   
>   /**
>   * __do_kmalloc - allocate memory
>   * @size: how many bytes of memory are required.
>   * @flags: the type of memory to allocate (see kmalloc).
>   * @caller: function caller for debug tracking of the caller
>   *
>   * Return: pointer to the allocated memory or %NULL in case of error
>   */
>
> I'll need someone else to confirm about possible kmalloc() return
> values with perhaps an example
>
> I am a bit skeptic that something special needs to be done here...
>
> Or perhaps you are suggesting that io_add_napi() returns an error code
> when allocation fails.
This is what I mean.
>
> as done here:
> https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/core_marvel.c#L867
>
> If that is what you suggest, what would this info do for the caller?
>
> IMHO, it wouldn't help in any way...

Hmm, I'm not sure, you're probably right based on that ENOMEM here shouldn't

fail the arm poll, but we wanna do it, we can do something like what we 
do for

kmalloc() in io_arm_poll_handler()). I'll leave it to others.

>>> @@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx
>>> *ctx, bool cap_entries)
>>>                      !(ctx->flags & IORING_SETUP_R_DISABLED))
>>>                          ret = io_submit_sqes(ctx, to_submit);
>>>                  mutex_unlock(&ctx->uring_lock);
>>> -
>>> +#ifdef CONFIG_NET_RX_BUSY_POLL
>>> +               if (!list_empty(&ctx->napi_list) &&
>>> +                   io_napi_busy_loop(&ctx->napi_list))
>> I'm afraid we may need lock for sqpoll too, since io_add_napi() could
>> be
>> in iowq context.
>>
>> I'll take a look at the lock stuff of this patch tomorrow, too late
>> now
>> in my timezone.
> Ok, please do. I'm not a big user of io workers. I may have omitted to
> consider this possibility.
>
> If that is the case, I think that this would be very easy to fix by
> locking the spinlock while __io_sq_thread() is using napi_list.
>> How about:
>>
>> if (list is singular) {
>>
>>       do something;
>>
>>       return;
>>
>> }
>>
>> while (!io_busy_loop_end() && io_napi_busy_loop())
>>
>>       ;
>>
> is there a concern with the current code?
> What would be the benefit of your suggestion over current code?

No, it's just coding style concern, since I see

do {

     if() {

         break;

     }

} while();

which means the if statement is actually not int the loop. Anyway, it's just

personal taste.

>
> To me, it seems that if io_blocking_napi_busy_loop() is called, a
> reasonable expectation would be that some busy looping is done or else
> you could return the function without doing anything which would, IMHO,
> be misleading.
>
> By definition, napi_busy_loop() is not blocking and if you desire the
> device to be in busy poll mode, you need to do it once in a while or
> else, after a certain time, the device will return back to its
> interrupt mode.
>
> IOW, io_blocking_napi_busy_loop() follows the same logic used by
> napi_busy_loop() that does not call loop_end() before having perform 1
> loop iteration.
I see, thanks for explanation. I'm ok with this.
>
>> Btw, start_time seems not used in singular branch.
> I know. This is why it is conditionally initialized.

like what I said, just personal taste.


+static void io_blocking_napi_busy_loop(struct list_head *napi_list,

+                                      struct io_wait_queue *iowq)
+{
+       if (list_is_singular(napi_list)) {
+               struct napi_entry *ne =
+                       list_first_entry(napi_list,
+                                        struct napi_entry, list);
+
+               napi_busy_loop(ne->napi_id, io_busy_loop_end, iowq,
+                              true, BUSY_POLL_BUDGET);
+               io_check_napi_entry_timeout(ne);
+               break;
+       }
+
+       while (io_napi_busy_loop(napi_list)) {
+               if(io_busy_loop_end(iowq, busy_loop_current_time()))
+                       break;
+       }
+}


>
> Greetings,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-02  5:12     ` Olivier Langlois
@ 2022-03-02  6:35       ` Hao Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Hao Xu @ 2022-03-02  6:35 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel


On 3/2/22 13:12, Olivier Langlois wrote:
> On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
>>> +static void io_blocking_napi_busy_loop(struct list_head
>>> *napi_list,
>>> +                                      struct io_wait_queue *iowq)
>>> +{
>>> +       unsigned long start_time =
>>> +               list_is_singular(napi_list) ? 0 :
>>> +               busy_loop_current_time();
>>> +
>>> +       do {
>>> +               if (list_is_singular(napi_list)) {
>>> +                       struct napi_entry *ne =
>>> +                               list_first_entry(napi_list,
>>> +                                                struct napi_entry,
>>> list);
>>> +
>>> +                       napi_busy_loop(ne->napi_id,
>>> io_busy_loop_end, iowq,
>>> +                                      true, BUSY_POLL_BUDGET);
>>> +                       io_check_napi_entry_timeout(ne);
>>> +                       break;
>>> +               }
>>> +       } while (io_napi_busy_loop(napi_list) &&
>>> +                !io_busy_loop_end(iowq, start_time));
>>> +}
>>> +
>> How about:
>>
>> if (list is singular) {
>>
>>       do something;
>>
>>       return;
>>
>> }
>>
>> while (!io_busy_loop_end() && io_napi_busy_loop())
>>
>>       ;
>>
>>
>> Btw, start_time seems not used in singular branch.
> Hao,
>
> it takes me few readings before being able to figure out the idea
> behind your suggestions. Sorry about that!
>
> So, if I get it correctly, you are proposing extract out the singular
> block out of the while loop...
>
> IMHO, this is not a good idea because you could start iterating the
> do/while loop with a multiple entries list that ends up becoming a
> singular list after one or few iterations.
My bad, I get your point now.
>
> Check what io_napi_busy_loop() is doing...
>
> It does not look like that but a lot thoughts have been put into
> writing io_blocking_napi_busy_loop()...
True.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-02  6:27       ` Hao Xu
@ 2022-03-02  6:38         ` Hao Xu
  2022-03-02 22:03           ` Olivier Langlois
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2022-03-02  6:38 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel


On 3/2/22 14:27, Hao Xu wrote:
>
> On 3/2/22 04:06, Olivier Langlois wrote:
>> On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
>>>> +       ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>>>> +       if (!ne)
>>>> +               goto out;
>>> IMHO, we need to handle -ENOMEM here, I cut off the error handling
>>> when
>>>
>>> I did the quick coding. Sorry for misleading.
>> If you are correct, I would be shocked about this.
>>
>> I did return in my 'Linux Device Drivers' book and nowhere it is
>> mentionned that the kmalloc() can return something else than a pointer
>>
>> No mention at all about the return value
>>
>> in man page:
>> https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html
>> API doc:
>>
>> https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kmalloc#c.kmalloc 
>>
>>
>> header file:
>> https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L522
>>
>> I did browse into the kmalloc code. There is a lot of paths to cover
>> but from preliminary reading, it pretty much seems that kmalloc only
>> returns a valid pointer or NULL...
>>
>> /**
>>   * kmem_cache_alloc - Allocate an object
>>   * @cachep: The cache to allocate from.
>>   * @flags: See kmalloc().
>>   *
>>   * Allocate an object from this cache.  The flags are only relevant
>>   * if the cache has no available objects.
>>   *
>>   * Return: pointer to the new object or %NULL in case of error
>>   */
>>     /**
>>   * __do_kmalloc - allocate memory
>>   * @size: how many bytes of memory are required.
>>   * @flags: the type of memory to allocate (see kmalloc).
>>   * @caller: function caller for debug tracking of the caller
>>   *
>>   * Return: pointer to the allocated memory or %NULL in case of error
>>   */
>>
>> I'll need someone else to confirm about possible kmalloc() return
>> values with perhaps an example
>>
>> I am a bit skeptic that something special needs to be done here...
>>
>> Or perhaps you are suggesting that io_add_napi() returns an error code
>> when allocation fails.
> This is what I mean.
>>
>> as done here:
>> https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/core_marvel.c#L867 
>>
>>
>> If that is what you suggest, what would this info do for the caller?
>>
>> IMHO, it wouldn't help in any way...
>
> Hmm, I'm not sure, you're probably right based on that ENOMEM here 
> shouldn't
>
> fail the arm poll, but we wanna do it, we can do something like what 
> we do for
                             ^---but if we wanna do it

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-02  6:38         ` Hao Xu
@ 2022-03-02 22:03           ` Olivier Langlois
  2022-03-03  7:12             ` Hao Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Olivier Langlois @ 2022-03-02 22:03 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel

On Wed, 2022-03-02 at 14:38 +0800, Hao Xu wrote:
> > > 
> > > 
> > > If that is what you suggest, what would this info do for the
> > > caller?
> > > 
> > > IMHO, it wouldn't help in any way...
> > 
> > Hmm, I'm not sure, you're probably right based on that ENOMEM here 
> > shouldn't
> > 
> > fail the arm poll, but we wanna do it, we can do something like
> > what 
> > we do for
>                              ^---but if we wanna do it
My position is that being able to perform busy poll is a nice to have
feature if the necessary resources are available. If not the request
will still be handled correctly so nothing special should be done in
case of mem alloc problem.

but fair enough, lets wait for Jens and Pavel to chime him if they
would like to see something to be done here.

Beside that, all I need to know is if napi_list needs to be protected
in __io_sq_thread with regards to io worket threads to start working on
a v5.

I'll look into this question too...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll
  2022-03-02 22:03           ` Olivier Langlois
@ 2022-03-03  7:12             ` Hao Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Hao Xu @ 2022-03-03  7:12 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, Pavel Begunkov; +Cc: io-uring, linux-kernel


On 3/3/22 06:03, Olivier Langlois wrote:
> On Wed, 2022-03-02 at 14:38 +0800, Hao Xu wrote:
>>>>
>>>> If that is what you suggest, what would this info do for the
>>>> caller?
>>>>
>>>> IMHO, it wouldn't help in any way...
>>> Hmm, I'm not sure, you're probably right based on that ENOMEM here
>>> shouldn't
>>>
>>> fail the arm poll, but we wanna do it, we can do something like
>>> what
>>> we do for
>>                               ^---but if we wanna do it
> My position is that being able to perform busy poll is a nice to have
> feature if the necessary resources are available. If not the request
> will still be handled correctly so nothing special should be done in
> case of mem alloc problem.
Exactly what I meant.
>
> but fair enough, lets wait for Jens and Pavel to chime him if they
> would like to see something to be done here.
Agree.
>
> Beside that, all I need to know is if napi_list needs to be protected
> in __io_sq_thread with regards to io worket threads to start working on
> a v5.

Sorry for the delay, was stuck in other things. We definitely need

lock in this case too. It should be several lines code, super appreciate

if you could add it.


Thanks,

Hao

>
> I'll look into this question too...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-03  7:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 13:47 [PATCH v4 0/2] io_uring: Add support for napi_busy_poll Olivier Langlois
2022-03-01 13:47 ` [PATCH v4 1/2] io_uring: minor io_cqring_wait() optimization Olivier Langlois
2022-03-01 13:47 ` [PATCH v4 2/2] io_uring: Add support for napi_busy_poll Olivier Langlois
2022-03-01 18:31   ` Hao Xu
2022-03-01 20:06     ` Olivier Langlois
2022-03-01 20:14       ` Olivier Langlois
2022-03-02  6:27       ` Hao Xu
2022-03-02  6:38         ` Hao Xu
2022-03-02 22:03           ` Olivier Langlois
2022-03-03  7:12             ` Hao Xu
2022-03-02  5:12     ` Olivier Langlois
2022-03-02  6:35       ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).