All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] liburing: add api for napi busy poll
@ 2023-02-05 18:51 Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 1/4] liburing: add api to set napi busy poll settings Stefan Roesch
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Stefan Roesch @ 2023-02-05 18:51 UTC (permalink / raw)
  To: io-uring, kernel-team; +Cc: shr, axboe, ammarfaizi2

This adds two new api's to set/clear the napi busy poll settings. The two
new functions are called:
- io_uring_register_napi
- io_uring_unregister_napi

The patch series also contains the documentation for the two new functions
and two example programs. The client program is called napi-busy-poll-client
and the server program napi-busy-poll-server. The client measures the
roundtrip times of requests.

There is also a kernel patch "io-uring: support napi busy poll" to enable
this feature on the kernel side.

Changes:
- V8:
  - Use tab in liburing-ffi.map file
  - Small fix to io_uring_register_napi man page
- V7:
  - Add new functions to liburing-ffi.map file.
  - Fix some whitespace issues.
  - Address the compile errors from CI in the two files
    napi-busy-poll-client.c and napi-busy-poll-server.c
- V6:
  - Check return value of unregister napi call and verify that busy poll
    timeout and prefer busy poll return the expected values.
- V5:
  - Fixes to documentation.
  - Correct opcode for unregister call
  - Initialize napi structure in example programs
  - Address tab issues in recordRTT()
- V4:
  - Modify functions to use a structure to pass the napi busy poll settings
    to the kernel.
  - Return previous values when returning from the above functions.
  - Rename the functions and remove one function (no longer needed as the
    data is passed as a structure)
- V3:
  - Updated liburing.map file
  - Moved example programs from the test directory to the example directory.
    The two example programs don't fit well in the test category and need to
    be run from separate hosts.
  - Added the io_uring_register_napi_prefer_busy_poll API.
  - Added the call to io_uring_register_napi_prefer_busy_poll to the example
    programs
  - Updated the documentation
- V2:
  - Updated the liburing.map file for the two new functions.
    (added a 2.4 section)
  - Added a description of the new feature to the changelog file
  - Fixed the indentation of the longopts structure
  - Used defined exit constants
  - Fixed encodeUserData to support 32 bit builds


Stefan Roesch (4):
  liburing: add api to set napi busy poll settings
  liburing: add documentation for new napi busy polling
  liburing: add example programs for napi busy poll
  liburing: update changelog with new feature

 .gitignore                       |   2 +
 CHANGELOG                        |   1 +
 examples/Makefile                |   2 +
 examples/napi-busy-poll-client.c | 451 +++++++++++++++++++++++++++++++
 examples/napi-busy-poll-server.c | 386 ++++++++++++++++++++++++++
 man/io_uring_register_napi.3     |  40 +++
 man/io_uring_unregister_napi.3   |  27 ++
 src/include/liburing.h           |   3 +
 src/include/liburing/io_uring.h  |  12 +
 src/liburing-ffi.map             |   2 +
 src/liburing.map                 |   3 +
 src/register.c                   |  12 +
 12 files changed, 941 insertions(+)
 create mode 100644 examples/napi-busy-poll-client.c
 create mode 100644 examples/napi-busy-poll-server.c
 create mode 100644 man/io_uring_register_napi.3
 create mode 100644 man/io_uring_unregister_napi.3


base-commit: d32d53b65377d846635387ce6c1cd2ed0d700f92
-- 
2.30.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v8 1/4] liburing: add api to set napi busy poll settings
  2023-02-05 18:51 [PATCH v8 0/4] liburing: add api for napi busy poll Stefan Roesch
@ 2023-02-05 18:51 ` Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 2/4] liburing: add documentation for new napi busy polling Stefan Roesch
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Roesch @ 2023-02-05 18:51 UTC (permalink / raw)
  To: io-uring, kernel-team; +Cc: shr, axboe, ammarfaizi2

This adds two functions to manage the napi busy poll settings:
- io_uring_register_napi
- io_uring_unregister_napi

Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 src/include/liburing.h          |  3 +++
 src/include/liburing/io_uring.h | 12 ++++++++++++
 src/liburing-ffi.map            |  2 ++
 src/liburing.map                |  3 +++
 src/register.c                  | 12 ++++++++++++
 5 files changed, 32 insertions(+)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 1f91983..13546c4 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -240,6 +240,9 @@ int io_uring_register_sync_cancel(struct io_uring *ring,
 int io_uring_register_file_alloc_range(struct io_uring *ring,
 					unsigned off, unsigned len);
 
+int io_uring_register_napi(struct io_uring *ring, struct io_uring_napi *napi);
+int io_uring_unregister_napi(struct io_uring *ring, struct io_uring_napi *napi);
+
 int io_uring_get_events(struct io_uring *ring);
 int io_uring_submit_and_get_events(struct io_uring *ring);
 
diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
index 636a4c2..ab6c86e 100644
--- a/src/include/liburing/io_uring.h
+++ b/src/include/liburing/io_uring.h
@@ -518,6 +518,10 @@ enum {
 	/* register a range of fixed file slots for automatic slot allocation */
 	IORING_REGISTER_FILE_ALLOC_RANGE	= 25,
 
+	/* set/clear busy poll settings */
+	IORING_REGISTER_NAPI			= 26,
+	IORING_UNREGISTER_NAPI			= 27,
+
 	/* this goes last */
 	IORING_REGISTER_LAST
 };
@@ -640,6 +644,14 @@ struct io_uring_buf_reg {
 	__u64	resv[3];
 };
 
+/* argument for IORING_(UN)REGISTER_NAPI */
+struct io_uring_napi {
+	__u32   busy_poll_to;
+	__u8    prefer_busy_poll;
+	__u8    pad[3];
+	__u64   resv;
+};
+
 /*
  * io_uring_restriction->opcode values
  */
diff --git a/src/liburing-ffi.map b/src/liburing-ffi.map
index cebf882..ed29000 100644
--- a/src/liburing-ffi.map
+++ b/src/liburing-ffi.map
@@ -166,6 +166,8 @@ LIBURING_2.4 {
 		io_uring_prep_recv;
 		io_uring_prep_msg_ring_cqe_flags;
 		io_uring_prep_msg_ring_fd;
+		io_uring_register_napi;
+		io_uring_unregister_napi;
 	local:
 		*;
 };
diff --git a/src/liburing.map b/src/liburing.map
index 1dbe765..013dc4a 100644
--- a/src/liburing.map
+++ b/src/liburing.map
@@ -76,4 +76,7 @@ LIBURING_2.4 {
 
 		io_uring_enable_rings;
 		io_uring_register_restrictions;
+
+		io_uring_register_napi;
+		io_uring_unregister_napi;
 } LIBURING_2.3;
diff --git a/src/register.c b/src/register.c
index ac4c9e3..966839b 100644
--- a/src/register.c
+++ b/src/register.c
@@ -342,3 +342,15 @@ int io_uring_register_file_alloc_range(struct io_uring *ring,
 				       IORING_REGISTER_FILE_ALLOC_RANGE, &range,
 				       0);
 }
+
+int io_uring_register_napi(struct io_uring *ring, struct io_uring_napi *napi)
+{
+	return __sys_io_uring_register(ring->ring_fd,
+				IORING_REGISTER_NAPI, napi, 0);
+}
+
+int io_uring_unregister_napi(struct io_uring *ring, struct io_uring_napi *napi)
+{
+	return __sys_io_uring_register(ring->ring_fd,
+				IORING_UNREGISTER_NAPI, napi, 0);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v8 2/4] liburing: add documentation for new napi busy polling
  2023-02-05 18:51 [PATCH v8 0/4] liburing: add api for napi busy poll Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 1/4] liburing: add api to set napi busy poll settings Stefan Roesch
@ 2023-02-05 18:51 ` Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 3/4] liburing: add example programs for napi busy poll Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 4/4] liburing: update changelog with new feature Stefan Roesch
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Roesch @ 2023-02-05 18:51 UTC (permalink / raw)
  To: io-uring, kernel-team; +Cc: shr, axboe, ammarfaizi2

This adds two man pages for the two new functions:
- io_uring_register_nap
- io_uring_unregister_napi

Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 man/io_uring_register_napi.3   | 40 ++++++++++++++++++++++++++++++++++
 man/io_uring_unregister_napi.3 | 27 +++++++++++++++++++++++
 2 files changed, 67 insertions(+)
 create mode 100644 man/io_uring_register_napi.3
 create mode 100644 man/io_uring_unregister_napi.3

diff --git a/man/io_uring_register_napi.3 b/man/io_uring_register_napi.3
new file mode 100644
index 0000000..6ce8cff
--- /dev/null
+++ b/man/io_uring_register_napi.3
@@ -0,0 +1,40 @@
+.\" Copyright (C) 2022 Stefan Roesch <shr@devkernel.io>
+.\"
+.\" SPDX-License-Identifier: LGPL-2.0-or-later
+.\"
+.TH io_uring_register_napi 3 "November 16, 2022" "liburing-2.4" "liburing Manual"
+.SH NAME
+io_uring_register_napi \- register NAPI busy poll settings
+.SH SYNOPSIS
+.nf
+.B #include <liburing.h>
+.PP
+.BI "int io_uring_register_napi(struct io_uring *" ring ","
+.BI "                           struct io_uring_napi *" napi)
+.PP
+.fi
+.SH DESCRIPTION
+.PP
+The
+.BR io_uring_register_napi (3)
+function registers the NAPI settings for subsequent operations. The NAPI
+settings are specified in the structure that is passed in the
+.I napi
+parameter. The structure consists of the napi timeout
+.I busy_poll_to
+(napi busy poll timeout in us) and
+.IR prefer_busy_poll .
+
+Registering a NAPI settings sets the mode when calling the function
+napi_busy_loop and corresponds to the SO_PREFER_BUSY_POLL socket
+option.
+
+NAPI busy poll can reduce the network roundtrip time.
+
+
+.SH RETURN VALUE
+On success
+.BR io_uring_register_napi (3)
+return 0. On failure they return
+.BR -errno .
+It also updates the napi structure with the current values.
diff --git a/man/io_uring_unregister_napi.3 b/man/io_uring_unregister_napi.3
new file mode 100644
index 0000000..f7087ef
--- /dev/null
+++ b/man/io_uring_unregister_napi.3
@@ -0,0 +1,27 @@
+.\" Copyright (C) 2022 Stefan Roesch <shr@devkernel.io>
+.\"
+.\" SPDX-License-Identifier: LGPL-2.0-or-later
+.\"
+.TH io_uring_unregister_napi 3 "November 16, 2022" "liburing-2.4" "liburing Manual"
+.SH NAME
+io_uring_unregister_napi \- unregister NAPI busy poll settings
+.SH SYNOPSIS
+.nf
+.B #include <liburing.h>
+.PP
+.BI "int io_uring_unregister_napi(struct io_uring *" ring ","
+.BI "                             struct io_uring_napi *" napi)
+.PP
+.fi
+.SH DESCRIPTION
+.PP
+The
+.BR io_uring_unregister_napi (3)
+function unregisters the NAPI busy poll settings for subsequent operations.
+
+.SH RETURN VALUE
+On success
+.BR io_uring_unregister_napi (3)
+return 0. On failure they return
+.BR -errno .
+It also updates the napi structure with the current values.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v8 3/4] liburing: add example programs for napi busy poll
  2023-02-05 18:51 [PATCH v8 0/4] liburing: add api for napi busy poll Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 1/4] liburing: add api to set napi busy poll settings Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 2/4] liburing: add documentation for new napi busy polling Stefan Roesch
@ 2023-02-05 18:51 ` Stefan Roesch
  2023-02-05 18:51 ` [PATCH v8 4/4] liburing: update changelog with new feature Stefan Roesch
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Roesch @ 2023-02-05 18:51 UTC (permalink / raw)
  To: io-uring, kernel-team; +Cc: shr, axboe, ammarfaizi2

This adds two example programs to test the napi busy poll functionality.
It consists of a client program and a server program. To get a napi id,
the client and the server program need to be run on different hosts.

To test the napi busy poll timeout, the -t needs to be specified. A
reasonable value for the busy poll timeout is 100. By specifying the
busy poll timeout on the server and the client the best results are
accomplished.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 .gitignore                       |   2 +
 examples/Makefile                |   2 +
 examples/napi-busy-poll-client.c | 451 +++++++++++++++++++++++++++++++
 examples/napi-busy-poll-server.c | 386 ++++++++++++++++++++++++++
 4 files changed, 841 insertions(+)
 create mode 100644 examples/napi-busy-poll-client.c
 create mode 100644 examples/napi-busy-poll-server.c

diff --git a/.gitignore b/.gitignore
index 0cd84dc..edccea9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,6 +18,8 @@
 /examples/io_uring-test
 /examples/io_uring-udp
 /examples/link-cp
+/examples/napi-busy-poll-client
+/examples/napi-busy-poll-server
 /examples/ucontext-cp
 /examples/poll-bench
 /examples/send-zerocopy
diff --git a/examples/Makefile b/examples/Makefile
index e561e05..59f1260 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -15,6 +15,8 @@ example_srcs := \
 	io_uring-test.c \
 	io_uring-udp.c \
 	link-cp.c \
+	napi-busy-poll-client.c \
+	napi-busy-poll-server.c \
 	poll-bench.c \
 	send-zerocopy.c
 
diff --git a/examples/napi-busy-poll-client.c b/examples/napi-busy-poll-client.c
new file mode 100644
index 0000000..37adaee
--- /dev/null
+++ b/examples/napi-busy-poll-client.c
@@ -0,0 +1,451 @@
+#include <ctype.h>
+#include <errno.h>
+#include <float.h>
+#include <getopt.h>
+#include <liburing.h>
+#include <math.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <time.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <netinet/in.h>
+
+#define MAXBUFLEN 100
+#define PORTNOLEN 10
+#define ADDRLEN   80
+#define RINGSIZE  1024
+
+#define printable(ch) (isprint((unsigned char)ch) ? ch : '#')
+
+enum {
+	IOURING_RECV,
+	IOURING_SEND,
+	IOURING_RECVMSG,
+	IOURING_SENDMSG
+};
+
+struct ctx
+{
+	struct io_uring ring;
+	struct sockaddr_in6 saddr;
+
+	int sockfd;
+	int buffer_len;
+	int num_pings;
+	bool napi_check;
+
+	union {
+		char buffer[MAXBUFLEN];
+		struct timespec ts;
+	};
+
+	int rtt_index;
+	double *rtt;
+};
+
+struct options
+{
+	int  num_pings;
+	__u32  timeout;
+
+	bool sq_poll;
+	bool busy_loop;
+	bool prefer_busy_poll;
+
+	char port[PORTNOLEN];
+	char addr[ADDRLEN];
+};
+
+static struct option longopts[] =
+{
+	{"address"  , 1, NULL, 'a'},
+	{"busy"     , 0, NULL, 'b'},
+	{"help"     , 0, NULL, 'h'},
+	{"num_pings", 1, NULL, 'n'},
+	{"port"     , 1, NULL, 'p'},
+	{"prefer"   , 1, NULL, 'u'},
+	{"sqpoll"   , 0, NULL, 's'},
+	{"timeout"  , 1, NULL, 't'},
+	{NULL       , 0, NULL,  0 }
+};
+
+static void printUsage(const char *name)
+{
+	fprintf(stderr,
+	"Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]"
+	" [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-u||--prefer] [-h|--help]\n"
+	"--address\n"
+	"-a        : remote or local ipv6 address\n"
+	"--busy\n"
+	"-b        : busy poll io_uring instead of blocking.\n"
+	"--num_pings\n"
+	"-n        : number of pings\n"
+	"--port\n"
+	"-p        : port\n"
+	"--sqpoll\n"
+	"-s        : Configure io_uring to use SQPOLL thread\n"
+	"--timeout\n"
+	"-t        : Configure NAPI busy poll timeoutn"
+	"--prefer\n"
+	"-u        : prefer NAPI busy poll\n"
+	"--help\n"
+	"-h        : Display this usage message\n\n",
+	name);
+}
+
+static void printError(const char *msg, int opt)
+{
+	if (msg && opt)
+		fprintf(stderr, "%s (-%c)\n", msg, printable(opt));
+}
+
+static void setProcessScheduler(void)
+{
+	struct sched_param param;
+
+	param.sched_priority = sched_get_priority_max(SCHED_FIFO);
+	if (sched_setscheduler(0, SCHED_FIFO, &param) < 0)
+		fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n",
+			errno, strerror(errno));
+}
+
+static double diffTimespec(const struct timespec *time1, const struct timespec *time0)
+{
+	return (time1->tv_sec - time0->tv_sec)
+		+ (time1->tv_nsec - time0->tv_nsec) / 1000000000.0;
+}
+
+static uint64_t encodeUserData(char type, int fd)
+{
+	return (uint32_t)fd | ((uint64_t)type << 56);
+}
+
+static void decodeUserData(uint64_t data, char *type, int *fd)
+{
+	*type = data >> 56;
+	*fd   = data & 0xffffffffU;
+}
+
+static const char *opTypeToStr(char type)
+{
+	const char *res;
+
+	switch (type) {
+	case IOURING_RECV:
+		res = "IOURING_RECV";
+		break;
+	case IOURING_SEND:
+		res = "IOURING_SEND";
+		break;
+	case IOURING_RECVMSG:
+		res = "IOURING_RECVMSG";
+		break;
+	case IOURING_SENDMSG:
+		res = "IOURING_SENDMSG";
+		break;
+	default:
+		res = "Unknown";
+	}
+
+	return res;
+}
+
+static void reportNapi(struct ctx *ctx)
+{
+	unsigned int napi_id = 0;
+	socklen_t len = sizeof(napi_id);
+
+	getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len);
+	if (napi_id)
+		printf(" napi id: %d\n", napi_id);
+	else
+		printf(" unassigned napi id\n");
+
+	ctx->napi_check = true;
+}
+
+static void sendPing(struct ctx *ctx)
+{
+	struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring);
+
+	clock_gettime(CLOCK_REALTIME, (struct timespec *)ctx->buffer);
+
+	io_uring_prep_send(sqe, ctx->sockfd, ctx->buffer, sizeof(struct timespec), 0);
+	sqe->user_data = encodeUserData(IOURING_SEND, ctx->sockfd);
+}
+
+static void receivePing(struct ctx *ctx)
+{
+	struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring);
+
+	io_uring_prep_recv(sqe, ctx->sockfd, ctx->buffer, MAXBUFLEN, 0);
+	sqe->user_data = encodeUserData(IOURING_RECV, ctx->sockfd);
+}
+
+static void recordRTT(struct ctx *ctx)
+{
+	struct timespec startTs = ctx->ts;
+
+	// Send next ping.
+	sendPing(ctx);
+
+	// Store round-trip time.
+	ctx->rtt[ctx->rtt_index] = diffTimespec(&ctx->ts, &startTs);
+	ctx->rtt_index++;
+}
+
+static void printStats(struct ctx *ctx)
+{
+	double minRTT    = DBL_MAX;
+	double maxRTT    = 0.0;
+	double avgRTT    = 0.0;
+	double stddevRTT = 0.0;
+
+	// Calculate min, max, avg.
+	for (int i = 0; i < ctx->rtt_index; i++) {
+		if (ctx->rtt[i] < minRTT)
+			minRTT = ctx->rtt[i];
+		if (ctx->rtt[i] > maxRTT)
+			maxRTT = ctx->rtt[i];
+
+		avgRTT += ctx->rtt[i];
+	}
+	avgRTT /= ctx->rtt_index;
+
+	// Calculate stddev.
+	for (int i = 0; i < ctx->rtt_index; i++)
+		stddevRTT += fabs(ctx->rtt[i] - avgRTT);
+	stddevRTT /= ctx->rtt_index;
+
+	fprintf(stdout, " rtt(us) min/avg/max/mdev = %.3f/%.3f/%.3f/%.3f\n",
+		minRTT * 1000000, avgRTT * 1000000, maxRTT * 1000000, stddevRTT * 1000000);
+}
+
+static int completion(struct ctx *ctx, struct io_uring_cqe *cqe)
+{
+	char type;
+	int  fd;
+	int  res = cqe->res;
+
+	decodeUserData(cqe->user_data, &type, &fd);
+	if (res < 0) {
+		fprintf(stderr, "unexpected %s failure: (%d) %s\n",
+			opTypeToStr(type), -res, strerror(-res));
+		return -1;
+	}
+
+	switch (type) {
+	case IOURING_SEND:
+		receivePing(ctx);
+		break;
+	case IOURING_RECV:
+		if (res != sizeof(struct timespec)) {
+			fprintf(stderr, "unexpected ping reply len: %d\n", res);
+			abort();
+		}
+
+		if (!ctx->napi_check) {
+			reportNapi(ctx);
+			sendPing(ctx);
+		} else {
+			recordRTT(ctx);
+		}
+
+		--ctx->num_pings;
+		break;
+
+	default:
+		fprintf(stderr, "unexpected %s completion\n",
+			opTypeToStr(type));
+		return -1;
+		break;
+	}
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	struct ctx       ctx;
+	struct options   opt;
+	struct __kernel_timespec *tsPtr;
+	struct __kernel_timespec ts;
+	struct io_uring_params params;
+	struct io_uring_napi napi;
+	int flag;
+
+	memset(&opt, 0, sizeof(struct options));
+
+	// Process flags.
+	while ((flag = getopt_long(argc, argv, ":hsbua:n:p:t:", longopts, NULL)) != -1) {
+		switch (flag) {
+		case 'a':
+			strcpy(opt.addr, optarg);
+			break;
+		case 'b':
+			opt.busy_loop = true;
+			break;
+		case 'h':
+			printUsage(argv[0]);
+			exit(0);
+			break;
+		case 'n':
+			opt.num_pings = atoi(optarg) + 1;
+			break;
+		case 'p':
+			strcpy(opt.port, optarg);
+			break;
+		case 's':
+			opt.sq_poll = true;
+			break;
+		case 't':
+			opt.timeout = atoi(optarg);
+			break;
+		case 'u':
+			opt.prefer_busy_poll = true;
+			break;
+		case ':':
+			printError("Missing argument", optopt);
+			printUsage(argv[0]);
+			exit(-1);
+			break;
+		case '?':
+			printError("Unrecognized option", optopt);
+			printUsage(argv[0]);
+			exit(-1);
+			break;
+
+		default:
+			fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n");
+			exit(-1);
+			break;
+		}
+	}
+
+	if (strlen(opt.addr) == 0) {
+		fprintf(stderr, "address option is mandatory\n");
+		printUsage(argv[0]);
+		exit(1);
+	}
+
+	ctx.saddr.sin6_port   = htons(atoi(opt.port));
+	ctx.saddr.sin6_family = AF_INET6;
+
+	if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) {
+		fprintf(stderr, "inet_pton error for %s\n", optarg);
+		printUsage(argv[0]);
+		exit(1);
+	}
+
+	// Connect to server.
+	fprintf(stdout, "Connecting to %s... (port=%s) to send %d pings\n", opt.addr, opt.port, opt.num_pings - 1);
+
+	if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) {
+		fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno));
+		exit(1);
+	}
+
+	if (connect(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) {
+		fprintf(stderr, "connect() failed: (%d) %s\n", errno, strerror(errno));
+		exit(1);
+	}
+
+	// Setup ring.
+	memset(&params, 0, sizeof(params));
+	memset(&ts, 0, sizeof(ts));
+	memset(&napi, 0, sizeof(napi));
+
+	if (opt.sq_poll) {
+		params.flags = IORING_SETUP_SQPOLL;
+		params.sq_thread_idle = 50;
+	}
+
+	if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, &params) < 0) {
+		fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n",
+			errno, strerror(errno));
+		exit(1);
+	}
+
+	if (opt.timeout || opt.prefer_busy_poll) {
+		napi.prefer_busy_poll = opt.prefer_busy_poll;
+		napi.busy_poll_to = opt.timeout;
+
+		io_uring_register_napi(&ctx.ring, &napi);
+	}
+
+	if (opt.busy_loop)
+		tsPtr = &ts;
+	else
+		tsPtr = NULL;
+
+	// Use realtime scheduler.
+	setProcessScheduler();
+
+	// Copy payload.
+	clock_gettime(CLOCK_REALTIME, &ctx.ts);
+
+	// Setup context.
+	ctx.napi_check = false;
+	ctx.buffer_len = sizeof(struct timespec);
+	ctx.num_pings  = opt.num_pings;
+
+	ctx.rtt_index = 0;
+	ctx.rtt = (double *)malloc(sizeof(double) * opt.num_pings);
+	if (!ctx.rtt) {
+		fprintf(stderr, "Cannot allocate results array\n");
+		exit(1);
+	}
+
+	// Send initial message to get napi id.
+	sendPing(&ctx);
+
+	while (ctx.num_pings != 0) {
+		int res;
+		unsigned num_completed = 0;
+		unsigned head;
+		struct io_uring_cqe *cqe;
+
+		do {
+			res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL);
+		}
+		while (res < 0 && errno == ETIME);
+
+		io_uring_for_each_cqe(&ctx.ring, head, cqe) {
+			++num_completed;
+			if (completion(&ctx, cqe))
+				goto out;
+		}
+
+		if (num_completed)
+			io_uring_cq_advance(&ctx.ring, num_completed);
+	}
+
+	printStats(&ctx);
+
+out:
+	// Clean up.
+	if (opt.timeout || opt.prefer_busy_poll) {
+		io_uring_unregister_napi(&ctx.ring, &napi);
+		if (opt.timeout          != napi.busy_poll_to ||
+		    opt.prefer_busy_poll != napi.prefer_busy_poll) {
+			fprintf(stderr, "Expected busy poll to = %d, got %d\n",
+				opt.timeout, napi.busy_poll_to);
+			fprintf(stderr, "Expected prefer busy poll = %d, got %d\n",
+				opt.prefer_busy_poll, napi.prefer_busy_poll);
+		}
+	} else {
+		io_uring_unregister_napi(&ctx.ring, NULL);
+	}
+	io_uring_queue_exit(&ctx.ring);
+
+	free(ctx.rtt);
+	close(ctx.sockfd);
+
+	return 0;
+}
diff --git a/examples/napi-busy-poll-server.c b/examples/napi-busy-poll-server.c
new file mode 100644
index 0000000..ccfd91f
--- /dev/null
+++ b/examples/napi-busy-poll-server.c
@@ -0,0 +1,386 @@
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <liburing.h>
+#include <math.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <time.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <netinet/in.h>
+
+#define MAXBUFLEN 100
+#define PORTNOLEN 10
+#define ADDRLEN   80
+#define RINGSIZE  1024
+
+#define printable(ch) (isprint((unsigned char)ch) ? ch : '#')
+
+enum {
+	IOURING_RECV,
+	IOURING_SEND,
+	IOURING_RECVMSG,
+	IOURING_SENDMSG
+};
+
+struct ctx
+{
+	struct io_uring     ring;
+	struct sockaddr_in6 saddr;
+	struct iovec        iov;
+	struct msghdr       msg;
+
+	int sockfd;
+	int buffer_len;
+	int num_pings;
+	bool napi_check;
+
+	union {
+		char buffer[MAXBUFLEN];
+		struct timespec ts;
+	};
+};
+
+struct options
+{
+	int  num_pings;
+	__u32 timeout;
+
+	bool listen;
+	bool sq_poll;
+	bool busy_loop;
+	bool prefer_busy_poll;
+
+	char port[PORTNOLEN];
+	char addr[ADDRLEN];
+};
+
+static struct option longopts[] =
+{
+	{"address"  , 1, NULL, 'a'},
+	{"busy"     , 0, NULL, 'b'},
+	{"help"     , 0, NULL, 'h'},
+	{"listen"   , 0, NULL, 'l'},
+	{"num_pings", 1, NULL, 'n'},
+	{"port"     , 1, NULL, 'p'},
+	{"prefer"   , 1, NULL, 'u'},
+	{"sqpoll"   , 0, NULL, 's'},
+	{"timeout"  , 1, NULL, 't'},
+	{NULL       , 0, NULL,  0 }
+};
+
+static void printUsage(const char *name)
+{
+	fprintf(stderr,
+	"Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]"
+	" [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-u|--prefer] [-h|--help]\n"
+	" --listen\n"
+	"-l        : Server mode\n"
+	"--address\n"
+	"-a        : remote or local ipv6 address\n"
+	"--busy\n"
+	"-b        : busy poll io_uring instead of blocking.\n"
+	"--num_pings\n"
+	"-n        : number of pings\n"
+	"--port\n"
+	"-p        : port\n"
+	"--sqpoll\n"
+	"-s        : Configure io_uring to use SQPOLL thread\n"
+	"--timeout\n"
+	"-t        : Configure NAPI busy poll timeoutn"
+	"--prefer\n"
+	"-u        : prefer NAPI busy poll\n"
+	"--help\n"
+	"-h        : Display this usage message\n\n",
+	name);
+}
+
+static void printError(const char *msg, int opt)
+{
+	if (msg && opt)
+		fprintf(stderr, "%s (-%c)\n", msg, printable(opt));
+}
+
+static void setProcessScheduler(void)
+{
+	struct sched_param param;
+
+	param.sched_priority = sched_get_priority_max(SCHED_FIFO);
+	if (sched_setscheduler(0, SCHED_FIFO, &param) < 0)
+		fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n",
+			errno, strerror(errno));
+}
+
+static uint64_t encodeUserData(char type, int fd)
+{
+	return (uint32_t)fd | ((__u64)type << 56);
+}
+
+static void decodeUserData(uint64_t data, char *type, int *fd)
+{
+	*type = data >> 56;
+	*fd   = data & 0xffffffffU;
+}
+
+static const char *opTypeToStr(char type)
+{
+	const char *res;
+
+	switch (type) {
+	case IOURING_RECV:
+		res = "IOURING_RECV";
+		break;
+	case IOURING_SEND:
+		res = "IOURING_SEND";
+		break;
+	case IOURING_RECVMSG:
+		res = "IOURING_RECVMSG";
+		break;
+	case IOURING_SENDMSG:
+		res = "IOURING_SENDMSG";
+		break;
+	default:
+		res = "Unknown";
+	}
+
+	return res;
+}
+
+static void reportNapi(struct ctx *ctx)
+{
+	unsigned int napi_id = 0;
+	socklen_t len = sizeof(napi_id);
+
+	getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len);
+	if (napi_id)
+		printf(" napi id: %d\n", napi_id);
+	else
+		printf(" unassigned napi id\n");
+
+	ctx->napi_check = true;
+}
+
+static void sendPing(struct ctx *ctx)
+{
+
+	struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring);
+
+	io_uring_prep_sendmsg(sqe, ctx->sockfd, &ctx->msg, 0);
+	sqe->user_data = encodeUserData(IOURING_SENDMSG, ctx->sockfd);
+}
+
+static void receivePing(struct ctx *ctx)
+{
+	bzero(&ctx->msg, sizeof(struct msghdr));
+	ctx->msg.msg_name    = &ctx->saddr;
+	ctx->msg.msg_namelen = sizeof(struct sockaddr_in6);
+	ctx->iov.iov_base    = ctx->buffer;
+	ctx->iov.iov_len     = MAXBUFLEN;
+	ctx->msg.msg_iov     = &ctx->iov;
+	ctx->msg.msg_iovlen  = 1;
+
+	struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring);
+	io_uring_prep_recvmsg(sqe, ctx->sockfd, &ctx->msg, 0);
+	sqe->user_data = encodeUserData(IOURING_RECVMSG, ctx->sockfd);
+}
+
+static void completion(struct ctx *ctx, struct io_uring_cqe *cqe)
+{
+	char type;
+	int  fd;
+	int  res = cqe->res;
+
+	decodeUserData(cqe->user_data, &type, &fd);
+	if (res < 0) {
+		fprintf(stderr, "unexpected %s failure: (%d) %s\n",
+			opTypeToStr(type), -res, strerror(-res));
+		abort();
+	}
+
+	switch (type) {
+	case IOURING_SENDMSG:
+		receivePing(ctx);
+		--ctx->num_pings;
+		break;
+	case IOURING_RECVMSG:
+		ctx->iov.iov_len = res;
+		sendPing(ctx);
+		if (!ctx->napi_check)
+			reportNapi(ctx);
+		break;
+	default:
+		fprintf(stderr, "unexpected %s completion\n",
+			opTypeToStr(type));
+		abort();
+		break;
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	int flag;
+	struct ctx       ctx;
+	struct options   opt;
+	struct __kernel_timespec *tsPtr;
+	struct __kernel_timespec ts;
+	struct io_uring_params params;
+	struct io_uring_napi napi;
+
+	memset(&opt, 0, sizeof(struct options));
+
+	// Process flags.
+	while ((flag = getopt_long(argc, argv, ":lhsbua:n:p:t:", longopts, NULL)) != -1) {
+		switch (flag) {
+		case 'a':
+			strcpy(opt.addr, optarg);
+			break;
+		case 'b':
+			opt.busy_loop = true;
+			break;
+		case 'h':
+			printUsage(argv[0]);
+			exit(0);
+			break;
+		case 'l':
+			opt.listen = true;
+			break;
+		case 'n':
+			opt.num_pings = atoi(optarg) + 1;
+			break;
+		case 'p':
+			strcpy(opt.port, optarg);
+			break;
+		case 's':
+			opt.sq_poll = true;
+			break;
+		case 't':
+			opt.timeout = atoi(optarg);
+			break;
+		case 'u':
+			opt.prefer_busy_poll = true;
+			break;
+		case ':':
+			printError("Missing argument", optopt);
+			printUsage(argv[0]);
+			exit(-1);
+			break;
+		case '?':
+			printError("Unrecognized option", optopt);
+			printUsage(argv[0]);
+			exit(-1);
+			break;
+
+		default:
+			fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n");
+			exit(-1);
+			break;
+		}
+	}
+
+	if (strlen(opt.addr) == 0) {
+		fprintf(stderr, "address option is mandatory\n");
+		printUsage(argv[0]);
+		exit(1);
+	}
+
+	ctx.saddr.sin6_port   = htons(atoi(opt.port));
+	ctx.saddr.sin6_family = AF_INET6;
+
+	if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) {
+		fprintf(stderr, "inet_pton error for %s\n", optarg);
+		printUsage(argv[0]);
+		exit(1);
+	}
+
+	// Connect to server.
+	fprintf(stdout, "Listening %s : %s...\n", opt.addr, opt.port);
+
+	if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) {
+		fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno));
+		exit(1);
+	}
+
+	if (bind(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) {
+		fprintf(stderr, "bind() failed: (%d) %s\n", errno, strerror(errno));
+		exit(1);
+	}
+
+	// Setup ring.
+	memset(&params, 0, sizeof(params));
+	memset(&ts, 0, sizeof(ts));
+	memset(&napi, 0, sizeof(napi));
+
+	if (opt.sq_poll) {
+		params.flags = IORING_SETUP_SQPOLL;
+		params.sq_thread_idle = 50;
+	}
+
+	if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, &params) < 0) {
+		fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n",
+			errno, strerror(errno));
+		exit(1);
+	}
+
+	if (opt.timeout || opt.prefer_busy_poll) {
+		napi.prefer_busy_poll = opt.prefer_busy_poll;
+		napi.busy_poll_to = opt.timeout;
+
+		io_uring_register_napi(&ctx.ring, &napi);
+	}
+
+	if (opt.busy_loop)
+		tsPtr = &ts;
+	else
+		tsPtr = NULL;
+
+
+	// Use realtime scheduler.
+	setProcessScheduler();
+
+	// Copy payload.
+	clock_gettime(CLOCK_REALTIME, &ctx.ts);
+
+	// Setup context.
+	ctx.napi_check = false;
+	ctx.buffer_len = sizeof(struct timespec);
+	ctx.num_pings  = opt.num_pings;
+
+	// Receive initial message to get napi id.
+	receivePing(&ctx);
+
+	while (ctx.num_pings != 0) {
+		int res;
+		unsigned int num_completed = 0;
+		unsigned int head;
+		struct io_uring_cqe *cqe;
+
+		do {
+			res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL);
+		}
+		while (res < 0 && errno == ETIME);
+
+		io_uring_for_each_cqe(&ctx.ring, head, cqe) {
+			++num_completed;
+			completion(&ctx, cqe);
+		}
+
+		if (num_completed) {
+			io_uring_cq_advance(&ctx.ring, num_completed);
+		}
+	}
+
+	// Clean up.
+	if (opt.timeout || opt.prefer_busy_poll)
+		io_uring_unregister_napi(&ctx.ring, &napi);
+
+	io_uring_queue_exit(&ctx.ring);
+	close(ctx.sockfd);
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v8 4/4] liburing: update changelog with new feature
  2023-02-05 18:51 [PATCH v8 0/4] liburing: add api for napi busy poll Stefan Roesch
                   ` (2 preceding siblings ...)
  2023-02-05 18:51 ` [PATCH v8 3/4] liburing: add example programs for napi busy poll Stefan Roesch
@ 2023-02-05 18:51 ` Stefan Roesch
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Roesch @ 2023-02-05 18:51 UTC (permalink / raw)
  To: io-uring, kernel-team; +Cc: shr, axboe, ammarfaizi2

Add a new entry to the changelog file for the napi busy poll feature.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 0722aae..7810137 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,7 @@ liburing-2.4 release
 - Add io_uring_prep_msg_ring_cqe_flags() function.
 - Deprecate --nolibc configure option.
 - CONFIG_NOLIBC is always enabled on x86-64, x86, and aarch64.
+- Support for napi busy polling
 
 liburing-2.3 release
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-05 18:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-05 18:51 [PATCH v8 0/4] liburing: add api for napi busy poll Stefan Roesch
2023-02-05 18:51 ` [PATCH v8 1/4] liburing: add api to set napi busy poll settings Stefan Roesch
2023-02-05 18:51 ` [PATCH v8 2/4] liburing: add documentation for new napi busy polling Stefan Roesch
2023-02-05 18:51 ` [PATCH v8 3/4] liburing: add example programs for napi busy poll Stefan Roesch
2023-02-05 18:51 ` [PATCH v8 4/4] liburing: update changelog with new feature Stefan Roesch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.