All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk
@ 2023-02-22 13:25 Xiaoguang Wang
  2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:25 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

Normally, userspace block device implementations need to copy data between
kernel block layer's io requests and userspace block device's userspace
daemon. For example, ublk and tcmu both have similar logic, but this
operation will consume cpu resources obviously, especially for large io.

There are methods trying to reduce these cpu overheads, then userspace
block device's io performance will be improved further. These methods
contain: 1) use special hardware to do memory copy, but seems not all
architectures have these special hardware; 2) software methods, such as
mmap kernel block layer's io requests's data to userspace daemon [1],
but it has page table's map/unmap, tlb flush overhead, security issue,
etc, and it maybe only friendly to large io.

To solve this problem, I'd propose a new method, which will combine the
respective advantages of io_uring and ebpf. Add a new program type
BPF_PROG_TYPE_UBLK for ublk, userspace block device daemon process will
register ebpf progs, which will use bpf helper offered by ublk bpf prog
type to submit io requests on behalf of daemon process in kernel, note
io requests will use kernel block layer io reqeusts's pages to do io,
then the memory copy overhead will be gone.

Currently only one helper has beed added:
    u64 bpf_ublk_queue_sqe(struct ublk_io_bpf_ctx *bpf_ctx,
                struct io_uring_sqe *sqe, u32 sqe_len, u32, fd)

This helper will use io_uring to submit io requests, so we need to make
io_uring be able to submit a sqe located in kernel(Some codes idea comes
from Pavel's patchset [2], but pavel's patch needs sqe->buf comes from
userspace addr). Bpf prog will initialize sqes, but does not need to
initializes sqes' buf field, sqe->buf will come from kernel block layer
io requests in some form. See patch 2 for more.

By using ebpf, we can implement various userspace io logic in kernel,
and the ultimate goal is to support users to build an in-kernel io
agent for userspace daemon, userspace block device's daemon justs
registers an ebpf at startup, though which I think there'll be a long
way to go. There'll be advantages at least:
  1. Remove memory copy between kernel block layer and userspace daemon
completely.
  2. Save memory. Userspace daemon doesn't need to maintain memory to
issue and complete io requests, and use kernel block layer io requests
memory directly.
  2. We may reduce the number of round trips between kernel and userspace
daemon, so may reduce kernel & userspace context switch overheads.

HOW to test:
  git clone https://github.com/ming1/ubdsrv
  cd ubdsrv
  git am -3 0001-Add-ebpf-support.patch
  # replace "/root/ublk/" with your own linux build directory
  cd bpf; make; cd ..;
  ./build_with_liburing_src
  ./ublk add -t loop -q 1 -d 128 -f loop.file

fio job file:
  [global]
  direct=1
  filename=/dev/ublkb0
  time_based
  runtime=60
  numjobs=1
  cpus_allowed=1

  [rand-read-4k]
  bs=2048K
  iodepth=16
  ioengine=libaio
  rw=randwrite
  stonewall

Without this patch:
  READ: bw=373MiB/s (392MB/s), 373MiB/s-373MiB/s (392MB/s-392MB/s), io=21.9GiB (23.5GB), run=60042-60042msec
  WRITE: bw=371MiB/s (389MB/s), 371MiB/s-371MiB/s (389MB/s-389MB/s), io=21.8GiB (23.4GB), run=60042-60042msec
  ublk daemon's cpu utilization is about 12.5%, showed by top tool.

With this patch:
  READ: bw=373MiB/s (392MB/s), 373MiB/s-373MiB/s (392MB/s-392MB/s), io=21.9GiB (23.5GB), run=60043-60043msec
  WRITE: bw=371MiB/s (389MB/s), 371MiB/s-371MiB/s (389MB/s-389MB/s), io=21.8GiB (23.4GB), run=60043-60043msec
ublk daemon's cpu utilization is about 1%, showed by top tool.

From above tests, this method can reduce cpu copy overhead obviously.

TODO:
I must say this patchset is still just a RFC for design.

1. Currently for this patchset, I just make ublk ebpf prog submit io requests
using io_uring in kernel, cqe event still needs to be handled in userspace
daemon. Once later we succeed in make io_uring handle cqe in kernel, ublk
ebpf prog can implement io in kernel.

2. I have not done much tests yet, will run liburing/ublk/blktests later.

3. Try to build complicated ebpf prog.

Any review and suggestions are welcome, thanks.

[1] https://lore.kernel.org/all/20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com/
[2] https://lore.kernel.org/all/cover.1621424513.git.asml.silence@gmail.com/

Xiaoguang Wang (4):
  bpf: add UBLK program type
  io_uring: enable io_uring to submit sqes located in kernel
  io_uring: introduce IORING_URING_CMD_UNLOCK flag
  ublk_drv: add ebpf support

 drivers/block/ublk_drv.c       | 284 +++++++++++++++++++++++++++++++--
 include/linux/bpf_types.h      |   2 +
 include/linux/io_uring.h       |  12 ++
 include/linux/io_uring_types.h |   8 +-
 include/uapi/linux/bpf.h       |   2 +
 include/uapi/linux/io_uring.h  |   5 +
 include/uapi/linux/ublk_cmd.h  |  18 +++
 io_uring/io_uring.c            |  59 ++++++-
 io_uring/rsrc.c                |  18 +++
 io_uring/rsrc.h                |   4 +
 io_uring/rw.c                  |   7 +
 io_uring/uring_cmd.c           |   6 +-
 kernel/bpf/syscall.c           |   1 +
 kernel/bpf/verifier.c          |  10 +-
 scripts/bpf_doc.py             |   4 +
 tools/include/uapi/linux/bpf.h |  10 ++
 tools/lib/bpf/libbpf.c         |   1 +
 17 files changed, 434 insertions(+), 17 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v2 1/4] bpf: add UBLK program type
  2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
@ 2023-02-22 13:25 ` Xiaoguang Wang
  2023-02-23  1:21   ` kernel test robot
  2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:25 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

Normally, userspace block device implementations need to copy data between
kernel block layer's io requests and userspace block device's userspace
daemon. For example, ublk and tcmu both have similar logic, but this
operation will consume cpu resources obviously, especially for large io.

There are methods trying to reduce these cpu overheads, then userspace
block device's io performance will be improved further. These methods
contain: 1) use special hardware to do memory copy, but seems not all
architectures have these special hardware; 2) software methods, such as
mmap kernel block layer's io requests's data to userspace daemon [1],
but it has page table's map/unmap, tlb flush overhead, security issue,
etc, and it maybe only friendly to large io.

To solve this problem, I'd propose a new method, which will combine the
respective advantages of io_uring and ebpf. Add a new program type
BPF_PROG_TYPE_UBLK for ublk, userspace block device daemon process will
register ebpf progs, which will use bpf helper offered by ublk bpf prog
type to submit io requests on behalf of daemon process in kernel, note
io requests will use kernel block layer io reqeusts's pages to do io,
then the memory copy overhead will be gone.

[1] https://lore.kernel.org/all/20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com/

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 drivers/block/ublk_drv.c       | 23 +++++++++++++++++++++++
 include/linux/bpf_types.h      |  2 ++
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           |  1 +
 kernel/bpf/verifier.c          |  9 +++++++--
 tools/include/uapi/linux/bpf.h |  1 +
 tools/lib/bpf/libbpf.c         |  1 +
 7 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6368b56eacf1..b628e9eaefa6 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -43,6 +43,8 @@
 #include <asm/page.h>
 #include <linux/task_work.h>
 #include <uapi/linux/ublk_cmd.h>
+#include <linux/filter.h>
+#include <linux/bpf.h>
 
 #define UBLK_MINORS		(1U << MINORBITS)
 
@@ -187,6 +189,27 @@ static DEFINE_MUTEX(ublk_ctl_mutex);
 
 static struct miscdevice ublk_misc;
 
+static const struct bpf_func_proto *
+ublk_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	return bpf_base_func_proto(func_id);
+}
+
+static bool ublk_bpf_is_valid_access(int off, int size,
+			enum bpf_access_type type,
+			const struct bpf_prog *prog,
+			struct bpf_insn_access_aux *info)
+{
+	return false;
+}
+
+const struct bpf_prog_ops bpf_ublk_prog_ops = {};
+
+const struct bpf_verifier_ops bpf_ublk_verifier_ops = {
+	.get_func_proto		= ublk_bpf_func_proto,
+	.is_valid_access	= ublk_bpf_is_valid_access,
+};
+
 static void ublk_dev_param_basic_apply(struct ublk_device *ub)
 {
 	struct request_queue *q = ub->ub_disk->queue;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index d4ee3ccd3753..4ef0bc0251b7 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -79,6 +79,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
 #endif
 BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
 	      void *, void *)
+BPF_PROG_TYPE(BPF_PROG_TYPE_UBLK, bpf_ublk,
+	      void *, void *)
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 464ca3f01fe7..515b7b995b3a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -986,6 +986,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+	BPF_PROG_TYPE_UBLK,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ecca9366c7a6..eb1752243f4f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2432,6 +2432,7 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type)
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_UBLK:
 	case BPF_PROG_TYPE_EXT: /* extends any prog */
 		return true;
 	case BPF_PROG_TYPE_CGROUP_SKB:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7ee218827259..1e5bc89aea36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12235,6 +12235,10 @@ static int check_return_code(struct bpf_verifier_env *env)
 		}
 		break;
 
+	case BPF_PROG_TYPE_UBLK:
+		range = tnum_const(0);
+		break;
+
 	case BPF_PROG_TYPE_EXT:
 		/* freplace program can return anything as its return value
 		 * depends on the to-be-replaced kernel func or bpf program.
@@ -16770,8 +16774,9 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	}
 
 	if (prog->aux->sleepable && prog->type != BPF_PROG_TYPE_TRACING &&
-	    prog->type != BPF_PROG_TYPE_LSM && prog->type != BPF_PROG_TYPE_KPROBE) {
-		verbose(env, "Only fentry/fexit/fmod_ret, lsm, and kprobe/uprobe programs can be sleepable\n");
+	    prog->type != BPF_PROG_TYPE_LSM && prog->type != BPF_PROG_TYPE_KPROBE &&
+	    prog->type != BPF_PROG_TYPE_UBLK) {
+		verbose(env, "Only fentry/fexit/fmod_ret, lsm, and kprobe/uprobe, ublk programs can be sleepable\n");
 		return -EINVAL;
 	}
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 464ca3f01fe7..515b7b995b3a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -986,6 +986,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+	BPF_PROG_TYPE_UBLK,
 };
 
 enum bpf_attach_type {
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2a82f49ce16f..891ae1830ac7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8606,6 +8606,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("cgroup/dev",		CGROUP_DEVICE, BPF_CGROUP_DEVICE, SEC_ATTACHABLE_OPT),
 	SEC_DEF("struct_ops+",		STRUCT_OPS, 0, SEC_NONE),
 	SEC_DEF("sk_lookup",		SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE),
+	SEC_DEF("ublk.s/",		UBLK, 0, SEC_SLEEPABLE),
 };
 
 static size_t custom_sec_def_cnt;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel
  2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
  2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
@ 2023-02-22 13:25 ` Xiaoguang Wang
  2023-02-23  0:39   ` kernel test robot
  2023-02-23  1:31   ` kernel test robot
  2023-02-22 13:25 ` [RFC v2 3/4] io_uring: introduce IORING_URING_CMD_UNLOCK flag Xiaoguang Wang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:25 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

Currently this feature can be used by userspace block device to reduce
kernel & userspace memory copy overhead. With this feature, userspace
block device driver can submit and complete io requests using kernel
block layer io requests's memory data, and further, by using ebpf, we
can customize how sqe is initialized, how io is submitted and completed.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 include/linux/io_uring.h       | 12 +++++++
 include/linux/io_uring_types.h |  8 ++++-
 io_uring/io_uring.c            | 59 ++++++++++++++++++++++++++++++++--
 io_uring/rsrc.c                | 18 +++++++++++
 io_uring/rsrc.h                |  4 +++
 io_uring/rw.c                  |  7 ++++
 6 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 934e5dd4ccc0..b6816de8e31d 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -4,6 +4,7 @@
 
 #include <linux/sched.h>
 #include <linux/xarray.h>
+#include <linux/uio.h>
 #include <uapi/linux/io_uring.h>
 
 enum io_uring_cmd_flags {
@@ -36,6 +37,10 @@ struct io_uring_cmd {
 	u8		pdu[32]; /* available inline for free use */
 };
 
+struct io_fixed_iter {
+	struct iov_iter iter;
+};
+
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd);
@@ -65,6 +70,8 @@ static inline void io_uring_free(struct task_struct *tsk)
 	if (tsk->io_uring)
 		__io_uring_free(tsk);
 }
+int io_uring_submit_sqe(int fd, const struct io_uring_sqe *sqe, u32 sqe_len,
+			const struct io_fixed_iter *iter);
 #else
 static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd)
@@ -96,6 +103,11 @@ static inline const char *io_uring_get_opcode(u8 opcode)
 {
 	return "";
 }
+int io_uring_submit_sqe(int fd, const struct io_uring_sqe *sqe, u32 sqe_len,
+			const struct io_fixed_iter *iter)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 128a67a40065..07c14854dc21 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -398,6 +398,7 @@ enum {
 	/* keep async read/write and isreg together and in order */
 	REQ_F_SUPPORT_NOWAIT_BIT,
 	REQ_F_ISREG_BIT,
+	REQ_F_ITER_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -467,6 +468,8 @@ enum {
 	REQ_F_CLEAR_POLLIN	= BIT(REQ_F_CLEAR_POLLIN_BIT),
 	/* hashed into ->cancel_hash_locked, protected by ->uring_lock */
 	REQ_F_HASH_LOCKED	= BIT(REQ_F_HASH_LOCKED_BIT),
+	/* buffer comes from fixed iter */
+	REQ_F_ITER		= BIT(REQ_F_ITER_BIT),
 };
 
 typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
@@ -527,7 +530,7 @@ struct io_kiocb {
 	 * and after selection it points to the buffer ID itself.
 	 */
 	u16				buf_index;
-	unsigned int			flags;
+	u64				flags;
 
 	struct io_cqe			cqe;
 
@@ -540,6 +543,9 @@ struct io_kiocb {
 		/* store used ubuf, so we can prevent reloading */
 		struct io_mapped_ubuf	*imu;
 
+		/* store fixed iter */
+		const struct io_fixed_iter	*iter;
+
 		/* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */
 		struct io_buffer	*kbuf;
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..880b913d6d35 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2232,7 +2232,8 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
 }
 
 static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
-			 const struct io_uring_sqe *sqe)
+			 const struct io_uring_sqe *sqe,
+			 const struct io_fixed_iter *iter)
 	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_link *link = &ctx->submit_state.link;
@@ -2241,6 +2242,10 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	ret = io_init_req(ctx, req, sqe);
 	if (unlikely(ret))
 		return io_submit_fail_init(sqe, req, ret);
+	if (unlikely(iter)) {
+		req->iter = iter;
+		req->flags |= REQ_F_ITER;
+	}
 
 	/* don't need @sqe from now on */
 	trace_io_uring_submit_sqe(req, true);
@@ -2392,7 +2397,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 		 * Continue submitting even for sqe failure if the
 		 * ring was setup with IORING_SETUP_SUBMIT_ALL
 		 */
-		if (unlikely(io_submit_sqe(ctx, req, sqe)) &&
+		if (unlikely(io_submit_sqe(ctx, req, sqe, NULL)) &&
 		    !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) {
 			left--;
 			break;
@@ -3272,6 +3277,54 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz
 	return 0;
 }
 
+int io_uring_submit_sqe(int fd, const struct io_uring_sqe *sqe, u32 sqe_len,
+			const struct io_fixed_iter *iter)
+{
+	struct io_kiocb *req;
+	struct fd f;
+	int ret;
+	struct io_ring_ctx *ctx;
+
+	f = fdget(fd);
+	if (unlikely(!f.file))
+		return -EBADF;
+
+	ret = -EOPNOTSUPP;
+	if (unlikely(!io_is_uring_fops(f.file))) {
+		ret = -EBADF;
+		goto out;
+	}
+	ctx = f.file->private_data;
+
+	mutex_lock(&ctx->uring_lock);
+	if (unlikely(!io_alloc_req_refill(ctx)))
+		goto out;
+	req = io_alloc_req(ctx);
+	if (unlikely(!req)) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	if (!percpu_ref_tryget_many(&ctx->refs, 1)) {
+		kmem_cache_free(req_cachep, req);
+		ret = -EAGAIN;
+		goto out;
+	}
+	percpu_counter_add(&current->io_uring->inflight, 1);
+	refcount_add(1, &current->usage);
+
+	/* returns number of submitted SQEs or an error */
+	ret = !io_submit_sqe(ctx, req, sqe, iter);
+	mutex_unlock(&ctx->uring_lock);
+	fdput(f);
+	return ret;
+
+out:
+	mutex_unlock(&ctx->uring_lock);
+	fdput(f);
+	return ret;
+}
+EXPORT_SYMBOL(io_uring_submit_sqe);
+
 SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		u32, min_complete, u32, flags, const void __user *, argp,
 		size_t, argsz)
@@ -4270,7 +4323,7 @@ static int __init io_uring_init(void)
 	BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8));
 	BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS);
 
-	BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+	BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64));
 
 	BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
 
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 18de10c68a15..cf1e53ba69b7 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1380,3 +1380,21 @@ int io_import_fixed(int ddir, struct iov_iter *iter,
 
 	return 0;
 }
+
+int io_import_iter(int ddir, struct iov_iter *iter,
+		   const struct io_fixed_iter *fixed_iter,
+		   u64 offset, size_t len)
+{
+	size_t count;
+
+	if (WARN_ON_ONCE(!fixed_iter))
+		return -EFAULT;
+
+	count = iov_iter_count(&(fixed_iter->iter));
+	if (offset >= count || (offset + len) > count)
+		return -EFAULT;
+
+	*iter = fixed_iter->iter;
+	return 0;
+}
+
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 2b8743645efc..823001dbdcd0 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -69,6 +69,10 @@ int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len);
 
+int io_import_iter(int ddir, struct iov_iter *iter,
+		   const struct io_fixed_iter *fixed_iter,
+		   u64 buf_addr, size_t len);
+
 void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
 int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 9c3ddd46a1ad..74079bcd7d6c 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -378,6 +378,13 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req,
 		return NULL;
 	}
 
+	if (unlikely(req->flags & REQ_F_ITER)) {
+		ret = io_import_iter(ddir, iter, req->iter, rw->addr, rw->len);
+		if (ret)
+			return ERR_PTR(ret);
+		return NULL;
+	}
+
 	buf = u64_to_user_ptr(rw->addr);
 	sqe_len = rw->len;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v2 3/4] io_uring: introduce IORING_URING_CMD_UNLOCK flag
  2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
  2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
  2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
@ 2023-02-22 13:25 ` Xiaoguang Wang
  2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
  2023-02-22 13:27 ` [PATCH] Add " Xiaoguang Wang
  4 siblings, 0 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:25 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

task_work_cb and its child functions may call io_uring_submit_sqe()
in io_uring_cmd's callback, so to avoid ctx->uring_lock deadlock,
introduce IORING_URING_CMD_UNLOCK to unlock uring_lock temporarily
in io_uring_cmd_work().

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 include/uapi/linux/io_uring.h | 5 +++++
 io_uring/uring_cmd.c          | 6 +++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2780bce62faf..45ea8c35d251 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -232,8 +232,13 @@ enum io_uring_op {
  * sqe->uring_cmd_flags
  * IORING_URING_CMD_FIXED	use registered buffer; pass this flag
  *				along with setting sqe->buf_index.
+ *
+ * IORING_URING_CMD_UNLOCK	Notify io_uring_cmd's task_work_cb to
+ *				unlock uring_lock, some ->uring_cmd()
+ *				implementations need it.
  */
 #define IORING_URING_CMD_FIXED	(1U << 0)
+#define IORING_URING_CMD_UNLOCK	(1U << 1)
 
 
 /*
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 446a189b78b0..11488a702832 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -16,7 +16,11 @@ static void io_uring_cmd_work(struct io_kiocb *req, bool *locked)
 {
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
 
+	if ((req->flags & IORING_URING_CMD_UNLOCK) && *locked)
+		mutex_unlock(&(req->ctx->uring_lock));
 	ioucmd->task_work_cb(ioucmd);
+	if ((req->flags & IORING_URING_CMD_UNLOCK) && *locked)
+		mutex_lock(&(req->ctx->uring_lock));
 }
 
 void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
@@ -82,7 +86,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return -EINVAL;
 
 	ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags);
-	if (ioucmd->flags & ~IORING_URING_CMD_FIXED)
+	if (ioucmd->flags & ~(IORING_URING_CMD_FIXED | IORING_URING_CMD_UNLOCK))
 		return -EINVAL;
 
 	if (ioucmd->flags & IORING_URING_CMD_FIXED) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v2 4/4] ublk_drv: add ebpf support
  2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
                   ` (2 preceding siblings ...)
  2023-02-22 13:25 ` [RFC v2 3/4] io_uring: introduce IORING_URING_CMD_UNLOCK flag Xiaoguang Wang
@ 2023-02-22 13:25 ` Xiaoguang Wang
  2023-02-22 19:25   ` Alexei Starovoitov
  2023-02-23  0:59   ` kernel test robot
  2023-02-22 13:27 ` [PATCH] Add " Xiaoguang Wang
  4 siblings, 2 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:25 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

Currenly only one bpf_ublk_queue_sqe() ebpf is added, ublksrv target
can use this helper to write ebpf prog to support ublk kernel & usersapce
zero copy, please see ublksrv test codes for more info.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 drivers/block/ublk_drv.c       | 263 +++++++++++++++++++++++++++++++--
 include/uapi/linux/bpf.h       |   1 +
 include/uapi/linux/ublk_cmd.h  |  18 +++
 kernel/bpf/verifier.c          |   3 +-
 scripts/bpf_doc.py             |   4 +
 tools/include/uapi/linux/bpf.h |   9 ++
 6 files changed, 286 insertions(+), 12 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index b628e9eaefa6..d17ddb6fc27f 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -105,6 +105,12 @@ struct ublk_uring_cmd_pdu {
  */
 #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
 
+/*
+ * UBLK_IO_FLAG_BPF is set if IO command has be handled by ebpf prog instead
+ * of user space daemon.
+ */
+#define UBLK_IO_FLAG_BPF	0x10
+
 struct ublk_io {
 	/* userspace buffer address from io cmd */
 	__u64	addr;
@@ -114,6 +120,11 @@ struct ublk_io {
 	struct io_uring_cmd *cmd;
 };
 
+struct ublk_req_iter {
+	struct io_fixed_iter fixed_iter;
+	struct bio_vec *bvec;
+};
+
 struct ublk_queue {
 	int q_id;
 	int q_depth;
@@ -163,6 +174,9 @@ struct ublk_device {
 	unsigned int		nr_queues_ready;
 	atomic_t		nr_aborted_queues;
 
+	struct bpf_prog		*io_bpf_prog;
+	struct ublk_req_iter	*iter_table;
+
 	/*
 	 * Our ubq->daemon may be killed without any notification, so
 	 * monitor each queue's daemon periodically
@@ -189,10 +203,48 @@ static DEFINE_MUTEX(ublk_ctl_mutex);
 
 static struct miscdevice ublk_misc;
 
+struct ublk_io_bpf_ctx {
+	struct ublk_bpf_ctx ctx;
+	struct ublk_device *ub;
+};
+
+static inline struct ublk_req_iter *ublk_get_req_iter(struct ublk_device *ub,
+					int qid, int tag)
+{
+	return &(ub->iter_table[qid * ub->dev_info.queue_depth + tag]);
+}
+
+BPF_CALL_4(bpf_ublk_queue_sqe, struct ublk_io_bpf_ctx *, bpf_ctx,
+	   struct io_uring_sqe *, sqe, u32, sqe_len, u32, fd)
+{
+	struct ublk_req_iter *req_iter;
+	u16 q_id = bpf_ctx->ctx.q_id;
+	u16 tag = bpf_ctx->ctx.tag;
+
+	req_iter = ublk_get_req_iter(bpf_ctx->ub, q_id, tag);
+	io_uring_submit_sqe(fd, sqe, sqe_len, &(req_iter->fixed_iter));
+	return 0;
+}
+
+const struct bpf_func_proto ublk_bpf_queue_sqe_proto = {
+	.func = bpf_ublk_queue_sqe,
+	.gpl_only = false,
+	.ret_type = RET_INTEGER,
+	.arg1_type = ARG_ANYTHING,
+	.arg2_type = ARG_ANYTHING,
+	.arg3_type = ARG_ANYTHING,
+	.arg4_type = ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 ublk_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	switch (func_id) {
+	case BPF_FUNC_ublk_queue_sqe:
+		return &ublk_bpf_queue_sqe_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
 }
 
 static bool ublk_bpf_is_valid_access(int off, int size,
@@ -200,6 +252,23 @@ static bool ublk_bpf_is_valid_access(int off, int size,
 			const struct bpf_prog *prog,
 			struct bpf_insn_access_aux *info)
 {
+	if (off < 0 || off >= sizeof(struct ublk_bpf_ctx))
+		return false;
+	if (off % size != 0)
+		return false;
+
+	switch (off) {
+	case offsetof(struct ublk_bpf_ctx, q_id):
+		return size == sizeof_field(struct ublk_bpf_ctx, q_id);
+	case offsetof(struct ublk_bpf_ctx, tag):
+		return size == sizeof_field(struct ublk_bpf_ctx, tag);
+	case offsetof(struct ublk_bpf_ctx, op):
+		return size == sizeof_field(struct ublk_bpf_ctx, op);
+	case offsetof(struct ublk_bpf_ctx, nr_sectors):
+		return size == sizeof_field(struct ublk_bpf_ctx, nr_sectors);
+	case offsetof(struct ublk_bpf_ctx, start_sector):
+		return size == sizeof_field(struct ublk_bpf_ctx, start_sector);
+	}
 	return false;
 }
 
@@ -324,7 +393,7 @@ static void ublk_put_device(struct ublk_device *ub)
 static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
 		int qid)
 {
-       return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
+	return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
 }
 
 static inline bool ublk_rq_has_data(const struct request *rq)
@@ -618,7 +687,6 @@ static void ublk_complete_rq(struct request *req)
 {
 	struct ublk_queue *ubq = req->mq_hctx->driver_data;
 	struct ublk_io *io = &ubq->ios[req->tag];
-	unsigned int unmapped_bytes;
 
 	/* failed read IO if nothing is read */
 	if (!io->res && req_op(req) == REQ_OP_READ)
@@ -641,15 +709,19 @@ static void ublk_complete_rq(struct request *req)
 	}
 
 	/* for READ request, writing data in iod->addr to rq buffers */
-	unmapped_bytes = ublk_unmap_io(ubq, req, io);
+	if (likely(!(io->flags & UBLK_IO_FLAG_BPF))) {
+		unsigned int unmapped_bytes;
 
-	/*
-	 * Extremely impossible since we got data filled in just before
-	 *
-	 * Re-read simply for this unlikely case.
-	 */
-	if (unlikely(unmapped_bytes < io->res))
-		io->res = unmapped_bytes;
+		unmapped_bytes = ublk_unmap_io(ubq, req, io);
+
+		/*
+		 * Extremely impossible since we got data filled in just before
+		 *
+		 * Re-read simply for this unlikely case.
+		 */
+		if (unlikely(unmapped_bytes < io->res))
+			io->res = unmapped_bytes;
+	}
 
 	if (blk_update_request(req, BLK_STS_OK, io->res))
 		blk_mq_requeue_request(req, true);
@@ -708,12 +780,92 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
 	mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0);
 }
 
+static int ublk_init_uring_fixed_iter(struct ublk_queue *ubq, struct request *rq)
+{
+	struct ublk_device *ub = ubq->dev;
+	struct bio *bio = rq->bio;
+	struct bio_vec *bvec;
+	struct req_iterator rq_iter;
+	struct bio_vec tmp;
+	int nr_bvec = 0;
+	struct ublk_req_iter *req_iter;
+	unsigned int rw, offset;
+
+	req_iter = ublk_get_req_iter(ub, ubq->q_id, rq->tag);
+	if (req_op(rq) == REQ_OP_READ)
+		rw = ITER_DEST;
+	else
+		rw = ITER_SOURCE;
+
+	rq_for_each_bvec(tmp, rq, rq_iter)
+		nr_bvec++;
+	if (rq->bio != rq->biotail) {
+		bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec), GFP_NOIO);
+		if (!bvec)
+			return -EIO;
+		req_iter->bvec = bvec;
+
+		/*
+		 * The bios of the request may be started from the middle of
+		 * the 'bvec' because of bio splitting, so we can't directly
+		 * copy bio->bi_iov_vec to new bvec. The rq_for_each_bvec
+		 * API will take care of all details for us.
+		 */
+		rq_for_each_bvec(tmp, rq, rq_iter) {
+			*bvec = tmp;
+			bvec++;
+		}
+		bvec = req_iter->bvec;
+		offset = 0;
+	} else {
+		/*
+		 * Same here, this bio may be started from the middle of the
+		 * 'bvec' because of bio splitting, so offset from the bvec
+		 * must be passed to iov iterator
+		 */
+		offset = bio->bi_iter.bi_bvec_done;
+		bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+		req_iter->bvec = NULL;
+	}
+
+	iov_iter_bvec(&(req_iter->fixed_iter.iter), rw, bvec, nr_bvec, blk_rq_bytes(rq));
+	req_iter->fixed_iter.iter.iov_offset = offset;
+	return 0;
+}
+
+static int ublk_run_bpf_prog(struct ublk_queue *ubq, struct request *rq)
+{
+	int ret;
+	struct ublk_device *ub = ubq->dev;
+	struct ublk_io_bpf_ctx bpf_ctx;
+	u32 bpf_act;
+
+	if (!ub->io_bpf_prog)
+		return 0;
+
+	ret = ublk_init_uring_fixed_iter(ubq, rq);
+	if (ret < 0)
+		return UBLK_BPF_IO_ABORTED;
+
+	bpf_ctx.ub = ub;
+	bpf_ctx.ctx.q_id = ubq->q_id;
+	bpf_ctx.ctx.tag = rq->tag;
+	bpf_ctx.ctx.op = req_op(rq);
+	bpf_ctx.ctx.nr_sectors = blk_rq_sectors(rq);
+	bpf_ctx.ctx.start_sector = blk_rq_pos(rq);
+	bpf_act = bpf_prog_run_pin_on_cpu(ub->io_bpf_prog, &bpf_ctx);
+	return bpf_act;
+}
+
 static inline void __ublk_rq_task_work(struct request *req)
 {
 	struct ublk_queue *ubq = req->mq_hctx->driver_data;
+	struct ublk_device *ub = ubq->dev;
 	int tag = req->tag;
 	struct ublk_io *io = &ubq->ios[tag];
 	unsigned int mapped_bytes;
+	u32 bpf_act;
+	bool io_done = false;
 
 	pr_devel("%s: complete: op %d, qid %d tag %d io_flags %x addr %llx\n",
 			__func__, io->cmd->cmd_op, ubq->q_id, req->tag, io->flags,
@@ -762,6 +914,10 @@ static inline void __ublk_rq_task_work(struct request *req)
 				ublk_get_iod(ubq, req->tag)->addr);
 	}
 
+	if (unlikely(ub->io_bpf_prog))
+		goto call_ebpf;
+
+normal_path:
 	mapped_bytes = ublk_map_io(ubq, req, io);
 
 	/* partially mapped, update io descriptor */
@@ -784,7 +940,21 @@ static inline void __ublk_rq_task_work(struct request *req)
 			mapped_bytes >> 9;
 	}
 
+	if (!io_done)
+		ubq_complete_io_cmd(io, UBLK_IO_RES_OK);
+	return;
+
+call_ebpf:
 	ubq_complete_io_cmd(io, UBLK_IO_RES_OK);
+	bpf_act = ublk_run_bpf_prog(ubq, req);
+	switch (bpf_act) {
+	case UBLK_BPF_IO_ABORTED:
+	case UBLK_BPF_IO_DROP:
+	case UBLK_BPF_IO_PASS:
+		io_done = true;
+		goto normal_path;
+	}
+	io->flags |= UBLK_IO_FLAG_BPF;
 }
 
 static inline void ublk_forward_io_cmds(struct ublk_queue *ubq)
@@ -1231,6 +1401,10 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 			__func__, cmd->cmd_op, ub_cmd->q_id, tag,
 			ub_cmd->result);
 
+	/* To workaround task_work_add is not exported. */
+	if (unlikely(ub->io_bpf_prog && !(cmd->flags & IORING_URING_CMD_UNLOCK)))
+		goto out;
+
 	if (!(issue_flags & IO_URING_F_SQE128))
 		goto out;
 
@@ -1295,6 +1469,14 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		io->flags |= UBLK_IO_FLAG_ACTIVE;
 		io->cmd = cmd;
 		ublk_commit_completion(ub, ub_cmd);
+		if (io->flags & UBLK_IO_FLAG_BPF) {
+			struct ublk_req_iter *req_iter;
+
+			req_iter = ublk_get_req_iter(ub, ubq->q_id, tag);
+			io->flags &= ~UBLK_IO_FLAG_BPF;
+			kfree(req_iter->bvec);
+			req_iter->bvec = NULL;
+		}
 		break;
 	case UBLK_IO_NEED_GET_DATA:
 		if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
@@ -2009,6 +2191,59 @@ static int ublk_ctrl_end_recovery(struct io_uring_cmd *cmd)
 	return ret;
 }
 
+static int ublk_ctrl_reg_bpf_prog(struct io_uring_cmd *cmd)
+{
+	struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
+	struct ublk_device *ub;
+	struct bpf_prog *prog;
+	int ret = 0, nr_queues, depth;
+
+	ub = ublk_get_device_from_id(header->dev_id);
+	if (!ub)
+		return -EINVAL;
+
+	mutex_lock(&ub->mutex);
+	nr_queues = ub->dev_info.nr_hw_queues;
+	depth = ub->dev_info.queue_depth;
+	ub->iter_table = kzalloc(sizeof(struct ublk_req_iter) * depth * nr_queues,
+				 GFP_KERNEL);
+	if (!ub->iter_table) {
+		ret =  -ENOMEM;
+		goto out_unlock;
+	}
+
+	prog = bpf_prog_get_type(header->data[0], BPF_PROG_TYPE_UBLK);
+	if (IS_ERR(prog)) {
+		kfree(ub->iter_table);
+		ret = PTR_ERR(prog);
+		goto out_unlock;
+	}
+	ub->io_bpf_prog = prog;
+
+out_unlock:
+	mutex_unlock(&ub->mutex);
+	ublk_put_device(ub);
+	return ret;
+}
+
+static int ublk_ctrl_unreg_bpf_prog(struct io_uring_cmd *cmd)
+{
+	struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
+	struct ublk_device *ub;
+
+	ub = ublk_get_device_from_id(header->dev_id);
+	if (!ub)
+		return -EINVAL;
+
+	mutex_lock(&ub->mutex);
+	bpf_prog_put(ub->io_bpf_prog);
+	ub->io_bpf_prog = NULL;
+	kfree(ub->iter_table);
+	ub->iter_table = NULL;
+	mutex_unlock(&ub->mutex);
+	ublk_put_device(ub);
+	return 0;
+}
 static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
 		unsigned int issue_flags)
 {
@@ -2059,6 +2294,12 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
 	case UBLK_CMD_END_USER_RECOVERY:
 		ret = ublk_ctrl_end_recovery(cmd);
 		break;
+	case UBLK_CMD_REG_BPF_PROG:
+		ret = ublk_ctrl_reg_bpf_prog(cmd);
+		break;
+	case UBLK_CMD_UNREG_BPF_PROG:
+		ret = ublk_ctrl_unreg_bpf_prog(cmd);
+		break;
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 515b7b995b3a..578d65e9f30e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5699,6 +5699,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(ublk_queue_sqe, 212, ##ctx)			\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 8f88e3a29998..fbfe5145221e 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -17,6 +17,8 @@
 #define	UBLK_CMD_STOP_DEV	0x07
 #define	UBLK_CMD_SET_PARAMS	0x08
 #define	UBLK_CMD_GET_PARAMS	0x09
+#define UBLK_CMD_REG_BPF_PROG		0x0a
+#define UBLK_CMD_UNREG_BPF_PROG		0x0b
 #define	UBLK_CMD_START_USER_RECOVERY	0x10
 #define	UBLK_CMD_END_USER_RECOVERY	0x11
 /*
@@ -230,4 +232,20 @@ struct ublk_params {
 	struct ublk_param_discard	discard;
 };
 
+struct ublk_bpf_ctx {
+	__u32   t_val;
+	__u16   q_id;
+	__u16   tag;
+	__u8    op;
+	__u32   nr_sectors;
+	__u64   start_sector;
+};
+
+enum {
+	UBLK_BPF_IO_ABORTED = 0,
+	UBLK_BPF_IO_DROP,
+	UBLK_BPF_IO_PASS,
+	UBLK_BPF_IO_REDIRECT,
+};
+
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1e5bc89aea36..b1645a3d93a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -24,6 +24,7 @@
 #include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
 #include <linux/poison.h>
+#include <linux/ublk_cmd.h>
 
 #include "disasm.h"
 
@@ -12236,7 +12237,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 		break;
 
 	case BPF_PROG_TYPE_UBLK:
-		range = tnum_const(0);
+		range = tnum_range(UBLK_BPF_IO_ABORTED, UBLK_BPF_IO_REDIRECT);
 		break;
 
 	case BPF_PROG_TYPE_EXT:
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index e8d90829f23e..f8672294e145 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -700,6 +700,8 @@ class PrinterHelpers(Printer):
             'struct bpf_dynptr',
             'struct iphdr',
             'struct ipv6hdr',
+            'struct ublk_io_bpf_ctx',
+            'struct io_uring_sqe',
     ]
     known_types = {
             '...',
@@ -755,6 +757,8 @@ class PrinterHelpers(Printer):
             'const struct bpf_dynptr',
             'struct iphdr',
             'struct ipv6hdr',
+            'struct ublk_io_bpf_ctx',
+            'struct io_uring_sqe',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 515b7b995b3a..e3a81e576ec1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5485,6 +5485,14 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ *
+ * u64 bpf_ublk_queue_sqe(struct ublk_io_bpf_ctx *ctx, struct io_uring_sqe *sqe, u32 offset, u32 len)
+ *	Description
+ *		Submit ublk io requests.
+ *	Return
+ *		0 on success.
+ *
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -5699,6 +5707,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(ublk_queue_sqe, 212, ##ctx)			\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] Add ebpf support.
  2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
                   ` (3 preceding siblings ...)
  2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
@ 2023-02-22 13:27 ` Xiaoguang Wang
  4 siblings, 0 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-22 13:27 UTC (permalink / raw)
  To: linux-block, io-uring, bpf; +Cc: ming.lei, axboe, asml.silence, ZiyangZhang

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 Makefile.am            |   2 +-
 bpf/Makefile           |  48 ++++++++++++++++++++
 bpf/ublk.bpf.c         | 101 +++++++++++++++++++++++++++++++++++++++++
 bpf/ublk.c             |  56 +++++++++++++++++++++++
 include/ublk_cmd.h     |   2 +
 include/ublksrv.h      |   8 ++++
 include/ublksrv_priv.h |   1 +
 include/ublksrv_tgt.h  |   1 +
 lib/ublksrv.c          |   6 ++-
 lib/ublksrv_cmd.c      |  19 ++++++++
 tgt_loop.cpp           |  31 ++++++++++++-
 ublksrv_tgt.cpp        |  32 +++++++++++++
 12 files changed, 304 insertions(+), 3 deletions(-)
 create mode 100644 bpf/Makefile
 create mode 100644 bpf/ublk.bpf.c
 create mode 100644 bpf/ublk.c

diff --git a/Makefile.am b/Makefile.am
index a340bed..04ecbab 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -9,7 +9,7 @@ EXTRA_DIST = \
 
 SUBDIRS = include lib tests
 
-AM_CXXFLAGS = -fcoroutines -std=c++20
+AM_CXXFLAGS = -fcoroutines -std=c++20 /root/ublk/tools/bpf/bpftool/bootstrap/libbpf/libbpf.a -lelf -lz
 
 sbin_PROGRAMS = ublk ublk_user_id
 noinst_PROGRAMS = demo_null demo_event
diff --git a/bpf/Makefile b/bpf/Makefile
new file mode 100644
index 0000000..6f6ad6c
--- /dev/null
+++ b/bpf/Makefile
@@ -0,0 +1,48 @@
+out_dir := .tmp
+CLANG ?= clang
+LLVM_STRIP ?= llvm-strip
+BPFTOOL ?= /root/ublk/tools/bpf/bpftool/bpftool
+INCLUDES := -I$(out_dir)
+CFLAGS := -g -O2 -Wall
+ARCH := $(shell uname -m | sed 's/x86_64/x86/')
+# LIBBPF := <linux-tree>/tools/lib/bpf/libbpf.a
+LIBBPF := /root/ublk/tools/bpf/bpftool/bootstrap/libbpf/libbpf.a
+
+targets = ublk
+
+all: $(targets)
+
+$(targets): %: $(out_dir)/%.o | $(out_dir) libbpf_target
+	$(QUIET_CC)$(CC) $(CFLAGS) $^ -lelf -lz  $(LIBBPF) -o $@
+
+$(patsubst %,$(out_dir)/%.o,$(targets)): %.o: %.skel.h
+
+$(out_dir)/%.o: %.c $(wildcard %.h) | $(out_dir)
+	$(QUIET_CC)$(CC) $(CFLAGS) $(INCLUDES) -c $(filter %.c,$^) -o $@
+
+$(out_dir)/%.skel.h: $(out_dir)/%.bpf.o | $(out_dir)
+	$(BPFTOOL) gen skeleton $< > $@
+
+$(out_dir)/%.bpf.o: %.bpf.c $(wildcard %.h) vmlinux | $(out_dir)
+	$(QUIET_CC)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH)	\
+		     $(INCLUDES) -c $(filter %.c,$^) -o $@ &&		\
+	$(LLVM_STRIP) -g $@
+
+$(out_dir):
+	mkdir -p $@
+
+vmlinux:
+ifeq (,$(wildcard ./vmlinux.h))
+	$(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > ./vmlinux.h
+endif
+
+libbpf_target:
+ifndef LIBBPF
+	$(error LIBBPF is undefined)
+endif
+	@
+
+clean:
+	$(Q)rm -rf $(out_dir) $(targets) ./vmlinux.h
+
+.PHONY: all clean vmlinux libbpf_target
diff --git a/bpf/ublk.bpf.c b/bpf/ublk.bpf.c
new file mode 100644
index 0000000..45eb8db
--- /dev/null
+++ b/bpf/ublk.bpf.c
@@ -0,0 +1,101 @@
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+
+static long (*bpf_ublk_queue_sqe)(void *ctx, struct io_uring_sqe *sqe,
+		u32 sqe_len, u32 fd) = (void *) 212;
+
+int target_fd = -1;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 128);
+	__type(key, int);
+	__type(value, int);
+} uring_fd_map SEC(".maps");
+
+static inline void io_uring_prep_rw(__u8 op, struct io_uring_sqe *sqe, int fd,
+				    const void *addr, unsigned int len,
+				    __u64 offset)
+{
+	sqe->opcode = op;
+	sqe->flags = 0;
+	sqe->ioprio = 0;
+	sqe->fd = fd;
+	sqe->off = offset;
+	sqe->addr = (unsigned long) addr;
+	sqe->len = len;
+	sqe->fsync_flags = 0;
+	sqe->buf_index = 0;
+	sqe->personality = 0;
+	sqe->splice_fd_in = 0;
+	sqe->addr3 = 0;
+	sqe->__pad2[0] = 0;
+}
+
+static inline void io_uring_prep_nop(struct io_uring_sqe *sqe)
+{
+	io_uring_prep_rw(IORING_OP_NOP, sqe, -1, 0, 0, 0);
+}
+
+static inline void io_uring_prep_read(struct io_uring_sqe *sqe, int fd,
+			void *buf, unsigned int nbytes, off_t offset)
+{
+	io_uring_prep_rw(IORING_OP_READ, sqe, fd, buf, nbytes, offset);
+}
+
+static inline void io_uring_prep_write(struct io_uring_sqe *sqe, int fd,
+	const void *buf, unsigned int nbytes, off_t offset)
+{
+	io_uring_prep_rw(IORING_OP_WRITE, sqe, fd, buf, nbytes, offset);
+}
+
+static inline __u64 build_user_data(unsigned int tag, unsigned int op,
+			unsigned int tgt_data, unsigned int is_target_io,
+			unsigned int is_bpf_io)
+{
+	return tag | (op << 16) | (tgt_data << 24) | (__u64)is_target_io << 63 |
+		(__u64)is_bpf_io << 60;
+}
+
+SEC("ublk.s/")
+int ublk_io_bpf_prog(struct ublk_bpf_ctx *ctx)
+{
+	struct io_uring_sqe *sqe;
+	char sqe_data[128] = {0};
+	int q_id = ctx->q_id;
+	u8 op;
+	u32 nr_sectors = ctx->nr_sectors;
+	u64 start_sector = ctx->start_sector;
+	int *ring_fd;
+
+	ring_fd = bpf_map_lookup_elem(&uring_fd_map, &q_id);
+	if (!ring_fd)
+		return UBLK_BPF_IO_PASS;
+
+	bpf_probe_read_kernel(&op, 1, &ctx->op);
+	sqe = (struct io_uring_sqe *)sqe_data;
+	if (op == REQ_OP_READ) {
+		char fmt[] = "sqe for REQ_OP_READ is issued\n";
+
+		bpf_trace_printk(fmt, sizeof(fmt));
+		io_uring_prep_read(sqe, target_fd, 0, nr_sectors << 9,
+				   start_sector << 9);
+		sqe->user_data = build_user_data(ctx->tag, op, 0, 1, 1);
+		bpf_ublk_queue_sqe(ctx, sqe, 128, *ring_fd);
+	} else if (op == REQ_OP_WRITE) {
+		char fmt[] = "sqe for REQ_OP_WRITE is issued\n";
+
+		bpf_trace_printk(fmt, sizeof(fmt));
+		io_uring_prep_write(sqe, target_fd, 0, nr_sectors << 9,
+				    start_sector << 9);
+		sqe->user_data = build_user_data(ctx->tag, op, 0, 1, 1);
+		bpf_ublk_queue_sqe(ctx, sqe, 128, *ring_fd);
+	} else {
+		return UBLK_BPF_IO_PASS;
+	}
+	return UBLK_BPF_IO_REDIRECT;
+}
+
+char LICENSE[] SEC("license") = "GPL";
diff --git a/bpf/ublk.c b/bpf/ublk.c
new file mode 100644
index 0000000..296005d
--- /dev/null
+++ b/bpf/ublk.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include <argp.h>
+#include <assert.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/time.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "ublk.skel.h"
+
+static void ublk_ebp_prep(struct ublk_bpf **pobj)
+{
+	struct ublk_bpf *obj;
+	int ret, prog_fds;
+
+	obj = ublk_bpf__open();
+	if (!obj) {
+		fprintf(stderr, "failed to open and/or load BPF object\n");
+		exit(1);
+	}
+	ret = ublk_bpf__load(obj);
+	if (ret) {
+		fprintf(stderr, "failed to load BPF object: %d\n", ret);
+		exit(1);
+	}
+
+	prog_fds = bpf_program__fd(obj->progs.ublk_io_bpf_prog);
+	*pobj = obj;
+
+
+	ret = bpf_map__set_max_entries(obj->maps.uring_fd_map, 16);
+
+	printf("prog_fds: %d\n", prog_fds);
+}
+
+static int ublk_ebpf_test(void)
+{
+	struct ublk_bpf *obj;
+
+	ublk_ebp_prep(&obj);
+	sleep(5);
+	ublk_bpf__destroy(obj);
+	return 0;
+}
+
+int main(int arg, char **argv)
+{
+	fprintf(stderr, "test1() ============\n");
+	ublk_ebpf_test();
+
+	return 0;
+}
diff --git a/include/ublk_cmd.h b/include/ublk_cmd.h
index f6238cc..893ba8c 100644
--- a/include/ublk_cmd.h
+++ b/include/ublk_cmd.h
@@ -17,6 +17,8 @@
 #define	UBLK_CMD_STOP_DEV	0x07
 #define	UBLK_CMD_SET_PARAMS	0x08
 #define	UBLK_CMD_GET_PARAMS	0x09
+#define UBLK_CMD_REG_BPF_PROG		0x0a
+#define UBLK_CMD_UNREG_BPF_PROG		0x0b
 #define	UBLK_CMD_START_USER_RECOVERY	0x10
 #define	UBLK_CMD_END_USER_RECOVERY	0x11
 #define	UBLK_CMD_GET_DEV_INFO2		0x12
diff --git a/include/ublksrv.h b/include/ublksrv.h
index d38bd46..800a6a0 100644
--- a/include/ublksrv.h
+++ b/include/ublksrv.h
@@ -106,6 +106,7 @@ struct ublksrv_tgt_info {
 	unsigned int nr_fds;
 	int fds[UBLKSRV_TGT_MAX_FDS];
 	void *tgt_data;
+	void *tgt_bpf_obj;
 
 	/*
 	 * Extra IO slots for each queue, target code can reserve some
@@ -263,6 +264,8 @@ struct ublksrv_tgt_type {
 	int (*init_queue)(const struct ublksrv_queue *, void **queue_data_ptr);
 	void (*deinit_queue)(const struct ublksrv_queue *);
 
+	int (*init_queue_bpf)(const struct ublksrv_dev *dev, const struct ublksrv_queue *q);
+
 	unsigned long reserved[5];
 };
 
@@ -318,6 +321,11 @@ extern void ublksrv_ctrl_prep_recovery(struct ublksrv_ctrl_dev *dev,
 		const char *recovery_jbuf);
 extern const char *ublksrv_ctrl_get_recovery_jbuf(const struct ublksrv_ctrl_dev *dev);
 
+extern void ublksrv_ctrl_set_bpf_obj_info(struct ublksrv_ctrl_dev *dev,
+					  void *obj);
+extern int ublksrv_ctrl_reg_bpf_prog(struct ublksrv_ctrl_dev *dev,
+				     int io_bpf_fd);
+
 /* ublksrv device ("/dev/ublkcN") level APIs */
 extern const struct ublksrv_dev *ublksrv_dev_init(const struct ublksrv_ctrl_dev *
 		ctrl_dev);
diff --git a/include/ublksrv_priv.h b/include/ublksrv_priv.h
index 2996baa..8da8866 100644
--- a/include/ublksrv_priv.h
+++ b/include/ublksrv_priv.h
@@ -42,6 +42,7 @@ struct ublksrv_ctrl_dev {
 
 	const char *tgt_type;
 	const struct ublksrv_tgt_type *tgt_ops;
+	void *bpf_obj;
 
 	/*
 	 * default is UBLKSRV_RUN_DIR but can be specified via command line,
diff --git a/include/ublksrv_tgt.h b/include/ublksrv_tgt.h
index 234d31e..e0db7d9 100644
--- a/include/ublksrv_tgt.h
+++ b/include/ublksrv_tgt.h
@@ -9,6 +9,7 @@
 #include <getopt.h>
 #include <string.h>
 #include <stdarg.h>
+#include <limits.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
diff --git a/lib/ublksrv.c b/lib/ublksrv.c
index 96bed95..1cf24ae 100644
--- a/lib/ublksrv.c
+++ b/lib/ublksrv.c
@@ -163,7 +163,7 @@ static inline int ublksrv_queue_io_cmd(struct _ublksrv_queue *q,
 	sqe->fd		= 0;	/*dev->cdev_fd*/
 	sqe->opcode	=  IORING_OP_URING_CMD;
 	sqe->flags	= IOSQE_FIXED_FILE;
-	sqe->rw_flags	= 0;
+	sqe->uring_cmd_flags = 2;
 	cmd->tag	= tag;
 	cmd->addr	= (__u64)io->buf_addr;
 	cmd->q_id	= q->q_id;
@@ -603,6 +603,9 @@ skip_alloc_buf:
 		goto fail;
 	}
 
+	if (dev->tgt.ops->init_queue_bpf)
+		dev->tgt.ops->init_queue_bpf(tdev, local_to_tq(q));
+
 	ublksrv_dev_init_io_cmds(dev, q);
 
 	/*
@@ -723,6 +726,7 @@ const struct ublksrv_dev *ublksrv_dev_init(const struct ublksrv_ctrl_dev *ctrl_d
 	}
 
 	tgt->fds[0] = dev->cdev_fd;
+	tgt->tgt_bpf_obj = ctrl_dev->bpf_obj;
 
 	ret = ublksrv_tgt_init(dev, ctrl_dev->tgt_type, ctrl_dev->tgt_ops,
 			ctrl_dev->tgt_argc, ctrl_dev->tgt_argv);
diff --git a/lib/ublksrv_cmd.c b/lib/ublksrv_cmd.c
index 0d7265d..1c1f3fc 100644
--- a/lib/ublksrv_cmd.c
+++ b/lib/ublksrv_cmd.c
@@ -502,6 +502,25 @@ int ublksrv_ctrl_end_recovery(struct ublksrv_ctrl_dev *dev, int daemon_pid)
 	return ret;
 }
 
+int ublksrv_ctrl_reg_bpf_prog(struct ublksrv_ctrl_dev *dev,
+			      int io_bpf_fd)
+{
+	struct ublksrv_ctrl_cmd_data data = {
+		.cmd_op = UBLK_CMD_REG_BPF_PROG,
+		.flags = CTRL_CMD_HAS_DATA,
+	};
+	int ret;
+
+	data.data[0] = io_bpf_fd;
+	ret = __ublksrv_ctrl_cmd(dev, &data);
+	return ret;
+}
+
+void ublksrv_ctrl_set_bpf_obj_info(struct ublksrv_ctrl_dev *dev,  void *obj)
+{
+	dev->bpf_obj = obj;
+}
+
 const struct ublksrv_ctrl_dev_info *ublksrv_ctrl_get_dev_info(
 		const struct ublksrv_ctrl_dev *dev)
 {
diff --git a/tgt_loop.cpp b/tgt_loop.cpp
index 79a65d3..6e884b0 100644
--- a/tgt_loop.cpp
+++ b/tgt_loop.cpp
@@ -4,7 +4,11 @@
 
 #include <poll.h>
 #include <sys/epoll.h>
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
 #include "ublksrv_tgt.h"
+#include "bpf/.tmp/ublk.skel.h"
 
 static bool backing_supports_discard(char *name)
 {
@@ -88,6 +92,20 @@ static int loop_recovery_tgt(struct ublksrv_dev *dev, int type)
 	return 0;
 }
 
+static int loop_init_queue_bpf(const struct ublksrv_dev *dev,
+			       const struct ublksrv_queue *q)
+{
+	int ret, q_id, ring_fd;
+	const struct ublksrv_tgt_info *tgt = &dev->tgt;
+	struct ublk_bpf *obj = (struct ublk_bpf*)tgt->tgt_bpf_obj;
+
+	q_id = q->q_id;
+	ring_fd = q->ring_ptr->ring_fd;
+	ret = bpf_map_update_elem(bpf_map__fd(obj->maps.uring_fd_map), &q_id,
+				  &ring_fd,  0);
+	return ret;
+}
+
 static int loop_init_tgt(struct ublksrv_dev *dev, int type, int argc, char
 		*argv[])
 {
@@ -125,6 +143,7 @@ static int loop_init_tgt(struct ublksrv_dev *dev, int type, int argc, char
 		},
 	};
 	bool can_discard = false;
+	struct ublk_bpf *bpf_obj;
 
 	strcpy(tgt_json.name, "loop");
 
@@ -218,6 +237,10 @@ static int loop_init_tgt(struct ublksrv_dev *dev, int type, int argc, char
 			jbuf = ublksrv_tgt_realloc_json_buf(dev, &jbuf_size);
 	} while (ret < 0);
 
+	if (tgt->tgt_bpf_obj) {
+		bpf_obj = (struct ublk_bpf *)tgt->tgt_bpf_obj;
+		bpf_obj->data->target_fd = tgt->fds[1];
+	}
 	return 0;
 }
 
@@ -252,9 +275,14 @@ static int loop_queue_tgt_io(const struct ublksrv_queue *q,
 		const struct ublk_io_data *data, int tag)
 {
 	const struct ublksrv_io_desc *iod = data->iod;
-	struct io_uring_sqe *sqe = io_uring_get_sqe(q->ring_ptr);
+	struct io_uring_sqe *sqe;
 	unsigned ublk_op = ublksrv_get_op(iod);
 
+	/* Currently ebpf prog wil handle read/write requests. */
+	if ((ublk_op == UBLK_IO_OP_READ) || (ublk_op == UBLK_IO_OP_WRITE))
+		return 1;
+
+	sqe = io_uring_get_sqe(q->ring_ptr);
 	if (!sqe)
 		return 0;
 
@@ -374,6 +402,7 @@ struct ublksrv_tgt_type  loop_tgt_type = {
 	.type	= UBLKSRV_TGT_TYPE_LOOP,
 	.name	=  "loop",
 	.recovery_tgt = loop_recovery_tgt,
+	.init_queue_bpf = loop_init_queue_bpf,
 };
 
 static void tgt_loop_init() __attribute__((constructor));
diff --git a/ublksrv_tgt.cpp b/ublksrv_tgt.cpp
index 5ed328d..34e59b2 100644
--- a/ublksrv_tgt.cpp
+++ b/ublksrv_tgt.cpp
@@ -2,6 +2,7 @@
 
 #include "config.h"
 #include "ublksrv_tgt.h"
+#include "bpf/.tmp/ublk.skel.h"
 
 /* per-task variable */
 static pthread_mutex_t jbuf_lock;
@@ -575,6 +576,30 @@ static void ublksrv_tgt_set_params(struct ublksrv_ctrl_dev *cdev,
 	}
 }
 
+static int ublksrv_tgt_load_bpf_prog(struct ublksrv_ctrl_dev *cdev)
+{
+	struct ublk_bpf *obj;
+	int ret, io_bpf_fd;
+
+	obj = ublk_bpf__open();
+	if (!obj) {
+		fprintf(stderr, "failed to open BPF object\n");
+		return -1;
+	}
+	ret = ublk_bpf__load(obj);
+	if (ret) {
+		fprintf(stderr, "failed to load BPF object\n");
+		return -1;
+	}
+
+
+	io_bpf_fd = bpf_program__fd(obj->progs.ublk_io_bpf_prog);
+	ret = ublksrv_ctrl_reg_bpf_prog(cdev, io_bpf_fd);
+	if (!ret)
+		ublksrv_ctrl_set_bpf_obj_info(cdev, obj);
+	return ret;
+}
+
 static int cmd_dev_add(int argc, char *argv[])
 {
 	static const struct option longopts[] = {
@@ -696,6 +721,13 @@ static int cmd_dev_add(int argc, char *argv[])
 		goto fail;
 	}
 
+	ret = ublksrv_tgt_load_bpf_prog(dev);
+	if (ret < 0) {
+		fprintf(stderr, "dev %d load bpf prog failed, ret %d\n",
+			data.dev_id, ret);
+		goto fail_stop_daemon;
+	}
+
 	{
 		const struct ublksrv_ctrl_dev_info *info =
 			ublksrv_ctrl_get_dev_info(dev);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC v2 4/4] ublk_drv: add ebpf support
  2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
@ 2023-02-22 19:25   ` Alexei Starovoitov
  2023-02-23 14:01     ` Xiaoguang Wang
  2023-02-23  0:59   ` kernel test robot
  1 sibling, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2023-02-22 19:25 UTC (permalink / raw)
  To: Xiaoguang Wang
  Cc: linux-block, io-uring, bpf, Ming Lei, Jens Axboe, Pavel Begunkov,
	ZiyangZhang

On Wed, Feb 22, 2023 at 5:29 AM Xiaoguang Wang
<xiaoguang.wang@linux.alibaba.com> wrote:
>
> Currenly only one bpf_ublk_queue_sqe() ebpf is added, ublksrv target
> can use this helper to write ebpf prog to support ublk kernel & usersapce
> zero copy, please see ublksrv test codes for more info.
>
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> ---
>  drivers/block/ublk_drv.c       | 263 +++++++++++++++++++++++++++++++--
>  include/uapi/linux/bpf.h       |   1 +
>  include/uapi/linux/ublk_cmd.h  |  18 +++
>  kernel/bpf/verifier.c          |   3 +-
>  scripts/bpf_doc.py             |   4 +
>  tools/include/uapi/linux/bpf.h |   9 ++
>  6 files changed, 286 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index b628e9eaefa6..d17ddb6fc27f 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -105,6 +105,12 @@ struct ublk_uring_cmd_pdu {
>   */
>  #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
>
> +/*
> + * UBLK_IO_FLAG_BPF is set if IO command has be handled by ebpf prog instead
> + * of user space daemon.
> + */
> +#define UBLK_IO_FLAG_BPF       0x10
> +
>  struct ublk_io {
>         /* userspace buffer address from io cmd */
>         __u64   addr;
> @@ -114,6 +120,11 @@ struct ublk_io {
>         struct io_uring_cmd *cmd;
>  };
>
> +struct ublk_req_iter {
> +       struct io_fixed_iter fixed_iter;
> +       struct bio_vec *bvec;
> +};
> +
>  struct ublk_queue {
>         int q_id;
>         int q_depth;
> @@ -163,6 +174,9 @@ struct ublk_device {
>         unsigned int            nr_queues_ready;
>         atomic_t                nr_aborted_queues;
>
> +       struct bpf_prog         *io_bpf_prog;
> +       struct ublk_req_iter    *iter_table;
> +
>         /*
>          * Our ubq->daemon may be killed without any notification, so
>          * monitor each queue's daemon periodically
> @@ -189,10 +203,48 @@ static DEFINE_MUTEX(ublk_ctl_mutex);
>
>  static struct miscdevice ublk_misc;
>
> +struct ublk_io_bpf_ctx {
> +       struct ublk_bpf_ctx ctx;
> +       struct ublk_device *ub;
> +};
> +
> +static inline struct ublk_req_iter *ublk_get_req_iter(struct ublk_device *ub,
> +                                       int qid, int tag)
> +{
> +       return &(ub->iter_table[qid * ub->dev_info.queue_depth + tag]);
> +}
> +
> +BPF_CALL_4(bpf_ublk_queue_sqe, struct ublk_io_bpf_ctx *, bpf_ctx,
> +          struct io_uring_sqe *, sqe, u32, sqe_len, u32, fd)
> +{
> +       struct ublk_req_iter *req_iter;
> +       u16 q_id = bpf_ctx->ctx.q_id;
> +       u16 tag = bpf_ctx->ctx.tag;
> +
> +       req_iter = ublk_get_req_iter(bpf_ctx->ub, q_id, tag);
> +       io_uring_submit_sqe(fd, sqe, sqe_len, &(req_iter->fixed_iter));
> +       return 0;
> +}
> +
> +const struct bpf_func_proto ublk_bpf_queue_sqe_proto = {
> +       .func = bpf_ublk_queue_sqe,
> +       .gpl_only = false,
> +       .ret_type = RET_INTEGER,
> +       .arg1_type = ARG_ANYTHING,
> +       .arg2_type = ARG_ANYTHING,
> +       .arg3_type = ARG_ANYTHING,
> +       .arg4_type = ARG_ANYTHING,
> +};

You know that the above is unsafe, right?

> +
>  static const struct bpf_func_proto *
>  ublk_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> -       return bpf_base_func_proto(func_id);
> +       switch (func_id) {
> +       case BPF_FUNC_ublk_queue_sqe:
> +               return &ublk_bpf_queue_sqe_proto;
> +       default:
> +               return bpf_base_func_proto(func_id);
> +       }
>  }
>
>  static bool ublk_bpf_is_valid_access(int off, int size,
> @@ -200,6 +252,23 @@ static bool ublk_bpf_is_valid_access(int off, int size,
>                         const struct bpf_prog *prog,
>                         struct bpf_insn_access_aux *info)
>  {
> +       if (off < 0 || off >= sizeof(struct ublk_bpf_ctx))
> +               return false;
> +       if (off % size != 0)
> +               return false;
> +
> +       switch (off) {
> +       case offsetof(struct ublk_bpf_ctx, q_id):
> +               return size == sizeof_field(struct ublk_bpf_ctx, q_id);
> +       case offsetof(struct ublk_bpf_ctx, tag):
> +               return size == sizeof_field(struct ublk_bpf_ctx, tag);
> +       case offsetof(struct ublk_bpf_ctx, op):
> +               return size == sizeof_field(struct ublk_bpf_ctx, op);
> +       case offsetof(struct ublk_bpf_ctx, nr_sectors):
> +               return size == sizeof_field(struct ublk_bpf_ctx, nr_sectors);
> +       case offsetof(struct ublk_bpf_ctx, start_sector):
> +               return size == sizeof_field(struct ublk_bpf_ctx, start_sector);
> +       }
>         return false;

We don't introduce stable 'ctx' anymore.
Please see how hid-bpf is doing things.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel
  2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
@ 2023-02-23  0:39   ` kernel test robot
  2023-02-23  1:31   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-23  0:39 UTC (permalink / raw)
  To: Xiaoguang Wang; +Cc: oe-kbuild-all

Hi Xiaoguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v6.2]
[cannot apply to bpf-next/master bpf/master axboe-block/for-next linus/master next-20230222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
patch link:    https://lore.kernel.org/r/20230222132534.114574-3-xiaoguang.wang%40linux.alibaba.com
patch subject: [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel
config: csky-randconfig-r034-20230222 (https://download.01.org/0day-ci/archive/20230223/202302230852.2d5z1aBz-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/160719e38c318893c448e87e839f2c68c211c500
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
        git checkout 160719e38c318893c448e87e839f2c68c211c500
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=csky olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=csky SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302230852.2d5z1aBz-lkp@intel.com/

All errors (new ones prefixed by >>):

   csky-linux-ld: kernel/exit.o: in function `io_uring_submit_sqe':
>> exit.c:(.text+0x9f0): multiple definition of `io_uring_submit_sqe'; kernel/fork.o:fork.c:(.text+0xd94): first defined here
   csky-linux-ld: fs/exec.o: in function `io_uring_submit_sqe':
   exec.c:(.text+0x2080): multiple definition of `io_uring_submit_sqe'; kernel/fork.o:fork.c:(.text+0xd94): first defined here
   csky-linux-ld: security/selinux/hooks.o: in function `io_uring_submit_sqe':
   hooks.c:(.text+0x8420): multiple definition of `io_uring_submit_sqe'; kernel/fork.o:fork.c:(.text+0xd94): first defined here

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v2 4/4] ublk_drv: add ebpf support
  2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
  2023-02-22 19:25   ` Alexei Starovoitov
@ 2023-02-23  0:59   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-23  0:59 UTC (permalink / raw)
  To: Xiaoguang Wang; +Cc: oe-kbuild-all

Hi Xiaoguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v6.2]
[cannot apply to bpf-next/master bpf/master axboe-block/for-next linus/master next-20230222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
patch link:    https://lore.kernel.org/r/20230222132534.114574-5-xiaoguang.wang%40linux.alibaba.com
patch subject: [RFC v2 4/4] ublk_drv: add ebpf support
config: arc-randconfig-r015-20230222 (https://download.01.org/0day-ci/archive/20230223/202302230856.gpapZydk-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/edb68bda205cafb786988ea5c1a3e4465c92fb49
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
        git checkout edb68bda205cafb786988ea5c1a3e4465c92fb49
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302230856.gpapZydk-lkp@intel.com/

All errors (new ones prefixed by >>):

   arc-elf-ld: drivers/block/ublk_drv.o: in function `static_key_count':
>> include/linux/jump_label.h:260: undefined reference to `bpf_stats_enabled_key'
>> arc-elf-ld: include/linux/jump_label.h:260: undefined reference to `bpf_stats_enabled_key'


vim +260 include/linux/jump_label.h

1f69bf9c613760 Jason Baron    2016-08-03  257  
656d054e0a15ec Peter Zijlstra 2022-05-02  258  static __always_inline int static_key_count(struct static_key *key)
4c5ea0a9cd02d6 Paolo Bonzini  2016-06-21  259  {
656d054e0a15ec Peter Zijlstra 2022-05-02 @260  	return arch_atomic_read(&key->enabled);
4c5ea0a9cd02d6 Paolo Bonzini  2016-06-21  261  }
4c5ea0a9cd02d6 Paolo Bonzini  2016-06-21  262  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v2 1/4] bpf: add UBLK program type
  2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
@ 2023-02-23  1:21   ` kernel test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-23  1:21 UTC (permalink / raw)
  To: Xiaoguang Wang; +Cc: llvm, oe-kbuild-all

Hi Xiaoguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v6.2]
[cannot apply to bpf-next/master bpf/master axboe-block/for-next linus/master next-20230222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
patch link:    https://lore.kernel.org/r/20230222132534.114574-2-xiaoguang.wang%40linux.alibaba.com
patch subject: [RFC v2 1/4] bpf: add UBLK program type
config: s390-randconfig-r024-20230222 (https://download.01.org/0day-ci/archive/20230223/202302230957.cU6XDCTE-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db89896bbbd2251fff457699635acbbedeead27f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/29aa0e051b8dc06ecf3f2bf3ebd4131552988a7a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
        git checkout 29aa0e051b8dc06ecf3f2bf3ebd4131552988a7a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302230957.cU6XDCTE-lkp@intel.com/

All errors (new ones prefixed by >>):

>> s390x-linux-ld: kernel/bpf/syscall.o:(.data.rel.ro+0x9c0): undefined reference to `bpf_ublk_prog_ops'
>> s390x-linux-ld: kernel/bpf/verifier.o:(.data.rel.ro+0x160): undefined reference to `bpf_ublk_verifier_ops'
   s390x-linux-ld: drivers/irqchip/irq-al-fic.o: in function `al_fic_init_dt':
   irq-al-fic.c:(.init.text+0x28): undefined reference to `of_iomap'
   s390x-linux-ld: irq-al-fic.c:(.init.text+0x4a2): undefined reference to `iounmap'
   s390x-linux-ld: drivers/dma/fsl-edma.o: in function `fsl_edma_probe':
   fsl-edma.c:(.text+0x21e): undefined reference to `devm_ioremap_resource'
   s390x-linux-ld: fsl-edma.c:(.text+0x376): undefined reference to `devm_ioremap_resource'
   s390x-linux-ld: drivers/dma/qcom/hidma.o: in function `hidma_probe':
   hidma.c:(.text+0x80): undefined reference to `devm_ioremap_resource'
   s390x-linux-ld: hidma.c:(.text+0xd2): undefined reference to `devm_ioremap_resource'
   s390x-linux-ld: drivers/char/xillybus/xillybus_of.o: in function `xilly_drv_probe':
   xillybus_of.c:(.text+0x66): undefined reference to `devm_platform_ioremap_resource'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel
  2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
  2023-02-23  0:39   ` kernel test robot
@ 2023-02-23  1:31   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-23  1:31 UTC (permalink / raw)
  To: Xiaoguang Wang; +Cc: oe-kbuild-all

Hi Xiaoguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v6.2]
[cannot apply to bpf-next/master bpf/master axboe-block/for-next linus/master next-20230222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
patch link:    https://lore.kernel.org/r/20230222132534.114574-3-xiaoguang.wang%40linux.alibaba.com
patch subject: [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel
config: parisc-randconfig-r033-20230222 (https://download.01.org/0day-ci/archive/20230223/202302230919.13HuuMnB-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/160719e38c318893c448e87e839f2c68c211c500
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xiaoguang-Wang/bpf-add-UBLK-program-type/20230222-235148
        git checkout 160719e38c318893c448e87e839f2c68c211c500
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=parisc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302230919.13HuuMnB-lkp@intel.com/

All errors (new ones prefixed by >>):

   hppa-linux-ld: hppa-linux-ld: DWARF error: could not find abbrev number 148863851
   kernel/exit.o: in function `io_uring_submit_sqe':
>> exit.c:(.text+0xee4): multiple definition of `io_uring_submit_sqe'; hppa-linux-ld: DWARF error: could not find abbrev number 100
   kernel/fork.o:fork.c:(.text+0x1528): first defined here
   hppa-linux-ld: hppa-linux-ld: DWARF error: could not find abbrev number 2312747227
   fs/exec.o: in function `io_uring_submit_sqe':
   exec.c:(.text+0x26e4): multiple definition of `io_uring_submit_sqe'; kernel/fork.o:fork.c:(.text+0x1528): first defined here

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v2 4/4] ublk_drv: add ebpf support
  2023-02-22 19:25   ` Alexei Starovoitov
@ 2023-02-23 14:01     ` Xiaoguang Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Xiaoguang Wang @ 2023-02-23 14:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-block, io-uring, bpf, Ming Lei, Jens Axboe, Pavel Begunkov,
	ZiyangZhang

hi,

> On Wed, Feb 22, 2023 at 5:29 AM Xiaoguang Wang
> <xiaoguang.wang@linux.alibaba.com> wrote:
>> Currenly only one bpf_ublk_queue_sqe() ebpf is added, ublksrv target
>> can use this helper to write ebpf prog to support ublk kernel & usersapce
>> zero copy, please see ublksrv test codes for more info.
>>
>>
>> +const struct bpf_func_proto ublk_bpf_queue_sqe_proto = {
>> +       .func = bpf_ublk_queue_sqe,
>> +       .gpl_only = false,
>> +       .ret_type = RET_INTEGER,
>> +       .arg1_type = ARG_ANYTHING,
>> +       .arg2_type = ARG_ANYTHING,
>> +       .arg3_type = ARG_ANYTHING,
>> +       .arg4_type = ARG_ANYTHING,
>> +};
> You know that the above is unsafe, right?
Yes, I know it's not safe, will improve it in next version.

>
>> +
>>  static const struct bpf_func_proto *
>>  ublk_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  {
>> -       return bpf_base_func_proto(func_id);
>> +       switch (func_id) {
>> +       case BPF_FUNC_ublk_queue_sqe:
>> +               return &ublk_bpf_queue_sqe_proto;
>> +       default:
>> +               return bpf_base_func_proto(func_id);
>> +       }
>>  }
>>
>>  static bool ublk_bpf_is_valid_access(int off, int size,
>> @@ -200,6 +252,23 @@ static bool ublk_bpf_is_valid_access(int off, int size,
>>                         const struct bpf_prog *prog,
>>                         struct bpf_insn_access_aux *info)
>>  {
>> +       if (off < 0 || off >= sizeof(struct ublk_bpf_ctx))
>> +               return false;
>> +       if (off % size != 0)
>> +               return false;
>> +
>> +       switch (off) {
>> +       case offsetof(struct ublk_bpf_ctx, q_id):
>> +               return size == sizeof_field(struct ublk_bpf_ctx, q_id);
>> +       case offsetof(struct ublk_bpf_ctx, tag):
>> +               return size == sizeof_field(struct ublk_bpf_ctx, tag);
>> +       case offsetof(struct ublk_bpf_ctx, op):
>> +               return size == sizeof_field(struct ublk_bpf_ctx, op);
>> +       case offsetof(struct ublk_bpf_ctx, nr_sectors):
>> +               return size == sizeof_field(struct ublk_bpf_ctx, nr_sectors);
>> +       case offsetof(struct ublk_bpf_ctx, start_sector):
>> +               return size == sizeof_field(struct ublk_bpf_ctx, start_sector);
>> +       }
>>         return false;
> We don't introduce stable 'ctx' anymore.
> Please see how hid-bpf is doing things.
ok, will learn it, thanks.

Regards,
Xiaoguang Wang


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-23 14:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-22 13:25 [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
2023-02-23  1:21   ` kernel test robot
2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
2023-02-23  0:39   ` kernel test robot
2023-02-23  1:31   ` kernel test robot
2023-02-22 13:25 ` [RFC v2 3/4] io_uring: introduce IORING_URING_CMD_UNLOCK flag Xiaoguang Wang
2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
2023-02-22 19:25   ` Alexei Starovoitov
2023-02-23 14:01     ` Xiaoguang Wang
2023-02-23  0:59   ` kernel test robot
2023-02-22 13:27 ` [PATCH] Add " Xiaoguang Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.