All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033
@ 2023-02-24 12:45 Ming Lei
  2023-02-24 12:45 ` [PATCH blktests v4 1/2] src: add mini ublk source code Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Ming Lei @ 2023-02-24 12:45 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki
  Cc: linux-block, Chaitanya Kulkarni, Ziyang Zhang, Ming Lei

Hello,

The 1st patch adds one mini ublk program, which only supports null &
loop targets.

The 2nd patch add blktests/033 for covering gendisk leak issue.

v4:
	- fix one error handling bug in src/miniublk.c
	- patch style improvement on src/miniublk.c
	- both are suggested by Chaitanya Kulkarni

v3:
	- move minublk.c into src/
	- remove '-i' in top Makefile
	- fix commit log for 1/2
	- add 'udevadm settle' after adding device, so that the following
	test can be done reliably
	- fix _init_ublk()
	- redirect runtime log into $FULL
	- all are suggested by Shinichiro Kawasaki

v2:
	- cleanup & bugfix on miniublk.c
	- avoid ignoring build warning just for addressing missed liburing
	  or ublk kernel header
	- patch style fix
	- make 'make check' happy

Ming Lei (2):
  src: add mini ublk source code
  block/033: add test to cover gendisk leak

 common/ublk         |   35 ++
 src/.gitignore      |    1 +
 src/Makefile        |   18 +
 src/miniublk.c      | 1367 +++++++++++++++++++++++++++++++++++++++++++
 tests/block/033     |   41 ++
 tests/block/033.out |    2 +
 6 files changed, 1464 insertions(+)
 create mode 100644 common/ublk
 create mode 100644 src/miniublk.c
 create mode 100755 tests/block/033
 create mode 100644 tests/block/033.out

-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH blktests v4 1/2] src: add mini ublk source code
  2023-02-24 12:45 [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Ming Lei
@ 2023-02-24 12:45 ` Ming Lei
  2023-02-24 12:45 ` [PATCH blktests v4 2/2] block/033: add test to cover gendisk leak Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2023-02-24 12:45 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki
  Cc: linux-block, Chaitanya Kulkarni, Ziyang Zhang, Ming Lei

Prepare for adding ublk related test:

1) ublk delete is sync removal, this way is convenient to
   blkg/queue/disk instance leak issue

2) mini ublk has two builtin target(null, loop), and loop IO is
handled by io_uring, so we can use ublk to cover part of io_uring
workloads

3) not like loop/nbd, ublk won't pre-allocate/add disk, and always
add/delete disk dynamically, this way may cover disk plug & unplug
tests

4) ublk specific test given people starts to use it, so better to
let blktest cover ublk related tests

Add mini ublk source for test purpose only, which is easy to use:

./miniublk add -t {null|loop} [-q nr_queues] [-d depth] [-n dev_id]
	 default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)
	 -t loop -f backing_file
	 -t null
./miniublk del [-n dev_id] -a
	 -a delete all devices, -n delete specified device
./miniublk list [-n dev_id] -a
	 -a list all devices, -n list specified device, default -a

miniublk depends on liburing 2.2, adds HAVE_LIBURING for checking if
liburing 2.2 exists; also add HAVE_UBLK_HEADER for checking ublk kernel
UAPI header exits. If either of two dependencies can't be met, simply
ignore miniublk target.

Also v6.0 is the 1st linux kernel release with ublk.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 src/.gitignore |    1 +
 src/Makefile   |   18 +
 src/miniublk.c | 1367 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1386 insertions(+)
 create mode 100644 src/miniublk.c

diff --git a/src/.gitignore b/src/.gitignore
index 355bed3..df7aff5 100644
--- a/src/.gitignore
+++ b/src/.gitignore
@@ -8,3 +8,4 @@
 /sg/dxfer-from-dev
 /sg/syzkaller1
 /zbdioctl
+/miniublk
diff --git a/src/Makefile b/src/Makefile
index 3b587f6..81c6541 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,6 +2,10 @@ HAVE_C_HEADER = $(shell if echo "\#include <$(1)>" |		\
 		$(CC) -E - > /dev/null 2>&1; then echo "$(2)";	\
 		else echo "$(3)"; fi)
 
+HAVE_C_MACRO = $(shell if echo "#include <$(1)>" |	\
+		$(CC) -E - 2>&1 /dev/null | grep $(2) > /dev/null 2>&1; \
+		then echo 1;else echo 0; fi)
+
 C_TARGETS := \
 	loblksize \
 	loop_change_fd \
@@ -13,16 +17,27 @@ C_TARGETS := \
 	sg/syzkaller1 \
 	zbdioctl
 
+C_MINIUBLK := miniublk
+
+HAVE_LIBURING := $(call HAVE_C_MACRO,liburing.h,IORING_OP_URING_CMD)
+HAVE_UBLK_HEADER := $(call HAVE_C_HEADER,linux/ublk_cmd.h,1)
+
 CXX_TARGETS := \
 	discontiguous-io
 
+ifeq ($(HAVE_LIBURING)$(HAVE_UBLK_HEADER), 11)
+TARGETS := $(C_TARGETS) $(CXX_TARGETS) $(C_MINIUBLK)
+else
+$(info Skip $(C_MINIUBLK) build due to missing kernel header(v6.0+) or liburing(2.2+))
 TARGETS := $(C_TARGETS) $(CXX_TARGETS)
+endif
 
 CONFIG_DEFS := $(call HAVE_C_HEADER,linux/blkzoned.h,-DHAVE_LINUX_BLKZONED_H)
 
 override CFLAGS   := -O2 -Wall -Wshadow $(CFLAGS) $(CONFIG_DEFS)
 override CXXFLAGS := -O2 -std=c++11 -Wall -Wextra -Wshadow -Wno-sign-compare \
 		     -Werror $(CXXFLAGS) $(CONFIG_DEFS)
+MINIUBLK_FLAGS :=  -D_GNU_SOURCE -lpthread -luring
 
 all: $(TARGETS)
 
@@ -39,4 +54,7 @@ $(C_TARGETS): %: %.c
 $(CXX_TARGETS): %: %.cpp
 	$(CXX) $(CPPFLAGS) $(CXXFLAGS) -o $@ $^
 
+$(C_MINIUBLK): %: miniublk.c
+	$(CC) $(CFLAGS) $(MINIUBLK_FLAGS) -o $@ miniublk.c
+
 .PHONY: all clean install
diff --git a/src/miniublk.c b/src/miniublk.c
new file mode 100644
index 0000000..fe10291
--- /dev/null
+++ b/src/miniublk.c
@@ -0,0 +1,1367 @@
+// SPDX-License-Identifier: GPL-3.0+
+// Copyright (C) 2023 Ming Lei
+
+/*
+ * io_uring based mini ublk implementation with null/loop target,
+ * for test purpose only.
+ *
+ * So please keep it clean & simple & reliable.
+ */
+
+#include <unistd.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <pthread.h>
+#include <getopt.h>
+#include <limits.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <liburing.h>
+#include <linux/ublk_cmd.h>
+
+#define CTRL_DEV		"/dev/ublk-control"
+#define UBLKC_DEV		"/dev/ublkc"
+#define UBLK_CTRL_RING_DEPTH            32
+
+/* queue idle timeout */
+#define UBLKSRV_IO_IDLE_SECS		20
+
+#define UBLK_IO_MAX_BYTES               65536
+#define UBLK_MAX_QUEUES                 4
+#define UBLK_QUEUE_DEPTH                128
+
+#define UBLK_DBG_DEV            (1U << 0)
+#define UBLK_DBG_QUEUE          (1U << 1)
+#define UBLK_DBG_IO_CMD         (1U << 2)
+#define UBLK_DBG_IO             (1U << 3)
+#define UBLK_DBG_CTRL_CMD       (1U << 4)
+#define UBLK_LOG		(1U << 5)
+
+struct ublk_dev;
+struct ublk_queue;
+
+struct ublk_ctrl_cmd_data {
+	__u32 cmd_op;
+#define CTRL_CMD_HAS_DATA	1
+#define CTRL_CMD_HAS_BUF	2
+	__u32 flags;
+
+	__u64 data[2];
+	__u64 addr;
+	__u32 len;
+};
+
+struct ublk_io {
+	char *buf_addr;
+
+#define UBLKSRV_NEED_FETCH_RQ		(1UL << 0)
+#define UBLKSRV_NEED_COMMIT_RQ_COMP	(1UL << 1)
+#define UBLKSRV_IO_FREE			(1UL << 2)
+	unsigned int flags;
+
+	unsigned int result;
+};
+
+struct ublk_tgt_ops {
+	const char *name;
+	int (*init_tgt)(struct ublk_dev *);
+	void (*deinit_tgt)(struct ublk_dev *);
+
+	int (*queue_io)(struct ublk_queue *, int tag);
+	void (*tgt_io_done)(struct ublk_queue *,
+			int tag, const struct io_uring_cqe *);
+};
+
+struct ublk_tgt {
+	unsigned long dev_size;
+	const struct ublk_tgt_ops *ops;
+	int argc;
+	char **argv;
+	struct ublk_params params;
+};
+
+struct ublk_queue {
+	int q_id;
+	int q_depth;
+	unsigned int cmd_inflight;
+	unsigned int io_inflight;
+	struct ublk_dev *dev;
+	const struct ublk_tgt_ops *tgt_ops;
+	char *io_cmd_buf;
+	struct io_uring ring;
+	struct ublk_io ios[UBLK_QUEUE_DEPTH];
+#define UBLKSRV_QUEUE_STOPPING	(1U << 0)
+#define UBLKSRV_QUEUE_IDLE	(1U << 1)
+	unsigned state;
+	pid_t tid;
+	pthread_t thread;
+};
+
+struct ublk_dev {
+	struct ublk_tgt tgt;
+	struct ublksrv_ctrl_dev_info  dev_info;
+	struct ublk_queue q[UBLK_MAX_QUEUES];
+
+	int fds[2];	/* fds[0] points to /dev/ublkcN */
+	int nr_fds;
+	int ctrl_fd;
+	struct io_uring ring;
+};
+
+#ifndef offsetof
+#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
+#endif
+
+#ifndef container_of
+#define container_of(ptr, type, member) ({                              \
+	unsigned long __mptr = (unsigned long)(ptr);                    \
+	((type *)(__mptr - offsetof(type, member))); })
+#endif
+
+#define round_up(val, rnd) \
+	(((val) + ((rnd) - 1)) & ~((rnd) - 1))
+
+#define ublk_assert(x)  do { \
+	if (!(x)) {     \
+		ublk_err("%s %d: assert!\n", __func__, __LINE__); \
+		assert(x);      \
+	}       \
+} while (0)
+
+static const struct ublk_tgt_ops *ublk_find_tgt(const char *name);
+
+static unsigned int ublk_dbg_mask = UBLK_LOG;
+
+static inline unsigned ilog2(unsigned x)
+{
+    return sizeof(unsigned) * 8 - 1 - __builtin_clz(x);
+}
+
+static inline int is_target_io(__u64 user_data)
+{
+	return (user_data & (1ULL << 63)) != 0;
+}
+
+static inline __u64 build_user_data(unsigned tag, unsigned op,
+		unsigned tgt_data, unsigned is_target_io)
+{
+	assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16));
+
+	return tag | (op << 16) | (tgt_data << 24) | (__u64)is_target_io << 63;
+}
+
+static inline unsigned int user_data_to_tag(__u64 user_data)
+{
+	return user_data & 0xffff;
+}
+
+static inline unsigned int user_data_to_op(__u64 user_data)
+{
+	return (user_data >> 16) & 0xff;
+}
+
+static void ublk_err(const char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+}
+
+static void ublk_log(const char *fmt, ...)
+{
+	if (ublk_dbg_mask & UBLK_LOG) {
+		va_list ap;
+
+		va_start(ap, fmt);
+		vfprintf(stdout, fmt, ap);
+	}
+}
+
+static void ublk_dbg(int level, const char *fmt, ...)
+{
+	if (level & ublk_dbg_mask) {
+		va_list ap;
+		va_start(ap, fmt);
+		vfprintf(stdout, fmt, ap);
+        }
+}
+
+static inline void *ublk_get_sqe_cmd(const struct io_uring_sqe *sqe)
+{
+	return (void *)&sqe->addr3;
+}
+
+static inline void ublk_mark_io_done(struct ublk_io *io, int res)
+{
+	io->flags |= (UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_IO_FREE);
+	io->result = res;
+}
+
+static inline const struct ublksrv_io_desc *ublk_get_iod(
+                const struct ublk_queue *q, int tag)
+{
+        return (struct ublksrv_io_desc *)
+                &(q->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]);
+}
+
+static inline void ublk_set_sqe_cmd_op(struct io_uring_sqe *sqe,
+		__u32 cmd_op)
+{
+        __u32 *addr = (__u32 *)&sqe->off;
+
+        addr[0] = cmd_op;
+        addr[1] = 0;
+}
+
+static inline int ublk_setup_ring(struct io_uring *r, int depth,
+		int cq_depth, unsigned flags)
+{
+	struct io_uring_params p;
+
+	memset(&p, 0, sizeof(p));
+	p.flags = flags | IORING_SETUP_CQSIZE;
+	p.cq_entries = cq_depth;
+
+	return io_uring_queue_init_params(depth, r, &p);
+}
+
+static inline void ublk_ctrl_init_cmd(struct ublk_dev *dev,
+		struct io_uring_sqe *sqe,
+		struct ublk_ctrl_cmd_data *data)
+{
+	struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+	struct ublksrv_ctrl_cmd *cmd = (struct ublksrv_ctrl_cmd *)ublk_get_sqe_cmd(sqe);
+
+	sqe->fd = dev->ctrl_fd;
+	sqe->opcode = IORING_OP_URING_CMD;
+	sqe->ioprio = 0;
+
+	if (data->flags & CTRL_CMD_HAS_BUF) {
+		cmd->addr = data->addr;
+		cmd->len = data->len;
+	}
+
+	if (data->flags & CTRL_CMD_HAS_DATA)
+		cmd->data[0] = data->data[0];
+
+	cmd->dev_id = info->dev_id;
+	cmd->queue_id = -1;
+
+	ublk_set_sqe_cmd_op(sqe, data->cmd_op);
+
+	io_uring_sqe_set_data(sqe, cmd);
+}
+
+static int __ublk_ctrl_cmd(struct ublk_dev *dev,
+		struct ublk_ctrl_cmd_data *data)
+{
+	struct io_uring_sqe *sqe;
+	struct io_uring_cqe *cqe;
+	int ret = -EINVAL;
+
+	sqe = io_uring_get_sqe(&dev->ring);
+	if (!sqe) {
+		ublk_err("%s: can't get sqe ret %d\n", __func__, ret);
+		return ret;
+	}
+
+	ublk_ctrl_init_cmd(dev, sqe, data);
+
+	ret = io_uring_submit(&dev->ring);
+	if (ret < 0) {
+		ublk_err("uring submit ret %d\n", ret);
+		return ret;
+	}
+
+	ret = io_uring_wait_cqe(&dev->ring, &cqe);
+	if (ret < 0) {
+		ublk_err("wait cqe: %s\n", strerror(-ret));
+		return ret;
+	}
+	io_uring_cqe_seen(&dev->ring, cqe);
+
+	return cqe->res;
+}
+
+int ublk_ctrl_stop_dev(struct ublk_dev *dev)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_STOP_DEV,
+	};
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+int ublk_ctrl_start_dev(struct ublk_dev *dev,
+		int daemon_pid)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_START_DEV,
+		.flags	= CTRL_CMD_HAS_DATA,
+	};
+
+	dev->dev_info.ublksrv_pid = data.data[0] = daemon_pid;
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+int ublk_ctrl_add_dev(struct ublk_dev *dev)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_ADD_DEV,
+		.flags	= CTRL_CMD_HAS_BUF,
+		.addr = (__u64)&dev->dev_info,
+		.len = sizeof(struct ublksrv_ctrl_dev_info),
+	};
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+int ublk_ctrl_del_dev(struct ublk_dev *dev)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op = UBLK_CMD_DEL_DEV,
+		.flags = 0,
+	};
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+int ublk_ctrl_get_info(struct ublk_dev *dev)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_GET_DEV_INFO,
+		.flags	= CTRL_CMD_HAS_BUF,
+		.addr = (__u64)&dev->dev_info,
+		.len = sizeof(struct ublksrv_ctrl_dev_info),
+	};
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+int ublk_ctrl_set_params(struct ublk_dev *dev,
+		struct ublk_params *params)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_SET_PARAMS,
+		.flags	= CTRL_CMD_HAS_BUF,
+		.addr = (__u64)params,
+		.len = sizeof(*params),
+	};
+	params->len = sizeof(*params);
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_get_params(struct ublk_dev *dev,
+		struct ublk_params *params)
+{
+	struct ublk_ctrl_cmd_data data = {
+		.cmd_op	= UBLK_CMD_GET_PARAMS,
+		.flags	= CTRL_CMD_HAS_BUF,
+		.addr = (__u64)params,
+		.len = sizeof(*params),
+	};
+
+	params->len = sizeof(*params);
+
+	return __ublk_ctrl_cmd(dev, &data);
+}
+
+static const char *ublk_dev_state_desc(struct ublk_dev *dev)
+{
+	switch (dev->dev_info.state) {
+	case UBLK_S_DEV_DEAD:
+		return "DEAD";
+	case UBLK_S_DEV_LIVE:
+		return "LIVE";
+	default:
+		return "UNKNOWN";
+	};
+}
+
+static void ublk_ctrl_dump(struct ublk_dev *dev, bool show_queue)
+{
+	struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+	int ret;
+	struct ublk_params p;
+
+	ret = ublk_ctrl_get_params(dev, &p);
+	if (ret < 0) {
+		ublk_err("failed to get params %m\n");
+		return;
+	}
+
+	ublk_log("dev id %d: nr_hw_queues %d queue_depth %d block size %d dev_capacity %lld\n",
+			info->dev_id,
+                        info->nr_hw_queues, info->queue_depth,
+                        1 << p.basic.logical_bs_shift, p.basic.dev_sectors);
+	ublk_log("\tmax rq size %d daemon pid %d flags 0x%llx state %s\n",
+                        info->max_io_buf_bytes,
+			info->ublksrv_pid, info->flags,
+			ublk_dev_state_desc(dev));
+	if (show_queue) {
+		int i;
+
+		for (i = 0; i < dev->dev_info.nr_hw_queues; i++)
+			ublk_log("\tqueue 0 tid: %d\n", dev->q[i].tid);
+	}
+	fflush(stdout);
+}
+
+static void ublk_ctrl_deinit(struct ublk_dev *dev)
+{
+	close(dev->ctrl_fd);
+	free(dev);
+}
+
+static struct ublk_dev *ublk_ctrl_init()
+{
+	struct ublk_dev *dev = (struct ublk_dev *)calloc(1, sizeof(*dev));
+	struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+	int ret;
+
+	dev->ctrl_fd = open(CTRL_DEV, O_RDWR);
+	if (dev->ctrl_fd < 0) {
+		ublk_err("control dev %s can't be opened: %m %d\n", CTRL_DEV, errno);
+		exit(dev->ctrl_fd);
+	}
+	info->max_io_buf_bytes = UBLK_IO_MAX_BYTES;
+
+	ret = ublk_setup_ring(&dev->ring, UBLK_CTRL_RING_DEPTH,
+			UBLK_CTRL_RING_DEPTH, IORING_SETUP_SQE128);
+	if (ret < 0) {
+		ublk_err("queue_init: %s\n", strerror(-ret));
+		free(dev);
+		return NULL;
+	}
+	dev->nr_fds = 1;
+
+	return dev;
+}
+
+static int ublk_queue_cmd_buf_sz(struct ublk_queue *q)
+{
+	int size =  q->q_depth * sizeof(struct ublksrv_io_desc);
+	unsigned int page_sz = getpagesize();
+
+	return round_up(size, page_sz);
+}
+
+static void ublk_queue_deinit(struct ublk_queue *q)
+{
+	int i;
+	int nr_ios = q->q_depth;
+
+	io_uring_unregister_ring_fd(&q->ring);
+
+	if (q->ring.ring_fd > 0) {
+		io_uring_unregister_files(&q->ring);
+		close(q->ring.ring_fd);
+		q->ring.ring_fd = -1;
+	}
+
+	if (q->io_cmd_buf)
+		munmap(q->io_cmd_buf, ublk_queue_cmd_buf_sz(q));
+
+	for (i = 0; i < nr_ios; i++)
+		free(q->ios[i].buf_addr);
+}
+
+static int ublk_queue_init(struct ublk_queue *q)
+{
+	struct ublk_dev *dev = q->dev;
+	int depth = dev->dev_info.queue_depth;
+	int i, ret = -1;
+	int cmd_buf_size, io_buf_size;
+	unsigned long off;
+	int ring_depth = depth, cq_depth = depth;
+
+	q->tgt_ops = dev->tgt.ops;
+	q->state = 0;
+	q->q_depth = depth;
+	q->cmd_inflight = 0;
+	q->tid = gettid();
+
+	cmd_buf_size = ublk_queue_cmd_buf_sz(q);
+	off = UBLKSRV_CMD_BUF_OFFSET +
+		q->q_id * (UBLK_MAX_QUEUE_DEPTH * sizeof(struct ublksrv_io_desc));
+	q->io_cmd_buf = (char *)mmap(0, cmd_buf_size, PROT_READ,
+			MAP_SHARED | MAP_POPULATE, dev->fds[0], off);
+	if (q->io_cmd_buf == MAP_FAILED) {
+		ublk_err("ublk dev %d queue %d map io_cmd_buf failed %m\n",
+				q->dev->dev_info.dev_id, q->q_id);
+		goto fail;
+	}
+
+	io_buf_size = dev->dev_info.max_io_buf_bytes;
+	for (i = 0; i < q->q_depth; i++) {
+		q->ios[i].buf_addr = NULL;
+
+		if (posix_memalign((void **)&q->ios[i].buf_addr,
+					getpagesize(), io_buf_size)) {
+			ublk_err("ublk dev %d queue %d io %d posix_memalign failed %m\n",
+					dev->dev_info.dev_id, q->q_id, i);
+			goto fail;
+		}
+		q->ios[i].flags = UBLKSRV_NEED_FETCH_RQ | UBLKSRV_IO_FREE;
+	}
+
+	ret = ublk_setup_ring(&q->ring, ring_depth, cq_depth,
+			IORING_SETUP_SQE128 | IORING_SETUP_COOP_TASKRUN);
+	if (ret < 0) {
+		ublk_err("ublk dev %d queue %d setup io_uring failed %d\n",
+				q->dev->dev_info.dev_id, q->q_id, ret);
+		goto fail;
+	}
+
+	io_uring_register_ring_fd(&q->ring);
+
+	ret = io_uring_register_files(&q->ring, dev->fds, dev->nr_fds);
+	if (ret) {
+		ublk_err("ublk dev %d queue %d register files failed %d\n",
+				q->dev->dev_info.dev_id, q->q_id, ret);
+		goto fail;
+	}
+
+	return 0;
+ fail:
+	ublk_queue_deinit(q);
+	ublk_err("ublk dev %d queue %d failed\n",
+			dev->dev_info.dev_id, q->q_id);
+	return -ENOMEM;
+}
+
+static int ublk_dev_prep(struct ublk_dev *dev)
+{
+	int dev_id = dev->dev_info.dev_id;
+	char buf[64];
+	int ret = 0;
+
+	snprintf(buf, 64, "%s%d", UBLKC_DEV, dev_id);
+	dev->fds[0] = open(buf, O_RDWR);
+	if (dev->fds[0] < 0) {
+		ret = -EBADF;
+		ublk_err("can't open %s, ret %d\n", buf, dev->fds[0]);
+		goto fail;
+	}
+
+	if (dev->tgt.ops->init_tgt)
+		ret = dev->tgt.ops->init_tgt(dev);
+
+	return ret;
+fail:
+	close(dev->fds[0]);
+	return ret;
+}
+
+static void ublk_dev_unprep(struct ublk_dev *dev)
+{
+	if (dev->tgt.ops->deinit_tgt)
+		dev->tgt.ops->deinit_tgt(dev);
+	close(dev->fds[0]);
+}
+
+static int ublk_queue_io_cmd(struct ublk_queue *q,
+		struct ublk_io *io, unsigned tag)
+{
+	struct ublksrv_io_cmd *cmd;
+	struct io_uring_sqe *sqe;
+	unsigned int cmd_op = 0;
+	__u64 user_data;
+
+	/* only freed io can be issued */
+	if (!(io->flags & UBLKSRV_IO_FREE))
+		return 0;
+
+	/* we issue because we need either fetching or committing */
+	if (!(io->flags &
+		(UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP)))
+		return 0;
+
+	if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
+		cmd_op = UBLK_IO_COMMIT_AND_FETCH_REQ;
+	else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
+		cmd_op = UBLK_IO_FETCH_REQ;
+
+	sqe = io_uring_get_sqe(&q->ring);
+	if (!sqe) {
+		ublk_err("%s: run out of sqe %d, tag %d\n",
+				__func__, q->q_id, tag);
+		return -1;
+	}
+
+	cmd = (struct ublksrv_io_cmd *)ublk_get_sqe_cmd(sqe);
+
+	if (cmd_op == UBLK_IO_COMMIT_AND_FETCH_REQ)
+		cmd->result = io->result;
+
+	/* These fields should be written once, never change */
+	ublk_set_sqe_cmd_op(sqe, cmd_op);
+	sqe->fd		= 0;	/* dev->fds[0] */
+	sqe->opcode	= IORING_OP_URING_CMD;
+	sqe->flags	= IOSQE_FIXED_FILE;
+	sqe->rw_flags	= 0;
+	cmd->tag	= tag;
+	cmd->addr	= (__u64)io->buf_addr;
+	cmd->q_id	= q->q_id;
+
+	user_data = build_user_data(tag, cmd_op, 0, 0);
+	io_uring_sqe_set_data64(sqe, user_data);
+
+	io->flags = 0;
+
+	q->cmd_inflight += 1;
+
+	ublk_dbg(UBLK_DBG_IO_CMD, "%s: (qid %d tag %u cmd_op %u) iof %x stopping %d\n",
+			__func__, q->q_id, tag, cmd_op,
+			io->flags, !!(q->state & UBLKSRV_QUEUE_STOPPING));
+	return 1;
+}
+
+static int ublk_complete_io(struct ublk_queue *q,
+		unsigned tag, int res)
+{
+	struct ublk_io *io = &q->ios[tag];
+
+	ublk_mark_io_done(io, res);
+
+	return ublk_queue_io_cmd(q, io, tag);
+}
+
+static void ublk_submit_fetch_commands(struct ublk_queue *q)
+{
+	int i = 0;
+
+	for (i = 0; i < q->q_depth; i++)
+		ublk_queue_io_cmd(q, &q->ios[i], i);
+}
+
+static int ublk_queue_is_idle(struct ublk_queue *q)
+{
+	return !io_uring_sq_ready(&q->ring) && !q->io_inflight;
+}
+
+static int ublk_queue_is_done(struct ublk_queue *q)
+{
+	return (q->state & UBLKSRV_QUEUE_STOPPING) && ublk_queue_is_idle(q);
+}
+
+static void ublk_queue_discard_io_pages(struct ublk_queue *q)
+{
+	const struct ublk_dev *dev = q->dev;
+	unsigned int io_buf_size = dev->dev_info.max_io_buf_bytes;
+	int i = 0;
+
+	for (i = 0; i < q->q_depth; i++)
+		madvise(q->ios[i].buf_addr, io_buf_size, MADV_DONTNEED);
+}
+
+static void ublk_queue_idle_enter(struct ublk_queue *q)
+{
+	if (q->state & UBLKSRV_QUEUE_IDLE)
+		return;
+
+	ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: enter idle %x\n",
+			q->dev->dev_info.dev_id, q->q_id, q->state);
+	ublk_queue_discard_io_pages(q);
+	q->state |= UBLKSRV_QUEUE_IDLE;
+}
+
+static void ublk_queue_idle_exit(struct ublk_queue *q)
+{
+	if (q->state & UBLKSRV_QUEUE_IDLE) {
+		ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: exit idle %x\n",
+			q->dev->dev_info.dev_id, q->q_id, q->state);
+		q->state &= ~UBLKSRV_QUEUE_IDLE;
+	}
+}
+
+static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q,
+		struct io_uring_cqe *cqe)
+{
+	unsigned tag = user_data_to_tag(cqe->user_data);
+
+	if (cqe->res < 0 && cqe->res != -EAGAIN)
+		ublk_err("%s: failed tgt io: res %d qid %u tag %u, cmd_op %u\n",
+			__func__, cqe->res, q->q_id,
+			user_data_to_tag(cqe->user_data),
+			user_data_to_op(cqe->user_data));
+
+	if (q->tgt_ops->tgt_io_done)
+		q->tgt_ops->tgt_io_done(q, tag, cqe);
+}
+
+static void ublk_handle_cqe(struct io_uring *r,
+		struct io_uring_cqe *cqe, void *data)
+{
+	struct ublk_queue *q = container_of(r, struct ublk_queue, ring);
+	unsigned tag = user_data_to_tag(cqe->user_data);
+	unsigned cmd_op = user_data_to_op(cqe->user_data);
+	int fetch = (cqe->res != UBLK_IO_RES_ABORT) &&
+		!(q->state & UBLKSRV_QUEUE_STOPPING);
+	struct ublk_io *io;
+
+	ublk_dbg(UBLK_DBG_IO_CMD, "%s: res %d (qid %d tag %u cmd_op %u target %d) stopping %d\n",
+			__func__, cqe->res, q->q_id, tag, cmd_op,
+			is_target_io(cqe->user_data),
+			(q->state & UBLKSRV_QUEUE_STOPPING));
+
+	/* Don't retrieve io in case of target io */
+	if (is_target_io(cqe->user_data)) {
+		ublksrv_handle_tgt_cqe(q, cqe);
+		return;
+	}
+
+	io = &q->ios[tag];
+	q->cmd_inflight--;
+
+	if (!fetch) {
+		q->state |= UBLKSRV_QUEUE_STOPPING;
+		io->flags &= ~UBLKSRV_NEED_FETCH_RQ;
+	}
+
+	if (cqe->res == UBLK_IO_RES_OK) {
+		ublk_assert(tag < q->q_depth);
+		q->tgt_ops->queue_io(q, tag);
+	} else {
+		/*
+		 * COMMIT_REQ will be completed immediately since no fetching
+		 * piggyback is required.
+		 *
+		 * Marking IO_FREE only, then this io won't be issued since
+		 * we only issue io with (UBLKSRV_IO_FREE | UBLKSRV_NEED_*)
+		 *
+		 * */
+		io->flags = UBLKSRV_IO_FREE;
+	}
+}
+
+static int ublk_reap_events_uring(struct io_uring *r)
+{
+	struct io_uring_cqe *cqe;
+	unsigned head;
+	int count = 0;
+
+	io_uring_for_each_cqe(r, head, cqe) {
+		ublk_handle_cqe(r, cqe, NULL);
+		count += 1;
+	}
+	io_uring_cq_advance(r, count);
+
+	return count;
+}
+
+static int ublk_process_io(struct ublk_queue *q)
+{
+	int ret, reapped;
+	struct __kernel_timespec ts = {
+		.tv_sec = UBLKSRV_IO_IDLE_SECS,
+		.tv_nsec = 0
+        };
+	struct __kernel_timespec *tsp = (q->state & UBLKSRV_QUEUE_IDLE) ?
+		NULL : &ts;
+	struct io_uring_cqe *cqe;
+
+	ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: to_submit %d inflight cmd %u stopping %d\n",
+				q->dev->dev_info.dev_id,
+				q->q_id, io_uring_sq_ready(&q->ring),
+				q->cmd_inflight,
+				(q->state & UBLKSRV_QUEUE_STOPPING));
+
+	if (ublk_queue_is_done(q))
+		return -ENODEV;
+
+	ret = io_uring_submit_and_wait_timeout(&q->ring, &cqe, 1, tsp, NULL);
+	reapped = ublk_reap_events_uring(&q->ring);
+
+	ublk_dbg(UBLK_DBG_QUEUE, "submit result %d, reapped %d stop %d idle %d\n",
+			ret, reapped, (q->state & UBLKSRV_QUEUE_STOPPING),
+			(q->state & UBLKSRV_QUEUE_IDLE));
+
+	if (!(q->state & UBLKSRV_QUEUE_STOPPING)) {
+		if (ret == -ETIME && reapped == 0 && ublk_queue_is_idle(q))
+			ublk_queue_idle_enter(q);
+		else
+			ublk_queue_idle_exit(q);
+	}
+	return reapped;
+}
+
+static void *ublk_io_handler_fn(void *data)
+{
+	struct ublk_queue *q = data;
+	int dev_id = q->dev->dev_info.dev_id;
+	int ret;
+
+	ret = ublk_queue_init(q);
+	if (ret) {
+		ublk_err("ublk dev %d queue %d init queue failed\n",
+				dev_id, q->q_id);
+		return NULL;
+	}
+
+	/* submit all io commands to ublk driver */
+	ublk_submit_fetch_commands(q);
+
+	ublk_dbg(UBLK_DBG_QUEUE, "tid %d: ublk dev %d queue %d started\n",
+			gettid(),
+			dev_id, q->q_id);
+	do {
+		if (ublk_process_io(q) < 0)
+			break;
+	} while (1);
+
+	ublk_dbg(UBLK_DBG_QUEUE, "ublk dev %d queue %d exited\n", dev_id, q->q_id);
+	ublk_queue_deinit(q);
+	return NULL;
+}
+
+static void ublk_set_parameters(struct ublk_dev *dev)
+{
+	int ret;
+
+	ret = ublk_ctrl_set_params(dev, &dev->tgt.params);
+	if (ret)
+		ublk_err("dev %d set basic parameter failed %d\n",
+				dev->dev_info.dev_id, ret);
+}
+
+static int ublk_start_daemon(struct ublk_dev *dev)
+{
+	int ret, i;
+	void *thread_ret;
+	const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info;
+
+	daemon(1, 1);
+
+	ublk_dbg(UBLK_DBG_DEV, "%s enter\n", __func__);
+
+	ret = ublk_dev_prep(dev);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < dinfo->nr_hw_queues; i++) {
+		dev->q[i].dev = dev;
+		dev->q[i].q_id = i;
+		pthread_create(&dev->q[i].thread, NULL,
+				ublk_io_handler_fn,
+				&dev->q[i]);
+	}
+
+	ublk_set_parameters(dev);
+
+	/* everything is fine now, start us */
+	ret = ublk_ctrl_start_dev(dev, getpid());
+	if (ret < 0)
+		goto fail;
+
+	ublk_ctrl_get_info(dev);
+	ublk_ctrl_dump(dev, true);
+
+	/* wait until we are terminated */
+	for (i = 0; i < dinfo->nr_hw_queues; i++)
+		pthread_join(dev->q[i].thread, &thread_ret);
+ fail:
+	ublk_dev_unprep(dev);
+	ublk_dbg(UBLK_DBG_DEV, "%s exit\n", __func__);
+
+	return ret;
+}
+
+static int cmd_dev_add(int argc, char *argv[])
+{
+	static const struct option longopts[] = {
+		{ "type",		1,	NULL, 't' },
+		{ "number",		1,	NULL, 'n' },
+		{ "queues",		1,	NULL, 'q' },
+		{ "depth",		1,	NULL, 'd' },
+		{ "debug_mask",	1,	NULL, 0},
+		{ "quiet",	0,	NULL, 0},
+		{ NULL }
+	};
+	const struct ublk_tgt_ops *ops;
+	struct ublksrv_ctrl_dev_info *info;
+	struct ublk_dev *dev;
+	int ret, option_idx, opt;
+	const char *tgt_type = NULL;
+	int dev_id = -1;
+	unsigned nr_queues = 2, depth = UBLK_QUEUE_DEPTH;
+
+	while ((opt = getopt_long(argc, argv, "-:t:n:d:q:",
+				  longopts, &option_idx)) != -1) {
+		switch (opt) {
+		case 'n':
+			dev_id = strtol(optarg, NULL, 10);
+			break;
+		case 't':
+			tgt_type = optarg;
+			break;
+		case 'q':
+			nr_queues = strtol(optarg, NULL, 10);
+			break;
+		case 'd':
+			depth = strtol(optarg, NULL, 10);
+			break;
+		case 0:
+			if (!strcmp(longopts[option_idx].name, "debug_mask"))
+				ublk_dbg_mask = strtol(optarg, NULL, 16);
+			if (!strcmp(longopts[option_idx].name, "quiet"))
+				ublk_dbg_mask = 0;
+			break;
+		}
+	}
+
+	optind = 0;
+
+	ops = ublk_find_tgt(tgt_type);
+	if (!ops) {
+		ublk_err("%s: no such tgt type, type %s\n",
+				__func__, tgt_type);
+		return -ENODEV;
+	}
+
+	if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) {
+		ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n",
+				__func__, nr_queues, depth);
+		return -EINVAL;
+	}
+
+	dev = ublk_ctrl_init();
+	if (!dev) {
+		ublk_err("%s: can't alloc dev id %d, type %s\n",
+				__func__, dev_id, tgt_type);
+		return -ENOMEM;
+	}
+
+	info = &dev->dev_info;
+	info->dev_id = dev_id;
+        info->nr_hw_queues = nr_queues;
+        info->queue_depth = depth;
+	dev->tgt.ops = ops;
+	dev->tgt.argc = argc;
+	dev->tgt.argv = argv;
+
+	ret = ublk_ctrl_add_dev(dev);
+	if (ret < 0) {
+		ublk_err("%s: can't add dev id %d, type %s ret %d\n",
+				__func__, dev_id, tgt_type, ret);
+		goto fail;
+	}
+
+	ret = ublk_start_daemon(dev);
+	if (ret < 0) {
+		ublk_err("%s: can't start daemon id %d, type %s\n",
+				__func__, dev_id, tgt_type);
+		goto fail_del;
+	}
+fail_del:
+	ublk_ctrl_del_dev(dev);
+fail:
+	ublk_ctrl_deinit(dev);
+	return ret;
+}
+
+static int ublk_stop_io_daemon(const struct ublk_dev *dev)
+{
+	int daemon_pid = dev->dev_info.ublksrv_pid;
+	int cnt = 0, ret;
+
+	if (daemon_pid == -1)
+		return 0;
+
+	/* wait until daemon is exited, or timeout after 3 seconds */
+	do {
+		ret = kill(daemon_pid, 0);
+		if (ret)
+			break;
+		usleep(500000);
+		cnt++;
+	} while (!ret && cnt < 6);
+
+	ublk_dbg(UBLK_DBG_DEV, "%s: pid %d ret %d\n", __func__, daemon_pid, ret);
+
+	return ret != 0 ? 0 : -1;
+}
+
+static int __cmd_dev_del(int number, bool log)
+{
+	struct ublk_dev *dev;
+	int ret;
+
+	dev = ublk_ctrl_init();
+	dev->dev_info.dev_id = number;
+
+	ret = ublk_ctrl_get_info(dev);
+	if (ret < 0) {
+		goto fail;
+	}
+
+	ret = ublk_ctrl_stop_dev(dev);
+	if (ret < 0) {
+		if (log)
+			ublk_err("stop dev %d failed\n", number);
+		goto fail;
+	}
+
+	ret = ublk_stop_io_daemon(dev);
+	if (ret < 0) {
+		if (log)
+			ublk_err("stop daemon %d failed\n", number);
+	}
+
+	ublk_ctrl_del_dev(dev);
+fail:
+	ublk_ctrl_deinit(dev);
+	return ret;
+}
+
+static int cmd_dev_del(int argc, char *argv[])
+{
+	static const struct option longopts[] = {
+		{ "number",		1,	NULL, 'n' },
+		{ "all",		0,	NULL, 'a' },
+		{ "debug_mask",	1,	NULL, 0},
+		{ NULL }
+	};
+	int number = -2;
+	int opt, i, option_idx;
+
+	while ((opt = getopt_long(argc, argv, "n:a",
+				  longopts, &option_idx)) != -1) {
+		switch (opt) {
+		case 'a':
+			number = -1;
+			break;
+
+		case 'n':
+			number = strtol(optarg, NULL, 10);
+			break;
+		case 0:
+			if (!strcmp(longopts[option_idx].name, "debug_mask"))
+				ublk_dbg_mask = strtol(optarg, NULL, 16);
+			break;
+		}
+	}
+
+	if (number >= 0)
+		return __cmd_dev_del(number, true);
+	else if (number != -1) {
+		ublk_err("%s: pass wrong devid or not delete via -a\n");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < 255; i++)
+		__cmd_dev_del(i, false);
+
+	return 0;
+}
+
+static int __cmd_dev_list(int number, bool log)
+{
+	struct ublk_dev *dev = ublk_ctrl_init();
+	int ret;
+
+	dev->dev_info.dev_id = number;
+
+	ret = ublk_ctrl_get_info(dev);
+	if (ret < 0) {
+		if (log)
+			ublk_err("%s: can't get dev info from %d: %d\n",
+					__func__, number, ret);
+	} else {
+		ublk_ctrl_dump(dev, false);
+	}
+
+	ublk_ctrl_deinit(dev);
+
+	return ret;
+}
+
+
+static int cmd_dev_list(int argc, char *argv[])
+{
+	static const struct option longopts[] = {
+		{ "number",		1,	NULL, 'n' },
+		{ "all",		0,	NULL, 'a' },
+		{ NULL }
+	};
+	int number = -1;
+	int opt, i;
+
+	while ((opt = getopt_long(argc, argv, "n:a",
+				  longopts, NULL)) != -1) {
+		switch (opt) {
+		case 'a':
+			break;
+
+		case 'n':
+			number = strtol(optarg, NULL, 10);
+			break;
+		}
+	}
+
+	if (number >= 0)
+		return __cmd_dev_list(number, true);
+
+	for (i = 0; i < 255; i++)
+		__cmd_dev_list(i, false);
+
+	return 0;
+}
+
+static int cmd_dev_help(int argc, char *argv[])
+{
+	printf("%s add -t {null|loop} [-q nr_queues] [-d depth] [-n dev_id] \n",
+			argv[0]);
+	printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n");
+	printf("\t -t loop -f backing_file \n");
+	printf("\t -t null\n");
+	printf("%s del [-n dev_id] -a \n", argv[0]);
+	printf("\t -a delete all devices -n delete specified device\n");
+	printf("%s list [-n dev_id] -a \n", argv[0]);
+	printf("\t -a list all devices, -n list specified device, default -a \n");
+
+	return 0;
+}
+
+static int ublk_null_tgt_init(struct ublk_dev *dev)
+{
+	const struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+	unsigned long dev_size = 250UL << 30;
+
+	dev->tgt.dev_size = dev_size;
+	dev->tgt.params = (struct ublk_params) {
+		.types = UBLK_PARAM_TYPE_BASIC,
+		.basic = {
+			.logical_bs_shift	= 9,
+			.physical_bs_shift	= 12,
+			.io_opt_shift		= 12,
+			.io_min_shift		= 9,
+			.max_sectors		= info->max_io_buf_bytes >> 9,
+			.dev_sectors		= dev_size >> 9,
+		},
+	};
+
+	return 0;
+}
+
+static int ublk_null_queue_io(struct ublk_queue *q, int tag)
+{
+	const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+
+	ublk_complete_io(q, tag, iod->nr_sectors << 9);
+
+	return 0;
+}
+
+static int loop_queue_tgt_io(struct ublk_queue *q, int tag)
+{
+	const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+	struct io_uring_sqe *sqe = io_uring_get_sqe(&q->ring);
+	unsigned ublk_op = ublksrv_get_op(iod);
+
+	if (!sqe)
+		return -ENOMEM;
+
+	switch (ublk_op) {
+	case UBLK_IO_OP_FLUSH:
+		io_uring_prep_sync_file_range(sqe, 1 /*fds[1]*/,
+				iod->nr_sectors << 9,
+				iod->start_sector << 9,
+				IORING_FSYNC_DATASYNC);
+		io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
+		break;
+	case UBLK_IO_OP_WRITE_ZEROES:
+	case UBLK_IO_OP_DISCARD:
+		return -ENOTSUP;
+	case UBLK_IO_OP_READ:
+		io_uring_prep_read(sqe, 1 /*fds[1]*/,
+				(void *)iod->addr,
+				iod->nr_sectors << 9,
+				iod->start_sector << 9);
+		io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
+		break;
+	case UBLK_IO_OP_WRITE:
+		io_uring_prep_write(sqe, 1 /*fds[1]*/,
+				(void *)iod->addr,
+				iod->nr_sectors << 9,
+				iod->start_sector << 9);
+		io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	q->io_inflight++;
+	/* bit63 marks us as tgt io */
+	sqe->user_data = build_user_data(tag, ublk_op, 0, 1);
+
+	ublk_dbg(UBLK_DBG_IO, "%s: tag %d ublk io %x %llx %u\n", __func__, tag,
+			iod->op_flags, iod->start_sector, iod->nr_sectors << 9);
+	return 1;
+}
+
+static int ublk_loop_queue_io(struct ublk_queue *q, int tag)
+{
+	int queued = loop_queue_tgt_io(q, tag);
+
+	if (queued < 0)
+		ublk_complete_io(q, tag, queued);
+
+	return 0;
+}
+
+static void ublk_loop_io_done(struct ublk_queue *q, int tag,
+		const struct io_uring_cqe *cqe)
+{
+	int cqe_tag = user_data_to_tag(cqe->user_data);
+
+	ublk_assert(tag == cqe_tag);
+	ublk_complete_io(q, tag, cqe->res);
+	q->io_inflight--;
+}
+
+static void ublk_loop_tgt_deinit(struct ublk_dev *dev)
+{
+	fsync(dev->fds[1]);
+	close(dev->fds[1]);
+}
+
+static int ublk_loop_tgt_init(struct ublk_dev *dev)
+{
+	static const struct option lo_longopts[] = {
+		{ "file",		1,	NULL, 'f' },
+		{ NULL }
+	};
+	unsigned long long bytes;
+	char **argv = dev->tgt.argv;
+	int argc = dev->tgt.argc;
+	char *file = NULL;
+	struct stat st;
+	int fd, opt;
+	struct ublk_params p = {
+		.types = UBLK_PARAM_TYPE_BASIC,
+		.basic = {
+			.logical_bs_shift	= 9,
+			.physical_bs_shift	= 12,
+			.io_opt_shift	= 12,
+			.io_min_shift	= 9,
+			.max_sectors = dev->dev_info.max_io_buf_bytes >> 9,
+		},
+	};
+
+	while ((opt = getopt_long(argc, argv, "-:f:",
+				  lo_longopts, NULL)) != -1) {
+		switch (opt) {
+		case 'f':
+			file = strdup(optarg);
+			break;
+		}
+	}
+
+	ublk_dbg(UBLK_DBG_DEV, "%s: file %s\n", __func__, file);
+
+	if (!file)
+		return -EINVAL;
+
+	fd = open(file, O_RDWR);
+	if (fd < 0) {
+		ublk_err( "%s: backing file %s can't be opened\n",
+				__func__, file);
+		return -EBADF;
+	}
+
+	if (fstat(fd, &st) < 0) {
+		close(fd);
+		return -EBADF;
+	}
+
+	if (S_ISBLK(st.st_mode)) {
+		unsigned int bs, pbs;
+
+		if (ioctl(fd, BLKGETSIZE64, &bytes) != 0)
+			return -EBADF;
+		if (ioctl(fd, BLKSSZGET, &bs) != 0)
+			return -1;
+		if (ioctl(fd, BLKPBSZGET, &pbs) != 0)
+			return -1;
+		p.basic.logical_bs_shift = ilog2(bs);
+		p.basic.physical_bs_shift = ilog2(pbs);
+	} else if (S_ISREG(st.st_mode)) {
+		bytes = st.st_size;
+	} else {
+		bytes = 0;
+	}
+
+	if (fcntl(fd, F_SETFL, O_DIRECT)) {
+		p.basic.logical_bs_shift = 9;
+		p.basic.physical_bs_shift = 12;
+		ublk_log("%s: ublk-loop fallback to buffered IO\n", __func__);
+	}
+
+	dev->tgt.dev_size = bytes;
+	p.basic.dev_sectors = bytes >> 9;
+	dev->fds[1] = fd;
+	dev->nr_fds += 1;
+	dev->tgt.params = p;
+
+	return 0;
+}
+
+const struct ublk_tgt_ops tgt_ops_list[] = {
+	{
+		.name = "null",
+		.init_tgt = ublk_null_tgt_init,
+		.queue_io = ublk_null_queue_io,
+	},
+
+	{
+		.name = "loop",
+		.init_tgt = ublk_loop_tgt_init,
+		.deinit_tgt = ublk_loop_tgt_deinit,
+		.queue_io = ublk_loop_queue_io,
+		.tgt_io_done = ublk_loop_io_done,
+	},
+};
+
+static const struct ublk_tgt_ops *ublk_find_tgt(const char *name)
+{
+	const struct ublk_tgt_ops *ops;
+	int i;
+
+	if (name == NULL)
+		return NULL;
+
+	for (i = 0; sizeof(tgt_ops_list) / sizeof(*ops); i++)
+		if (strcmp(tgt_ops_list[i].name, name) == 0)
+			return &tgt_ops_list[i];
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	const char *cmd = argv[1];
+	int ret = -EINVAL;
+
+	if (argc == 1)
+		goto out;
+
+	if (!strcmp(cmd, "add"))
+		ret = cmd_dev_add(argc, argv);
+	else if (!strcmp(cmd, "del"))
+		ret = cmd_dev_del(argc, argv);
+	else if (!strcmp(cmd, "list"))
+		ret = cmd_dev_list(argc, argv);
+	else if (!strcmp(cmd, "help"))
+		ret = cmd_dev_help(argc, argv);
+out:
+	if (ret)
+		cmd_dev_help(argc, argv);
+
+	return ret;
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH blktests v4 2/2] block/033: add test to cover gendisk leak
  2023-02-24 12:45 [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Ming Lei
  2023-02-24 12:45 ` [PATCH blktests v4 1/2] src: add mini ublk source code Ming Lei
@ 2023-02-24 12:45 ` Ming Lei
  2023-03-01  8:27 ` [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Shinichiro Kawasaki
  2023-03-02  5:00 ` Chaitanya Kulkarni
  3 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2023-02-24 12:45 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki
  Cc: linux-block, Chaitanya Kulkarni, Ziyang Zhang, Ming Lei

So far only sync ublk removal is supported, and the device's
last reference is dropped in gendisk's ->free_disk(), so it
can be used to test gendisk leak issue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 common/ublk         | 35 +++++++++++++++++++++++++++++++++++
 tests/block/033     | 41 +++++++++++++++++++++++++++++++++++++++++
 tests/block/033.out |  2 ++
 3 files changed, 78 insertions(+)
 create mode 100644 common/ublk
 create mode 100755 tests/block/033
 create mode 100644 tests/block/033.out

diff --git a/common/ublk b/common/ublk
new file mode 100644
index 0000000..932c534
--- /dev/null
+++ b/common/ublk
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2023 Ming Lei
+#
+# ublk_drv helper functions.
+
+. common/shellcheck
+
+_have_ublk() {
+	_have_driver ublk_drv
+	_have_src_program miniublk
+}
+
+_remove_ublk_devices() {
+	src/miniublk del -a
+}
+
+_init_ublk() {
+	_remove_ublk_devices
+
+	modprobe -rq ublk_drv
+	if ! modprobe ublk_drv; then
+		SKIP_REASONS+=("requires ublk_drv")
+		return 1
+	fi
+
+	udevadm settle
+	return 0
+}
+
+_exit_ublk() {
+	_remove_ublk_devices
+	udevadm settle
+	modprobe -r -q ublk_drv
+}
diff --git a/tests/block/033 b/tests/block/033
new file mode 100755
index 0000000..762f606
--- /dev/null
+++ b/tests/block/033
@@ -0,0 +1,41 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2023 Ming Lei
+#
+# Test if gendisk is leaked, and regression in the following commit
+# c43332fe028c ("blk-cgroup: delay calling blkcg_exit_disk until disk_release")
+# can be covered
+
+. tests/block/rc
+. common/ublk
+
+DESCRIPTION="add & delete ublk device and test if gendisk is leaked"
+QUICK=1
+
+requires() {
+	_have_ublk
+}
+
+test() {
+	local ublk_prog="src/miniublk"
+
+	echo "Running ${TEST_NAME}"
+
+	if ! _init_ublk; then
+		return 1
+	fi
+
+	${ublk_prog} add -t null -n 0 > "$FULL"
+	udevadm settle
+	if ! ${ublk_prog} list -n 0 >> "$FULL"; then
+		echo "fail to list dev"
+	fi
+	if ! dd if=/dev/ublkb0 iflag=direct of=/dev/null bs=1M count=512 >> "$FULL" 2>&1; then
+		echo "fail io"
+	fi
+	${ublk_prog} del -n 0 >> "$FULL"
+
+	_exit_ublk
+
+	echo "Test complete"
+}
diff --git a/tests/block/033.out b/tests/block/033.out
new file mode 100644
index 0000000..067846a
--- /dev/null
+++ b/tests/block/033.out
@@ -0,0 +1,2 @@
+Running block/033
+Test complete
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033
  2023-02-24 12:45 [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Ming Lei
  2023-02-24 12:45 ` [PATCH blktests v4 1/2] src: add mini ublk source code Ming Lei
  2023-02-24 12:45 ` [PATCH blktests v4 2/2] block/033: add test to cover gendisk leak Ming Lei
@ 2023-03-01  8:27 ` Shinichiro Kawasaki
  2023-03-02  5:00 ` Chaitanya Kulkarni
  3 siblings, 0 replies; 7+ messages in thread
From: Shinichiro Kawasaki @ 2023-03-01  8:27 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Chaitanya Kulkarni, Ziyang Zhang

On Feb 24, 2023 / 20:45, Ming Lei wrote:
> Hello,
> 
> The 1st patch adds one mini ublk program, which only supports null &
> loop targets.
> 
> The 2nd patch add blktests/033 for covering gendisk leak issue.
> 
> v4:
> 	- fix one error handling bug in src/miniublk.c
> 	- patch style improvement on src/miniublk.c
> 	- both are suggested by Chaitanya Kulkarni

The changes for v4 looks good to me. I also did a trial run together with the
follow-up patch [*] and observed good result.

[*] https://lore.kernel.org/linux-block/20230301080301.2410060-1-shinichiro.kawasaki@wdc.com/

Chaitanya, will you review v4 again?

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033
  2023-02-24 12:45 [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Ming Lei
                   ` (2 preceding siblings ...)
  2023-03-01  8:27 ` [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Shinichiro Kawasaki
@ 2023-03-02  5:00 ` Chaitanya Kulkarni
  2023-03-02  7:44   ` Shinichiro Kawasaki
  3 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2023-03-02  5:00 UTC (permalink / raw)
  To: Ming Lei, Shin'ichiro Kawasaki; +Cc: linux-block, Ziyang Zhang

On 2/24/2023 4:45 AM, Ming Lei wrote:
> Hello,
> 
> The 1st patch adds one mini ublk program, which only supports null &
> loop targets.
> 
> The 2nd patch add blktests/033 for covering gendisk leak issue.
>

Looks good to me, feel free to merge it.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033
  2023-03-02  5:00 ` Chaitanya Kulkarni
@ 2023-03-02  7:44   ` Shinichiro Kawasaki
  2023-03-02  7:48     ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Shinichiro Kawasaki @ 2023-03-02  7:44 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: Ming Lei, linux-block, Ziyang Zhang

On Mar 02, 2023 / 05:00, Chaitanya Kulkarni wrote:
> On 2/24/2023 4:45 AM, Ming Lei wrote:
> > Hello,
> > 
> > The 1st patch adds one mini ublk program, which only supports null &
> > loop targets.
> > 
> > The 2nd patch add blktests/033 for covering gendisk leak issue.
> >
> 
> Looks good to me, feel free to merge it.
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

Thanks Chaitanya.

Ming, I've merged the patches. Thanks!

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033
  2023-03-02  7:44   ` Shinichiro Kawasaki
@ 2023-03-02  7:48     ` Ming Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2023-03-02  7:48 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Chaitanya Kulkarni, linux-block, Ziyang Zhang

On Thu, Mar 02, 2023 at 07:44:46AM +0000, Shinichiro Kawasaki wrote:
> On Mar 02, 2023 / 05:00, Chaitanya Kulkarni wrote:
> > On 2/24/2023 4:45 AM, Ming Lei wrote:
> > > Hello,
> > > 
> > > The 1st patch adds one mini ublk program, which only supports null &
> > > loop targets.
> > > 
> > > The 2nd patch add blktests/033 for covering gendisk leak issue.
> > >
> > 
> > Looks good to me, feel free to merge it.
> > 
> > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> 
> Thanks Chaitanya.
> 
> Ming, I've merged the patches. Thanks!

Great! Thanks all of you!

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-02  7:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-24 12:45 [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Ming Lei
2023-02-24 12:45 ` [PATCH blktests v4 1/2] src: add mini ublk source code Ming Lei
2023-02-24 12:45 ` [PATCH blktests v4 2/2] block/033: add test to cover gendisk leak Ming Lei
2023-03-01  8:27 ` [PATCH blktests v4 0/2] blktests: add mini ublk source and blktests/033 Shinichiro Kawasaki
2023-03-02  5:00 ` Chaitanya Kulkarni
2023-03-02  7:44   ` Shinichiro Kawasaki
2023-03-02  7:48     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.