[PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-06-26  9:41 ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Rusty Russell, linux-api, virtualization, Michael S. Tsirkin,
	Stefan Hajnoczi, Paolo Bonzini

Hi,

These patches try to support multi virtual queues(multi-vq) in one
virtio-blk device, and maps each virtual queue(vq) to blk-mq's
hardware queue.

With this approach, both scalability and performance on virtio-blk
device can get improved.

For verifying the improvement, I implements virtio-blk multi-vq over
qemu's dataplane feature, and both handling host notification
from each vq and processing host I/O are still kept in the per-device
iothread context, the change is based on qemu v2.0.0 release, and
can be accessed from below tree:

	git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1

For enabling the multi-vq feature, 'num_queues=N' need to be added into
'-device virtio-blk-pci ...' of qemu command line, and suggest to pass
'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
depends on x-data-plane.

Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
verify the improvement.

I just create a small quadcore VM and run fio inside the VM, and
num_queues of the virtio-blk device is set as 2, but looks the
improvement is still obvious. The host is 2 sockets, 8cores(16threads)
server.

1), about scalability
- jobs = 2, thoughput: +33%
- jobs = 4, thoughput: +100%

2), about top thoughput: +39%

So in my test, even for a quad-core VM, if the virtqueue number
is increased from 1 to 2, both scalability and performance can
get improved a lot.

In above qemu implementation of virtio-blk-mq device, only one
IOthread handles requests from all vqs, and the above throughput
data has been very close to same fio test in host side with single
job. So more improvement should be observed once more IOthreads are
used for handling requests from multi vqs.

TODO:
	- adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask

V3:
	- fix use-after-free on vq->name reported by Michael

V2: (suggestions from Michael and Dave Chinner)
	- allocate virtqueues' pointers dynamically
	- make sure the per-queue spinlock isn't kept in same cache line
	- make each queue's name different

V1:
	- remove RFC since no one objects
	- add '__u8 unused' for pending as suggested by Rusty
	- use virtio_cread_feature() directly, suggested by Rusty

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-06-26  9:41 ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Rusty Russell, linux-api-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Michael S. Tsirkin, Stefan Hajnoczi, Paolo Bonzini

Hi,

These patches try to support multi virtual queues(multi-vq) in one
virtio-blk device, and maps each virtual queue(vq) to blk-mq's
hardware queue.

With this approach, both scalability and performance on virtio-blk
device can get improved.

For verifying the improvement, I implements virtio-blk multi-vq over
qemu's dataplane feature, and both handling host notification
from each vq and processing host I/O are still kept in the per-device
iothread context, the change is based on qemu v2.0.0 release, and
can be accessed from below tree:

	git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1

For enabling the multi-vq feature, 'num_queues=N' need to be added into
'-device virtio-blk-pci ...' of qemu command line, and suggest to pass
'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
depends on x-data-plane.

Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
verify the improvement.

I just create a small quadcore VM and run fio inside the VM, and
num_queues of the virtio-blk device is set as 2, but looks the
improvement is still obvious. The host is 2 sockets, 8cores(16threads)
server.

1), about scalability
- jobs = 2, thoughput: +33%
- jobs = 4, thoughput: +100%

2), about top thoughput: +39%

So in my test, even for a quad-core VM, if the virtqueue number
is increased from 1 to 2, both scalability and performance can
get improved a lot.

In above qemu implementation of virtio-blk-mq device, only one
IOthread handles requests from all vqs, and the above throughput
data has been very close to same fio test in host side with single
job. So more improvement should be observed once more IOthreads are
used for handling requests from multi vqs.

TODO:
	- adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask

V3:
	- fix use-after-free on vq->name reported by Michael

V2: (suggestions from Michael and Dave Chinner)
	- allocate virtqueues' pointers dynamically
	- make sure the per-queue spinlock isn't kept in same cache line
	- make each queue's name different

V1:
	- remove RFC since no one objects
	- add '__u8 unused' for pending as suggested by Rusty
	- use virtio_cread_feature() directly, suggested by Rusty

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
@ 2014-06-26  9:41   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Rusty Russell, linux-api, virtualization, Michael S. Tsirkin,
	Stefan Hajnoczi, Paolo Bonzini, Ming Lei

Current virtio-blk spec only supports one virtual queue for transfering
data between VM and host, and inside VM all kinds of operations on
the virtual queue needs to hold one lock, so cause below problems:

	- bad scalability
	- bad throughput

This patch requests to introduce feature of VIRTIO_BLK_F_MQ
so that more than one virtual queues can be used to virtio-blk
device, then above problems can be solved or eased.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/uapi/linux/virtio_blk.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 6d8e61c..9ad67b2 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_WCE	9	/* Writeback mode enabled after reset */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
 #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
+#define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
 
 #ifndef __KERNEL__
 /* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
@@ -77,6 +78,10 @@ struct virtio_blk_config {
 
 	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
 	__u8 wce;
+	__u8 unused;
+
+	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
+	__u16 num_queues;
 } __attribute__((packed));
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
@ 2014-06-26  9:41   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Rusty Russell, linux-api-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Michael S. Tsirkin, Stefan Hajnoczi, Paolo Bonzini, Ming Lei

Current virtio-blk spec only supports one virtual queue for transfering
data between VM and host, and inside VM all kinds of operations on
the virtual queue needs to hold one lock, so cause below problems:

	- bad scalability
	- bad throughput

This patch requests to introduce feature of VIRTIO_BLK_F_MQ
so that more than one virtual queues can be used to virtio-blk
device, then above problems can be solved or eased.

Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/uapi/linux/virtio_blk.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 6d8e61c..9ad67b2 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_WCE	9	/* Writeback mode enabled after reset */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
 #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
+#define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
 
 #ifndef __KERNEL__
 /* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
@@ -77,6 +78,10 @@ struct virtio_blk_config {
 
 	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
 	__u8 wce;
+	__u8 unused;
+
+	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
+	__u16 num_queues;
 } __attribute__((packed));
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
  2014-06-26  9:41 ` Ming Lei
  (?)
  (?)
@ 2014-06-26  9:41 ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Michael S. Tsirkin, linux-api, Ming Lei, virtualization,
	Stefan Hajnoczi, Paolo Bonzini

Current virtio-blk spec only supports one virtual queue for transfering
data between VM and host, and inside VM all kinds of operations on
the virtual queue needs to hold one lock, so cause below problems:

	- bad scalability
	- bad throughput

This patch requests to introduce feature of VIRTIO_BLK_F_MQ
so that more than one virtual queues can be used to virtio-blk
device, then above problems can be solved or eased.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/uapi/linux/virtio_blk.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 6d8e61c..9ad67b2 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_WCE	9	/* Writeback mode enabled after reset */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
 #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
+#define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
 
 #ifndef __KERNEL__
 /* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
@@ -77,6 +78,10 @@ struct virtio_blk_config {
 
 	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
 	__u8 wce;
+	__u8 unused;
+
+	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
+	__u16 num_queues;
 } __attribute__((packed));
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] block: virtio-blk: support multi virt queues per virtio-blk device
  2014-06-26  9:41 ` Ming Lei
@ 2014-06-26  9:41   ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Rusty Russell, linux-api, virtualization, Michael S. Tsirkin,
	Stefan Hajnoczi, Paolo Bonzini, Ming Lei

Firstly this patch supports more than one virtual queues for virtio-blk
device.

Secondly this patch maps the virtual queue to blk-mq's hardware queue.

With this approach, both scalability and performance can be improved.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/virtio_blk.c |  104 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 84 insertions(+), 20 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f63d358..0a58140 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -15,17 +15,22 @@
 #include <linux/numa.h>
 
 #define PART_BITS 4
+#define VQ_NAME_LEN 16
 
 static int major;
 static DEFINE_IDA(vd_index_ida);
 
 static struct workqueue_struct *virtblk_wq;
 
+struct virtio_blk_vq {
+	struct virtqueue *vq;
+	spinlock_t lock;
+	char name[VQ_NAME_LEN];
+} ____cacheline_aligned_in_smp;
+
 struct virtio_blk
 {
 	struct virtio_device *vdev;
-	struct virtqueue *vq;
-	spinlock_t vq_lock;
 
 	/* The disk structure for the kernel. */
 	struct gendisk *disk;
@@ -47,6 +52,10 @@ struct virtio_blk
 
 	/* Ida index - used to track minor number allocations. */
 	int index;
+
+	/* num of vqs */
+	int num_vqs;
+	struct virtio_blk_vq *vqs;
 };
 
 struct virtblk_req
@@ -133,14 +142,15 @@ static void virtblk_done(struct virtqueue *vq)
 {
 	struct virtio_blk *vblk = vq->vdev->priv;
 	bool req_done = false;
+	int qid = vq->index;
 	struct virtblk_req *vbr;
 	unsigned long flags;
 	unsigned int len;
 
-	spin_lock_irqsave(&vblk->vq_lock, flags);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
 	do {
 		virtqueue_disable_cb(vq);
-		while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
+		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
 			blk_mq_complete_request(vbr->req);
 			req_done = true;
 		}
@@ -151,7 +161,7 @@ static void virtblk_done(struct virtqueue *vq)
 	/* In case queue is stopped waiting for more buffers. */
 	if (req_done)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
 static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
@@ -160,6 +170,7 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 	unsigned long flags;
 	unsigned int num;
+	int qid = hctx->queue_num;
 	const bool last = (req->cmd_flags & REQ_END) != 0;
 	int err;
 	bool notify = false;
@@ -202,12 +213,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
 	}
 
-	spin_lock_irqsave(&vblk->vq_lock, flags);
-	err = __virtblk_add_req(vblk->vq, vbr, vbr->sg, num);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
+	err = __virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
 	if (err) {
-		virtqueue_kick(vblk->vq);
+		virtqueue_kick(vblk->vqs[qid].vq);
 		blk_mq_stop_hw_queue(hctx);
-		spin_unlock_irqrestore(&vblk->vq_lock, flags);
+		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 		/* Out of mem doesn't actually happen, since we fall back
 		 * to direct descriptors */
 		if (err == -ENOMEM || err == -ENOSPC)
@@ -215,12 +226,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 		return BLK_MQ_RQ_QUEUE_ERROR;
 	}
 
-	if (last && virtqueue_kick_prepare(vblk->vq))
+	if (last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
 		notify = true;
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 
 	if (notify)
-		virtqueue_notify(vblk->vq);
+		virtqueue_notify(vblk->vqs[qid].vq);
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 
@@ -377,12 +388,64 @@ static void virtblk_config_changed(struct virtio_device *vdev)
 static int init_vq(struct virtio_blk *vblk)
 {
 	int err = 0;
+	int i;
+	vq_callback_t **callbacks;
+	const char **names;
+	struct virtqueue **vqs;
+	unsigned short num_vqs;
+	struct virtio_device *vdev = vblk->vdev;
+
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
+				   struct virtio_blk_config, num_queues,
+				   &num_vqs);
+	if (err)
+		num_vqs = 1;
+
+	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
+	if (!vblk->vqs) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
+	if (!names)
+		goto err_names;
+
+	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
+	if (!callbacks)
+		goto err_callbacks;
+
+	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
+	if (!vqs)
+		goto err_vqs;
 
-	/* We expect one virtqueue, for output. */
-	vblk->vq = virtio_find_single_vq(vblk->vdev, virtblk_done, "requests");
-	if (IS_ERR(vblk->vq))
-		err = PTR_ERR(vblk->vq);
+	for (i = 0; i < num_vqs; i++) {
+		callbacks[i] = virtblk_done;
+		snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
+		names[i] = vblk->vqs[i].name;
+	}
+
+	/* Discover virtqueues and write information to configuration.  */
+	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
+	if (err)
+		goto err_find_vqs;
 
+	for (i = 0; i < num_vqs; i++) {
+		spin_lock_init(&vblk->vqs[i].lock);
+		vblk->vqs[i].vq = vqs[i];
+	}
+	vblk->num_vqs = num_vqs;
+
+ err_find_vqs:
+	kfree(vqs);
+ err_vqs:
+	kfree(callbacks);
+ err_callbacks:
+	kfree(names);
+ err_names:
+	if (err)
+		kfree(vblk->vqs);
+ out:
 	return err;
 }
 
@@ -551,7 +614,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	err = init_vq(vblk);
 	if (err)
 		goto out_free_vblk;
-	spin_lock_init(&vblk->vq_lock);
 
 	/* FIXME: How many partitions?  How long is a piece of string? */
 	vblk->disk = alloc_disk(1 << PART_BITS);
@@ -562,7 +624,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	/* Default queue sizing is to fill the ring. */
 	if (!virtblk_queue_depth) {
-		virtblk_queue_depth = vblk->vq->num_free;
+		virtblk_queue_depth = vblk->vqs[0].vq->num_free;
 		/* ... but without indirect descs, we use 2 descs per req */
 		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
 			virtblk_queue_depth /= 2;
@@ -570,7 +632,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
 	vblk->tag_set.ops = &virtio_mq_ops;
-	vblk->tag_set.nr_hw_queues = 1;
 	vblk->tag_set.queue_depth = virtblk_queue_depth;
 	vblk->tag_set.numa_node = NUMA_NO_NODE;
 	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
@@ -578,6 +639,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		sizeof(struct virtblk_req) +
 		sizeof(struct scatterlist) * sg_elems;
 	vblk->tag_set.driver_data = vblk;
+	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
 
 	err = blk_mq_alloc_tag_set(&vblk->tag_set);
 	if (err)
@@ -727,6 +789,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
 	put_disk(vblk->disk);
 	vdev->config->del_vqs(vdev);
+	kfree(vblk->vqs);
 	kfree(vblk);
 
 	/* Only free device id if we don't have any users */
@@ -777,7 +840,8 @@ static const struct virtio_device_id id_table[] = {
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE
+	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
+	VIRTIO_BLK_F_MQ,
 };
 
 static struct virtio_driver virtio_blk = {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] block: virtio-blk: support multi virt queues per virtio-blk device
@ 2014-06-26  9:41   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Michael S. Tsirkin, linux-api, Ming Lei, virtualization,
	Stefan Hajnoczi, Paolo Bonzini

Firstly this patch supports more than one virtual queues for virtio-blk
device.

Secondly this patch maps the virtual queue to blk-mq's hardware queue.

With this approach, both scalability and performance can be improved.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/block/virtio_blk.c |  104 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 84 insertions(+), 20 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f63d358..0a58140 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -15,17 +15,22 @@
 #include <linux/numa.h>
 
 #define PART_BITS 4
+#define VQ_NAME_LEN 16
 
 static int major;
 static DEFINE_IDA(vd_index_ida);
 
 static struct workqueue_struct *virtblk_wq;
 
+struct virtio_blk_vq {
+	struct virtqueue *vq;
+	spinlock_t lock;
+	char name[VQ_NAME_LEN];
+} ____cacheline_aligned_in_smp;
+
 struct virtio_blk
 {
 	struct virtio_device *vdev;
-	struct virtqueue *vq;
-	spinlock_t vq_lock;
 
 	/* The disk structure for the kernel. */
 	struct gendisk *disk;
@@ -47,6 +52,10 @@ struct virtio_blk
 
 	/* Ida index - used to track minor number allocations. */
 	int index;
+
+	/* num of vqs */
+	int num_vqs;
+	struct virtio_blk_vq *vqs;
 };
 
 struct virtblk_req
@@ -133,14 +142,15 @@ static void virtblk_done(struct virtqueue *vq)
 {
 	struct virtio_blk *vblk = vq->vdev->priv;
 	bool req_done = false;
+	int qid = vq->index;
 	struct virtblk_req *vbr;
 	unsigned long flags;
 	unsigned int len;
 
-	spin_lock_irqsave(&vblk->vq_lock, flags);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
 	do {
 		virtqueue_disable_cb(vq);
-		while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
+		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
 			blk_mq_complete_request(vbr->req);
 			req_done = true;
 		}
@@ -151,7 +161,7 @@ static void virtblk_done(struct virtqueue *vq)
 	/* In case queue is stopped waiting for more buffers. */
 	if (req_done)
 		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
 static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
@@ -160,6 +170,7 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 	unsigned long flags;
 	unsigned int num;
+	int qid = hctx->queue_num;
 	const bool last = (req->cmd_flags & REQ_END) != 0;
 	int err;
 	bool notify = false;
@@ -202,12 +213,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
 	}
 
-	spin_lock_irqsave(&vblk->vq_lock, flags);
-	err = __virtblk_add_req(vblk->vq, vbr, vbr->sg, num);
+	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
+	err = __virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
 	if (err) {
-		virtqueue_kick(vblk->vq);
+		virtqueue_kick(vblk->vqs[qid].vq);
 		blk_mq_stop_hw_queue(hctx);
-		spin_unlock_irqrestore(&vblk->vq_lock, flags);
+		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 		/* Out of mem doesn't actually happen, since we fall back
 		 * to direct descriptors */
 		if (err == -ENOMEM || err == -ENOSPC)
@@ -215,12 +226,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 		return BLK_MQ_RQ_QUEUE_ERROR;
 	}
 
-	if (last && virtqueue_kick_prepare(vblk->vq))
+	if (last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
 		notify = true;
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
+	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 
 	if (notify)
-		virtqueue_notify(vblk->vq);
+		virtqueue_notify(vblk->vqs[qid].vq);
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 
@@ -377,12 +388,64 @@ static void virtblk_config_changed(struct virtio_device *vdev)
 static int init_vq(struct virtio_blk *vblk)
 {
 	int err = 0;
+	int i;
+	vq_callback_t **callbacks;
+	const char **names;
+	struct virtqueue **vqs;
+	unsigned short num_vqs;
+	struct virtio_device *vdev = vblk->vdev;
+
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
+				   struct virtio_blk_config, num_queues,
+				   &num_vqs);
+	if (err)
+		num_vqs = 1;
+
+	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
+	if (!vblk->vqs) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
+	if (!names)
+		goto err_names;
+
+	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
+	if (!callbacks)
+		goto err_callbacks;
+
+	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
+	if (!vqs)
+		goto err_vqs;
 
-	/* We expect one virtqueue, for output. */
-	vblk->vq = virtio_find_single_vq(vblk->vdev, virtblk_done, "requests");
-	if (IS_ERR(vblk->vq))
-		err = PTR_ERR(vblk->vq);
+	for (i = 0; i < num_vqs; i++) {
+		callbacks[i] = virtblk_done;
+		snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
+		names[i] = vblk->vqs[i].name;
+	}
+
+	/* Discover virtqueues and write information to configuration.  */
+	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
+	if (err)
+		goto err_find_vqs;
 
+	for (i = 0; i < num_vqs; i++) {
+		spin_lock_init(&vblk->vqs[i].lock);
+		vblk->vqs[i].vq = vqs[i];
+	}
+	vblk->num_vqs = num_vqs;
+
+ err_find_vqs:
+	kfree(vqs);
+ err_vqs:
+	kfree(callbacks);
+ err_callbacks:
+	kfree(names);
+ err_names:
+	if (err)
+		kfree(vblk->vqs);
+ out:
 	return err;
 }
 
@@ -551,7 +614,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	err = init_vq(vblk);
 	if (err)
 		goto out_free_vblk;
-	spin_lock_init(&vblk->vq_lock);
 
 	/* FIXME: How many partitions?  How long is a piece of string? */
 	vblk->disk = alloc_disk(1 << PART_BITS);
@@ -562,7 +624,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	/* Default queue sizing is to fill the ring. */
 	if (!virtblk_queue_depth) {
-		virtblk_queue_depth = vblk->vq->num_free;
+		virtblk_queue_depth = vblk->vqs[0].vq->num_free;
 		/* ... but without indirect descs, we use 2 descs per req */
 		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
 			virtblk_queue_depth /= 2;
@@ -570,7 +632,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
 	vblk->tag_set.ops = &virtio_mq_ops;
-	vblk->tag_set.nr_hw_queues = 1;
 	vblk->tag_set.queue_depth = virtblk_queue_depth;
 	vblk->tag_set.numa_node = NUMA_NO_NODE;
 	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
@@ -578,6 +639,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		sizeof(struct virtblk_req) +
 		sizeof(struct scatterlist) * sg_elems;
 	vblk->tag_set.driver_data = vblk;
+	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
 
 	err = blk_mq_alloc_tag_set(&vblk->tag_set);
 	if (err)
@@ -727,6 +789,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
 	put_disk(vblk->disk);
 	vdev->config->del_vqs(vdev);
+	kfree(vblk->vqs);
 	kfree(vblk);
 
 	/* Only free device id if we don't have any users */
@@ -777,7 +840,8 @@ static const struct virtio_device_id id_table[] = {
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE
+	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
+	VIRTIO_BLK_F_MQ,
 };
 
 static struct virtio_driver virtio_blk = {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
  2014-06-26  9:41 ` Ming Lei
@ 2014-06-26 12:04   ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26 12:04 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List
  Cc: Rusty Russell, linux-api, Linux Virtualization,
	Michael S. Tsirkin, Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
> Hi,
>
> These patches try to support multi virtual queues(multi-vq) in one
> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
> hardware queue.
>
> With this approach, both scalability and performance on virtio-blk
> device can get improved.
>
> For verifying the improvement, I implements virtio-blk multi-vq over
> qemu's dataplane feature, and both handling host notification
> from each vq and processing host I/O are still kept in the per-device
> iothread context, the change is based on qemu v2.0.0 release, and
> can be accessed from below tree:
>
>         git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>
> For enabling the multi-vq feature, 'num_queues=N' need to be added into
> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
> depends on x-data-plane.
>
> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
> verify the improvement.
>
> I just create a small quadcore VM and run fio inside the VM, and
> num_queues of the virtio-blk device is set as 2, but looks the
> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
> server.
>
> 1), about scalability
> - jobs = 2, thoughput: +33%
> - jobs = 4, thoughput: +100%
>
> 2), about top thoughput: +39%
>
> So in my test, even for a quad-core VM, if the virtqueue number
> is increased from 1 to 2, both scalability and performance can
> get improved a lot.
>
> In above qemu implementation of virtio-blk-mq device, only one
> IOthread handles requests from all vqs, and the above throughput
> data has been very close to same fio test in host side with single
> job. So more improvement should be observed once more IOthreads are
> used for handling requests from multi vqs.
>
> TODO:
>         - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>
> V3:
>         - fix use-after-free on vq->name reported by Michael
>
> V2: (suggestions from Michael and Dave Chinner)
>         - allocate virtqueues' pointers dynamically
>         - make sure the per-queue spinlock isn't kept in same cache line
>         - make each queue's name different
>
> V1:
>         - remove RFC since no one objects
>         - add '__u8 unused' for pending as suggested by Rusty
>         - use virtio_cread_feature() directly, suggested by Rusty

Sorry, please add Jens' reviewed-by.

    Reviewed-by: Jens Axboe <axboe@kernel.dk>

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-06-26 12:04   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26 12:04 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List
  Cc: Michael S. Tsirkin, linux-api, Linux Virtualization,
	Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
> Hi,
>
> These patches try to support multi virtual queues(multi-vq) in one
> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
> hardware queue.
>
> With this approach, both scalability and performance on virtio-blk
> device can get improved.
>
> For verifying the improvement, I implements virtio-blk multi-vq over
> qemu's dataplane feature, and both handling host notification
> from each vq and processing host I/O are still kept in the per-device
> iothread context, the change is based on qemu v2.0.0 release, and
> can be accessed from below tree:
>
>         git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>
> For enabling the multi-vq feature, 'num_queues=N' need to be added into
> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
> depends on x-data-plane.
>
> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
> verify the improvement.
>
> I just create a small quadcore VM and run fio inside the VM, and
> num_queues of the virtio-blk device is set as 2, but looks the
> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
> server.
>
> 1), about scalability
> - jobs = 2, thoughput: +33%
> - jobs = 4, thoughput: +100%
>
> 2), about top thoughput: +39%
>
> So in my test, even for a quad-core VM, if the virtqueue number
> is increased from 1 to 2, both scalability and performance can
> get improved a lot.
>
> In above qemu implementation of virtio-blk-mq device, only one
> IOthread handles requests from all vqs, and the above throughput
> data has been very close to same fio test in host side with single
> job. So more improvement should be observed once more IOthreads are
> used for handling requests from multi vqs.
>
> TODO:
>         - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>
> V3:
>         - fix use-after-free on vq->name reported by Michael
>
> V2: (suggestions from Michael and Dave Chinner)
>         - allocate virtqueues' pointers dynamically
>         - make sure the per-queue spinlock isn't kept in same cache line
>         - make each queue's name different
>
> V1:
>         - remove RFC since no one objects
>         - add '__u8 unused' for pending as suggested by Rusty
>         - use virtio_cread_feature() directly, suggested by Rusty

Sorry, please add Jens' reviewed-by.

    Reviewed-by: Jens Axboe <axboe@kernel.dk>

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
  2014-06-26  9:41   ` Ming Lei
@ 2014-06-29 16:32     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 16:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Rusty Russell, linux-api,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 05:41:47PM +0800, Ming Lei wrote:
> Current virtio-blk spec only supports one virtual queue for transfering
> data between VM and host, and inside VM all kinds of operations on
> the virtual queue needs to hold one lock, so cause below problems:
> 
> 	- bad scalability
> 	- bad throughput
> 
> This patch requests to introduce feature of VIRTIO_BLK_F_MQ
> so that more than one virtual queues can be used to virtio-blk
> device, then above problems can be solved or eased.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  include/uapi/linux/virtio_blk.h |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
> index 6d8e61c..9ad67b2 100644
> --- a/include/uapi/linux/virtio_blk.h
> +++ b/include/uapi/linux/virtio_blk.h
> @@ -40,6 +40,7 @@
>  #define VIRTIO_BLK_F_WCE	9	/* Writeback mode enabled after reset */
>  #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
>  #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
> +#define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
>  
>  #ifndef __KERNEL__
>  /* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
> @@ -77,6 +78,10 @@ struct virtio_blk_config {
>  
>  	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
>  	__u8 wce;
> +	__u8 unused;
> +
> +	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
> +	__u16 num_queues;
>  } __attribute__((packed));
>  
>  /*
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
@ 2014-06-29 16:32     ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 16:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-api, linux-kernel, virtualization,
	Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 05:41:47PM +0800, Ming Lei wrote:
> Current virtio-blk spec only supports one virtual queue for transfering
> data between VM and host, and inside VM all kinds of operations on
> the virtual queue needs to hold one lock, so cause below problems:
> 
> 	- bad scalability
> 	- bad throughput
> 
> This patch requests to introduce feature of VIRTIO_BLK_F_MQ
> so that more than one virtual queues can be used to virtio-blk
> device, then above problems can be solved or eased.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  include/uapi/linux/virtio_blk.h |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
> index 6d8e61c..9ad67b2 100644
> --- a/include/uapi/linux/virtio_blk.h
> +++ b/include/uapi/linux/virtio_blk.h
> @@ -40,6 +40,7 @@
>  #define VIRTIO_BLK_F_WCE	9	/* Writeback mode enabled after reset */
>  #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
>  #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
> +#define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
>  
>  #ifndef __KERNEL__
>  /* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
> @@ -77,6 +78,10 @@ struct virtio_blk_config {
>  
>  	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
>  	__u8 wce;
> +	__u8 unused;
> +
> +	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
> +	__u16 num_queues;
>  } __attribute__((packed));
>  
>  /*
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] block: virtio-blk: support multi virt queues per virtio-blk device
  2014-06-26  9:41   ` Ming Lei
@ 2014-06-29 16:32     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 16:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Rusty Russell, linux-api,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 05:41:48PM +0800, Ming Lei wrote:
> Firstly this patch supports more than one virtual queues for virtio-blk
> device.
> 
> Secondly this patch maps the virtual queue to blk-mq's hardware queue.
> 
> With this approach, both scalability and performance can be improved.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/block/virtio_blk.c |  104 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 84 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index f63d358..0a58140 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -15,17 +15,22 @@
>  #include <linux/numa.h>
>  
>  #define PART_BITS 4
> +#define VQ_NAME_LEN 16
>  
>  static int major;
>  static DEFINE_IDA(vd_index_ida);
>  
>  static struct workqueue_struct *virtblk_wq;
>  
> +struct virtio_blk_vq {
> +	struct virtqueue *vq;
> +	spinlock_t lock;
> +	char name[VQ_NAME_LEN];
> +} ____cacheline_aligned_in_smp;
> +
>  struct virtio_blk
>  {
>  	struct virtio_device *vdev;
> -	struct virtqueue *vq;
> -	spinlock_t vq_lock;
>  
>  	/* The disk structure for the kernel. */
>  	struct gendisk *disk;
> @@ -47,6 +52,10 @@ struct virtio_blk
>  
>  	/* Ida index - used to track minor number allocations. */
>  	int index;
> +
> +	/* num of vqs */
> +	int num_vqs;
> +	struct virtio_blk_vq *vqs;
>  };
>  
>  struct virtblk_req
> @@ -133,14 +142,15 @@ static void virtblk_done(struct virtqueue *vq)
>  {
>  	struct virtio_blk *vblk = vq->vdev->priv;
>  	bool req_done = false;
> +	int qid = vq->index;
>  	struct virtblk_req *vbr;
>  	unsigned long flags;
>  	unsigned int len;
>  
> -	spin_lock_irqsave(&vblk->vq_lock, flags);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>  	do {
>  		virtqueue_disable_cb(vq);
> -		while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
> +		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
>  			blk_mq_complete_request(vbr->req);
>  			req_done = true;
>  		}
> @@ -151,7 +161,7 @@ static void virtblk_done(struct virtqueue *vq)
>  	/* In case queue is stopped waiting for more buffers. */
>  	if (req_done)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
>  static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> @@ -160,6 +170,7 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>  	unsigned long flags;
>  	unsigned int num;
> +	int qid = hctx->queue_num;
>  	const bool last = (req->cmd_flags & REQ_END) != 0;
>  	int err;
>  	bool notify = false;
> @@ -202,12 +213,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
>  	}
>  
> -	spin_lock_irqsave(&vblk->vq_lock, flags);
> -	err = __virtblk_add_req(vblk->vq, vbr, vbr->sg, num);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> +	err = __virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
>  	if (err) {
> -		virtqueue_kick(vblk->vq);
> +		virtqueue_kick(vblk->vqs[qid].vq);
>  		blk_mq_stop_hw_queue(hctx);
> -		spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  		/* Out of mem doesn't actually happen, since we fall back
>  		 * to direct descriptors */
>  		if (err == -ENOMEM || err == -ENOSPC)
> @@ -215,12 +226,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  		return BLK_MQ_RQ_QUEUE_ERROR;
>  	}
>  
> -	if (last && virtqueue_kick_prepare(vblk->vq))
> +	if (last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
>  		notify = true;
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  
>  	if (notify)
> -		virtqueue_notify(vblk->vq);
> +		virtqueue_notify(vblk->vqs[qid].vq);
>  	return BLK_MQ_RQ_QUEUE_OK;
>  }
>  
> @@ -377,12 +388,64 @@ static void virtblk_config_changed(struct virtio_device *vdev)
>  static int init_vq(struct virtio_blk *vblk)
>  {
>  	int err = 0;
> +	int i;
> +	vq_callback_t **callbacks;
> +	const char **names;
> +	struct virtqueue **vqs;
> +	unsigned short num_vqs;
> +	struct virtio_device *vdev = vblk->vdev;
> +
> +	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
> +				   struct virtio_blk_config, num_queues,
> +				   &num_vqs);
> +	if (err)
> +		num_vqs = 1;
> +
> +	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> +	if (!vblk->vqs) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> +	if (!names)
> +		goto err_names;
> +
> +	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> +	if (!callbacks)
> +		goto err_callbacks;
> +
> +	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> +	if (!vqs)
> +		goto err_vqs;
>  
> -	/* We expect one virtqueue, for output. */
> -	vblk->vq = virtio_find_single_vq(vblk->vdev, virtblk_done, "requests");
> -	if (IS_ERR(vblk->vq))
> -		err = PTR_ERR(vblk->vq);
> +	for (i = 0; i < num_vqs; i++) {
> +		callbacks[i] = virtblk_done;
> +		snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> +		names[i] = vblk->vqs[i].name;
> +	}
> +
> +	/* Discover virtqueues and write information to configuration.  */
> +	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
> +	if (err)
> +		goto err_find_vqs;
>  
> +	for (i = 0; i < num_vqs; i++) {
> +		spin_lock_init(&vblk->vqs[i].lock);
> +		vblk->vqs[i].vq = vqs[i];
> +	}
> +	vblk->num_vqs = num_vqs;
> +
> + err_find_vqs:
> +	kfree(vqs);
> + err_vqs:
> +	kfree(callbacks);
> + err_callbacks:
> +	kfree(names);
> + err_names:
> +	if (err)
> +		kfree(vblk->vqs);
> + out:
>  	return err;
>  }
>  
> @@ -551,7 +614,6 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	err = init_vq(vblk);
>  	if (err)
>  		goto out_free_vblk;
> -	spin_lock_init(&vblk->vq_lock);
>  
>  	/* FIXME: How many partitions?  How long is a piece of string? */
>  	vblk->disk = alloc_disk(1 << PART_BITS);
> @@ -562,7 +624,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>  	/* Default queue sizing is to fill the ring. */
>  	if (!virtblk_queue_depth) {
> -		virtblk_queue_depth = vblk->vq->num_free;
> +		virtblk_queue_depth = vblk->vqs[0].vq->num_free;
>  		/* ... but without indirect descs, we use 2 descs per req */
>  		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
>  			virtblk_queue_depth /= 2;
> @@ -570,7 +632,6 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
> -	vblk->tag_set.nr_hw_queues = 1;
>  	vblk->tag_set.queue_depth = virtblk_queue_depth;
>  	vblk->tag_set.numa_node = NUMA_NO_NODE;
>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> @@ -578,6 +639,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  		sizeof(struct virtblk_req) +
>  		sizeof(struct scatterlist) * sg_elems;
>  	vblk->tag_set.driver_data = vblk;
> +	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
>  
>  	err = blk_mq_alloc_tag_set(&vblk->tag_set);
>  	if (err)
> @@ -727,6 +789,7 @@ static void virtblk_remove(struct virtio_device *vdev)
>  	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
>  	put_disk(vblk->disk);
>  	vdev->config->del_vqs(vdev);
> +	kfree(vblk->vqs);
>  	kfree(vblk);
>  
>  	/* Only free device id if we don't have any users */
> @@ -777,7 +840,8 @@ static const struct virtio_device_id id_table[] = {
>  static unsigned int features[] = {
>  	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
>  	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
> -	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE
> +	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> +	VIRTIO_BLK_F_MQ,
>  };
>  
>  static struct virtio_driver virtio_blk = {
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] block: virtio-blk: support multi virt queues per virtio-blk device
@ 2014-06-29 16:32     ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 16:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-api, linux-kernel, virtualization,
	Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 26, 2014 at 05:41:48PM +0800, Ming Lei wrote:
> Firstly this patch supports more than one virtual queues for virtio-blk
> device.
> 
> Secondly this patch maps the virtual queue to blk-mq's hardware queue.
> 
> With this approach, both scalability and performance can be improved.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/block/virtio_blk.c |  104 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 84 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index f63d358..0a58140 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -15,17 +15,22 @@
>  #include <linux/numa.h>
>  
>  #define PART_BITS 4
> +#define VQ_NAME_LEN 16
>  
>  static int major;
>  static DEFINE_IDA(vd_index_ida);
>  
>  static struct workqueue_struct *virtblk_wq;
>  
> +struct virtio_blk_vq {
> +	struct virtqueue *vq;
> +	spinlock_t lock;
> +	char name[VQ_NAME_LEN];
> +} ____cacheline_aligned_in_smp;
> +
>  struct virtio_blk
>  {
>  	struct virtio_device *vdev;
> -	struct virtqueue *vq;
> -	spinlock_t vq_lock;
>  
>  	/* The disk structure for the kernel. */
>  	struct gendisk *disk;
> @@ -47,6 +52,10 @@ struct virtio_blk
>  
>  	/* Ida index - used to track minor number allocations. */
>  	int index;
> +
> +	/* num of vqs */
> +	int num_vqs;
> +	struct virtio_blk_vq *vqs;
>  };
>  
>  struct virtblk_req
> @@ -133,14 +142,15 @@ static void virtblk_done(struct virtqueue *vq)
>  {
>  	struct virtio_blk *vblk = vq->vdev->priv;
>  	bool req_done = false;
> +	int qid = vq->index;
>  	struct virtblk_req *vbr;
>  	unsigned long flags;
>  	unsigned int len;
>  
> -	spin_lock_irqsave(&vblk->vq_lock, flags);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>  	do {
>  		virtqueue_disable_cb(vq);
> -		while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
> +		while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
>  			blk_mq_complete_request(vbr->req);
>  			req_done = true;
>  		}
> @@ -151,7 +161,7 @@ static void virtblk_done(struct virtqueue *vq)
>  	/* In case queue is stopped waiting for more buffers. */
>  	if (req_done)
>  		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
>  static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> @@ -160,6 +170,7 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>  	unsigned long flags;
>  	unsigned int num;
> +	int qid = hctx->queue_num;
>  	const bool last = (req->cmd_flags & REQ_END) != 0;
>  	int err;
>  	bool notify = false;
> @@ -202,12 +213,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
>  	}
>  
> -	spin_lock_irqsave(&vblk->vq_lock, flags);
> -	err = __virtblk_add_req(vblk->vq, vbr, vbr->sg, num);
> +	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> +	err = __virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
>  	if (err) {
> -		virtqueue_kick(vblk->vq);
> +		virtqueue_kick(vblk->vqs[qid].vq);
>  		blk_mq_stop_hw_queue(hctx);
> -		spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  		/* Out of mem doesn't actually happen, since we fall back
>  		 * to direct descriptors */
>  		if (err == -ENOMEM || err == -ENOSPC)
> @@ -215,12 +226,12 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  		return BLK_MQ_RQ_QUEUE_ERROR;
>  	}
>  
> -	if (last && virtqueue_kick_prepare(vblk->vq))
> +	if (last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
>  		notify = true;
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  
>  	if (notify)
> -		virtqueue_notify(vblk->vq);
> +		virtqueue_notify(vblk->vqs[qid].vq);
>  	return BLK_MQ_RQ_QUEUE_OK;
>  }
>  
> @@ -377,12 +388,64 @@ static void virtblk_config_changed(struct virtio_device *vdev)
>  static int init_vq(struct virtio_blk *vblk)
>  {
>  	int err = 0;
> +	int i;
> +	vq_callback_t **callbacks;
> +	const char **names;
> +	struct virtqueue **vqs;
> +	unsigned short num_vqs;
> +	struct virtio_device *vdev = vblk->vdev;
> +
> +	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
> +				   struct virtio_blk_config, num_queues,
> +				   &num_vqs);
> +	if (err)
> +		num_vqs = 1;
> +
> +	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> +	if (!vblk->vqs) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> +	if (!names)
> +		goto err_names;
> +
> +	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> +	if (!callbacks)
> +		goto err_callbacks;
> +
> +	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> +	if (!vqs)
> +		goto err_vqs;
>  
> -	/* We expect one virtqueue, for output. */
> -	vblk->vq = virtio_find_single_vq(vblk->vdev, virtblk_done, "requests");
> -	if (IS_ERR(vblk->vq))
> -		err = PTR_ERR(vblk->vq);
> +	for (i = 0; i < num_vqs; i++) {
> +		callbacks[i] = virtblk_done;
> +		snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> +		names[i] = vblk->vqs[i].name;
> +	}
> +
> +	/* Discover virtqueues and write information to configuration.  */
> +	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
> +	if (err)
> +		goto err_find_vqs;
>  
> +	for (i = 0; i < num_vqs; i++) {
> +		spin_lock_init(&vblk->vqs[i].lock);
> +		vblk->vqs[i].vq = vqs[i];
> +	}
> +	vblk->num_vqs = num_vqs;
> +
> + err_find_vqs:
> +	kfree(vqs);
> + err_vqs:
> +	kfree(callbacks);
> + err_callbacks:
> +	kfree(names);
> + err_names:
> +	if (err)
> +		kfree(vblk->vqs);
> + out:
>  	return err;
>  }
>  
> @@ -551,7 +614,6 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	err = init_vq(vblk);
>  	if (err)
>  		goto out_free_vblk;
> -	spin_lock_init(&vblk->vq_lock);
>  
>  	/* FIXME: How many partitions?  How long is a piece of string? */
>  	vblk->disk = alloc_disk(1 << PART_BITS);
> @@ -562,7 +624,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>  	/* Default queue sizing is to fill the ring. */
>  	if (!virtblk_queue_depth) {
> -		virtblk_queue_depth = vblk->vq->num_free;
> +		virtblk_queue_depth = vblk->vqs[0].vq->num_free;
>  		/* ... but without indirect descs, we use 2 descs per req */
>  		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
>  			virtblk_queue_depth /= 2;
> @@ -570,7 +632,6 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
> -	vblk->tag_set.nr_hw_queues = 1;
>  	vblk->tag_set.queue_depth = virtblk_queue_depth;
>  	vblk->tag_set.numa_node = NUMA_NO_NODE;
>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> @@ -578,6 +639,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  		sizeof(struct virtblk_req) +
>  		sizeof(struct scatterlist) * sg_elems;
>  	vblk->tag_set.driver_data = vblk;
> +	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
>  
>  	err = blk_mq_alloc_tag_set(&vblk->tag_set);
>  	if (err)
> @@ -727,6 +789,7 @@ static void virtblk_remove(struct virtio_device *vdev)
>  	refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
>  	put_disk(vblk->disk);
>  	vdev->config->del_vqs(vdev);
> +	kfree(vblk->vqs);
>  	kfree(vblk);
>  
>  	/* Only free device id if we don't have any users */
> @@ -777,7 +840,8 @@ static const struct virtio_device_id id_table[] = {
>  static unsigned int features[] = {
>  	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
>  	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
> -	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE
> +	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> +	VIRTIO_BLK_F_MQ,
>  };
>  
>  static struct virtio_driver virtio_blk = {
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-07-01  1:36     ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-07-01  1:36 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List
  Cc: Rusty Russell, linux-api, Linux Virtualization,
	Michael S. Tsirkin, Stefan Hajnoczi, Paolo Bonzini

Hi Jens and Rusty,

On Thu, Jun 26, 2014 at 8:04 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> Hi,
>>
>> These patches try to support multi virtual queues(multi-vq) in one
>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>> hardware queue.
>>
>> With this approach, both scalability and performance on virtio-blk
>> device can get improved.
>>
>> For verifying the improvement, I implements virtio-blk multi-vq over
>> qemu's dataplane feature, and both handling host notification
>> from each vq and processing host I/O are still kept in the per-device
>> iothread context, the change is based on qemu v2.0.0 release, and
>> can be accessed from below tree:
>>
>>         git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>
>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>> depends on x-data-plane.
>>
>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>> verify the improvement.
>>
>> I just create a small quadcore VM and run fio inside the VM, and
>> num_queues of the virtio-blk device is set as 2, but looks the
>> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
>> server.
>>
>> 1), about scalability
>> - jobs = 2, thoughput: +33%
>> - jobs = 4, thoughput: +100%
>>
>> 2), about top thoughput: +39%
>>
>> So in my test, even for a quad-core VM, if the virtqueue number
>> is increased from 1 to 2, both scalability and performance can
>> get improved a lot.
>>
>> In above qemu implementation of virtio-blk-mq device, only one
>> IOthread handles requests from all vqs, and the above throughput
>> data has been very close to same fio test in host side with single
>> job. So more improvement should be observed once more IOthreads are
>> used for handling requests from multi vqs.
>>
>> TODO:
>>         - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>>
>> V3:
>>         - fix use-after-free on vq->name reported by Michael
>>
>> V2: (suggestions from Michael and Dave Chinner)
>>         - allocate virtqueues' pointers dynamically
>>         - make sure the per-queue spinlock isn't kept in same cache line
>>         - make each queue's name different
>>
>> V1:
>>         - remove RFC since no one objects
>>         - add '__u8 unused' for pending as suggested by Rusty
>>         - use virtio_cread_feature() directly, suggested by Rusty
>
> Sorry, please add Jens' reviewed-by.
>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>

I appreciate very much that one of you may queue these two
patches into your tree so that userspace work can be kicked off,
since Michael has acked both patches and all comments have
been addressed already.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-07-01  1:36     ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-07-01  1:36 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List
  Cc: Rusty Russell, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Linux Virtualization, Michael S. Tsirkin, Stefan Hajnoczi,
	Paolo Bonzini

Hi Jens and Rusty,

On Thu, Jun 26, 2014 at 8:04 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> Hi,
>>
>> These patches try to support multi virtual queues(multi-vq) in one
>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>> hardware queue.
>>
>> With this approach, both scalability and performance on virtio-blk
>> device can get improved.
>>
>> For verifying the improvement, I implements virtio-blk multi-vq over
>> qemu's dataplane feature, and both handling host notification
>> from each vq and processing host I/O are still kept in the per-device
>> iothread context, the change is based on qemu v2.0.0 release, and
>> can be accessed from below tree:
>>
>>         git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>
>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>> depends on x-data-plane.
>>
>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>> verify the improvement.
>>
>> I just create a small quadcore VM and run fio inside the VM, and
>> num_queues of the virtio-blk device is set as 2, but looks the
>> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
>> server.
>>
>> 1), about scalability
>> - jobs = 2, thoughput: +33%
>> - jobs = 4, thoughput: +100%
>>
>> 2), about top thoughput: +39%
>>
>> So in my test, even for a quad-core VM, if the virtqueue number
>> is increased from 1 to 2, both scalability and performance can
>> get improved a lot.
>>
>> In above qemu implementation of virtio-blk-mq device, only one
>> IOthread handles requests from all vqs, and the above throughput
>> data has been very close to same fio test in host side with single
>> job. So more improvement should be observed once more IOthreads are
>> used for handling requests from multi vqs.
>>
>> TODO:
>>         - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>>
>> V3:
>>         - fix use-after-free on vq->name reported by Michael
>>
>> V2: (suggestions from Michael and Dave Chinner)
>>         - allocate virtqueues' pointers dynamically
>>         - make sure the per-queue spinlock isn't kept in same cache line
>>         - make each queue's name different
>>
>> V1:
>>         - remove RFC since no one objects
>>         - add '__u8 unused' for pending as suggested by Rusty
>>         - use virtio_cread_feature() directly, suggested by Rusty
>
> Sorry, please add Jens' reviewed-by.
>
>     Reviewed-by: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>

I appreciate very much that one of you may queue these two
patches into your tree so that userspace work can be kicked off,
since Michael has acked both patches and all comments have
been addressed already.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
  2014-06-26 12:04   ` Ming Lei
  (?)
@ 2014-07-01  1:36   ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-07-01  1:36 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List
  Cc: Michael S. Tsirkin, linux-api, Linux Virtualization,
	Stefan Hajnoczi, Paolo Bonzini

Hi Jens and Rusty,

On Thu, Jun 26, 2014 at 8:04 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> Hi,
>>
>> These patches try to support multi virtual queues(multi-vq) in one
>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>> hardware queue.
>>
>> With this approach, both scalability and performance on virtio-blk
>> device can get improved.
>>
>> For verifying the improvement, I implements virtio-blk multi-vq over
>> qemu's dataplane feature, and both handling host notification
>> from each vq and processing host I/O are still kept in the per-device
>> iothread context, the change is based on qemu v2.0.0 release, and
>> can be accessed from below tree:
>>
>>         git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>
>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>> depends on x-data-plane.
>>
>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>> verify the improvement.
>>
>> I just create a small quadcore VM and run fio inside the VM, and
>> num_queues of the virtio-blk device is set as 2, but looks the
>> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
>> server.
>>
>> 1), about scalability
>> - jobs = 2, thoughput: +33%
>> - jobs = 4, thoughput: +100%
>>
>> 2), about top thoughput: +39%
>>
>> So in my test, even for a quad-core VM, if the virtqueue number
>> is increased from 1 to 2, both scalability and performance can
>> get improved a lot.
>>
>> In above qemu implementation of virtio-blk-mq device, only one
>> IOthread handles requests from all vqs, and the above throughput
>> data has been very close to same fio test in host side with single
>> job. So more improvement should be observed once more IOthreads are
>> used for handling requests from multi vqs.
>>
>> TODO:
>>         - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>>
>> V3:
>>         - fix use-after-free on vq->name reported by Michael
>>
>> V2: (suggestions from Michael and Dave Chinner)
>>         - allocate virtqueues' pointers dynamically
>>         - make sure the per-queue spinlock isn't kept in same cache line
>>         - make each queue's name different
>>
>> V1:
>>         - remove RFC since no one objects
>>         - add '__u8 unused' for pending as suggested by Rusty
>>         - use virtio_cread_feature() directly, suggested by Rusty
>
> Sorry, please add Jens' reviewed-by.
>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>

I appreciate very much that one of you may queue these two
patches into your tree so that userspace work can be kicked off,
since Michael has acked both patches and all comments have
been addressed already.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
  2014-07-01  1:36     ` Ming Lei
@ 2014-07-01  3:01       ` Jens Axboe
  -1 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2014-07-01  3:01 UTC (permalink / raw)
  To: Ming Lei, Linux Kernel Mailing List
  Cc: Rusty Russell, linux-api, Linux Virtualization,
	Michael S. Tsirkin, Stefan Hajnoczi, Paolo Bonzini

On 2014-06-30 19:36, Ming Lei wrote:
> Hi Jens and Rusty,
>
> On Thu, Jun 26, 2014 at 8:04 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
>>> Hi,
>>>
>>> These patches try to support multi virtual queues(multi-vq) in one
>>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>>> hardware queue.
>>>
>>> With this approach, both scalability and performance on virtio-blk
>>> device can get improved.
>>>
>>> For verifying the improvement, I implements virtio-blk multi-vq over
>>> qemu's dataplane feature, and both handling host notification
>>> from each vq and processing host I/O are still kept in the per-device
>>> iothread context, the change is based on qemu v2.0.0 release, and
>>> can be accessed from below tree:
>>>
>>>          git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>>
>>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>>> depends on x-data-plane.
>>>
>>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>>> verify the improvement.
>>>
>>> I just create a small quadcore VM and run fio inside the VM, and
>>> num_queues of the virtio-blk device is set as 2, but looks the
>>> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
>>> server.
>>>
>>> 1), about scalability
>>> - jobs = 2, thoughput: +33%
>>> - jobs = 4, thoughput: +100%
>>>
>>> 2), about top thoughput: +39%
>>>
>>> So in my test, even for a quad-core VM, if the virtqueue number
>>> is increased from 1 to 2, both scalability and performance can
>>> get improved a lot.
>>>
>>> In above qemu implementation of virtio-blk-mq device, only one
>>> IOthread handles requests from all vqs, and the above throughput
>>> data has been very close to same fio test in host side with single
>>> job. So more improvement should be observed once more IOthreads are
>>> used for handling requests from multi vqs.
>>>
>>> TODO:
>>>          - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>>>
>>> V3:
>>>          - fix use-after-free on vq->name reported by Michael
>>>
>>> V2: (suggestions from Michael and Dave Chinner)
>>>          - allocate virtqueues' pointers dynamically
>>>          - make sure the per-queue spinlock isn't kept in same cache line
>>>          - make each queue's name different
>>>
>>> V1:
>>>          - remove RFC since no one objects
>>>          - add '__u8 unused' for pending as suggested by Rusty
>>>          - use virtio_cread_feature() directly, suggested by Rusty
>>
>> Sorry, please add Jens' reviewed-by.
>>
>>      Reviewed-by: Jens Axboe <axboe@kernel.dk>
>
> I appreciate very much that one of you may queue these two
> patches into your tree so that userspace work can be kicked off,
> since Michael has acked both patches and all comments have
> been addressed already.

Given that Michael also acked it and Rusty is on his sabbatical, I'll 
queue it up for 3.17.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-07-01  3:01       ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2014-07-01  3:01 UTC (permalink / raw)
  To: Ming Lei, Linux Kernel Mailing List
  Cc: Michael S. Tsirkin, linux-api, Linux Virtualization,
	Stefan Hajnoczi, Paolo Bonzini

On 2014-06-30 19:36, Ming Lei wrote:
> Hi Jens and Rusty,
>
> On Thu, Jun 26, 2014 at 8:04 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Thu, Jun 26, 2014 at 5:41 PM, Ming Lei <ming.lei@canonical.com> wrote:
>>> Hi,
>>>
>>> These patches try to support multi virtual queues(multi-vq) in one
>>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>>> hardware queue.
>>>
>>> With this approach, both scalability and performance on virtio-blk
>>> device can get improved.
>>>
>>> For verifying the improvement, I implements virtio-blk multi-vq over
>>> qemu's dataplane feature, and both handling host notification
>>> from each vq and processing host I/O are still kept in the per-device
>>> iothread context, the change is based on qemu v2.0.0 release, and
>>> can be accessed from below tree:
>>>
>>>          git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>>
>>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>>> depends on x-data-plane.
>>>
>>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>>> verify the improvement.
>>>
>>> I just create a small quadcore VM and run fio inside the VM, and
>>> num_queues of the virtio-blk device is set as 2, but looks the
>>> improvement is still obvious. The host is 2 sockets, 8cores(16threads)
>>> server.
>>>
>>> 1), about scalability
>>> - jobs = 2, thoughput: +33%
>>> - jobs = 4, thoughput: +100%
>>>
>>> 2), about top thoughput: +39%
>>>
>>> So in my test, even for a quad-core VM, if the virtqueue number
>>> is increased from 1 to 2, both scalability and performance can
>>> get improved a lot.
>>>
>>> In above qemu implementation of virtio-blk-mq device, only one
>>> IOthread handles requests from all vqs, and the above throughput
>>> data has been very close to same fio test in host side with single
>>> job. So more improvement should be observed once more IOthreads are
>>> used for handling requests from multi vqs.
>>>
>>> TODO:
>>>          - adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask
>>>
>>> V3:
>>>          - fix use-after-free on vq->name reported by Michael
>>>
>>> V2: (suggestions from Michael and Dave Chinner)
>>>          - allocate virtqueues' pointers dynamically
>>>          - make sure the per-queue spinlock isn't kept in same cache line
>>>          - make each queue's name different
>>>
>>> V1:
>>>          - remove RFC since no one objects
>>>          - add '__u8 unused' for pending as suggested by Rusty
>>>          - use virtio_cread_feature() directly, suggested by Rusty
>>
>> Sorry, please add Jens' reviewed-by.
>>
>>      Reviewed-by: Jens Axboe <axboe@kernel.dk>
>
> I appreciate very much that one of you may queue these two
> patches into your tree so that userspace work can be kicked off,
> since Michael has acked both patches and all comments have
> been addressed already.

Given that Michael also acked it and Rusty is on his sabbatical, I'll 
queue it up for 3.17.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
  2014-07-01  3:01       ` Jens Axboe
@ 2014-07-01  8:13         ` Christoph Hellwig
  -1 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2014-07-01  8:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Linux Kernel Mailing List, Rusty Russell, linux-api,
	Linux Virtualization, Michael S. Tsirkin, Stefan Hajnoczi,
	Paolo Bonzini

On Mon, Jun 30, 2014 at 09:01:07PM -0600, Jens Axboe wrote:
> >I appreciate very much that one of you may queue these two
> >patches into your tree so that userspace work can be kicked off,
> >since Michael has acked both patches and all comments have
> >been addressed already.
> 
> Given that Michael also acked it and Rusty is on his sabbatical, I'll queue
> it up for 3.17.

So Rusty is offline?  Who is taking care of module/moduleparam patches
in the meantime?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-07-01  8:13         ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2014-07-01  8:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, linux-api, Ming Lei,
	Linux Kernel Mailing List, Linux Virtualization, Stefan Hajnoczi,
	Paolo Bonzini

On Mon, Jun 30, 2014 at 09:01:07PM -0600, Jens Axboe wrote:
> >I appreciate very much that one of you may queue these two
> >patches into your tree so that userspace work can be kicked off,
> >since Michael has acked both patches and all comments have
> >been addressed already.
> 
> Given that Michael also acked it and Rusty is on his sabbatical, I'll queue
> it up for 3.17.

So Rusty is offline?  Who is taking care of module/moduleparam patches
in the meantime?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
  2014-07-01  8:13         ` Christoph Hellwig
@ 2014-07-09  0:07           ` Rusty Russell
  -1 siblings, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2014-07-09  0:07 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Ming Lei, Linux Kernel Mailing List, linux-api,
	Linux Virtualization, Michael S. Tsirkin, Stefan Hajnoczi,
	Paolo Bonzini

Christoph Hellwig <hch@infradead.org> writes:
> On Mon, Jun 30, 2014 at 09:01:07PM -0600, Jens Axboe wrote:
>> >I appreciate very much that one of you may queue these two
>> >patches into your tree so that userspace work can be kicked off,
>> >since Michael has acked both patches and all comments have
>> >been addressed already.
>> 
>> Given that Michael also acked it and Rusty is on his sabbatical, I'll queue
>> it up for 3.17.
>
> So Rusty is offline?  Who is taking care of module/moduleparam patches
> in the meantime?

I'm intermittantly online :)

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-07-09  0:07           ` Rusty Russell
  0 siblings, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2014-07-09  0:07 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, linux-api, Ming Lei,
	Linux Kernel Mailing List, Linux Virtualization, Stefan Hajnoczi,
	Paolo Bonzini

Christoph Hellwig <hch@infradead.org> writes:
> On Mon, Jun 30, 2014 at 09:01:07PM -0600, Jens Axboe wrote:
>> >I appreciate very much that one of you may queue these two
>> >patches into your tree so that userspace work can be kicked off,
>> >since Michael has acked both patches and all comments have
>> >been addressed already.
>> 
>> Given that Michael also acked it and Rusty is on his sabbatical, I'll queue
>> it up for 3.17.
>
> So Rusty is offline?  Who is taking care of module/moduleparam patches
> in the meantime?

I'm intermittantly online :)

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk
@ 2014-06-26  9:41 Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-26  9:41 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: Michael S. Tsirkin, linux-api, virtualization, Stefan Hajnoczi,
	Paolo Bonzini

Hi,

These patches try to support multi virtual queues(multi-vq) in one
virtio-blk device, and maps each virtual queue(vq) to blk-mq's
hardware queue.

With this approach, both scalability and performance on virtio-blk
device can get improved.

For verifying the improvement, I implements virtio-blk multi-vq over
qemu's dataplane feature, and both handling host notification
from each vq and processing host I/O are still kept in the per-device
iothread context, the change is based on qemu v2.0.0 release, and
can be accessed from below tree:

	git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1

For enabling the multi-vq feature, 'num_queues=N' need to be added into
'-device virtio-blk-pci ...' of qemu command line, and suggest to pass
'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
depends on x-data-plane.

Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
verify the improvement.

I just create a small quadcore VM and run fio inside the VM, and
num_queues of the virtio-blk device is set as 2, but looks the
improvement is still obvious. The host is 2 sockets, 8cores(16threads)
server.

1), about scalability
- jobs = 2, thoughput: +33%
- jobs = 4, thoughput: +100%

2), about top thoughput: +39%

So in my test, even for a quad-core VM, if the virtqueue number
is increased from 1 to 2, both scalability and performance can
get improved a lot.

In above qemu implementation of virtio-blk-mq device, only one
IOthread handles requests from all vqs, and the above throughput
data has been very close to same fio test in host side with single
job. So more improvement should be observed once more IOthreads are
used for handling requests from multi vqs.

TODO:
	- adjust vq's irq smp_affinity according to blk-mq hw queue's cpumask

V3:
	- fix use-after-free on vq->name reported by Michael

V2: (suggestions from Michael and Dave Chinner)
	- allocate virtqueues' pointers dynamically
	- make sure the per-queue spinlock isn't kept in same cache line
	- make each queue's name different

V1:
	- remove RFC since no one objects
	- add '__u8 unused' for pending as suggested by Rusty
	- use virtio_cread_feature() directly, suggested by Rusty

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-09  0:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-26  9:41 [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk Ming Lei
2014-06-26  9:41 ` Ming Lei
2014-06-26  9:41 ` [PATCH v3 1/2] include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ Ming Lei
2014-06-26  9:41   ` Ming Lei
2014-06-29 16:32   ` Michael S. Tsirkin
2014-06-29 16:32     ` Michael S. Tsirkin
2014-06-26  9:41 ` Ming Lei
2014-06-26  9:41 ` [PATCH v3 2/2] block: virtio-blk: support multi virt queues per virtio-blk device Ming Lei
2014-06-26  9:41   ` Ming Lei
2014-06-29 16:32   ` Michael S. Tsirkin
2014-06-29 16:32     ` Michael S. Tsirkin
2014-06-26 12:04 ` [PATCH v3 0/2] block: virtio-blk: support multi vq per virtio-blk Ming Lei
2014-06-26 12:04   ` Ming Lei
2014-07-01  1:36   ` Ming Lei
2014-07-01  1:36   ` Ming Lei
2014-07-01  1:36     ` Ming Lei
2014-07-01  3:01     ` Jens Axboe
2014-07-01  3:01       ` Jens Axboe
2014-07-01  8:13       ` Christoph Hellwig
2014-07-01  8:13         ` Christoph Hellwig
2014-07-09  0:07         ` Rusty Russell
2014-07-09  0:07           ` Rusty Russell
2014-06-26  9:41 Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.