All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
@ 2012-07-04  4:24 Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 1/6] vhost: Separate vhost-net features from vhost features Nicholas A. Bellinger
                   ` (11 more replies)
  0 siblings, 12 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

Hi folks,

This series contains patches required to update tcm_vhost <-> virtio-scsi
connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
available on top of target-pending/auto-next here:

   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost

This includes the necessary vhost changes from Stefan to to get tcm_vhost
functioning, along a virtio-scsi LUN scanning change to address a client bug
with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
source + header file that is now living under /drivers/vhost/, along with latest
tcm_vhost changes from Zhi's tcm_vhost tree.

Here are a couple of screenshots of the code in action using raw IBLOCK
backends provided by FusionIO ioDrive Duo:

   http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
   http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png

So the next steps on my end will be converting tcm_vhost to submit backend I/O from
cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.

Please have a look vhost + virtio-scsi folks (mst + paolo CC'ed) and let us
know if you have any concerns.

Thanks!

--nab

Nicholas Bellinger (4):
  vhost: Add vhost_scsi specific defines
  tcm_vhost: Initial merge for vhost level target fabric driver
  virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN
    scanning
  virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs

Stefan Hajnoczi (2):
  vhost: Separate vhost-net features from vhost features
  vhost: make vhost work queue visible

 drivers/scsi/virtio_scsi.c |   20 +-
 drivers/vhost/Kconfig      |    6 +
 drivers/vhost/Makefile     |    1 +
 drivers/vhost/net.c        |    4 +-
 drivers/vhost/tcm_vhost.c  | 1592 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/tcm_vhost.h  |   70 ++
 drivers/vhost/vhost.c      |    5 +-
 drivers/vhost/vhost.h      |    6 +-
 drivers/virtio/virtio.c    |    5 +-
 include/linux/vhost.h      |    9 +
 include/linux/virtio.h     |    1 +
 11 files changed, 1708 insertions(+), 11 deletions(-)
 create mode 100644 drivers/vhost/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm_vhost.h

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/6] vhost: Separate vhost-net features from vhost features
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:41   ` Asias He
  2012-07-04  4:24 ` [PATCH 2/6] vhost: make vhost work queue visible Nicholas A. Bellinger
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Christoph Hellwig

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
constant.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
---
 drivers/vhost/net.c   |    4 ++--
 drivers/vhost/vhost.h |    3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f82a739..072cbba 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 			return -EFAULT;
 		return vhost_net_set_backend(n, backend.index, backend.fd);
 	case VHOST_GET_FEATURES:
-		features = VHOST_FEATURES;
+		features = VHOST_NET_FEATURES;
 		if (copy_to_user(featurep, &features, sizeof features))
 			return -EFAULT;
 		return 0;
 	case VHOST_SET_FEATURES:
 		if (copy_from_user(&features, featurep, sizeof features))
 			return -EFAULT;
-		if (features & ~VHOST_FEATURES)
+		if (features & ~VHOST_NET_FEATURES)
 			return -EOPNOTSUPP;
 		return vhost_net_set_features(n, features);
 	case VHOST_RESET_OWNER:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8de1fd5..07b9763 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -201,7 +201,8 @@ enum {
 	VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
 			 (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
 			 (1ULL << VIRTIO_RING_F_EVENT_IDX) |
-			 (1ULL << VHOST_F_LOG_ALL) |
+			 (1ULL << VHOST_F_LOG_ALL),
+	VHOST_NET_FEATURES = VHOST_FEATURES |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 			 (1ULL << VIRTIO_NET_F_MRG_RXBUF),
 };
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/6] vhost: make vhost work queue visible
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 1/6] vhost: Separate vhost-net features from vhost features Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` Nicholas A. Bellinger
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke, Stefan Hajnoczi

From: Stefan Hajnoczi <stefanha@gmail.com>

The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm.  Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions.  However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/vhost/vhost.c |    5 ++---
 drivers/vhost/vhost.h |    3 +++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94dbd25..1aab08b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
 	return 0;
 }
 
-static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 {
 	INIT_LIST_HEAD(&work->node);
 	work->fn = fn;
@@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
 	vhost_work_flush(poll->dev, &poll->work);
 }
 
-static inline void vhost_work_queue(struct vhost_dev *dev,
-				    struct vhost_work *work)
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
 	unsigned long flags;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 07b9763..1125af3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -43,6 +43,9 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     unsigned long mask, struct vhost_dev *dev);
 void vhost_poll_start(struct vhost_poll *poll, struct file *file);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/6] vhost: make vhost work queue visible
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 1/6] vhost: Separate vhost-net features from vhost features Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 2/6] vhost: make vhost work queue visible Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 3/6] vhost: Add vhost_scsi specific defines Nicholas A. Bellinger
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Christoph Hellwig

From: Stefan Hajnoczi <stefanha@gmail.com>

The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm.  Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions.  However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/vhost/vhost.c |    5 ++---
 drivers/vhost/vhost.h |    3 +++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94dbd25..1aab08b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
 	return 0;
 }
 
-static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 {
 	INIT_LIST_HEAD(&work->node);
 	work->fn = fn;
@@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
 	vhost_work_flush(poll->dev, &poll->work);
 }
 
-static inline void vhost_work_queue(struct vhost_dev *dev,
-				    struct vhost_work *work)
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
 	unsigned long flags;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 07b9763..1125af3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -43,6 +43,9 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     unsigned long mask, struct vhost_dev *dev);
 void vhost_poll_start(struct vhost_poll *poll, struct file *file);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 3/6] vhost: Add vhost_scsi specific defines
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (2 preceding siblings ...)
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` Nicholas A. Bellinger
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@risingtidesystems.com>

This patch adds the initial vhost_scsi_ioctl() callers for VHOST_SCSI_SET_ENDPOINT
and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct vhost_vring_target
that is used by tcm_vhost code when locating target ports during qemu setup.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
---
 include/linux/vhost.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index e847f1e..33b313b 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -24,7 +24,11 @@ struct vhost_vring_state {
 struct vhost_vring_file {
 	unsigned int index;
 	int fd; /* Pass -1 to unbind from file. */
+};
 
+struct vhost_vring_target {
+	unsigned char vhost_wwpn[224];
+	unsigned short vhost_tpgt;
 };
 
 struct vhost_vring_addr {
@@ -121,6 +125,11 @@ struct vhost_memory {
  * device.  This can be used to stop the ring (e.g. for migration). */
 #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
 
+/* VHOST_SCSI specific defines */
+
+#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct vhost_vring_target)
+#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct vhost_vring_target)
+
 /* Feature bits */
 /* Log all write descriptors. Can be changed while device is active. */
 #define VHOST_F_LOG_ALL 26
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 3/6] vhost: Add vhost_scsi specific defines
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (3 preceding siblings ...)
  2012-07-04  4:24 ` [PATCH 3/6] vhost: Add vhost_scsi specific defines Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver Nicholas A. Bellinger
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Nicholas Bellinger, Christoph Hellwig

From: Nicholas Bellinger <nab@risingtidesystems.com>

This patch adds the initial vhost_scsi_ioctl() callers for VHOST_SCSI_SET_ENDPOINT
and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct vhost_vring_target
that is used by tcm_vhost code when locating target ports during qemu setup.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
---
 include/linux/vhost.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index e847f1e..33b313b 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -24,7 +24,11 @@ struct vhost_vring_state {
 struct vhost_vring_file {
 	unsigned int index;
 	int fd; /* Pass -1 to unbind from file. */
+};
 
+struct vhost_vring_target {
+	unsigned char vhost_wwpn[224];
+	unsigned short vhost_tpgt;
 };
 
 struct vhost_vring_addr {
@@ -121,6 +125,11 @@ struct vhost_memory {
  * device.  This can be used to stop the ring (e.g. for migration). */
 #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
 
+/* VHOST_SCSI specific defines */
+
+#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct vhost_vring_target)
+#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct vhost_vring_target)
+
 /* Feature bits */
 /* Log all write descriptors. Can be changed while device is active. */
 #define VHOST_F_LOG_ALL 26
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (4 preceding siblings ...)
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-05 17:47   ` Bart Van Assche
  2012-07-05 17:47   ` Bart Van Assche
  2012-07-04  4:24 ` Nicholas A. Bellinger
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch adds the initial code for tcm_vhost, a Vhost level TCM
fabric driver for virtio SCSI initiators into KVM guest.

This code is currently up and running on v3.5-rc2 host+guest along with
the virtio-scsi vdev->scan() patch to allow a proper scsi_scan_host() to
occur once the tcm_vhost nexus has been established by the paravirtualized
virtio-scsi client.

(nab: Merge into single source + header file, and move to drivers/vhost/)

Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/vhost/Kconfig     |    6 +
 drivers/vhost/Makefile    |    1 +
 drivers/vhost/tcm_vhost.c | 1592 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/tcm_vhost.h |   70 ++
 4 files changed, 1669 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm_vhost.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..a8642e2 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,9 @@ config VHOST_NET
 	  To compile this driver as a module, choose M here: the module will
 	  be called vhost_net.
 
+config TCM_VHOST
+	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
+	depends on TARGET_CORE && EVENTFD && EXPERINETAL && m
+	default n
+	---help---
+	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..b10c7b1 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
+obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
 vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
new file mode 100644
index 0000000..cd86633
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.c
@@ -0,0 +1,1592 @@
+/*******************************************************************************
+ * Vhost kernel TCM fabric driver for virtio SCSI initiators
+ *
+ * (C) Copyright 2010-2012 RisingTide Systems LLC.
+ * (C) Copyright 2010-2012 IBM Corp.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) version 2.
+ *
+ * Authors: Nicholas A. Bellinger <nab@risingtidesystems.com>
+ *          Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ ****************************************************************************/
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <generated/utsrelease.h>
+#include <linux/utsname.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/configfs.h>
+#include <linux/ctype.h>
+#include <linux/compat.h>
+#include <linux/eventfd.h>
+#include <linux/vhost.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <asm/unaligned.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_tcq.h>
+#include <target/target_core_base.h>
+#include <target/target_core_fabric.h>
+#include <target/target_core_fabric_configfs.h>
+#include <target/target_core_configfs.h>
+#include <target/configfs_macros.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h> /* TODO vhost.h currently depends on this */
+#include <linux/virtio_scsi.h>
+
+#include "vhost.c"
+#include "vhost.h"
+#include "tcm_vhost.h"
+
+struct vhost_scsi {
+	atomic_t vhost_ref_cnt;
+	struct tcm_vhost_tpg *vs_tpg;
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[3];
+
+	struct vhost_work vs_completion_work; /* cmd completion work item */
+	struct list_head vs_completion_list;  /* cmd completion queue */
+	spinlock_t vs_completion_lock;        /* protects s_completion_list */
+};
+
+/* Local pointer to allocated TCM configfs fabric module */
+struct target_fabric_configfs *tcm_vhost_fabric_configfs;
+
+/* Global spinlock to protect tcm_vhost TPG list for vhost IOCTL access */
+DEFINE_MUTEX(tcm_vhost_mutex);
+LIST_HEAD(tcm_vhost_list);
+
+static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+static int tcm_vhost_check_false(struct se_portal_group *se_tpg)
+{
+	return 0;
+}
+
+static char *tcm_vhost_get_fabric_name(void)
+{
+	return "vhost";
+}
+
+static u8 tcm_vhost_get_fabric_proto_ident(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_fabric_proto_ident(se_tpg);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_fabric_proto_ident(se_tpg);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_fabric_proto_ident(se_tpg);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_fabric_proto_ident(se_tpg);
+}
+
+static char *tcm_vhost_get_fabric_wwn(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	return &tport->tport_name[0];
+}
+
+u16 tcm_vhost_get_tag(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	return tpg->tport_tpgt;
+}
+
+static u32 tcm_vhost_get_default_depth(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+static u32 tcm_vhost_get_pr_transport_id(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl,
+	struct t10_pr_registration *pr_reg,
+	int *format_code,
+	unsigned char *buf)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+			format_code, buf);
+}
+
+static u32 tcm_vhost_get_pr_transport_id_len(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl,
+	struct t10_pr_registration *pr_reg,
+	int *format_code)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+			format_code);
+}
+
+static char *tcm_vhost_parse_pr_out_transport_id(
+	struct se_portal_group *se_tpg,
+	const char *buf,
+	u32 *out_tid_len,
+	char **port_nexus_ptr)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	case SCSI_PROTOCOL_FCP:
+		return fc_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+			port_nexus_ptr);
+}
+
+static struct se_node_acl *tcm_vhost_alloc_fabric_acl(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_nacl *nacl;
+
+	nacl = kzalloc(sizeof(struct tcm_vhost_nacl), GFP_KERNEL);
+	if (!nacl) {
+		pr_err("Unable to alocate struct tcm_vhost_nacl\n");
+		return NULL;
+	}
+
+	return &nacl->se_node_acl;
+}
+
+static void tcm_vhost_release_fabric_acl(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl)
+{
+	struct tcm_vhost_nacl *nacl = container_of(se_nacl,
+			struct tcm_vhost_nacl, se_node_acl);
+	kfree(nacl);
+}
+
+static u32 tcm_vhost_tpg_get_inst_index(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+/*
+ * Called by struct target_core_fabric_ops->new_cmd_map()
+ *
+ * Always called in process context.  A non zero return value
+ * here will signal to handle an exception based on the return code.
+ */
+static int tcm_vhost_new_cmd_map(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	struct scatterlist *sg_ptr, *sg_bidi_ptr = NULL;
+	u32 sg_no_bidi = 0;
+	int ret;
+	/*
+	 * Allocate the necessary tasks to complete the received CDB+data
+	 */
+	ret = target_setup_cmd_from_cdb(se_cmd, tv_cmd->tvc_cdb);
+	if (ret != 0)
+		return ret;
+	/*
+	 * Setup the struct scatterlist memory from the received
+	 * struct tcm_vhost_cmd..
+	 */
+	if (tv_cmd->tvc_sgl_count) {
+		sg_ptr = tv_cmd->tvc_sgl;
+		/*
+		 * For BIDI commands, pass in the extra READ buffer
+		 * to transport_generic_map_mem_to_cmd() below..
+		 */
+/* FIXME: Fix BIDI operation in tcm_vhost_new_cmd_map() */
+#if 0
+		if (se_cmd->se_cmd_flags & SCF_BIDI) {
+			mem_bidi_ptr = NULL;
+			sg_no_bidi = 0;
+		}
+#endif
+	} else {
+		/*
+		 * Used for DMA_NONE
+		 */
+		sg_ptr = NULL;
+	}
+
+	/* Tell the core about our preallocated memory */
+	return transport_generic_map_mem_to_cmd(se_cmd, sg_ptr,
+				tv_cmd->tvc_sgl_count, sg_bidi_ptr,
+				sg_no_bidi);
+}
+
+static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
+{
+	return;
+}
+
+static int tcm_vhost_shutdown_session(struct se_session *se_sess)
+{
+	return 0;
+}
+
+static void tcm_vhost_close_session(struct se_session *se_sess)
+{
+	return;
+}
+
+static u32 tcm_vhost_sess_get_index(struct se_session *se_sess)
+{
+	return 0;
+}
+
+static int tcm_vhost_write_pending(struct se_cmd *se_cmd)
+{
+	/* Go ahead and process the write immediately */
+	transport_generic_process_write(se_cmd);
+	return 0;
+}
+
+static int tcm_vhost_write_pending_status(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static void tcm_vhost_set_default_node_attrs(struct se_node_acl *nacl)
+{
+	return;
+}
+
+static u32 tcm_vhost_get_task_tag(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static int tcm_vhost_get_cmd_state(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *);
+
+static int tcm_vhost_queue_data_in(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	vhost_scsi_complete_cmd(tv_cmd);
+	return 0;
+}
+
+static int tcm_vhost_queue_status(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	vhost_scsi_complete_cmd(tv_cmd);
+	return 0;
+}
+
+static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static u16 tcm_vhost_set_fabric_sense_len(struct se_cmd *se_cmd, u32 sense_length)
+{
+	return 0;
+}
+
+static u16 tcm_vhost_get_fabric_sense_len(void)
+{
+	return 0;
+}
+
+static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+	struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+
+	/* TODO locking against target/backend threads? */
+	transport_generic_free_cmd(se_cmd, 1);
+
+	if (tv_cmd->tvc_sgl_count) {
+		u32 i;
+		for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+			put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+	}
+
+	kfree(tv_cmd);
+}
+
+/* Dequeue a command from the completion list */
+static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(struct vhost_scsi *vs)
+{
+	struct tcm_vhost_cmd *tv_cmd = NULL;
+
+	spin_lock_bh(&vs->vs_completion_lock);
+	if (list_empty(&vs->vs_completion_list)) {
+		spin_unlock_bh(&vs->vs_completion_lock);
+		return NULL;
+	}
+
+	list_for_each_entry(tv_cmd, &vs->vs_completion_list,
+			    tvc_completion_list) {
+		list_del(&tv_cmd->tvc_completion_list);
+		break;
+	}
+	spin_unlock_bh(&vs->vs_completion_lock);
+	return tv_cmd;
+}
+
+/* Fill in status and signal that we are done processing this command
+ *
+ * This is scheduled in the vhost work queue so we are called with the owner
+ * process mm and can access the vring.
+ */
+static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
+{
+	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
+	                                     vs_completion_work);
+	struct tcm_vhost_cmd *tv_cmd;
+
+	while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs)) != NULL) {
+		struct virtio_scsi_cmd_resp v_rsp;
+		struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+		int ret;
+
+		pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__,
+			tv_cmd, se_cmd->residual_count, se_cmd->scsi_status);
+
+		memset(&v_rsp, 0, sizeof(v_rsp));
+		v_rsp.resid = se_cmd->residual_count;
+		/* TODO is status_qualifier field needed? */
+		v_rsp.status = se_cmd->scsi_status;
+		v_rsp.sense_len = se_cmd->scsi_sense_length;
+		memcpy(v_rsp.sense, tv_cmd->tvc_sense_buf,
+		       v_rsp.sense_len);
+		ret = copy_to_user(tv_cmd->tvc_resp, &v_rsp, sizeof(v_rsp));
+		if (likely(ret == 0))
+			vhost_add_used(&vs->vqs[2], tv_cmd->tvc_vq_desc, 0);
+		else
+			pr_err("Faulted on virtio_scsi_cmd_resp\n");
+
+		vhost_scsi_free_cmd(tv_cmd);
+	}
+
+	vhost_signal(&vs->dev, &vs->vqs[2]);
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+	struct vhost_scsi *vs = tv_cmd->tvc_vhost;
+
+	pr_debug("%s tv_cmd %p\n", __func__, tv_cmd);
+
+	spin_lock_bh(&vs->vs_completion_lock);
+	list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
+	spin_unlock_bh(&vs->vs_completion_lock);
+
+	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+}
+
+static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
+	struct tcm_vhost_tpg *tv_tpg,
+	struct virtio_scsi_cmd_req *v_req,
+	u32 exp_data_len,
+	int data_direction)
+{
+	struct tcm_vhost_cmd *tv_cmd;
+	struct tcm_vhost_nexus *tv_nexus;
+	struct se_portal_group *se_tpg = &tv_tpg->se_tpg;
+	struct se_session *se_sess;
+	struct se_cmd *se_cmd;
+	int sam_task_attr;
+
+	tv_nexus = tv_tpg->tpg_nexus;
+	if (!tv_nexus) {
+		pr_err("Unable to locate active struct tcm_vhost_nexus\n");
+		return ERR_PTR(-EIO);
+	}
+	se_sess = tv_nexus->tvn_se_sess;
+
+	tv_cmd = kzalloc(sizeof(struct tcm_vhost_cmd), GFP_ATOMIC);
+	if (!tv_cmd) {
+		pr_err("Unable to allocate struct tcm_vhost_cmd\n");
+		return ERR_PTR(-ENOMEM);
+	}
+	INIT_LIST_HEAD(&tv_cmd->tvc_completion_list);
+	tv_cmd->tvc_tag = v_req->tag;
+
+	se_cmd = &tv_cmd->tvc_se_cmd;
+	/*
+	 * Locate the SAM Task Attr from virtio_scsi_cmd_req
+	 */
+	sam_task_attr = v_req->task_attr;
+	/*
+	 * Initialize struct se_cmd descriptor from target_core_mod infrastructure
+	 */
+	transport_init_se_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess, exp_data_len,
+				data_direction, sam_task_attr,
+				&tv_cmd->tvc_sense_buf[0]);
+
+#if 0	/* FIXME: vhost_scsi_allocate_cmd() BIDI operation */
+	if (bidi)
+		se_cmd->se_cmd_flags |= SCF_BIDI;
+#endif
+	/*
+	 * From here the rest of the se_cmd will be setup and dispatched
+	 * via tcm_vhost_new_cmd_map() from TCM backend thread context
+	 * after transport_generic_handle_cdb_map() has been called from
+	 * vhost_scsi_handle_vq() below..
+	 */
+	return tv_cmd;
+}
+
+/*
+ * Map a user memory range into a scatterlist
+ *
+ * Returns the number of scatterlist entries used or -errno on error.
+ */
+static int vhost_scsi_map_to_sgl(struct scatterlist *sgl,
+		                 unsigned int sgl_count,
+		                 void __user *ptr, size_t len, int write)
+{
+	struct scatterlist *sg = sgl;
+	unsigned int npages = 0;
+	int ret;
+
+	while (len > 0) {
+		struct page *page;
+		unsigned int offset = (uintptr_t)ptr & ~PAGE_MASK;
+		unsigned int nbytes = min(PAGE_SIZE - offset, len);
+
+		if (npages == sgl_count) {
+			ret = -ENOBUFS;
+			goto err;
+		}
+
+		ret = get_user_pages_fast((unsigned long)ptr, 1, write, &page);
+		BUG_ON(ret == 0); /* we should either get our page or fail */
+		if (ret < 0)
+			goto err;
+
+		sg_set_page(sg, page, nbytes, offset);
+		ptr += nbytes;
+		len -= nbytes;
+		sg++;
+		npages++;
+	}
+	return npages;
+
+err:
+	/* Put pages that we hold */
+	for (sg = sgl; sg != &sgl[npages]; sg++)
+		put_page(sg_page(sg));
+	return ret;
+}
+
+static int vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *tv_cmd,
+                                     struct iovec *iov, unsigned int niov,
+				     int write)
+{
+	int ret;
+	unsigned int i;
+	u32 sgl_count;
+	struct scatterlist *sg;
+
+	/*
+	 * Find out how long sglist needs to be
+	 */
+	sgl_count = 0;
+	for (i = 0; i < niov; i++) {
+		sgl_count += (((uintptr_t)iov[i].iov_base + iov[i].iov_len +
+		             PAGE_SIZE - 1) >> PAGE_SHIFT) -
+		             ((uintptr_t)iov[i].iov_base >> PAGE_SHIFT);
+	}
+	/* TODO overflow checking */
+
+	sg = kmalloc(sizeof(tv_cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC);
+	if (!sg)
+		return -ENOMEM;
+	pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__,
+	       sg, sgl_count, IS_ERR(sg));
+	sg_init_table(sg, sgl_count);
+
+	tv_cmd->tvc_sgl = sg;
+	tv_cmd->tvc_sgl_count = sgl_count;
+
+	pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
+	for (i = 0; i < niov; i++) {
+		ret = vhost_scsi_map_to_sgl(sg, sgl_count, iov[i].iov_base,
+		                            iov[i].iov_len, write);
+		if (ret < 0) {
+			for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+				put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+			kfree(tv_cmd->tvc_sgl);
+			tv_cmd->tvc_sgl = NULL;
+			tv_cmd->tvc_sgl_count = 0;
+			return ret;
+		}
+
+		sg += ret;
+		sgl_count -= ret;
+	}
+	return 0;
+}
+
+static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
+{
+	struct vhost_virtqueue *vq = &vs->vqs[2];
+	struct virtio_scsi_cmd_req v_req;
+	struct tcm_vhost_tpg *tv_tpg;
+	struct tcm_vhost_cmd *tv_cmd;
+	u32 exp_data_len, data_first, data_num, data_direction;
+	unsigned out, in, i;
+	int head, ret, lun;
+
+	/* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
+	tv_tpg = vs->vs_tpg;
+	if (unlikely(!tv_tpg)) {
+		pr_err("%s endpoint not set\n", __func__);
+		return;
+	}
+
+	mutex_lock(&vq->mutex);
+	vhost_disable_notify(&vs->dev, vq);
+
+	for (;;) {
+		head = vhost_get_vq_desc(&vs->dev, vq, vq->iov,
+					ARRAY_SIZE(vq->iov), &out, &in,
+					NULL, NULL);
+		pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n", head, out, in);
+		/* On error, stop handling until the next kick. */
+		if (unlikely(head < 0))
+			break;
+		/* Nothing new?  Wait for eventfd to tell us they refilled. */
+		if (head == vq->num) {
+			if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
+				vhost_disable_notify(&vs->dev, vq);
+				continue;
+			}
+			break;
+		}
+
+/* FIXME: BIDI operation */
+		if (out == 1 && in == 1) {
+			data_direction = DMA_NONE;
+			data_first = 0;
+			data_num = 0;
+		} else if (out == 1 && in > 1) {
+			data_direction = DMA_FROM_DEVICE;
+			data_first = out + 1;
+			data_num = in - 1;
+		} else if (out > 1 && in == 1) {
+			data_direction = DMA_TO_DEVICE;
+			data_first = 1;
+			data_num = out - 1;
+		} else {
+			pr_err("Invalid buffer layout out: %u in: %u\n", out, in);
+			break;
+		}
+
+		/*
+		 * Check for a sane resp buffer so we can report errors to
+		 * the guest.
+		 */
+		if (unlikely(vq->iov[out].iov_len !=
+					sizeof(struct virtio_scsi_cmd_resp))) {
+			pr_err("Expecting virtio_scsi_cmd_resp, got %zu bytes\n",
+					vq->iov[out].iov_len);
+			break;
+		}
+
+		if (unlikely(vq->iov[0].iov_len != sizeof(v_req))) {
+			pr_err("Expecting virtio_scsi_cmd_req, got %zu bytes\n",
+					vq->iov[0].iov_len);
+			break;
+		}
+		pr_debug("Calling __copy_from_user: vq->iov[0].iov_base: %p, len: %lu\n",
+				vq->iov[0].iov_base, sizeof(v_req));
+		ret = __copy_from_user(&v_req, vq->iov[0].iov_base, sizeof(v_req));
+		if (unlikely(ret)) {
+			pr_err("Faulted on virtio_scsi_cmd_req\n");
+			break;
+		}
+
+		exp_data_len = 0;
+		for (i = 0; i < data_num; i++) {
+			exp_data_len += vq->iov[data_first + i].iov_len;
+		}
+
+		tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req,
+					exp_data_len, data_direction);
+		if (IS_ERR(tv_cmd)) {
+			pr_err("vhost_scsi_allocate_cmd failed %ld\n", PTR_ERR(tv_cmd));
+			break;
+		}
+		pr_debug("Allocated tv_cmd: %p exp_data_len: %d, data_direction: %d\n",
+				tv_cmd, exp_data_len, data_direction);
+
+		tv_cmd->tvc_vhost = vs;
+
+		if (unlikely(vq->iov[out].iov_len !=
+		             sizeof(struct virtio_scsi_cmd_resp))) {
+			pr_err("Expecting virtio_scsi_cmd_resp, "
+			       " got %zu bytes, out: %d, in: %d\n", vq->iov[out].iov_len, out, in);
+			break;
+		}
+
+		tv_cmd->tvc_resp = vq->iov[out].iov_base;
+
+		/*
+		 * Copy in the recieved CDB descriptor into tv_cmd->tvc_cdb
+		 * that will be used by tcm_vhost_new_cmd_map() and down into
+		 * target_setup_cmd_from_cdb()
+		 */
+		memcpy(tv_cmd->tvc_cdb, v_req.cdb, TCM_VHOST_MAX_CDB_SIZE);
+		/*
+		 * Check that the recieved CDB size does not exceeded our
+		 * hardcoded max for tcm_vhost
+		 */
+		/* TODO what if cdb was too small for varlen cdb header? */
+		if (unlikely(scsi_command_size(tv_cmd->tvc_cdb) > TCM_VHOST_MAX_CDB_SIZE)) {
+			pr_err("Received SCSI CDB with command_size: %d that exceeds"
+				" SCSI_MAX_VARLEN_CDB_SIZE: %d\n",
+				scsi_command_size(tv_cmd->tvc_cdb), TCM_VHOST_MAX_CDB_SIZE);
+			break; /* TODO */
+		}
+		lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+
+		pr_debug("vhost_scsi got command opcode: %#02x, lun: %d\n",
+			tv_cmd->tvc_cdb[0], lun);
+
+		if (data_direction != DMA_NONE) {
+			ret = vhost_scsi_map_iov_to_sgl(tv_cmd, &vq->iov[data_first],
+					data_num, data_direction == DMA_TO_DEVICE);
+			if (unlikely(ret)) {
+				pr_err("Failed to map iov to sgl\n");
+				break; /* TODO */
+			}
+		}
+
+		/*
+		 * Save the descriptor from vhost_get_vq_desc() to be used to
+		 * complete the virtio-scsi request in TCM callback context via
+		 * tcm_vhost_queue_data_in() and tcm_vhost_queue_status()
+		 */
+		tv_cmd->tvc_vq_desc = head;
+		/*
+		 * Locate the struct se_lun pointer based on v_req->lun, and
+		 * attach it to struct se_cmd
+		 */
+		if (transport_lookup_cmd_lun(&tv_cmd->tvc_se_cmd, lun) < 0) {
+			pr_err("Failed to look up lun: %d\n", lun);
+			/* NON_EXISTENT_LUN */
+			transport_send_check_condition_and_sense(&tv_cmd->tvc_se_cmd,
+					tv_cmd->tvc_se_cmd.scsi_sense_reason, 0);
+			continue;
+		}
+		/*
+		 * Now queue up the newly allocated se_cmd to be processed
+		 * within TCM thread context to finish the setup and dispatched
+		 * into a TCM backend struct se_device.
+		 */
+		transport_generic_handle_cdb_map(&tv_cmd->tvc_se_cmd);
+	}
+
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_scsi_ctl_handle_kick(struct vhost_work *work)
+{
+     pr_err("%s: The handling func for control queue.\n", __func__);
+}
+
+static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
+{
+     pr_err("%s: The handling func for event queue.\n", __func__);
+}
+
+static void vhost_scsi_handle_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+						poll.work);
+	struct vhost_scsi *vs = container_of(vq->dev, struct vhost_scsi, dev);
+
+	vhost_scsi_handle_vq(vs);
+}
+
+/*
+ * Called from vhost_scsi_ioctl() context to walk the list of available tcm_vhost_tpg
+ * with an active struct tcm_vhost_nexus
+ */
+static int vhost_scsi_set_endpoint(
+	struct vhost_scsi *vs,
+	struct vhost_vring_target *t)
+{
+	struct tcm_vhost_tport *tv_tport;
+	struct tcm_vhost_tpg *tv_tpg;
+        int index;
+
+	mutex_lock(&vs->dev.mutex);
+	/* Verify that ring has been setup correctly. */
+	for (index = 0; index < vs->dev.nvqs; ++index) {
+		/* Verify that ring has been setup correctly. */
+		if (!vhost_vq_access_ok(&vs->vqs[index])) {
+		        mutex_unlock(&vs->dev.mutex);
+			return -EFAULT;
+		}
+	}
+
+	if (vs->vs_tpg) {
+		mutex_unlock(&vs->dev.mutex);
+		return -EEXIST;
+	}
+	mutex_unlock(&vs->dev.mutex);
+
+	mutex_lock(&tcm_vhost_mutex);
+	list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) {
+		mutex_lock(&tv_tpg->tv_tpg_mutex);
+		if (!tv_tpg->tpg_nexus) {
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			continue;
+		}
+		if (atomic_read(&tv_tpg->tv_tpg_vhost_count)) {
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			continue;
+		}
+		tv_tport = tv_tpg->tport;
+
+		if (!strcmp(tv_tport->tport_name, t->vhost_wwpn) &&
+		    (tv_tpg->tport_tpgt == t->vhost_tpgt)) {
+			atomic_inc(&tv_tpg->tv_tpg_vhost_count);
+			smp_mb__after_atomic_inc();
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			mutex_unlock(&tcm_vhost_mutex);
+
+			mutex_lock(&vs->dev.mutex);
+			vs->vs_tpg = tv_tpg;
+			atomic_inc(&vs->vhost_ref_cnt);
+			smp_mb__after_atomic_inc();
+			mutex_unlock(&vs->dev.mutex);
+			return 0;
+		}
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+	}
+	mutex_unlock(&tcm_vhost_mutex);
+	return -EINVAL;
+}
+
+static int vhost_scsi_clear_endpoint(
+	struct vhost_scsi *vs,
+	struct vhost_vring_target *t)
+{
+	struct tcm_vhost_tport *tv_tport;
+	struct tcm_vhost_tpg *tv_tpg;
+        int index;
+
+	mutex_lock(&vs->dev.mutex);
+	/* Verify that ring has been setup correctly. */
+	for (index = 0; index < vs->dev.nvqs; ++index) {
+		if (!vhost_vq_access_ok(&vs->vqs[index])) {
+		        mutex_unlock(&vs->dev.mutex);
+			return -EFAULT;
+		}
+	}
+
+	if (!vs->vs_tpg) {
+		mutex_unlock(&vs->dev.mutex);
+		return -ENODEV;
+	}
+	tv_tpg = vs->vs_tpg;
+	tv_tport = tv_tpg->tport;
+
+	if (strcmp(tv_tport->tport_name, t->vhost_wwpn) ||
+	    (tv_tpg->tport_tpgt != t->vhost_tpgt)) {
+		mutex_unlock(&vs->dev.mutex);
+		pr_warn("tv_tport->tport_name: %s, tv_tpg->tport_tpgt: %hu"
+			" does not match t->vhost_wwpn: %s, t->vhost_tpgt: %hu\n",
+			tv_tport->tport_name, tv_tpg->tport_tpgt,
+			t->vhost_wwpn, t->vhost_tpgt);
+		return -EINVAL;
+	}
+        atomic_dec(&tv_tpg->tv_tpg_vhost_count);
+	vs->vs_tpg = NULL;
+	mutex_unlock(&vs->dev.mutex);
+
+	return 0;
+}
+
+static int vhost_scsi_open(struct inode *inode, struct file *f)
+{
+	struct vhost_scsi *s;
+	int r;
+
+	s = kzalloc(sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work);
+	INIT_LIST_HEAD(&s->vs_completion_list);
+	spin_lock_init(&s->vs_completion_lock);
+
+	s->vqs[0].handle_kick = vhost_scsi_ctl_handle_kick;
+	s->vqs[1].handle_kick = vhost_scsi_evt_handle_kick;
+	s->vqs[2].handle_kick = vhost_scsi_handle_kick;
+	r = vhost_dev_init(&s->dev, s->vqs, 3);
+	if (r < 0) {
+		kfree(s);
+		return r;
+	}
+
+	f->private_data = s;
+	return 0;
+}
+
+static int vhost_scsi_release(struct inode *inode, struct file *f)
+{
+	struct vhost_scsi *s = f->private_data;
+
+        if (s->vs_tpg && s->vs_tpg->tport) {
+            struct vhost_vring_target backend;
+            memcpy(backend.vhost_wwpn, s->vs_tpg->tport->tport_name, sizeof(backend.vhost_wwpn));
+            backend.vhost_tpgt = s->vs_tpg->tport_tpgt;
+            vhost_scsi_clear_endpoint(s, &backend);
+        }
+
+	vhost_dev_cleanup(&s->dev, false);
+	kfree(s);
+	return 0;
+}
+
+static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
+{
+	if (features & ~VHOST_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vs->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vs->dev)) {
+		mutex_unlock(&vs->dev.mutex);
+		return -EFAULT;
+	}
+	vs->dev.acked_features = features;
+	/* TODO possibly smp_wmb() and flush vqs */
+	mutex_unlock(&vs->dev.mutex);
+	return 0;
+}
+
+static long vhost_scsi_ioctl(struct file *f, unsigned int ioctl,
+				unsigned long arg)
+{
+	struct vhost_scsi *vs = f->private_data;
+	struct vhost_vring_target backend;
+	void __user *argp = (void __user *)arg;
+	u64 __user *featurep = argp;
+	u64 features;
+	int r;
+
+	switch (ioctl) {
+	case VHOST_SCSI_SET_ENDPOINT:
+		if (copy_from_user(&backend, argp, sizeof backend))
+			return -EFAULT;
+
+		return vhost_scsi_set_endpoint(vs, &backend);
+	case VHOST_SCSI_CLEAR_ENDPOINT:
+		if (copy_from_user(&backend, argp, sizeof backend))
+			return -EFAULT;
+
+		return vhost_scsi_clear_endpoint(vs, &backend);
+	case VHOST_GET_FEATURES:
+		features = VHOST_FEATURES;
+		if (copy_to_user(featurep, &features, sizeof features))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, featurep, sizeof features))
+			return -EFAULT;
+		return vhost_scsi_set_features(vs, features);
+	default:
+		mutex_lock(&vs->dev.mutex);
+		r = vhost_dev_ioctl(&vs->dev, ioctl, arg);
+		mutex_unlock(&vs->dev.mutex);
+		return r;
+	}
+}
+
+static const struct file_operations vhost_scsi_fops = {
+	.owner          = THIS_MODULE,
+	.release        = vhost_scsi_release,
+	.unlocked_ioctl = vhost_scsi_ioctl,
+	/* TODO compat ioctl? */
+	.open           = vhost_scsi_open,
+	.llseek		= noop_llseek,
+};
+
+static struct miscdevice vhost_scsi_misc = {
+	MISC_DYNAMIC_MINOR,
+	"vhost-scsi",
+	&vhost_scsi_fops,
+};
+
+static int __init vhost_scsi_register(void)
+{
+	return misc_register(&vhost_scsi_misc);
+}
+
+static int vhost_scsi_deregister(void)
+{
+	return misc_deregister(&vhost_scsi_misc);
+}
+
+static char *tcm_vhost_dump_proto_id(struct tcm_vhost_tport *tport)
+{
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return "SAS";
+	case SCSI_PROTOCOL_FCP:
+		return "FCP";
+	case SCSI_PROTOCOL_ISCSI:
+		return "iSCSI";
+	default:
+		break;
+	}
+
+	return "Unknown";
+}
+
+static int tcm_vhost_port_link(
+	struct se_portal_group *se_tpg,
+	struct se_lun *lun)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	atomic_inc(&tv_tpg->tv_tpg_port_count);
+	smp_mb__after_atomic_inc();
+
+	return 0;
+}
+
+static void tcm_vhost_port_unlink(
+	struct se_portal_group *se_tpg,
+	struct se_lun *se_lun)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	atomic_dec(&tv_tpg->tv_tpg_port_count);
+	smp_mb__after_atomic_dec();
+}
+
+static struct se_node_acl *tcm_vhost_make_nodeacl(
+	struct se_portal_group *se_tpg,
+	struct config_group *group,
+	const char *name)
+{
+	struct se_node_acl *se_nacl, *se_nacl_new;
+	struct tcm_vhost_nacl *nacl;
+	u64 wwpn = 0;
+	u32 nexus_depth;
+
+	/* tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+		return ERR_PTR(-EINVAL); */
+	se_nacl_new = tcm_vhost_alloc_fabric_acl(se_tpg);
+	if (!se_nacl_new)
+		return ERR_PTR(-ENOMEM);
+//#warning FIXME: Hardcoded nexus depth in tcm_vhost_make_nodeacl()
+	nexus_depth = 1;
+	/*
+	 * se_nacl_new may be released by core_tpg_add_initiator_node_acl()
+	 * when converting a NodeACL from demo mode -> explict
+	 */
+	se_nacl = core_tpg_add_initiator_node_acl(se_tpg, se_nacl_new,
+				name, nexus_depth);
+	if (IS_ERR(se_nacl)) {
+		tcm_vhost_release_fabric_acl(se_tpg, se_nacl_new);
+		return se_nacl;
+	}
+	/*
+	 * Locate our struct tcm_vhost_nacl and set the FC Nport WWPN
+	 */
+	nacl = container_of(se_nacl, struct tcm_vhost_nacl, se_node_acl);
+	nacl->iport_wwpn = wwpn;
+	/* tcm_vhost_format_wwn(&nacl->iport_name[0], TCM_VHOST_NAMELEN, wwpn); */
+
+	return se_nacl;
+}
+
+static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl)
+{
+	struct tcm_vhost_nacl *nacl = container_of(se_acl,
+				struct tcm_vhost_nacl, se_node_acl);
+	core_tpg_del_initiator_node_acl(se_acl->se_tpg, se_acl, 1);
+	kfree(nacl);
+}
+
+static int tcm_vhost_make_nexus(
+	struct tcm_vhost_tpg *tv_tpg,
+	const char *name)
+{
+	struct se_portal_group *se_tpg;
+	struct tcm_vhost_nexus *tv_nexus;
+
+	mutex_lock(&tv_tpg->tv_tpg_mutex);
+	if (tv_tpg->tpg_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_debug("tv_tpg->tpg_nexus already exists\n");
+		return -EEXIST;
+	}
+	se_tpg = &tv_tpg->se_tpg;
+
+	tv_nexus = kzalloc(sizeof(struct tcm_vhost_nexus), GFP_KERNEL);
+	if (!tv_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_err("Unable to allocate struct tcm_vhost_nexus\n");
+		return -ENOMEM;
+	}
+	/*
+	 *  Initialize the struct se_session pointer
+	 */
+	tv_nexus->tvn_se_sess = transport_init_session();
+	if (IS_ERR(tv_nexus->tvn_se_sess)) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		kfree(tv_nexus);
+		return -ENOMEM;
+	}
+	/*
+	 * Since we are running in 'demo mode' this call with generate a
+	 * struct se_node_acl for the tcm_vhost struct se_portal_group with
+	 * the SCSI Initiator port name of the passed configfs group 'name'.
+	 */
+	tv_nexus->tvn_se_sess->se_node_acl = core_tpg_check_initiator_node_acl(
+				se_tpg, (unsigned char *)name);
+	if (!tv_nexus->tvn_se_sess->se_node_acl) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_debug("core_tpg_check_initiator_node_acl() failed"
+				" for %s\n", name);
+		transport_free_session(tv_nexus->tvn_se_sess);
+		kfree(tv_nexus);
+		return -ENOMEM;
+	}
+	/*
+	 * Now register the TCM vHost virtual I_T Nexus as active with the
+	 * call to __transport_register_session()
+	 */
+	__transport_register_session(se_tpg, tv_nexus->tvn_se_sess->se_node_acl,
+			tv_nexus->tvn_se_sess, tv_nexus);
+	tv_tpg->tpg_nexus = tv_nexus;
+
+	mutex_unlock(&tv_tpg->tv_tpg_mutex);
+	return 0;
+}
+
+static int tcm_vhost_drop_nexus(
+	struct tcm_vhost_tpg *tpg)
+{
+	struct se_session *se_sess;
+	struct tcm_vhost_nexus *tv_nexus;
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	tv_nexus = tpg->tpg_nexus;
+	if (!tv_nexus) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+
+	se_sess = tv_nexus->tvn_se_sess;
+	if (!se_sess) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+
+	if (atomic_read(&tpg->tv_tpg_port_count)) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		pr_err("Unable to remove TCM_vHost I_T Nexus with"
+			" active TPG port count: %d\n",
+			atomic_read(&tpg->tv_tpg_port_count));
+		return -EPERM;
+	}
+
+	if (atomic_read(&tpg->tv_tpg_vhost_count)) {
+		pr_err("Unable to remove TCM_vHost I_T Nexus with"
+			" active TPG vhost count: %d\n",
+			atomic_read(&tpg->tv_tpg_vhost_count));
+		return -EPERM;
+	}
+
+	pr_debug("TCM_vHost_ConfigFS: Removing I_T Nexus to emulated"
+		" %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport),
+		tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+	/*
+	 * Release the SCSI I_T Nexus to the emulated vHost Target Port
+	 */
+	transport_deregister_session(tv_nexus->tvn_se_sess);
+	tpg->tpg_nexus = NULL;
+	mutex_unlock(&tpg->tv_tpg_mutex);
+
+	kfree(tv_nexus);
+	return 0;
+}
+
+static ssize_t tcm_vhost_tpg_show_nexus(
+	struct se_portal_group *se_tpg,
+	char *page)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_nexus *tv_nexus;
+	ssize_t ret;
+
+	mutex_lock(&tv_tpg->tv_tpg_mutex);
+	tv_nexus = tv_tpg->tpg_nexus;
+	if (!tv_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+	ret = snprintf(page, PAGE_SIZE, "%s\n",
+			tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+	mutex_unlock(&tv_tpg->tv_tpg_mutex);
+
+	return ret;
+}
+
+static ssize_t tcm_vhost_tpg_store_nexus(
+	struct se_portal_group *se_tpg,
+	const char *page,
+	size_t count)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport_wwn = tv_tpg->tport;
+	unsigned char i_port[TCM_VHOST_NAMELEN], *ptr, *port_ptr;
+	int ret;
+	/*
+	 * Shutdown the active I_T nexus if 'NULL' is passed..
+	 */
+	if (!strncmp(page, "NULL", 4)) {
+		ret = tcm_vhost_drop_nexus(tv_tpg);
+		return (!ret) ? count : ret;
+	}
+	/*
+	 * Otherwise make sure the passed virtual Initiator port WWN matches
+	 * the fabric protocol_id set in tcm_vhost_make_tport(), and call
+	 * tcm_vhost_make_nexus().
+	 */
+	if (strlen(page) > TCM_VHOST_NAMELEN) {
+		pr_err("Emulated NAA Sas Address: %s, exceeds"
+				" max: %d\n", page, TCM_VHOST_NAMELEN);
+		return -EINVAL;
+	}
+	snprintf(&i_port[0], TCM_VHOST_NAMELEN, "%s", page);
+
+	ptr = strstr(i_port, "naa.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_SAS) {
+			pr_err("Passed SAS Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[0];
+		goto check_newline;
+	}
+	ptr = strstr(i_port, "fc.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_FCP) {
+			pr_err("Passed FCP Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[3]; /* Skip over "fc." */
+		goto check_newline;
+	}
+	ptr = strstr(i_port, "iqn.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_ISCSI) {
+			pr_err("Passed iSCSI Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[0];
+		goto check_newline;
+	}
+	pr_err("Unable to locate prefix for emulated Initiator Port:"
+			" %s\n", i_port);
+	return -EINVAL;
+	/*
+	 * Clear any trailing newline for the NAA WWN
+	 */
+check_newline:
+	if (i_port[strlen(i_port)-1] == '\n')
+		i_port[strlen(i_port)-1] = '\0';
+
+	ret = tcm_vhost_make_nexus(tv_tpg, port_ptr);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+TF_TPG_BASE_ATTR(tcm_vhost, nexus, S_IRUGO | S_IWUSR);
+
+static struct configfs_attribute *tcm_vhost_tpg_attrs[] = {
+	&tcm_vhost_tpg_nexus.attr,
+	NULL,
+};
+
+static struct se_portal_group *tcm_vhost_make_tpg(
+	struct se_wwn *wwn,
+	struct config_group *group,
+	const char *name)
+{
+	struct tcm_vhost_tport*tport = container_of(wwn,
+			struct tcm_vhost_tport, tport_wwn);
+
+	struct tcm_vhost_tpg *tpg;
+	unsigned long tpgt;
+	int ret;
+
+	if (strstr(name, "tpgt_") != name)
+		return ERR_PTR(-EINVAL);
+	if (strict_strtoul(name + 5, 10, &tpgt) || tpgt > UINT_MAX)
+		return ERR_PTR(-EINVAL);
+
+	tpg = kzalloc(sizeof(struct tcm_vhost_tpg), GFP_KERNEL);
+	if (!tpg) {
+		pr_err("Unable to allocate struct tcm_vhost_tpg");
+		return ERR_PTR(-ENOMEM);
+	}
+	mutex_init(&tpg->tv_tpg_mutex);
+	INIT_LIST_HEAD(&tpg->tv_tpg_list);
+	tpg->tport = tport;
+	tpg->tport_tpgt = tpgt;
+
+	ret = core_tpg_register(&tcm_vhost_fabric_configfs->tf_ops, wwn,
+				&tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
+	if (ret < 0) {
+		kfree(tpg);
+		return NULL;
+	}
+	mutex_lock(&tcm_vhost_mutex);
+	list_add_tail(&tpg->tv_tpg_list, &tcm_vhost_list);
+	mutex_unlock(&tcm_vhost_mutex);
+
+	return &tpg->se_tpg;
+}
+
+static void tcm_vhost_drop_tpg(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	mutex_lock(&tcm_vhost_mutex);
+	list_del(&tpg->tv_tpg_list);
+	mutex_unlock(&tcm_vhost_mutex);
+	/*
+	 * Release the virtual I_T Nexus for this vHost TPG
+	 */
+	tcm_vhost_drop_nexus(tpg);
+	/*
+	 * Deregister the se_tpg from TCM..
+	 */
+	core_tpg_deregister(se_tpg);
+	kfree(tpg);
+}
+
+static struct se_wwn *tcm_vhost_make_tport(
+	struct target_fabric_configfs *tf,
+	struct config_group *group,
+	const char *name)
+{
+	struct tcm_vhost_tport *tport;
+	char *ptr;
+	u64 wwpn = 0;
+	int off = 0;
+
+	/* if (tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+		return ERR_PTR(-EINVAL); */
+
+	tport = kzalloc(sizeof(struct tcm_vhost_tport), GFP_KERNEL);
+	if (!tport) {
+		pr_err("Unable to allocate struct tcm_vhost_tport");
+		return ERR_PTR(-ENOMEM);
+	}
+	tport->tport_wwpn = wwpn;
+	/* tcm_vhost_format_wwn(&tport->tport_name[0], TCM_VHOST__NAMELEN, wwpn); */
+	/*
+	 * Determine the emulated Protocol Identifier and Target Port Name
+	 * based on the incoming configfs directory name.
+	 */
+	ptr = strstr(name, "naa.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_SAS;
+		goto check_len;
+	}
+	ptr = strstr(name, "fc.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_FCP;
+		off = 3; /* Skip over "fc." */
+		goto check_len;
+	}
+	ptr = strstr(name, "iqn.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_ISCSI;
+		goto check_len;
+	}
+
+	pr_err("Unable to locate prefix for emulated Target Port:"
+			" %s\n", name);
+	return ERR_PTR(-EINVAL);
+
+check_len:
+	if (strlen(name) > TCM_VHOST_NAMELEN) {
+		pr_err("Emulated %s Address: %s, exceeds"
+			" max: %d\n", name, tcm_vhost_dump_proto_id(tport),
+			TCM_VHOST_NAMELEN);
+		kfree(tport);
+		return ERR_PTR(-EINVAL);
+	}
+	snprintf(&tport->tport_name[0], TCM_VHOST_NAMELEN, "%s", &name[off]);
+
+	pr_debug("TCM_VHost_ConfigFS: Allocated emulated Target"
+		" %s Address: %s\n", tcm_vhost_dump_proto_id(tport), name);
+
+	return &tport->tport_wwn;
+}
+
+static void tcm_vhost_drop_tport(struct se_wwn *wwn)
+{
+	struct tcm_vhost_tport *tport = container_of(wwn,
+				struct tcm_vhost_tport, tport_wwn);
+
+	pr_debug("TCM_VHost_ConfigFS: Deallocating emulated Target"
+		" %s Address: %s\n", tcm_vhost_dump_proto_id(tport),
+		tport->tport_name);;
+
+	kfree(tport);
+}
+
+static ssize_t tcm_vhost_wwn_show_attr_version(
+	struct target_fabric_configfs *tf,
+	char *page)
+{
+	return sprintf(page, "TCM_VHOST fabric module %s on %s/%s"
+		"on "UTS_RELEASE"\n", TCM_VHOST_VERSION, utsname()->sysname,
+		utsname()->machine);
+}
+
+TF_WWN_ATTR_RO(tcm_vhost, version);
+
+static struct configfs_attribute *tcm_vhost_wwn_attrs[] = {
+	&tcm_vhost_wwn_version.attr,
+	NULL,
+};
+
+static struct target_core_fabric_ops tcm_vhost_ops = {
+	.get_fabric_name		= tcm_vhost_get_fabric_name,
+	.get_fabric_proto_ident		= tcm_vhost_get_fabric_proto_ident,
+	.tpg_get_wwn			= tcm_vhost_get_fabric_wwn,
+	.tpg_get_tag			= tcm_vhost_get_tag,
+	.tpg_get_default_depth		= tcm_vhost_get_default_depth,
+	.tpg_get_pr_transport_id	= tcm_vhost_get_pr_transport_id,
+	.tpg_get_pr_transport_id_len	= tcm_vhost_get_pr_transport_id_len,
+	.tpg_parse_pr_out_transport_id	= tcm_vhost_parse_pr_out_transport_id,
+	.tpg_check_demo_mode		= tcm_vhost_check_true,
+	.tpg_check_demo_mode_cache	= tcm_vhost_check_true,
+	.tpg_check_demo_mode_write_protect = tcm_vhost_check_false,
+	.tpg_check_prod_mode_write_protect = tcm_vhost_check_false,
+	.tpg_alloc_fabric_acl		= tcm_vhost_alloc_fabric_acl,
+	.tpg_release_fabric_acl		= tcm_vhost_release_fabric_acl,
+	.tpg_get_inst_index		= tcm_vhost_tpg_get_inst_index,
+	.new_cmd_map			= tcm_vhost_new_cmd_map,
+	.release_cmd			= tcm_vhost_release_cmd,
+	.shutdown_session		= tcm_vhost_shutdown_session,
+	.close_session			= tcm_vhost_close_session,
+	.sess_get_index			= tcm_vhost_sess_get_index,
+	.sess_get_initiator_sid		= NULL,
+	.write_pending			= tcm_vhost_write_pending,
+	.write_pending_status		= tcm_vhost_write_pending_status,
+	.set_default_node_attributes	= tcm_vhost_set_default_node_attrs,
+	.get_task_tag			= tcm_vhost_get_task_tag,
+	.get_cmd_state			= tcm_vhost_get_cmd_state,
+	.queue_data_in			= tcm_vhost_queue_data_in,
+	.queue_status			= tcm_vhost_queue_status,
+	.queue_tm_rsp			= tcm_vhost_queue_tm_rsp,
+	.get_fabric_sense_len		= tcm_vhost_get_fabric_sense_len,
+	.set_fabric_sense_len		= tcm_vhost_set_fabric_sense_len,
+	/*
+	 * Setup function pointers for generic logic in target_core_fabric_configfs.c
+	 */
+	.fabric_make_wwn		= tcm_vhost_make_tport,
+	.fabric_drop_wwn		= tcm_vhost_drop_tport,
+	.fabric_make_tpg		= tcm_vhost_make_tpg,
+	.fabric_drop_tpg		= tcm_vhost_drop_tpg,
+	.fabric_post_link		= tcm_vhost_port_link,
+	.fabric_pre_unlink		= tcm_vhost_port_unlink,
+	.fabric_make_np			= NULL,
+	.fabric_drop_np			= NULL,
+	.fabric_make_nodeacl		= tcm_vhost_make_nodeacl,
+	.fabric_drop_nodeacl		= tcm_vhost_drop_nodeacl,
+};
+
+static int tcm_vhost_register_configfs(void)
+{
+	struct target_fabric_configfs *fabric;
+	int ret;
+
+	pr_debug("TCM_VHOST fabric module %s on %s/%s"
+		" on "UTS_RELEASE"\n",TCM_VHOST_VERSION, utsname()->sysname,
+		utsname()->machine);
+	/*
+	 * Register the top level struct config_item_type with TCM core
+	 */
+	fabric = target_fabric_configfs_init(THIS_MODULE, "vhost");
+	if (IS_ERR(fabric)) {
+		pr_err("target_fabric_configfs_init() failed\n");
+		return PTR_ERR(fabric);
+	}
+	/*
+	 * Setup fabric->tf_ops from our local tcm_vhost_ops
+	 */
+	fabric->tf_ops = tcm_vhost_ops;
+	/*
+	 * Setup default attribute lists for various fabric->tf_cit_tmpl
+	 */
+	TF_CIT_TMPL(fabric)->tfc_wwn_cit.ct_attrs = tcm_vhost_wwn_attrs;
+	TF_CIT_TMPL(fabric)->tfc_tpg_base_cit.ct_attrs = tcm_vhost_tpg_attrs;
+	TF_CIT_TMPL(fabric)->tfc_tpg_attrib_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_param_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_np_base_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_base_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_attrib_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_auth_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_param_cit.ct_attrs = NULL;
+	/*
+	 * Register the fabric for use within TCM
+	 */
+	ret = target_fabric_configfs_register(fabric);
+	if (ret < 0) {
+		pr_err("target_fabric_configfs_register() failed"
+				" for TCM_VHOST\n");
+		return ret;
+	}
+	/*
+	 * Setup our local pointer to *fabric
+	 */
+	tcm_vhost_fabric_configfs = fabric;
+	pr_debug("TCM_VHOST[0] - Set fabric -> tcm_vhost_fabric_configfs\n");
+	return 0;
+};
+
+static void tcm_vhost_deregister_configfs(void)
+{
+	if (!tcm_vhost_fabric_configfs)
+		return;
+
+	target_fabric_configfs_deregister(tcm_vhost_fabric_configfs);
+	tcm_vhost_fabric_configfs = NULL;
+	pr_debug("TCM_VHOST[0] - Cleared tcm_vhost_fabric_configfs\n");
+};
+
+static int __init tcm_vhost_init(void)
+{
+	int ret;
+
+	ret = vhost_scsi_register();
+	if (ret < 0)
+		return ret;
+
+	ret = tcm_vhost_register_configfs();
+	if (ret < 0)
+		return ret;
+
+	return 0;
+};
+
+static void tcm_vhost_exit(void)
+{
+	tcm_vhost_deregister_configfs();
+	vhost_scsi_deregister();
+};
+
+MODULE_DESCRIPTION("TCM_VHOST series fabric driver");
+MODULE_LICENSE("GPL");
+module_init(tcm_vhost_init);
+module_exit(tcm_vhost_exit);
diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h
new file mode 100644
index 0000000..0e8951b
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.h
@@ -0,0 +1,70 @@
+#define TCM_VHOST_VERSION  "v0.1"
+#define TCM_VHOST_NAMELEN 256
+#define TCM_VHOST_MAX_CDB_SIZE 32
+
+struct tcm_vhost_cmd {
+	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
+	int tvc_vq_desc;
+	/* The Tag from include/linux/virtio_scsi.h:struct virtio_scsi_cmd_req */
+	u64 tvc_tag;
+	/* The number of scatterlists associated with this cmd */
+	u32 tvc_sgl_count;
+	/* Pointer to the SGL formatted memory from virtio-scsi */
+	struct scatterlist *tvc_sgl;
+	/* Pointer to response */
+	struct virtio_scsi_cmd_resp __user *tvc_resp;
+	/* Pointer to vhost_scsi for our device */
+	struct vhost_scsi *tvc_vhost;
+	 /* The TCM I/O descriptor that is accessed via container_of() */
+	struct se_cmd tvc_se_cmd;
+	/* Copy of the incoming SCSI command descriptor block (CDB) */
+	unsigned char tvc_cdb[TCM_VHOST_MAX_CDB_SIZE];
+	/* Sense buffer that will be mapped into outgoing status */
+	unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER];
+	/* Completed commands list, serviced from vhost worker thread */
+	struct list_head tvc_completion_list;
+};
+
+struct tcm_vhost_nexus {
+	/* Pointer to TCM session for I_T Nexus */
+	struct se_session *tvn_se_sess;
+};
+
+struct tcm_vhost_nacl {
+	/* Binary World Wide unique Port Name for Vhost Initiator port */
+	u64 iport_wwpn;
+	/* ASCII formatted WWPN for Sas Initiator port */
+	char iport_name[TCM_VHOST_NAMELEN];
+	/* Returned by tcm_vhost_make_nodeacl() */
+	struct se_node_acl se_node_acl;
+};
+
+struct tcm_vhost_tpg {
+	/* Vhost port target portal group tag for TCM */
+	u16 tport_tpgt;
+	/* Used to track number of TPG Port/Lun Links wrt to explict I_T Nexus shutdown */
+	atomic_t tv_tpg_port_count;
+	/* Used for vhost_scsi device reference to tpg_nexus */
+	atomic_t tv_tpg_vhost_count;
+	/* list for tcm_vhost_list */
+	struct list_head tv_tpg_list;
+	/* Used to protect access for tpg_nexus */
+	struct mutex tv_tpg_mutex;
+	/* Pointer to the TCM VHost I_T Nexus for this TPG endpoint */
+	struct tcm_vhost_nexus *tpg_nexus;
+	/* Pointer back to tcm_vhost_tport */
+	struct tcm_vhost_tport *tport;
+	/* Returned by tcm_vhost_make_tpg() */
+	struct se_portal_group se_tpg;
+};
+
+struct tcm_vhost_tport {
+	/* SCSI protocol the tport is providing */
+	u8 tport_proto_id;
+	/* Binary World Wide unique Port Name for Vhost Target port */
+	u64 tport_wwpn;
+	/* ASCII formatted WWPN for Vhost Target port */
+	char tport_name[TCM_VHOST_NAMELEN];
+	/* Returned by tcm_vhost_make_tport() */
+	struct se_wwn tport_wwn;
+};
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (5 preceding siblings ...)
  2012-07-04  4:24 ` [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning Nicholas A. Bellinger
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Christoph Hellwig

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch adds the initial code for tcm_vhost, a Vhost level TCM
fabric driver for virtio SCSI initiators into KVM guest.

This code is currently up and running on v3.5-rc2 host+guest along with
the virtio-scsi vdev->scan() patch to allow a proper scsi_scan_host() to
occur once the tcm_vhost nexus has been established by the paravirtualized
virtio-scsi client.

(nab: Merge into single source + header file, and move to drivers/vhost/)

Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/vhost/Kconfig     |    6 +
 drivers/vhost/Makefile    |    1 +
 drivers/vhost/tcm_vhost.c | 1592 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/tcm_vhost.h |   70 ++
 4 files changed, 1669 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm_vhost.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..a8642e2 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,9 @@ config VHOST_NET
 	  To compile this driver as a module, choose M here: the module will
 	  be called vhost_net.
 
+config TCM_VHOST
+	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
+	depends on TARGET_CORE && EVENTFD && EXPERINETAL && m
+	default n
+	---help---
+	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..b10c7b1 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
+obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
 vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
new file mode 100644
index 0000000..cd86633
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.c
@@ -0,0 +1,1592 @@
+/*******************************************************************************
+ * Vhost kernel TCM fabric driver for virtio SCSI initiators
+ *
+ * (C) Copyright 2010-2012 RisingTide Systems LLC.
+ * (C) Copyright 2010-2012 IBM Corp.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) version 2.
+ *
+ * Authors: Nicholas A. Bellinger <nab@risingtidesystems.com>
+ *          Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ ****************************************************************************/
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <generated/utsrelease.h>
+#include <linux/utsname.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/configfs.h>
+#include <linux/ctype.h>
+#include <linux/compat.h>
+#include <linux/eventfd.h>
+#include <linux/vhost.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <asm/unaligned.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_tcq.h>
+#include <target/target_core_base.h>
+#include <target/target_core_fabric.h>
+#include <target/target_core_fabric_configfs.h>
+#include <target/target_core_configfs.h>
+#include <target/configfs_macros.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h> /* TODO vhost.h currently depends on this */
+#include <linux/virtio_scsi.h>
+
+#include "vhost.c"
+#include "vhost.h"
+#include "tcm_vhost.h"
+
+struct vhost_scsi {
+	atomic_t vhost_ref_cnt;
+	struct tcm_vhost_tpg *vs_tpg;
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[3];
+
+	struct vhost_work vs_completion_work; /* cmd completion work item */
+	struct list_head vs_completion_list;  /* cmd completion queue */
+	spinlock_t vs_completion_lock;        /* protects s_completion_list */
+};
+
+/* Local pointer to allocated TCM configfs fabric module */
+struct target_fabric_configfs *tcm_vhost_fabric_configfs;
+
+/* Global spinlock to protect tcm_vhost TPG list for vhost IOCTL access */
+DEFINE_MUTEX(tcm_vhost_mutex);
+LIST_HEAD(tcm_vhost_list);
+
+static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+static int tcm_vhost_check_false(struct se_portal_group *se_tpg)
+{
+	return 0;
+}
+
+static char *tcm_vhost_get_fabric_name(void)
+{
+	return "vhost";
+}
+
+static u8 tcm_vhost_get_fabric_proto_ident(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_fabric_proto_ident(se_tpg);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_fabric_proto_ident(se_tpg);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_fabric_proto_ident(se_tpg);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_fabric_proto_ident(se_tpg);
+}
+
+static char *tcm_vhost_get_fabric_wwn(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	return &tport->tport_name[0];
+}
+
+u16 tcm_vhost_get_tag(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	return tpg->tport_tpgt;
+}
+
+static u32 tcm_vhost_get_default_depth(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+static u32 tcm_vhost_get_pr_transport_id(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl,
+	struct t10_pr_registration *pr_reg,
+	int *format_code,
+	unsigned char *buf)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+					format_code, buf);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+			format_code, buf);
+}
+
+static u32 tcm_vhost_get_pr_transport_id_len(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl,
+	struct t10_pr_registration *pr_reg,
+	int *format_code)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	case SCSI_PROTOCOL_FCP:
+		return fc_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+					format_code);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+			format_code);
+}
+
+static char *tcm_vhost_parse_pr_out_transport_id(
+	struct se_portal_group *se_tpg,
+	const char *buf,
+	u32 *out_tid_len,
+	char **port_nexus_ptr)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport = tpg->tport;
+
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	case SCSI_PROTOCOL_FCP:
+		return fc_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	case SCSI_PROTOCOL_ISCSI:
+		return iscsi_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+					port_nexus_ptr);
+	default:
+		pr_err("Unknown tport_proto_id: 0x%02x, using"
+			" SAS emulation\n", tport->tport_proto_id);
+		break;
+	}
+
+	return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+			port_nexus_ptr);
+}
+
+static struct se_node_acl *tcm_vhost_alloc_fabric_acl(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_nacl *nacl;
+
+	nacl = kzalloc(sizeof(struct tcm_vhost_nacl), GFP_KERNEL);
+	if (!nacl) {
+		pr_err("Unable to alocate struct tcm_vhost_nacl\n");
+		return NULL;
+	}
+
+	return &nacl->se_node_acl;
+}
+
+static void tcm_vhost_release_fabric_acl(
+	struct se_portal_group *se_tpg,
+	struct se_node_acl *se_nacl)
+{
+	struct tcm_vhost_nacl *nacl = container_of(se_nacl,
+			struct tcm_vhost_nacl, se_node_acl);
+	kfree(nacl);
+}
+
+static u32 tcm_vhost_tpg_get_inst_index(struct se_portal_group *se_tpg)
+{
+	return 1;
+}
+
+/*
+ * Called by struct target_core_fabric_ops->new_cmd_map()
+ *
+ * Always called in process context.  A non zero return value
+ * here will signal to handle an exception based on the return code.
+ */
+static int tcm_vhost_new_cmd_map(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	struct scatterlist *sg_ptr, *sg_bidi_ptr = NULL;
+	u32 sg_no_bidi = 0;
+	int ret;
+	/*
+	 * Allocate the necessary tasks to complete the received CDB+data
+	 */
+	ret = target_setup_cmd_from_cdb(se_cmd, tv_cmd->tvc_cdb);
+	if (ret != 0)
+		return ret;
+	/*
+	 * Setup the struct scatterlist memory from the received
+	 * struct tcm_vhost_cmd..
+	 */
+	if (tv_cmd->tvc_sgl_count) {
+		sg_ptr = tv_cmd->tvc_sgl;
+		/*
+		 * For BIDI commands, pass in the extra READ buffer
+		 * to transport_generic_map_mem_to_cmd() below..
+		 */
+/* FIXME: Fix BIDI operation in tcm_vhost_new_cmd_map() */
+#if 0
+		if (se_cmd->se_cmd_flags & SCF_BIDI) {
+			mem_bidi_ptr = NULL;
+			sg_no_bidi = 0;
+		}
+#endif
+	} else {
+		/*
+		 * Used for DMA_NONE
+		 */
+		sg_ptr = NULL;
+	}
+
+	/* Tell the core about our preallocated memory */
+	return transport_generic_map_mem_to_cmd(se_cmd, sg_ptr,
+				tv_cmd->tvc_sgl_count, sg_bidi_ptr,
+				sg_no_bidi);
+}
+
+static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
+{
+	return;
+}
+
+static int tcm_vhost_shutdown_session(struct se_session *se_sess)
+{
+	return 0;
+}
+
+static void tcm_vhost_close_session(struct se_session *se_sess)
+{
+	return;
+}
+
+static u32 tcm_vhost_sess_get_index(struct se_session *se_sess)
+{
+	return 0;
+}
+
+static int tcm_vhost_write_pending(struct se_cmd *se_cmd)
+{
+	/* Go ahead and process the write immediately */
+	transport_generic_process_write(se_cmd);
+	return 0;
+}
+
+static int tcm_vhost_write_pending_status(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static void tcm_vhost_set_default_node_attrs(struct se_node_acl *nacl)
+{
+	return;
+}
+
+static u32 tcm_vhost_get_task_tag(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static int tcm_vhost_get_cmd_state(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *);
+
+static int tcm_vhost_queue_data_in(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	vhost_scsi_complete_cmd(tv_cmd);
+	return 0;
+}
+
+static int tcm_vhost_queue_status(struct se_cmd *se_cmd)
+{
+	struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+				struct tcm_vhost_cmd, tvc_se_cmd);
+	vhost_scsi_complete_cmd(tv_cmd);
+	return 0;
+}
+
+static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd)
+{
+	return 0;
+}
+
+static u16 tcm_vhost_set_fabric_sense_len(struct se_cmd *se_cmd, u32 sense_length)
+{
+	return 0;
+}
+
+static u16 tcm_vhost_get_fabric_sense_len(void)
+{
+	return 0;
+}
+
+static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+	struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+
+	/* TODO locking against target/backend threads? */
+	transport_generic_free_cmd(se_cmd, 1);
+
+	if (tv_cmd->tvc_sgl_count) {
+		u32 i;
+		for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+			put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+	}
+
+	kfree(tv_cmd);
+}
+
+/* Dequeue a command from the completion list */
+static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(struct vhost_scsi *vs)
+{
+	struct tcm_vhost_cmd *tv_cmd = NULL;
+
+	spin_lock_bh(&vs->vs_completion_lock);
+	if (list_empty(&vs->vs_completion_list)) {
+		spin_unlock_bh(&vs->vs_completion_lock);
+		return NULL;
+	}
+
+	list_for_each_entry(tv_cmd, &vs->vs_completion_list,
+			    tvc_completion_list) {
+		list_del(&tv_cmd->tvc_completion_list);
+		break;
+	}
+	spin_unlock_bh(&vs->vs_completion_lock);
+	return tv_cmd;
+}
+
+/* Fill in status and signal that we are done processing this command
+ *
+ * This is scheduled in the vhost work queue so we are called with the owner
+ * process mm and can access the vring.
+ */
+static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
+{
+	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
+	                                     vs_completion_work);
+	struct tcm_vhost_cmd *tv_cmd;
+
+	while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs)) != NULL) {
+		struct virtio_scsi_cmd_resp v_rsp;
+		struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+		int ret;
+
+		pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__,
+			tv_cmd, se_cmd->residual_count, se_cmd->scsi_status);
+
+		memset(&v_rsp, 0, sizeof(v_rsp));
+		v_rsp.resid = se_cmd->residual_count;
+		/* TODO is status_qualifier field needed? */
+		v_rsp.status = se_cmd->scsi_status;
+		v_rsp.sense_len = se_cmd->scsi_sense_length;
+		memcpy(v_rsp.sense, tv_cmd->tvc_sense_buf,
+		       v_rsp.sense_len);
+		ret = copy_to_user(tv_cmd->tvc_resp, &v_rsp, sizeof(v_rsp));
+		if (likely(ret == 0))
+			vhost_add_used(&vs->vqs[2], tv_cmd->tvc_vq_desc, 0);
+		else
+			pr_err("Faulted on virtio_scsi_cmd_resp\n");
+
+		vhost_scsi_free_cmd(tv_cmd);
+	}
+
+	vhost_signal(&vs->dev, &vs->vqs[2]);
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+	struct vhost_scsi *vs = tv_cmd->tvc_vhost;
+
+	pr_debug("%s tv_cmd %p\n", __func__, tv_cmd);
+
+	spin_lock_bh(&vs->vs_completion_lock);
+	list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
+	spin_unlock_bh(&vs->vs_completion_lock);
+
+	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+}
+
+static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
+	struct tcm_vhost_tpg *tv_tpg,
+	struct virtio_scsi_cmd_req *v_req,
+	u32 exp_data_len,
+	int data_direction)
+{
+	struct tcm_vhost_cmd *tv_cmd;
+	struct tcm_vhost_nexus *tv_nexus;
+	struct se_portal_group *se_tpg = &tv_tpg->se_tpg;
+	struct se_session *se_sess;
+	struct se_cmd *se_cmd;
+	int sam_task_attr;
+
+	tv_nexus = tv_tpg->tpg_nexus;
+	if (!tv_nexus) {
+		pr_err("Unable to locate active struct tcm_vhost_nexus\n");
+		return ERR_PTR(-EIO);
+	}
+	se_sess = tv_nexus->tvn_se_sess;
+
+	tv_cmd = kzalloc(sizeof(struct tcm_vhost_cmd), GFP_ATOMIC);
+	if (!tv_cmd) {
+		pr_err("Unable to allocate struct tcm_vhost_cmd\n");
+		return ERR_PTR(-ENOMEM);
+	}
+	INIT_LIST_HEAD(&tv_cmd->tvc_completion_list);
+	tv_cmd->tvc_tag = v_req->tag;
+
+	se_cmd = &tv_cmd->tvc_se_cmd;
+	/*
+	 * Locate the SAM Task Attr from virtio_scsi_cmd_req
+	 */
+	sam_task_attr = v_req->task_attr;
+	/*
+	 * Initialize struct se_cmd descriptor from target_core_mod infrastructure
+	 */
+	transport_init_se_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess, exp_data_len,
+				data_direction, sam_task_attr,
+				&tv_cmd->tvc_sense_buf[0]);
+
+#if 0	/* FIXME: vhost_scsi_allocate_cmd() BIDI operation */
+	if (bidi)
+		se_cmd->se_cmd_flags |= SCF_BIDI;
+#endif
+	/*
+	 * From here the rest of the se_cmd will be setup and dispatched
+	 * via tcm_vhost_new_cmd_map() from TCM backend thread context
+	 * after transport_generic_handle_cdb_map() has been called from
+	 * vhost_scsi_handle_vq() below..
+	 */
+	return tv_cmd;
+}
+
+/*
+ * Map a user memory range into a scatterlist
+ *
+ * Returns the number of scatterlist entries used or -errno on error.
+ */
+static int vhost_scsi_map_to_sgl(struct scatterlist *sgl,
+		                 unsigned int sgl_count,
+		                 void __user *ptr, size_t len, int write)
+{
+	struct scatterlist *sg = sgl;
+	unsigned int npages = 0;
+	int ret;
+
+	while (len > 0) {
+		struct page *page;
+		unsigned int offset = (uintptr_t)ptr & ~PAGE_MASK;
+		unsigned int nbytes = min(PAGE_SIZE - offset, len);
+
+		if (npages == sgl_count) {
+			ret = -ENOBUFS;
+			goto err;
+		}
+
+		ret = get_user_pages_fast((unsigned long)ptr, 1, write, &page);
+		BUG_ON(ret == 0); /* we should either get our page or fail */
+		if (ret < 0)
+			goto err;
+
+		sg_set_page(sg, page, nbytes, offset);
+		ptr += nbytes;
+		len -= nbytes;
+		sg++;
+		npages++;
+	}
+	return npages;
+
+err:
+	/* Put pages that we hold */
+	for (sg = sgl; sg != &sgl[npages]; sg++)
+		put_page(sg_page(sg));
+	return ret;
+}
+
+static int vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *tv_cmd,
+                                     struct iovec *iov, unsigned int niov,
+				     int write)
+{
+	int ret;
+	unsigned int i;
+	u32 sgl_count;
+	struct scatterlist *sg;
+
+	/*
+	 * Find out how long sglist needs to be
+	 */
+	sgl_count = 0;
+	for (i = 0; i < niov; i++) {
+		sgl_count += (((uintptr_t)iov[i].iov_base + iov[i].iov_len +
+		             PAGE_SIZE - 1) >> PAGE_SHIFT) -
+		             ((uintptr_t)iov[i].iov_base >> PAGE_SHIFT);
+	}
+	/* TODO overflow checking */
+
+	sg = kmalloc(sizeof(tv_cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC);
+	if (!sg)
+		return -ENOMEM;
+	pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__,
+	       sg, sgl_count, IS_ERR(sg));
+	sg_init_table(sg, sgl_count);
+
+	tv_cmd->tvc_sgl = sg;
+	tv_cmd->tvc_sgl_count = sgl_count;
+
+	pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
+	for (i = 0; i < niov; i++) {
+		ret = vhost_scsi_map_to_sgl(sg, sgl_count, iov[i].iov_base,
+		                            iov[i].iov_len, write);
+		if (ret < 0) {
+			for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+				put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+			kfree(tv_cmd->tvc_sgl);
+			tv_cmd->tvc_sgl = NULL;
+			tv_cmd->tvc_sgl_count = 0;
+			return ret;
+		}
+
+		sg += ret;
+		sgl_count -= ret;
+	}
+	return 0;
+}
+
+static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
+{
+	struct vhost_virtqueue *vq = &vs->vqs[2];
+	struct virtio_scsi_cmd_req v_req;
+	struct tcm_vhost_tpg *tv_tpg;
+	struct tcm_vhost_cmd *tv_cmd;
+	u32 exp_data_len, data_first, data_num, data_direction;
+	unsigned out, in, i;
+	int head, ret, lun;
+
+	/* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
+	tv_tpg = vs->vs_tpg;
+	if (unlikely(!tv_tpg)) {
+		pr_err("%s endpoint not set\n", __func__);
+		return;
+	}
+
+	mutex_lock(&vq->mutex);
+	vhost_disable_notify(&vs->dev, vq);
+
+	for (;;) {
+		head = vhost_get_vq_desc(&vs->dev, vq, vq->iov,
+					ARRAY_SIZE(vq->iov), &out, &in,
+					NULL, NULL);
+		pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n", head, out, in);
+		/* On error, stop handling until the next kick. */
+		if (unlikely(head < 0))
+			break;
+		/* Nothing new?  Wait for eventfd to tell us they refilled. */
+		if (head == vq->num) {
+			if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
+				vhost_disable_notify(&vs->dev, vq);
+				continue;
+			}
+			break;
+		}
+
+/* FIXME: BIDI operation */
+		if (out == 1 && in == 1) {
+			data_direction = DMA_NONE;
+			data_first = 0;
+			data_num = 0;
+		} else if (out == 1 && in > 1) {
+			data_direction = DMA_FROM_DEVICE;
+			data_first = out + 1;
+			data_num = in - 1;
+		} else if (out > 1 && in == 1) {
+			data_direction = DMA_TO_DEVICE;
+			data_first = 1;
+			data_num = out - 1;
+		} else {
+			pr_err("Invalid buffer layout out: %u in: %u\n", out, in);
+			break;
+		}
+
+		/*
+		 * Check for a sane resp buffer so we can report errors to
+		 * the guest.
+		 */
+		if (unlikely(vq->iov[out].iov_len !=
+					sizeof(struct virtio_scsi_cmd_resp))) {
+			pr_err("Expecting virtio_scsi_cmd_resp, got %zu bytes\n",
+					vq->iov[out].iov_len);
+			break;
+		}
+
+		if (unlikely(vq->iov[0].iov_len != sizeof(v_req))) {
+			pr_err("Expecting virtio_scsi_cmd_req, got %zu bytes\n",
+					vq->iov[0].iov_len);
+			break;
+		}
+		pr_debug("Calling __copy_from_user: vq->iov[0].iov_base: %p, len: %lu\n",
+				vq->iov[0].iov_base, sizeof(v_req));
+		ret = __copy_from_user(&v_req, vq->iov[0].iov_base, sizeof(v_req));
+		if (unlikely(ret)) {
+			pr_err("Faulted on virtio_scsi_cmd_req\n");
+			break;
+		}
+
+		exp_data_len = 0;
+		for (i = 0; i < data_num; i++) {
+			exp_data_len += vq->iov[data_first + i].iov_len;
+		}
+
+		tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req,
+					exp_data_len, data_direction);
+		if (IS_ERR(tv_cmd)) {
+			pr_err("vhost_scsi_allocate_cmd failed %ld\n", PTR_ERR(tv_cmd));
+			break;
+		}
+		pr_debug("Allocated tv_cmd: %p exp_data_len: %d, data_direction: %d\n",
+				tv_cmd, exp_data_len, data_direction);
+
+		tv_cmd->tvc_vhost = vs;
+
+		if (unlikely(vq->iov[out].iov_len !=
+		             sizeof(struct virtio_scsi_cmd_resp))) {
+			pr_err("Expecting virtio_scsi_cmd_resp, "
+			       " got %zu bytes, out: %d, in: %d\n", vq->iov[out].iov_len, out, in);
+			break;
+		}
+
+		tv_cmd->tvc_resp = vq->iov[out].iov_base;
+
+		/*
+		 * Copy in the recieved CDB descriptor into tv_cmd->tvc_cdb
+		 * that will be used by tcm_vhost_new_cmd_map() and down into
+		 * target_setup_cmd_from_cdb()
+		 */
+		memcpy(tv_cmd->tvc_cdb, v_req.cdb, TCM_VHOST_MAX_CDB_SIZE);
+		/*
+		 * Check that the recieved CDB size does not exceeded our
+		 * hardcoded max for tcm_vhost
+		 */
+		/* TODO what if cdb was too small for varlen cdb header? */
+		if (unlikely(scsi_command_size(tv_cmd->tvc_cdb) > TCM_VHOST_MAX_CDB_SIZE)) {
+			pr_err("Received SCSI CDB with command_size: %d that exceeds"
+				" SCSI_MAX_VARLEN_CDB_SIZE: %d\n",
+				scsi_command_size(tv_cmd->tvc_cdb), TCM_VHOST_MAX_CDB_SIZE);
+			break; /* TODO */
+		}
+		lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+
+		pr_debug("vhost_scsi got command opcode: %#02x, lun: %d\n",
+			tv_cmd->tvc_cdb[0], lun);
+
+		if (data_direction != DMA_NONE) {
+			ret = vhost_scsi_map_iov_to_sgl(tv_cmd, &vq->iov[data_first],
+					data_num, data_direction == DMA_TO_DEVICE);
+			if (unlikely(ret)) {
+				pr_err("Failed to map iov to sgl\n");
+				break; /* TODO */
+			}
+		}
+
+		/*
+		 * Save the descriptor from vhost_get_vq_desc() to be used to
+		 * complete the virtio-scsi request in TCM callback context via
+		 * tcm_vhost_queue_data_in() and tcm_vhost_queue_status()
+		 */
+		tv_cmd->tvc_vq_desc = head;
+		/*
+		 * Locate the struct se_lun pointer based on v_req->lun, and
+		 * attach it to struct se_cmd
+		 */
+		if (transport_lookup_cmd_lun(&tv_cmd->tvc_se_cmd, lun) < 0) {
+			pr_err("Failed to look up lun: %d\n", lun);
+			/* NON_EXISTENT_LUN */
+			transport_send_check_condition_and_sense(&tv_cmd->tvc_se_cmd,
+					tv_cmd->tvc_se_cmd.scsi_sense_reason, 0);
+			continue;
+		}
+		/*
+		 * Now queue up the newly allocated se_cmd to be processed
+		 * within TCM thread context to finish the setup and dispatched
+		 * into a TCM backend struct se_device.
+		 */
+		transport_generic_handle_cdb_map(&tv_cmd->tvc_se_cmd);
+	}
+
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_scsi_ctl_handle_kick(struct vhost_work *work)
+{
+     pr_err("%s: The handling func for control queue.\n", __func__);
+}
+
+static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
+{
+     pr_err("%s: The handling func for event queue.\n", __func__);
+}
+
+static void vhost_scsi_handle_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+						poll.work);
+	struct vhost_scsi *vs = container_of(vq->dev, struct vhost_scsi, dev);
+
+	vhost_scsi_handle_vq(vs);
+}
+
+/*
+ * Called from vhost_scsi_ioctl() context to walk the list of available tcm_vhost_tpg
+ * with an active struct tcm_vhost_nexus
+ */
+static int vhost_scsi_set_endpoint(
+	struct vhost_scsi *vs,
+	struct vhost_vring_target *t)
+{
+	struct tcm_vhost_tport *tv_tport;
+	struct tcm_vhost_tpg *tv_tpg;
+        int index;
+
+	mutex_lock(&vs->dev.mutex);
+	/* Verify that ring has been setup correctly. */
+	for (index = 0; index < vs->dev.nvqs; ++index) {
+		/* Verify that ring has been setup correctly. */
+		if (!vhost_vq_access_ok(&vs->vqs[index])) {
+		        mutex_unlock(&vs->dev.mutex);
+			return -EFAULT;
+		}
+	}
+
+	if (vs->vs_tpg) {
+		mutex_unlock(&vs->dev.mutex);
+		return -EEXIST;
+	}
+	mutex_unlock(&vs->dev.mutex);
+
+	mutex_lock(&tcm_vhost_mutex);
+	list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) {
+		mutex_lock(&tv_tpg->tv_tpg_mutex);
+		if (!tv_tpg->tpg_nexus) {
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			continue;
+		}
+		if (atomic_read(&tv_tpg->tv_tpg_vhost_count)) {
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			continue;
+		}
+		tv_tport = tv_tpg->tport;
+
+		if (!strcmp(tv_tport->tport_name, t->vhost_wwpn) &&
+		    (tv_tpg->tport_tpgt == t->vhost_tpgt)) {
+			atomic_inc(&tv_tpg->tv_tpg_vhost_count);
+			smp_mb__after_atomic_inc();
+			mutex_unlock(&tv_tpg->tv_tpg_mutex);
+			mutex_unlock(&tcm_vhost_mutex);
+
+			mutex_lock(&vs->dev.mutex);
+			vs->vs_tpg = tv_tpg;
+			atomic_inc(&vs->vhost_ref_cnt);
+			smp_mb__after_atomic_inc();
+			mutex_unlock(&vs->dev.mutex);
+			return 0;
+		}
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+	}
+	mutex_unlock(&tcm_vhost_mutex);
+	return -EINVAL;
+}
+
+static int vhost_scsi_clear_endpoint(
+	struct vhost_scsi *vs,
+	struct vhost_vring_target *t)
+{
+	struct tcm_vhost_tport *tv_tport;
+	struct tcm_vhost_tpg *tv_tpg;
+        int index;
+
+	mutex_lock(&vs->dev.mutex);
+	/* Verify that ring has been setup correctly. */
+	for (index = 0; index < vs->dev.nvqs; ++index) {
+		if (!vhost_vq_access_ok(&vs->vqs[index])) {
+		        mutex_unlock(&vs->dev.mutex);
+			return -EFAULT;
+		}
+	}
+
+	if (!vs->vs_tpg) {
+		mutex_unlock(&vs->dev.mutex);
+		return -ENODEV;
+	}
+	tv_tpg = vs->vs_tpg;
+	tv_tport = tv_tpg->tport;
+
+	if (strcmp(tv_tport->tport_name, t->vhost_wwpn) ||
+	    (tv_tpg->tport_tpgt != t->vhost_tpgt)) {
+		mutex_unlock(&vs->dev.mutex);
+		pr_warn("tv_tport->tport_name: %s, tv_tpg->tport_tpgt: %hu"
+			" does not match t->vhost_wwpn: %s, t->vhost_tpgt: %hu\n",
+			tv_tport->tport_name, tv_tpg->tport_tpgt,
+			t->vhost_wwpn, t->vhost_tpgt);
+		return -EINVAL;
+	}
+        atomic_dec(&tv_tpg->tv_tpg_vhost_count);
+	vs->vs_tpg = NULL;
+	mutex_unlock(&vs->dev.mutex);
+
+	return 0;
+}
+
+static int vhost_scsi_open(struct inode *inode, struct file *f)
+{
+	struct vhost_scsi *s;
+	int r;
+
+	s = kzalloc(sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work);
+	INIT_LIST_HEAD(&s->vs_completion_list);
+	spin_lock_init(&s->vs_completion_lock);
+
+	s->vqs[0].handle_kick = vhost_scsi_ctl_handle_kick;
+	s->vqs[1].handle_kick = vhost_scsi_evt_handle_kick;
+	s->vqs[2].handle_kick = vhost_scsi_handle_kick;
+	r = vhost_dev_init(&s->dev, s->vqs, 3);
+	if (r < 0) {
+		kfree(s);
+		return r;
+	}
+
+	f->private_data = s;
+	return 0;
+}
+
+static int vhost_scsi_release(struct inode *inode, struct file *f)
+{
+	struct vhost_scsi *s = f->private_data;
+
+        if (s->vs_tpg && s->vs_tpg->tport) {
+            struct vhost_vring_target backend;
+            memcpy(backend.vhost_wwpn, s->vs_tpg->tport->tport_name, sizeof(backend.vhost_wwpn));
+            backend.vhost_tpgt = s->vs_tpg->tport_tpgt;
+            vhost_scsi_clear_endpoint(s, &backend);
+        }
+
+	vhost_dev_cleanup(&s->dev, false);
+	kfree(s);
+	return 0;
+}
+
+static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
+{
+	if (features & ~VHOST_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vs->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vs->dev)) {
+		mutex_unlock(&vs->dev.mutex);
+		return -EFAULT;
+	}
+	vs->dev.acked_features = features;
+	/* TODO possibly smp_wmb() and flush vqs */
+	mutex_unlock(&vs->dev.mutex);
+	return 0;
+}
+
+static long vhost_scsi_ioctl(struct file *f, unsigned int ioctl,
+				unsigned long arg)
+{
+	struct vhost_scsi *vs = f->private_data;
+	struct vhost_vring_target backend;
+	void __user *argp = (void __user *)arg;
+	u64 __user *featurep = argp;
+	u64 features;
+	int r;
+
+	switch (ioctl) {
+	case VHOST_SCSI_SET_ENDPOINT:
+		if (copy_from_user(&backend, argp, sizeof backend))
+			return -EFAULT;
+
+		return vhost_scsi_set_endpoint(vs, &backend);
+	case VHOST_SCSI_CLEAR_ENDPOINT:
+		if (copy_from_user(&backend, argp, sizeof backend))
+			return -EFAULT;
+
+		return vhost_scsi_clear_endpoint(vs, &backend);
+	case VHOST_GET_FEATURES:
+		features = VHOST_FEATURES;
+		if (copy_to_user(featurep, &features, sizeof features))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, featurep, sizeof features))
+			return -EFAULT;
+		return vhost_scsi_set_features(vs, features);
+	default:
+		mutex_lock(&vs->dev.mutex);
+		r = vhost_dev_ioctl(&vs->dev, ioctl, arg);
+		mutex_unlock(&vs->dev.mutex);
+		return r;
+	}
+}
+
+static const struct file_operations vhost_scsi_fops = {
+	.owner          = THIS_MODULE,
+	.release        = vhost_scsi_release,
+	.unlocked_ioctl = vhost_scsi_ioctl,
+	/* TODO compat ioctl? */
+	.open           = vhost_scsi_open,
+	.llseek		= noop_llseek,
+};
+
+static struct miscdevice vhost_scsi_misc = {
+	MISC_DYNAMIC_MINOR,
+	"vhost-scsi",
+	&vhost_scsi_fops,
+};
+
+static int __init vhost_scsi_register(void)
+{
+	return misc_register(&vhost_scsi_misc);
+}
+
+static int vhost_scsi_deregister(void)
+{
+	return misc_deregister(&vhost_scsi_misc);
+}
+
+static char *tcm_vhost_dump_proto_id(struct tcm_vhost_tport *tport)
+{
+	switch (tport->tport_proto_id) {
+	case SCSI_PROTOCOL_SAS:
+		return "SAS";
+	case SCSI_PROTOCOL_FCP:
+		return "FCP";
+	case SCSI_PROTOCOL_ISCSI:
+		return "iSCSI";
+	default:
+		break;
+	}
+
+	return "Unknown";
+}
+
+static int tcm_vhost_port_link(
+	struct se_portal_group *se_tpg,
+	struct se_lun *lun)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	atomic_inc(&tv_tpg->tv_tpg_port_count);
+	smp_mb__after_atomic_inc();
+
+	return 0;
+}
+
+static void tcm_vhost_port_unlink(
+	struct se_portal_group *se_tpg,
+	struct se_lun *se_lun)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	atomic_dec(&tv_tpg->tv_tpg_port_count);
+	smp_mb__after_atomic_dec();
+}
+
+static struct se_node_acl *tcm_vhost_make_nodeacl(
+	struct se_portal_group *se_tpg,
+	struct config_group *group,
+	const char *name)
+{
+	struct se_node_acl *se_nacl, *se_nacl_new;
+	struct tcm_vhost_nacl *nacl;
+	u64 wwpn = 0;
+	u32 nexus_depth;
+
+	/* tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+		return ERR_PTR(-EINVAL); */
+	se_nacl_new = tcm_vhost_alloc_fabric_acl(se_tpg);
+	if (!se_nacl_new)
+		return ERR_PTR(-ENOMEM);
+//#warning FIXME: Hardcoded nexus depth in tcm_vhost_make_nodeacl()
+	nexus_depth = 1;
+	/*
+	 * se_nacl_new may be released by core_tpg_add_initiator_node_acl()
+	 * when converting a NodeACL from demo mode -> explict
+	 */
+	se_nacl = core_tpg_add_initiator_node_acl(se_tpg, se_nacl_new,
+				name, nexus_depth);
+	if (IS_ERR(se_nacl)) {
+		tcm_vhost_release_fabric_acl(se_tpg, se_nacl_new);
+		return se_nacl;
+	}
+	/*
+	 * Locate our struct tcm_vhost_nacl and set the FC Nport WWPN
+	 */
+	nacl = container_of(se_nacl, struct tcm_vhost_nacl, se_node_acl);
+	nacl->iport_wwpn = wwpn;
+	/* tcm_vhost_format_wwn(&nacl->iport_name[0], TCM_VHOST_NAMELEN, wwpn); */
+
+	return se_nacl;
+}
+
+static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl)
+{
+	struct tcm_vhost_nacl *nacl = container_of(se_acl,
+				struct tcm_vhost_nacl, se_node_acl);
+	core_tpg_del_initiator_node_acl(se_acl->se_tpg, se_acl, 1);
+	kfree(nacl);
+}
+
+static int tcm_vhost_make_nexus(
+	struct tcm_vhost_tpg *tv_tpg,
+	const char *name)
+{
+	struct se_portal_group *se_tpg;
+	struct tcm_vhost_nexus *tv_nexus;
+
+	mutex_lock(&tv_tpg->tv_tpg_mutex);
+	if (tv_tpg->tpg_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_debug("tv_tpg->tpg_nexus already exists\n");
+		return -EEXIST;
+	}
+	se_tpg = &tv_tpg->se_tpg;
+
+	tv_nexus = kzalloc(sizeof(struct tcm_vhost_nexus), GFP_KERNEL);
+	if (!tv_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_err("Unable to allocate struct tcm_vhost_nexus\n");
+		return -ENOMEM;
+	}
+	/*
+	 *  Initialize the struct se_session pointer
+	 */
+	tv_nexus->tvn_se_sess = transport_init_session();
+	if (IS_ERR(tv_nexus->tvn_se_sess)) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		kfree(tv_nexus);
+		return -ENOMEM;
+	}
+	/*
+	 * Since we are running in 'demo mode' this call with generate a
+	 * struct se_node_acl for the tcm_vhost struct se_portal_group with
+	 * the SCSI Initiator port name of the passed configfs group 'name'.
+	 */
+	tv_nexus->tvn_se_sess->se_node_acl = core_tpg_check_initiator_node_acl(
+				se_tpg, (unsigned char *)name);
+	if (!tv_nexus->tvn_se_sess->se_node_acl) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		pr_debug("core_tpg_check_initiator_node_acl() failed"
+				" for %s\n", name);
+		transport_free_session(tv_nexus->tvn_se_sess);
+		kfree(tv_nexus);
+		return -ENOMEM;
+	}
+	/*
+	 * Now register the TCM vHost virtual I_T Nexus as active with the
+	 * call to __transport_register_session()
+	 */
+	__transport_register_session(se_tpg, tv_nexus->tvn_se_sess->se_node_acl,
+			tv_nexus->tvn_se_sess, tv_nexus);
+	tv_tpg->tpg_nexus = tv_nexus;
+
+	mutex_unlock(&tv_tpg->tv_tpg_mutex);
+	return 0;
+}
+
+static int tcm_vhost_drop_nexus(
+	struct tcm_vhost_tpg *tpg)
+{
+	struct se_session *se_sess;
+	struct tcm_vhost_nexus *tv_nexus;
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	tv_nexus = tpg->tpg_nexus;
+	if (!tv_nexus) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+
+	se_sess = tv_nexus->tvn_se_sess;
+	if (!se_sess) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+
+	if (atomic_read(&tpg->tv_tpg_port_count)) {
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		pr_err("Unable to remove TCM_vHost I_T Nexus with"
+			" active TPG port count: %d\n",
+			atomic_read(&tpg->tv_tpg_port_count));
+		return -EPERM;
+	}
+
+	if (atomic_read(&tpg->tv_tpg_vhost_count)) {
+		pr_err("Unable to remove TCM_vHost I_T Nexus with"
+			" active TPG vhost count: %d\n",
+			atomic_read(&tpg->tv_tpg_vhost_count));
+		return -EPERM;
+	}
+
+	pr_debug("TCM_vHost_ConfigFS: Removing I_T Nexus to emulated"
+		" %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport),
+		tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+	/*
+	 * Release the SCSI I_T Nexus to the emulated vHost Target Port
+	 */
+	transport_deregister_session(tv_nexus->tvn_se_sess);
+	tpg->tpg_nexus = NULL;
+	mutex_unlock(&tpg->tv_tpg_mutex);
+
+	kfree(tv_nexus);
+	return 0;
+}
+
+static ssize_t tcm_vhost_tpg_show_nexus(
+	struct se_portal_group *se_tpg,
+	char *page)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_nexus *tv_nexus;
+	ssize_t ret;
+
+	mutex_lock(&tv_tpg->tv_tpg_mutex);
+	tv_nexus = tv_tpg->tpg_nexus;
+	if (!tv_nexus) {
+		mutex_unlock(&tv_tpg->tv_tpg_mutex);
+		return -ENODEV;
+	}
+	ret = snprintf(page, PAGE_SIZE, "%s\n",
+			tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+	mutex_unlock(&tv_tpg->tv_tpg_mutex);
+
+	return ret;
+}
+
+static ssize_t tcm_vhost_tpg_store_nexus(
+	struct se_portal_group *se_tpg,
+	const char *page,
+	size_t count)
+{
+	struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+	struct tcm_vhost_tport *tport_wwn = tv_tpg->tport;
+	unsigned char i_port[TCM_VHOST_NAMELEN], *ptr, *port_ptr;
+	int ret;
+	/*
+	 * Shutdown the active I_T nexus if 'NULL' is passed..
+	 */
+	if (!strncmp(page, "NULL", 4)) {
+		ret = tcm_vhost_drop_nexus(tv_tpg);
+		return (!ret) ? count : ret;
+	}
+	/*
+	 * Otherwise make sure the passed virtual Initiator port WWN matches
+	 * the fabric protocol_id set in tcm_vhost_make_tport(), and call
+	 * tcm_vhost_make_nexus().
+	 */
+	if (strlen(page) > TCM_VHOST_NAMELEN) {
+		pr_err("Emulated NAA Sas Address: %s, exceeds"
+				" max: %d\n", page, TCM_VHOST_NAMELEN);
+		return -EINVAL;
+	}
+	snprintf(&i_port[0], TCM_VHOST_NAMELEN, "%s", page);
+
+	ptr = strstr(i_port, "naa.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_SAS) {
+			pr_err("Passed SAS Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[0];
+		goto check_newline;
+	}
+	ptr = strstr(i_port, "fc.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_FCP) {
+			pr_err("Passed FCP Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[3]; /* Skip over "fc." */
+		goto check_newline;
+	}
+	ptr = strstr(i_port, "iqn.");
+	if (ptr) {
+		if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_ISCSI) {
+			pr_err("Passed iSCSI Initiator Port %s does not"
+				" match target port protoid: %s\n", i_port,
+				tcm_vhost_dump_proto_id(tport_wwn));
+			return -EINVAL;
+		}
+		port_ptr = &i_port[0];
+		goto check_newline;
+	}
+	pr_err("Unable to locate prefix for emulated Initiator Port:"
+			" %s\n", i_port);
+	return -EINVAL;
+	/*
+	 * Clear any trailing newline for the NAA WWN
+	 */
+check_newline:
+	if (i_port[strlen(i_port)-1] == '\n')
+		i_port[strlen(i_port)-1] = '\0';
+
+	ret = tcm_vhost_make_nexus(tv_tpg, port_ptr);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+TF_TPG_BASE_ATTR(tcm_vhost, nexus, S_IRUGO | S_IWUSR);
+
+static struct configfs_attribute *tcm_vhost_tpg_attrs[] = {
+	&tcm_vhost_tpg_nexus.attr,
+	NULL,
+};
+
+static struct se_portal_group *tcm_vhost_make_tpg(
+	struct se_wwn *wwn,
+	struct config_group *group,
+	const char *name)
+{
+	struct tcm_vhost_tport*tport = container_of(wwn,
+			struct tcm_vhost_tport, tport_wwn);
+
+	struct tcm_vhost_tpg *tpg;
+	unsigned long tpgt;
+	int ret;
+
+	if (strstr(name, "tpgt_") != name)
+		return ERR_PTR(-EINVAL);
+	if (strict_strtoul(name + 5, 10, &tpgt) || tpgt > UINT_MAX)
+		return ERR_PTR(-EINVAL);
+
+	tpg = kzalloc(sizeof(struct tcm_vhost_tpg), GFP_KERNEL);
+	if (!tpg) {
+		pr_err("Unable to allocate struct tcm_vhost_tpg");
+		return ERR_PTR(-ENOMEM);
+	}
+	mutex_init(&tpg->tv_tpg_mutex);
+	INIT_LIST_HEAD(&tpg->tv_tpg_list);
+	tpg->tport = tport;
+	tpg->tport_tpgt = tpgt;
+
+	ret = core_tpg_register(&tcm_vhost_fabric_configfs->tf_ops, wwn,
+				&tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
+	if (ret < 0) {
+		kfree(tpg);
+		return NULL;
+	}
+	mutex_lock(&tcm_vhost_mutex);
+	list_add_tail(&tpg->tv_tpg_list, &tcm_vhost_list);
+	mutex_unlock(&tcm_vhost_mutex);
+
+	return &tpg->se_tpg;
+}
+
+static void tcm_vhost_drop_tpg(struct se_portal_group *se_tpg)
+{
+	struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+				struct tcm_vhost_tpg, se_tpg);
+
+	mutex_lock(&tcm_vhost_mutex);
+	list_del(&tpg->tv_tpg_list);
+	mutex_unlock(&tcm_vhost_mutex);
+	/*
+	 * Release the virtual I_T Nexus for this vHost TPG
+	 */
+	tcm_vhost_drop_nexus(tpg);
+	/*
+	 * Deregister the se_tpg from TCM..
+	 */
+	core_tpg_deregister(se_tpg);
+	kfree(tpg);
+}
+
+static struct se_wwn *tcm_vhost_make_tport(
+	struct target_fabric_configfs *tf,
+	struct config_group *group,
+	const char *name)
+{
+	struct tcm_vhost_tport *tport;
+	char *ptr;
+	u64 wwpn = 0;
+	int off = 0;
+
+	/* if (tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+		return ERR_PTR(-EINVAL); */
+
+	tport = kzalloc(sizeof(struct tcm_vhost_tport), GFP_KERNEL);
+	if (!tport) {
+		pr_err("Unable to allocate struct tcm_vhost_tport");
+		return ERR_PTR(-ENOMEM);
+	}
+	tport->tport_wwpn = wwpn;
+	/* tcm_vhost_format_wwn(&tport->tport_name[0], TCM_VHOST__NAMELEN, wwpn); */
+	/*
+	 * Determine the emulated Protocol Identifier and Target Port Name
+	 * based on the incoming configfs directory name.
+	 */
+	ptr = strstr(name, "naa.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_SAS;
+		goto check_len;
+	}
+	ptr = strstr(name, "fc.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_FCP;
+		off = 3; /* Skip over "fc." */
+		goto check_len;
+	}
+	ptr = strstr(name, "iqn.");
+	if (ptr) {
+		tport->tport_proto_id = SCSI_PROTOCOL_ISCSI;
+		goto check_len;
+	}
+
+	pr_err("Unable to locate prefix for emulated Target Port:"
+			" %s\n", name);
+	return ERR_PTR(-EINVAL);
+
+check_len:
+	if (strlen(name) > TCM_VHOST_NAMELEN) {
+		pr_err("Emulated %s Address: %s, exceeds"
+			" max: %d\n", name, tcm_vhost_dump_proto_id(tport),
+			TCM_VHOST_NAMELEN);
+		kfree(tport);
+		return ERR_PTR(-EINVAL);
+	}
+	snprintf(&tport->tport_name[0], TCM_VHOST_NAMELEN, "%s", &name[off]);
+
+	pr_debug("TCM_VHost_ConfigFS: Allocated emulated Target"
+		" %s Address: %s\n", tcm_vhost_dump_proto_id(tport), name);
+
+	return &tport->tport_wwn;
+}
+
+static void tcm_vhost_drop_tport(struct se_wwn *wwn)
+{
+	struct tcm_vhost_tport *tport = container_of(wwn,
+				struct tcm_vhost_tport, tport_wwn);
+
+	pr_debug("TCM_VHost_ConfigFS: Deallocating emulated Target"
+		" %s Address: %s\n", tcm_vhost_dump_proto_id(tport),
+		tport->tport_name);;
+
+	kfree(tport);
+}
+
+static ssize_t tcm_vhost_wwn_show_attr_version(
+	struct target_fabric_configfs *tf,
+	char *page)
+{
+	return sprintf(page, "TCM_VHOST fabric module %s on %s/%s"
+		"on "UTS_RELEASE"\n", TCM_VHOST_VERSION, utsname()->sysname,
+		utsname()->machine);
+}
+
+TF_WWN_ATTR_RO(tcm_vhost, version);
+
+static struct configfs_attribute *tcm_vhost_wwn_attrs[] = {
+	&tcm_vhost_wwn_version.attr,
+	NULL,
+};
+
+static struct target_core_fabric_ops tcm_vhost_ops = {
+	.get_fabric_name		= tcm_vhost_get_fabric_name,
+	.get_fabric_proto_ident		= tcm_vhost_get_fabric_proto_ident,
+	.tpg_get_wwn			= tcm_vhost_get_fabric_wwn,
+	.tpg_get_tag			= tcm_vhost_get_tag,
+	.tpg_get_default_depth		= tcm_vhost_get_default_depth,
+	.tpg_get_pr_transport_id	= tcm_vhost_get_pr_transport_id,
+	.tpg_get_pr_transport_id_len	= tcm_vhost_get_pr_transport_id_len,
+	.tpg_parse_pr_out_transport_id	= tcm_vhost_parse_pr_out_transport_id,
+	.tpg_check_demo_mode		= tcm_vhost_check_true,
+	.tpg_check_demo_mode_cache	= tcm_vhost_check_true,
+	.tpg_check_demo_mode_write_protect = tcm_vhost_check_false,
+	.tpg_check_prod_mode_write_protect = tcm_vhost_check_false,
+	.tpg_alloc_fabric_acl		= tcm_vhost_alloc_fabric_acl,
+	.tpg_release_fabric_acl		= tcm_vhost_release_fabric_acl,
+	.tpg_get_inst_index		= tcm_vhost_tpg_get_inst_index,
+	.new_cmd_map			= tcm_vhost_new_cmd_map,
+	.release_cmd			= tcm_vhost_release_cmd,
+	.shutdown_session		= tcm_vhost_shutdown_session,
+	.close_session			= tcm_vhost_close_session,
+	.sess_get_index			= tcm_vhost_sess_get_index,
+	.sess_get_initiator_sid		= NULL,
+	.write_pending			= tcm_vhost_write_pending,
+	.write_pending_status		= tcm_vhost_write_pending_status,
+	.set_default_node_attributes	= tcm_vhost_set_default_node_attrs,
+	.get_task_tag			= tcm_vhost_get_task_tag,
+	.get_cmd_state			= tcm_vhost_get_cmd_state,
+	.queue_data_in			= tcm_vhost_queue_data_in,
+	.queue_status			= tcm_vhost_queue_status,
+	.queue_tm_rsp			= tcm_vhost_queue_tm_rsp,
+	.get_fabric_sense_len		= tcm_vhost_get_fabric_sense_len,
+	.set_fabric_sense_len		= tcm_vhost_set_fabric_sense_len,
+	/*
+	 * Setup function pointers for generic logic in target_core_fabric_configfs.c
+	 */
+	.fabric_make_wwn		= tcm_vhost_make_tport,
+	.fabric_drop_wwn		= tcm_vhost_drop_tport,
+	.fabric_make_tpg		= tcm_vhost_make_tpg,
+	.fabric_drop_tpg		= tcm_vhost_drop_tpg,
+	.fabric_post_link		= tcm_vhost_port_link,
+	.fabric_pre_unlink		= tcm_vhost_port_unlink,
+	.fabric_make_np			= NULL,
+	.fabric_drop_np			= NULL,
+	.fabric_make_nodeacl		= tcm_vhost_make_nodeacl,
+	.fabric_drop_nodeacl		= tcm_vhost_drop_nodeacl,
+};
+
+static int tcm_vhost_register_configfs(void)
+{
+	struct target_fabric_configfs *fabric;
+	int ret;
+
+	pr_debug("TCM_VHOST fabric module %s on %s/%s"
+		" on "UTS_RELEASE"\n",TCM_VHOST_VERSION, utsname()->sysname,
+		utsname()->machine);
+	/*
+	 * Register the top level struct config_item_type with TCM core
+	 */
+	fabric = target_fabric_configfs_init(THIS_MODULE, "vhost");
+	if (IS_ERR(fabric)) {
+		pr_err("target_fabric_configfs_init() failed\n");
+		return PTR_ERR(fabric);
+	}
+	/*
+	 * Setup fabric->tf_ops from our local tcm_vhost_ops
+	 */
+	fabric->tf_ops = tcm_vhost_ops;
+	/*
+	 * Setup default attribute lists for various fabric->tf_cit_tmpl
+	 */
+	TF_CIT_TMPL(fabric)->tfc_wwn_cit.ct_attrs = tcm_vhost_wwn_attrs;
+	TF_CIT_TMPL(fabric)->tfc_tpg_base_cit.ct_attrs = tcm_vhost_tpg_attrs;
+	TF_CIT_TMPL(fabric)->tfc_tpg_attrib_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_param_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_np_base_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_base_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_attrib_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_auth_cit.ct_attrs = NULL;
+	TF_CIT_TMPL(fabric)->tfc_tpg_nacl_param_cit.ct_attrs = NULL;
+	/*
+	 * Register the fabric for use within TCM
+	 */
+	ret = target_fabric_configfs_register(fabric);
+	if (ret < 0) {
+		pr_err("target_fabric_configfs_register() failed"
+				" for TCM_VHOST\n");
+		return ret;
+	}
+	/*
+	 * Setup our local pointer to *fabric
+	 */
+	tcm_vhost_fabric_configfs = fabric;
+	pr_debug("TCM_VHOST[0] - Set fabric -> tcm_vhost_fabric_configfs\n");
+	return 0;
+};
+
+static void tcm_vhost_deregister_configfs(void)
+{
+	if (!tcm_vhost_fabric_configfs)
+		return;
+
+	target_fabric_configfs_deregister(tcm_vhost_fabric_configfs);
+	tcm_vhost_fabric_configfs = NULL;
+	pr_debug("TCM_VHOST[0] - Cleared tcm_vhost_fabric_configfs\n");
+};
+
+static int __init tcm_vhost_init(void)
+{
+	int ret;
+
+	ret = vhost_scsi_register();
+	if (ret < 0)
+		return ret;
+
+	ret = tcm_vhost_register_configfs();
+	if (ret < 0)
+		return ret;
+
+	return 0;
+};
+
+static void tcm_vhost_exit(void)
+{
+	tcm_vhost_deregister_configfs();
+	vhost_scsi_deregister();
+};
+
+MODULE_DESCRIPTION("TCM_VHOST series fabric driver");
+MODULE_LICENSE("GPL");
+module_init(tcm_vhost_init);
+module_exit(tcm_vhost_exit);
diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h
new file mode 100644
index 0000000..0e8951b
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.h
@@ -0,0 +1,70 @@
+#define TCM_VHOST_VERSION  "v0.1"
+#define TCM_VHOST_NAMELEN 256
+#define TCM_VHOST_MAX_CDB_SIZE 32
+
+struct tcm_vhost_cmd {
+	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
+	int tvc_vq_desc;
+	/* The Tag from include/linux/virtio_scsi.h:struct virtio_scsi_cmd_req */
+	u64 tvc_tag;
+	/* The number of scatterlists associated with this cmd */
+	u32 tvc_sgl_count;
+	/* Pointer to the SGL formatted memory from virtio-scsi */
+	struct scatterlist *tvc_sgl;
+	/* Pointer to response */
+	struct virtio_scsi_cmd_resp __user *tvc_resp;
+	/* Pointer to vhost_scsi for our device */
+	struct vhost_scsi *tvc_vhost;
+	 /* The TCM I/O descriptor that is accessed via container_of() */
+	struct se_cmd tvc_se_cmd;
+	/* Copy of the incoming SCSI command descriptor block (CDB) */
+	unsigned char tvc_cdb[TCM_VHOST_MAX_CDB_SIZE];
+	/* Sense buffer that will be mapped into outgoing status */
+	unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER];
+	/* Completed commands list, serviced from vhost worker thread */
+	struct list_head tvc_completion_list;
+};
+
+struct tcm_vhost_nexus {
+	/* Pointer to TCM session for I_T Nexus */
+	struct se_session *tvn_se_sess;
+};
+
+struct tcm_vhost_nacl {
+	/* Binary World Wide unique Port Name for Vhost Initiator port */
+	u64 iport_wwpn;
+	/* ASCII formatted WWPN for Sas Initiator port */
+	char iport_name[TCM_VHOST_NAMELEN];
+	/* Returned by tcm_vhost_make_nodeacl() */
+	struct se_node_acl se_node_acl;
+};
+
+struct tcm_vhost_tpg {
+	/* Vhost port target portal group tag for TCM */
+	u16 tport_tpgt;
+	/* Used to track number of TPG Port/Lun Links wrt to explict I_T Nexus shutdown */
+	atomic_t tv_tpg_port_count;
+	/* Used for vhost_scsi device reference to tpg_nexus */
+	atomic_t tv_tpg_vhost_count;
+	/* list for tcm_vhost_list */
+	struct list_head tv_tpg_list;
+	/* Used to protect access for tpg_nexus */
+	struct mutex tv_tpg_mutex;
+	/* Pointer to the TCM VHost I_T Nexus for this TPG endpoint */
+	struct tcm_vhost_nexus *tpg_nexus;
+	/* Pointer back to tcm_vhost_tport */
+	struct tcm_vhost_tport *tport;
+	/* Returned by tcm_vhost_make_tpg() */
+	struct se_portal_group se_tpg;
+};
+
+struct tcm_vhost_tport {
+	/* SCSI protocol the tport is providing */
+	u8 tport_proto_id;
+	/* Binary World Wide unique Port Name for Vhost Target port */
+	u64 tport_wwpn;
+	/* ASCII formatted WWPN for Vhost Target port */
+	char tport_name[TCM_VHOST_NAMELEN];
+	/* Returned by tcm_vhost_make_tport() */
+	struct se_wwn tport_wwn;
+};
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (7 preceding siblings ...)
  2012-07-04  4:24 ` [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04 14:50   ` Paolo Bonzini
  2012-07-04  4:24 ` [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs Nicholas A. Bellinger
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch changes virtio-scsi to use a new virtio_driver->scan() callback
so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
operation, instead of from within virtscsi_probe().

This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur.  This fixes a bug
with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.

Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/virtio_scsi.c |   15 ++++++++++++---
 drivers/virtio/virtio.c    |    5 ++++-
 include/linux/virtio.h     |    1 +
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 1b38431..391b30d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
 	err = scsi_add_host(shost, &vdev->dev);
 	if (err)
 		goto scsi_add_host_failed;
-
-	scsi_scan_host(shost);
-
+	/*
+	 * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
+	 * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
+	 */
 	return 0;
 
 scsi_add_host_failed:
@@ -493,6 +494,13 @@ virtscsi_init_failed:
 	return err;
 }
 
+static void virtscsi_scan(struct virtio_device *vdev)
+{
+	struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv;
+
+	scsi_scan_host(shost);
+}
+
 static void virtscsi_remove_vqs(struct virtio_device *vdev)
 {
 	/* Stop all the virtqueues. */
@@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = {
 	.driver.owner = THIS_MODULE,
 	.id_table = id_table,
 	.probe = virtscsi_probe,
+	.scan = virtscsi_scan,
 #ifdef CONFIG_PM
 	.freeze = virtscsi_freeze,
 	.restore = virtscsi_restore,
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f355807..c3b3f7f 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d)
 	err = drv->probe(dev);
 	if (err)
 		add_status(dev, VIRTIO_CONFIG_S_FAILED);
-	else
+	else {
 		add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+		if (drv->scan)
+			drv->scan(dev);
+	}
 
 	return err;
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8efd28a..a1ba8bb 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -92,6 +92,7 @@ struct virtio_driver {
 	const unsigned int *feature_table;
 	unsigned int feature_table_size;
 	int (*probe)(struct virtio_device *dev);
+	void (*scan)(struct virtio_device *dev);
 	void (*remove)(struct virtio_device *dev);
 	void (*config_changed)(struct virtio_device *dev);
 #ifdef CONFIG_PM
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (6 preceding siblings ...)
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04  4:24 ` Nicholas A. Bellinger
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Christoph Hellwig

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch changes virtio-scsi to use a new virtio_driver->scan() callback
so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
operation, instead of from within virtscsi_probe().

This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur.  This fixes a bug
with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.

Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/virtio_scsi.c |   15 ++++++++++++---
 drivers/virtio/virtio.c    |    5 ++++-
 include/linux/virtio.h     |    1 +
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 1b38431..391b30d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
 	err = scsi_add_host(shost, &vdev->dev);
 	if (err)
 		goto scsi_add_host_failed;
-
-	scsi_scan_host(shost);
-
+	/*
+	 * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
+	 * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
+	 */
 	return 0;
 
 scsi_add_host_failed:
@@ -493,6 +494,13 @@ virtscsi_init_failed:
 	return err;
 }
 
+static void virtscsi_scan(struct virtio_device *vdev)
+{
+	struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv;
+
+	scsi_scan_host(shost);
+}
+
 static void virtscsi_remove_vqs(struct virtio_device *vdev)
 {
 	/* Stop all the virtqueues. */
@@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = {
 	.driver.owner = THIS_MODULE,
 	.id_table = id_table,
 	.probe = virtscsi_probe,
+	.scan = virtscsi_scan,
 #ifdef CONFIG_PM
 	.freeze = virtscsi_freeze,
 	.restore = virtscsi_restore,
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f355807..c3b3f7f 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d)
 	err = drv->probe(dev);
 	if (err)
 		add_status(dev, VIRTIO_CONFIG_S_FAILED);
-	else
+	else {
 		add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+		if (drv->scan)
+			drv->scan(dev);
+	}
 
 	return err;
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8efd28a..a1ba8bb 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -92,6 +92,7 @@ struct virtio_driver {
 	const unsigned int *feature_table;
 	unsigned int feature_table_size;
 	int (*probe)(struct virtio_device *dev);
+	void (*scan)(struct virtio_device *dev);
 	void (*remove)(struct virtio_device *dev);
 	void (*config_changed)(struct virtio_device *dev);
 #ifdef CONFIG_PM
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (8 preceding siblings ...)
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04 14:50   ` Paolo Bonzini
  2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04 14:02 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Michael S. Tsirkin
  11 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This is currently required for connecting to tcm_vhost in order to prevent
the client LUN scan from detecting the same tcm_vhost WWPN on multiple target
IDs.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/virtio_scsi.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 391b30d..8711951 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
 	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
 	shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
 	shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
-	shost->max_id = virtscsi_config_get(vdev, max_target) + 1;
+	/*
+	 * Currently required for tcm_vhost to function..
+	 */
+	shost->max_id = 1;
 	shost->max_channel = 0;
 	shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
 	err = scsi_add_host(shost, &vdev->dev);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (9 preceding siblings ...)
  2012-07-04  4:24 ` [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs Nicholas A. Bellinger
@ 2012-07-04  4:24 ` Nicholas A. Bellinger
  2012-07-04 14:02 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Michael S. Tsirkin
  11 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-04  4:24 UTC (permalink / raw)
  To: target-devel
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
	Christoph Hellwig

From: Nicholas Bellinger <nab@linux-iscsi.org>

This is currently required for connecting to tcm_vhost in order to prevent
the client LUN scan from detecting the same tcm_vhost WWPN on multiple target
IDs.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/virtio_scsi.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 391b30d..8711951 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
 	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
 	shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
 	shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
-	shost->max_id = virtscsi_config_get(vdev, max_target) + 1;
+	/*
+	 * Currently required for tcm_vhost to function..
+	 */
+	shost->max_id = 1;
 	shost->max_channel = 0;
 	shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
 	err = scsi_add_host(shost, &vdev->dev);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/6] vhost: Separate vhost-net features from vhost features
  2012-07-04  4:24 ` [PATCH 1/6] vhost: Separate vhost-net features from vhost features Nicholas A. Bellinger
@ 2012-07-04  4:41   ` Asias He
  0 siblings, 0 replies; 57+ messages in thread
From: Asias He @ 2012-07-04  4:41 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi,
	Paolo Bonzini, lf-virt, Christoph Hellwig

On 07/04/2012 12:24 PM, Nicholas A. Bellinger wrote:
> From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>
> In order for other vhost devices to use the VHOST_FEATURES bits the
> vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
> constant.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>

I think you need to change drivers/vhost/test.c as well.

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 3de00d9..91d6f06 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -261,14 +261,14 @@ static long vhost_test_ioctl(struct file *f, 
unsigned int ioctl,
                         return -EFAULT;
                 return vhost_test_run(n, test);
         case VHOST_GET_FEATURES:
-               features = VHOST_FEATURES;
+               features = VHOST_NET_FEATURES;
                 if (copy_to_user(featurep, &features, sizeof features))
                         return -EFAULT;
                 return 0;
         case VHOST_SET_FEATURES:
                 if (copy_from_user(&features, featurep, sizeof features))
                         return -EFAULT;
-               if (features & ~VHOST_FEATURES)
+               if (features & ~VHOST_NET_FEATURES)
                         return -EOPNOTSUPP;
                 return vhost_test_set_features(n, features);
         case VHOST_RESET_OWNER:


> ---
>   drivers/vhost/net.c   |    4 ++--
>   drivers/vhost/vhost.h |    3 ++-
>   2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index f82a739..072cbba 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   			return -EFAULT;
>   		return vhost_net_set_backend(n, backend.index, backend.fd);
>   	case VHOST_GET_FEATURES:
> -		features = VHOST_FEATURES;
> +		features = VHOST_NET_FEATURES;
>   		if (copy_to_user(featurep, &features, sizeof features))
>   			return -EFAULT;
>   		return 0;
>   	case VHOST_SET_FEATURES:
>   		if (copy_from_user(&features, featurep, sizeof features))
>   			return -EFAULT;
> -		if (features & ~VHOST_FEATURES)
> +		if (features & ~VHOST_NET_FEATURES)
>   			return -EOPNOTSUPP;
>   		return vhost_net_set_features(n, features);
>   	case VHOST_RESET_OWNER:
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 8de1fd5..07b9763 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -201,7 +201,8 @@ enum {
>   	VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
>   			 (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
>   			 (1ULL << VIRTIO_RING_F_EVENT_IDX) |
> -			 (1ULL << VHOST_F_LOG_ALL) |
> +			 (1ULL << VHOST_F_LOG_ALL),
> +	VHOST_NET_FEATURES = VHOST_FEATURES |
>   			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
>   			 (1ULL << VIRTIO_NET_F_MRG_RXBUF),
>   };
>


-- 
Asias

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
                   ` (10 preceding siblings ...)
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04 14:02 ` Michael S. Tsirkin
  2012-07-04 14:52   ` Paolo Bonzini
  11 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 14:02 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Zhi Yong Wu,
	Anthony Liguori, target-devel, linux-scsi, Paolo Bonzini,
	lf-virt, Christoph Hellwig

On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> Hi folks,
> 
> This series contains patches required to update tcm_vhost <-> virtio-scsi
> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
> available on top of target-pending/auto-next here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> 
> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> functioning, along a virtio-scsi LUN scanning change to address a client bug
> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> source + header file that is now living under /drivers/vhost/, along with latest
> tcm_vhost changes from Zhi's tcm_vhost tree.
> 
> Here are a couple of screenshots of the code in action using raw IBLOCK
> backends provided by FusionIO ioDrive Duo:
> 
>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> 
> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.

OK so this is an RFC, not for merge yet?

> 
> Please have a look vhost + virtio-scsi folks (mst + paolo CC'ed) and let us
> know if you have any concerns.
> 
> Thanks!
> 
> --nab
> Nicholas Bellinger (4):
>   vhost: Add vhost_scsi specific defines
>   tcm_vhost: Initial merge for vhost level target fabric driver
>   virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN
>     scanning
>   virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
> 
> Stefan Hajnoczi (2):
>   vhost: Separate vhost-net features from vhost features
>   vhost: make vhost work queue visible
> 
>  drivers/scsi/virtio_scsi.c |   20 +-
>  drivers/vhost/Kconfig      |    6 +
>  drivers/vhost/Makefile     |    1 +
>  drivers/vhost/net.c        |    4 +-
>  drivers/vhost/tcm_vhost.c  | 1592 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/vhost/tcm_vhost.h  |   70 ++
>  drivers/vhost/vhost.c      |    5 +-
>  drivers/vhost/vhost.h      |    6 +-
>  drivers/virtio/virtio.c    |    5 +-
>  include/linux/vhost.h      |    9 +
>  include/linux/virtio.h     |    1 +
>  11 files changed, 1708 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/vhost/tcm_vhost.c
>  create mode 100644 drivers/vhost/tcm_vhost.h
> 
> -- 
> 1.7.2.5

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
  2012-07-04  4:24 ` Nicholas A. Bellinger
@ 2012-07-04 14:50   ` Paolo Bonzini
  0 siblings, 0 replies; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-04 14:50 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi, lf-virt,
	Christoph Hellwig

Il 04/07/2012 06:24, Nicholas A. Bellinger ha scritto:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> This patch changes virtio-scsi to use a new virtio_driver->scan() callback
> so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
> set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
> operation, instead of from within virtscsi_probe().
> 
> This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
> virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
> had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur.  This fixes a bug
> with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.
> 
> Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Please send this independently.  I think we also want it in stable@vger?

Paolo

> ---
>  drivers/scsi/virtio_scsi.c |   15 ++++++++++++---
>  drivers/virtio/virtio.c    |    5 ++++-
>  include/linux/virtio.h     |    1 +
>  3 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 1b38431..391b30d 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
>  	err = scsi_add_host(shost, &vdev->dev);
>  	if (err)
>  		goto scsi_add_host_failed;
> -
> -	scsi_scan_host(shost);
> -
> +	/*
> +	 * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
> +	 * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
> +	 */
>  	return 0;
>  
>  scsi_add_host_failed:
> @@ -493,6 +494,13 @@ virtscsi_init_failed:
>  	return err;
>  }
>  
> +static void virtscsi_scan(struct virtio_device *vdev)
> +{
> +	struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv;
> +
> +	scsi_scan_host(shost);
> +}
> +
>  static void virtscsi_remove_vqs(struct virtio_device *vdev)
>  {
>  	/* Stop all the virtqueues. */
> @@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = {
>  	.driver.owner = THIS_MODULE,
>  	.id_table = id_table,
>  	.probe = virtscsi_probe,
> +	.scan = virtscsi_scan,
>  #ifdef CONFIG_PM
>  	.freeze = virtscsi_freeze,
>  	.restore = virtscsi_restore,
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index f355807..c3b3f7f 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d)
>  	err = drv->probe(dev);
>  	if (err)
>  		add_status(dev, VIRTIO_CONFIG_S_FAILED);
> -	else
> +	else {
>  		add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +		if (drv->scan)
> +			drv->scan(dev);
> +	}
>  
>  	return err;
>  }
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 8efd28a..a1ba8bb 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -92,6 +92,7 @@ struct virtio_driver {
>  	const unsigned int *feature_table;
>  	unsigned int feature_table_size;
>  	int (*probe)(struct virtio_device *dev);
> +	void (*scan)(struct virtio_device *dev);
>  	void (*remove)(struct virtio_device *dev);
>  	void (*config_changed)(struct virtio_device *dev);
>  #ifdef CONFIG_PM
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
  2012-07-04  4:24 ` [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs Nicholas A. Bellinger
@ 2012-07-04 14:50   ` Paolo Bonzini
  2012-07-05  2:05     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-04 14:50 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi, lf-virt,
	Christoph Hellwig

Il 04/07/2012 06:24, Nicholas A. Bellinger ha scritto:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> This is currently required for connecting to tcm_vhost in order to prevent
> the client LUN scan from detecting the same tcm_vhost WWPN on multiple target
> IDs.

But that's what the config field is for... why can't tcm_vhost (or QEMU)
set max_id to 0?

Paolo

> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/virtio_scsi.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 391b30d..8711951 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
>  	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
>  	shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
>  	shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
> -	shost->max_id = virtscsi_config_get(vdev, max_target) + 1;
> +	/*
> +	 * Currently required for tcm_vhost to function..
> +	 */
> +	shost->max_id = 1;
>  	shost->max_channel = 0;
>  	shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
>  	err = scsi_add_host(shost, &vdev->dev);
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 14:02 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Michael S. Tsirkin
@ 2012-07-04 14:52   ` Paolo Bonzini
  2012-07-04 15:05     ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-04 14:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, linux-scsi, Zhi Yong Wu, Christoph Hellwig

Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
>> From: Nicholas Bellinger <nab@linux-iscsi.org>
>>
>> Hi folks,
>>
>> This series contains patches required to update tcm_vhost <-> virtio-scsi
>> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
>> available on top of target-pending/auto-next here:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
>>
>> This includes the necessary vhost changes from Stefan to to get tcm_vhost
>> functioning, along a virtio-scsi LUN scanning change to address a client bug
>> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
>> source + header file that is now living under /drivers/vhost/, along with latest
>> tcm_vhost changes from Zhi's tcm_vhost tree.
>>
>> Here are a couple of screenshots of the code in action using raw IBLOCK
>> backends provided by FusionIO ioDrive Duo:
>>
>>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
>>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
>>
>> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
>> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
>> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> 
> OK so this is an RFC, not for merge yet?

Patch 6 definitely looks RFCish, but patch 5 should go in anyway.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 14:52   ` Paolo Bonzini
@ 2012-07-04 15:05     ` Michael S. Tsirkin
  2012-07-04 22:12       ` Anthony Liguori
                         ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 15:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, linux-scsi, Zhi Yong Wu, Christoph Hellwig

On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> > On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> >> From: Nicholas Bellinger <nab@linux-iscsi.org>
> >>
> >> Hi folks,
> >>
> >> This series contains patches required to update tcm_vhost <-> virtio-scsi
> >> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
> >> available on top of target-pending/auto-next here:
> >>
> >>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> >>
> >> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> >> functioning, along a virtio-scsi LUN scanning change to address a client bug
> >> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> >> source + header file that is now living under /drivers/vhost/, along with latest
> >> tcm_vhost changes from Zhi's tcm_vhost tree.
> >>
> >> Here are a couple of screenshots of the code in action using raw IBLOCK
> >> backends provided by FusionIO ioDrive Duo:
> >>
> >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> >>
> >> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> >> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> > 
> > OK so this is an RFC, not for merge yet?
> 
> Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> 
> Paolo

I was talking about 4/6 first of all.
Anyway, it's best to split, not to mix RFCs and fixes.

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 15:05     ` Michael S. Tsirkin
@ 2012-07-04 22:12       ` Anthony Liguori
  2012-07-05  1:52         ` Nicholas A. Bellinger
  2012-07-05  2:01       ` Nicholas A. Bellinger
  2012-07-05  2:01       ` Nicholas A. Bellinger
  2 siblings, 1 reply; 57+ messages in thread
From: Anthony Liguori @ 2012-07-04 22:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, linux-scsi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On 07/04/2012 10:05 AM, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
>> Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
>>> On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
>>>> From: Nicholas Bellinger<nab@linux-iscsi.org>
>>>>
>>>> Hi folks,
>>>>
>>>> This series contains patches required to update tcm_vhost<->  virtio-scsi
>>>> connected hosts<->  guests to run on v3.5-rc2 mainline code.  This series is
>>>> available on top of target-pending/auto-next here:
>>>>
>>>>     git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
>>>>
>>>> This includes the necessary vhost changes from Stefan to to get tcm_vhost
>>>> functioning, along a virtio-scsi LUN scanning change to address a client bug
>>>> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
>>>> source + header file that is now living under /drivers/vhost/, along with latest
>>>> tcm_vhost changes from Zhi's tcm_vhost tree.
>>>>
>>>> Here are a couple of screenshots of the code in action using raw IBLOCK
>>>> backends provided by FusionIO ioDrive Duo:
>>>>
>>>>     http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
>>>>     http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
>>>>
>>>> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
>>>> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
>>>> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
>>>
>>> OK so this is an RFC, not for merge yet?
>>
>> Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
>>
>> Paolo
>
> I was talking about 4/6 first of all.
> Anyway, it's best to split, not to mix RFCs and fixes.

What is the use-case that we're targeting for this?

I certainly think it's fine to merge this into the kernel...  maybe something 
will use it.  But I'm pretty opposed to taking support for this into QEMU.  It's 
going to create more problems than it solves specifically because I have no idea 
what problem it actually solves.

We cannot avoid doing better SCSI emulation in QEMU.  That cannot be a long term 
strategy on our part and vhost-scsi isn't going to solve that problem for us.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 22:12       ` Anthony Liguori
@ 2012-07-05  1:52         ` Nicholas A. Bellinger
  2012-07-05 10:22           ` Paolo Bonzini
                             ` (4 more replies)
  0 siblings, 5 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-05  1:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jens Axboe, linux-scsi, kvm-devel, Michael S. Tsirkin, lf-virt,
	Anthony Liguori, target-devel, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

Hi Anthony & Co,

On Wed, 2012-07-04 at 17:12 -0500, Anthony Liguori wrote:
> On 07/04/2012 10:05 AM, Michael S. Tsirkin wrote:
> > On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> >> Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> >>> On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> >>>> From: Nicholas Bellinger<nab@linux-iscsi.org>
> >>>>
> >>>> Hi folks,
> >>>>
> >>>> This series contains patches required to update tcm_vhost<->  virtio-scsi
> >>>> connected hosts<->  guests to run on v3.5-rc2 mainline code.  This series is
> >>>> available on top of target-pending/auto-next here:
> >>>>
> >>>>     git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> >>>>
> >>>> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> >>>> functioning, along a virtio-scsi LUN scanning change to address a client bug
> >>>> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> >>>> source + header file that is now living under /drivers/vhost/, along with latest
> >>>> tcm_vhost changes from Zhi's tcm_vhost tree.
> >>>>
> >>>> Here are a couple of screenshots of the code in action using raw IBLOCK
> >>>> backends provided by FusionIO ioDrive Duo:
> >>>>
> >>>>     http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> >>>>     http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> >>>>
> >>>> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> >>>> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> >>>> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> >>>
> >>> OK so this is an RFC, not for merge yet?
> >>
> >> Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> >>
> >> Paolo
> >
> > I was talking about 4/6 first of all.
> > Anyway, it's best to split, not to mix RFCs and fixes.
> 
> What is the use-case that we're targeting for this?
> 

The first use case is high performance small block random IO access into
KVM guest from IBLOCK/FILEIO + pSCSI passthrough backends.  (see below)

The second use case is shared storage access across multiple KVM guests
using TCM level SPC-3 persistent reservations + ALUA multipath logic.

The third use case is future DIF support within virtio-scsi supported
guests that we connect directly to tcm_vhost.

> I certainly think it's fine to merge this into the kernel...  maybe something 
> will use it.  But I'm pretty opposed to taking support for this into QEMU.  It's 
> going to create more problems than it solves specifically because I have no idea 
> what problem it actually solves.
> 

To give an idea of how things are looking on the performance side, here
some initial numbers for small block (4k) mixed random IOPs using the
following fio test setup:

[randrw]
rw=randrw
rwmixwrite=25
rwmixread=75
size=131072m
ioengine=libaio
direct=1
iodepth=64
blocksize=4k
filename=/dev/sdb

The backend is a single iomemory_vsl (FusionIO) raw flash block_device
using IBLOCK w/ emulate_write_cache=1 set.  Also note the noop scheduler
has been set with virtio-scsi LUNs.  Here are the QEMU cli opts for both
cases:

./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 2048 -serial
file:/tmp/vhost-serial.txt -hda debian_squeeze_amd64_standard-old.qcow2
-vhost-scsi id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device
virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off

./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 4 -m 2048 -serial
file:/tmp/vhost-serial.txt -hda debian_squeeze_amd64_standard-old.qcow2
-drive file=/dev/fioa,format=raw,if=none,id=sdb,cache=none,aio=native
-device virtio-scsi-pci,id=mcbus -device scsi-disk,drive=sdb


fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
------------------------------------------------------------------------------------
25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K


In the first case, virtio-scsi+tcm_vhost is out performing by 3x
compared to virtio-scsi-raw using QEMU SCSI emulation with the same raw
flash backend device.  For the second case heavier WRITE case, tcm_vhost
is nearing full bare-metal utilization (~55K vs. ~60K).

Also converting tcm_vhost to use proper cmwq process context I/O
submission will help to get even closer to bare metal speeds for both
work-loads.

> We cannot avoid doing better SCSI emulation in QEMU.  That cannot be a long term 
> strategy on our part and vhost-scsi isn't going to solve that problem for us.
> 

Yes, QEMU needs a sane level of host OS independent functional SCSI
emulation, I don't think that is the interesting point up for debate
here..  ;)

I think performance wise it's now pretty clear that vhost is
outperforming QEMU block with virtio-scsi for intestive small block
randrw workloads.  When connected to raw block flash backends where we
avoid the SCSI LLD bottleneck for small block random I/O on the KVM host
all-together, the difference between the two case is even larger based
upon these initial benchmarks.

Thanks for your comments!

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 15:05     ` Michael S. Tsirkin
  2012-07-04 22:12       ` Anthony Liguori
  2012-07-05  2:01       ` Nicholas A. Bellinger
@ 2012-07-05  2:01       ` Nicholas A. Bellinger
  2012-07-05  9:31         ` Michael S. Tsirkin
  2 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-05  2:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, target-devel, linux-scsi, lf-virt, kvm-devel,
	Stefan Hajnoczi, Zhi Yong Wu, Anthony Liguori, Christoph Hellwig,
	Jens Axboe, Hannes Reinecke

On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> > Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> > > On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> > >> From: Nicholas Bellinger <nab@linux-iscsi.org>
> > >>
> > >> Hi folks,
> > >>
> > >> This series contains patches required to update tcm_vhost <-> virtio-scsi
> > >> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
> > >> available on top of target-pending/auto-next here:
> > >>
> > >>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> > >>
> > >> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> > >> functioning, along a virtio-scsi LUN scanning change to address a client bug
> > >> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> > >> source + header file that is now living under /drivers/vhost/, along with latest
> > >> tcm_vhost changes from Zhi's tcm_vhost tree.
> > >>
> > >> Here are a couple of screenshots of the code in action using raw IBLOCK
> > >> backends provided by FusionIO ioDrive Duo:
> > >>
> > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> > >>
> > >> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> > >> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> > >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> > > 
> > > OK so this is an RFC, not for merge yet?
> > 
> > Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> > 
> > Paolo
> 
> I was talking about 4/6 first of all.

So yeah, this code is still considered RFC at this point for-3.6, but
I'd like to get this into target-pending/for-next in next week for more
feedback and start collecting signoffs for the necessary pieces that
effect existing vhost code.

By that time the cmwq conversion of tcm_vhost should be in place as
well..

> Anyway, it's best to split, not to mix RFCs and fixes.
> 

<nod>, I'll send patch #5 separately to linux-scsi -> James and CC
stable following Paolo's request.

Thanks!

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-04 15:05     ` Michael S. Tsirkin
  2012-07-04 22:12       ` Anthony Liguori
@ 2012-07-05  2:01       ` Nicholas A. Bellinger
  2012-07-05  2:01       ` Nicholas A. Bellinger
  2 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-05  2:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, linux-scsi, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig

On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> > Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> > > On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> > >> From: Nicholas Bellinger <nab@linux-iscsi.org>
> > >>
> > >> Hi folks,
> > >>
> > >> This series contains patches required to update tcm_vhost <-> virtio-scsi
> > >> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
> > >> available on top of target-pending/auto-next here:
> > >>
> > >>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> > >>
> > >> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> > >> functioning, along a virtio-scsi LUN scanning change to address a client bug
> > >> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> > >> source + header file that is now living under /drivers/vhost/, along with latest
> > >> tcm_vhost changes from Zhi's tcm_vhost tree.
> > >>
> > >> Here are a couple of screenshots of the code in action using raw IBLOCK
> > >> backends provided by FusionIO ioDrive Duo:
> > >>
> > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> > >>
> > >> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> > >> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> > >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> > > 
> > > OK so this is an RFC, not for merge yet?
> > 
> > Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> > 
> > Paolo
> 
> I was talking about 4/6 first of all.

So yeah, this code is still considered RFC at this point for-3.6, but
I'd like to get this into target-pending/for-next in next week for more
feedback and start collecting signoffs for the necessary pieces that
effect existing vhost code.

By that time the cmwq conversion of tcm_vhost should be in place as
well..

> Anyway, it's best to split, not to mix RFCs and fixes.
> 

<nod>, I'll send patch #5 separately to linux-scsi -> James and CC
stable following Paolo's request.

Thanks!

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
  2012-07-04 14:50   ` Paolo Bonzini
@ 2012-07-05  2:05     ` Nicholas A. Bellinger
  2012-07-05  6:42       ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-05  2:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi, lf-virt,
	Christoph Hellwig

On Wed, 2012-07-04 at 16:50 +0200, Paolo Bonzini wrote:
> Il 04/07/2012 06:24, Nicholas A. Bellinger ha scritto:
> > From: Nicholas Bellinger <nab@linux-iscsi.org>
> > 
> > This is currently required for connecting to tcm_vhost in order to prevent
> > the client LUN scan from detecting the same tcm_vhost WWPN on multiple target
> > IDs.
> 
> But that's what the config field is for... why can't tcm_vhost (or QEMU)
> set max_id to 0?
> 

So this patch was carried forward from Stefan's original code that I
thought was required due to other limitations..

If that's not the case anymore I'm happy to drop it for now and look
into a proper fix outside of virtio-scsi.

> Paolo
> 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> > Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > ---
> >  drivers/scsi/virtio_scsi.c |    5 ++++-
> >  1 files changed, 4 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> > index 391b30d..8711951 100644
> > --- a/drivers/scsi/virtio_scsi.c
> > +++ b/drivers/scsi/virtio_scsi.c
> > @@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
> >  	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
> >  	shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
> >  	shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
> > -	shost->max_id = virtscsi_config_get(vdev, max_target) + 1;
> > +	/*
> > +	 * Currently required for tcm_vhost to function..
> > +	 */
> > +	shost->max_id = 1;
> >  	shost->max_channel = 0;
> >  	shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
> >  	err = scsi_add_host(shost, &vdev->dev);
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
  2012-07-05  2:05     ` Nicholas A. Bellinger
@ 2012-07-05  6:42       ` Paolo Bonzini
  0 siblings, 0 replies; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-05  6:42 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi, lf-virt,
	Christoph Hellwig

Il 05/07/2012 04:05, Nicholas A. Bellinger ha scritto:
>> > But that's what the config field is for... why can't tcm_vhost (or QEMU)
>> > set max_id to 0?
>> > 
> So this patch was carried forward from Stefan's original code that I
> thought was required due to other limitations..
> 
> If that's not the case anymore I'm happy to drop it for now and look
> into a proper fix outside of virtio-scsi.
> 

I think max_id did not exist in the virtio-scsi configuration at the
time Stefan was working on it.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  2:01       ` Nicholas A. Bellinger
@ 2012-07-05  9:31         ` Michael S. Tsirkin
  2012-07-06  3:13           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-05  9:31 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, linux-scsi, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig

On Wed, Jul 04, 2012 at 07:01:05PM -0700, Nicholas A. Bellinger wrote:
> On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> > > Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> > > > On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> > > >> From: Nicholas Bellinger <nab@linux-iscsi.org>
> > > >>
> > > >> Hi folks,
> > > >>
> > > >> This series contains patches required to update tcm_vhost <-> virtio-scsi
> > > >> connected hosts <-> guests to run on v3.5-rc2 mainline code.  This series is
> > > >> available on top of target-pending/auto-next here:
> > > >>
> > > >>    git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> > > >>
> > > >> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> > > >> functioning, along a virtio-scsi LUN scanning change to address a client bug
> > > >> with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a single
> > > >> source + header file that is now living under /drivers/vhost/, along with latest
> > > >> tcm_vhost changes from Zhi's tcm_vhost tree.
> > > >>
> > > >> Here are a couple of screenshots of the code in action using raw IBLOCK
> > > >> backends provided by FusionIO ioDrive Duo:
> > > >>
> > > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> > > >>    http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> > > >>
> > > >> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> > > >> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> > > >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> > > > 
> > > > OK so this is an RFC, not for merge yet?
> > > 
> > > Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> > > 
> > > Paolo
> > 
> > I was talking about 4/6 first of all.
> 
> So yeah, this code is still considered RFC at this point for-3.6, but
> I'd like to get this into target-pending/for-next in next week for more
> feedback and start collecting signoffs for the necessary pieces that
> effect existing vhost code.
> 
> By that time the cmwq conversion of tcm_vhost should be in place as
> well..

I'll try to give some feedback but I think we do need
to see the qemu patches - they weren't posted yet, were they?
This driver has some userspace interface and once
that is merged it has to be supported.
So I think we need the buy-in from the qemu side at the principal level.

> > Anyway, it's best to split, not to mix RFCs and fixes.
> > 
> 
> <nod>, I'll send patch #5 separately to linux-scsi -> James and CC
> stable following Paolo's request.
> 
> Thanks!
> 
> --nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  1:52         ` Nicholas A. Bellinger
@ 2012-07-05 10:22           ` Paolo Bonzini
  2012-07-05 13:53             ` Michael S. Tsirkin
  2012-07-05 17:53           ` Bart Van Assche
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-05 10:22 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> 
> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K

This is impressive, but I think it's still not enough to justify the
inclusion of tcm_vhost.  In my opinion, vhost-blk/vhost-scsi are mostly
worthwhile as drivers for improvements to QEMU performance.  We want to
add more fast paths to QEMU that let us move SCSI and virtio processing
to separate threads, we have proof of concepts that this can be done,
and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.

In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
completely different devices that happen to speak the same SCSI
transport.  Not only virtio-scsi-vhost must be configured outside QEMU
and doesn't support -device; it (obviously) presents different
inquiry/vpd/mode data than virtio-scsi-qemu, so that it is not possible
to migrate one to the other.

I don't think vhost-scsi is particularly useful for virtualization,
honestly.  However, if it is useful for development, testing or
benchmarking of lio itself (does this make any sense? :)) that could be
by itself a good reason to include it.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 10:22           ` Paolo Bonzini
@ 2012-07-05 13:53             ` Michael S. Tsirkin
  2012-07-05 14:06               ` Anthony Liguori
                                 ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-05 13:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> > 
> > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> > ------------------------------------------------------------------------------------
> > 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> > 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> 
> This is impressive, but I think it's still not enough to justify the
> inclusion of tcm_vhost.  In my opinion, vhost-blk/vhost-scsi are mostly
> worthwhile as drivers for improvements to QEMU performance.  We want to
> add more fast paths to QEMU that let us move SCSI and virtio processing
> to separate threads, we have proof of concepts that this can be done,
> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.

A general rant below:

OTOH if it works, and adds value, we really should consider including code.
To me, it does not make sense to reject code just because in theory
someone could write even better code. Code walks. Time to marker matters too.
Yes I realize more options increases support. But downstreams can make
their own decisions on whether to support some configurations:
add a configure option to disable it and that's enough.

> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> completely different devices that happen to speak the same SCSI
> transport.  Not only virtio-scsi-vhost must be configured outside QEMU

configuration outside QEMU is OK I think - real users use
management anyway. But maybe we can have helper scripts
like we have for tun?

> and doesn't support -device;

This needs to be fixed I think.

> it (obviously) presents different
> inquiry/vpd/mode data than virtio-scsi-qemu,

Why is this obvious and can't be fixed? Userspace virtio-scsi
is pretty flexible - can't it supply matching inquiry/vpd/mode data
so that switching is transparent to the guest?

> so that it is not possible to migrate one to the other.

Migration between different backend types does not seem all that useful.
The general rule is you need identical flags on both sides to allow
migration, and it is not clear how valuable it is to relax this
somewhat.

> I don't think vhost-scsi is particularly useful for virtualization,
> honestly.  However, if it is useful for development, testing or
> benchmarking of lio itself (does this make any sense? :)) that could be
> by itself a good reason to include it.
> 
> Paolo

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 13:53             ` Michael S. Tsirkin
@ 2012-07-05 14:06               ` Anthony Liguori
  2012-07-05 14:40                 ` Michael S. Tsirkin
                                   ` (2 more replies)
  2012-07-05 14:06               ` Anthony Liguori
                                 ` (2 subsequent siblings)
  3 siblings, 3 replies; 57+ messages in thread
From: Anthony Liguori @ 2012-07-05 14:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Nicholas A. Bellinger, target-devel, linux-scsi,
	lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke

On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
>> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>>>
>>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>>> ------------------------------------------------------------------------------------
>>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
>>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
>>
>> This is impressive, but I think it's still not enough to justify the
>> inclusion of tcm_vhost.

We have demonstrated better results at much higher IOP rates with virtio-blk in 
userspace so while these results are nice, there's no reason to believe we can't 
do this in userspace.

>> In my opinion, vhost-blk/vhost-scsi are mostly
>> worthwhile as drivers for improvements to QEMU performance.  We want to
>> add more fast paths to QEMU that let us move SCSI and virtio processing
>> to separate threads, we have proof of concepts that this can be done,
>> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
>
> A general rant below:
>
> OTOH if it works, and adds value, we really should consider including code.

Users want something that has lots of features and performs really, really well. 
  They want everything.

Having one device type that is "fast" but has no features and another that is 
"not fast" but has a lot of features forces the user to make a bad choice.  No 
one wins in the end.

virtio-scsi is brand new.  It's not as if we've had any significant time to make 
virtio-scsi-qemu faster.  In fact, tcm_vhost existed before virtio-scsi-qemu did 
if I understand correctly.

> To me, it does not make sense to reject code just because in theory
> someone could write even better code.

There is no theory.  We have proof points with virtio-blk.

> Code walks. Time to marker matters too.

But guest/user facing decisions cannot be easily unmade and making the wrong 
technical choices because of premature concerns of "time to market" just result 
in a long term mess.

There is no technical reason why tcm_vhost is going to be faster than doing it 
in userspace.  We can demonstrate this with virtio-blk.  This isn't a 
theoretical argument.

> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
>
>> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
>> completely different devices that happen to speak the same SCSI
>> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
>
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?

Asking a user to write a helper script is pretty awful...

>
>> and doesn't support -device;
>
> This needs to be fixed I think.
>
>> it (obviously) presents different
>> inquiry/vpd/mode data than virtio-scsi-qemu,
>
> Why is this obvious and can't be fixed?

It's an entirely different emulation path.  It's not a simple packet protocol 
like virtio-net.  It's a complex command protocol where the backend maintains a 
very large amount of state.

> Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?

Basically, the issue is that the kernel has more complete SCSI emulation that 
QEMU does right now.

There are lots of ways to try to solve this--like try to reuse the kernel code 
in userspace or just improving the userspace code.  If we were able to make the 
two paths identical, then I strongly suspect there'd be no point in having 
tcm_vhost anyway.

Regards,

Anthony Liguori

>
>> so that it is not possible to migrate one to the other.
>
> Migration between different backend types does not seem all that useful.
> The general rule is you need identical flags on both sides to allow
> migration, and it is not clear how valuable it is to relax this
> somewhat.
>
>> I don't think vhost-scsi is particularly useful for virtualization,
>> honestly.  However, if it is useful for development, testing or
>> benchmarking of lio itself (does this make any sense? :)) that could be
>> by itself a good reason to include it.
>>
>> Paolo
>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 13:53             ` Michael S. Tsirkin
  2012-07-05 14:06               ` Anthony Liguori
@ 2012-07-05 14:06               ` Anthony Liguori
  2012-07-05 14:32               ` Paolo Bonzini
  2012-07-06  3:38               ` Nicholas A. Bellinger
  3 siblings, 0 replies; 57+ messages in thread
From: Anthony Liguori @ 2012-07-05 14:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, linux-scsi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
>> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>>>
>>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>>> ------------------------------------------------------------------------------------
>>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
>>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
>>
>> This is impressive, but I think it's still not enough to justify the
>> inclusion of tcm_vhost.

We have demonstrated better results at much higher IOP rates with virtio-blk in 
userspace so while these results are nice, there's no reason to believe we can't 
do this in userspace.

>> In my opinion, vhost-blk/vhost-scsi are mostly
>> worthwhile as drivers for improvements to QEMU performance.  We want to
>> add more fast paths to QEMU that let us move SCSI and virtio processing
>> to separate threads, we have proof of concepts that this can be done,
>> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
>
> A general rant below:
>
> OTOH if it works, and adds value, we really should consider including code.

Users want something that has lots of features and performs really, really well. 
  They want everything.

Having one device type that is "fast" but has no features and another that is 
"not fast" but has a lot of features forces the user to make a bad choice.  No 
one wins in the end.

virtio-scsi is brand new.  It's not as if we've had any significant time to make 
virtio-scsi-qemu faster.  In fact, tcm_vhost existed before virtio-scsi-qemu did 
if I understand correctly.

> To me, it does not make sense to reject code just because in theory
> someone could write even better code.

There is no theory.  We have proof points with virtio-blk.

> Code walks. Time to marker matters too.

But guest/user facing decisions cannot be easily unmade and making the wrong 
technical choices because of premature concerns of "time to market" just result 
in a long term mess.

There is no technical reason why tcm_vhost is going to be faster than doing it 
in userspace.  We can demonstrate this with virtio-blk.  This isn't a 
theoretical argument.

> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
>
>> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
>> completely different devices that happen to speak the same SCSI
>> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
>
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?

Asking a user to write a helper script is pretty awful...

>
>> and doesn't support -device;
>
> This needs to be fixed I think.
>
>> it (obviously) presents different
>> inquiry/vpd/mode data than virtio-scsi-qemu,
>
> Why is this obvious and can't be fixed?

It's an entirely different emulation path.  It's not a simple packet protocol 
like virtio-net.  It's a complex command protocol where the backend maintains a 
very large amount of state.

> Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?

Basically, the issue is that the kernel has more complete SCSI emulation that 
QEMU does right now.

There are lots of ways to try to solve this--like try to reuse the kernel code 
in userspace or just improving the userspace code.  If we were able to make the 
two paths identical, then I strongly suspect there'd be no point in having 
tcm_vhost anyway.

Regards,

Anthony Liguori

>
>> so that it is not possible to migrate one to the other.
>
> Migration between different backend types does not seem all that useful.
> The general rule is you need identical flags on both sides to allow
> migration, and it is not clear how valuable it is to relax this
> somewhat.
>
>> I don't think vhost-scsi is particularly useful for virtualization,
>> honestly.  However, if it is useful for development, testing or
>> benchmarking of lio itself (does this make any sense? :)) that could be
>> by itself a good reason to include it.
>>
>> Paolo
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 13:53             ` Michael S. Tsirkin
  2012-07-05 14:06               ` Anthony Liguori
  2012-07-05 14:06               ` Anthony Liguori
@ 2012-07-05 14:32               ` Paolo Bonzini
  2012-07-05 21:00                 ` Michael S. Tsirkin
  2012-07-06  3:38               ` Nicholas A. Bellinger
  3 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-05 14:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

Il 05/07/2012 15:53, Michael S. Tsirkin ha scritto:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
>> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>>>
>>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>>> ------------------------------------------------------------------------------------
>>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
>>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
>>
>> This is impressive, but I think it's still not enough to justify the
>> inclusion of tcm_vhost.  In my opinion, vhost-blk/vhost-scsi are mostly
>> worthwhile as drivers for improvements to QEMU performance.  We want to
>> add more fast paths to QEMU that let us move SCSI and virtio processing
>> to separate threads, we have proof of concepts that this can be done,
>> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> 
> A general rant below:
> 
> OTOH if it works, and adds value, we really should consider including code.
> To me, it does not make sense to reject code just because in theory
> someone could write even better code.

It's not about writing better code.  It's about having two completely
separate SCSI/block layers with completely different feature sets.

> Code walks. Time to marker matters too.
> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
> 
>> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
>> completely different devices that happen to speak the same SCSI
>> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> 
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?

We could add hooks for vhost-scsi in the SCSI devices and let them
configure themselves.  I'm not sure it is a good idea.

>> and doesn't support -device;
> 
> This needs to be fixed I think.

To be clear, it supports -device for the virtio-scsi HBA itself; it
doesn't support using -drive/-device to set up the disks hanging off it.

>> it (obviously) presents different
>> inquiry/vpd/mode data than virtio-scsi-qemu,
> 
> Why is this obvious and can't be fixed? Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?

It cannot support anyway the whole feature set unless you want to port
thousands of lines from the kernel to QEMU (well, perhaps we'll get
there but it's far.  And dually, the in-kernel target of course does not
support qcow2 and friends though perhaps you could imagine some hack
based on NBD.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:06               ` Anthony Liguori
@ 2012-07-05 14:40                 ` Michael S. Tsirkin
  2012-07-05 14:47                   ` Paolo Bonzini
  2012-07-06  3:01                 ` Nicholas A. Bellinger
  2012-07-06  3:01                 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
  2 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-05 14:40 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jens Axboe, linux-scsi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On Thu, Jul 05, 2012 at 09:06:35AM -0500, Anthony Liguori wrote:
> On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> >On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> >>Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >>>
> >>>fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> >>>------------------------------------------------------------------------------------
> >>>25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> >>>75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> >>
> >>This is impressive, but I think it's still not enough to justify the
> >>inclusion of tcm_vhost.
> 
> We have demonstrated better results at much higher IOP rates with
> virtio-blk in userspace so while these results are nice, there's no
> reason to believe we can't do this in userspace.
> 
> >>In my opinion, vhost-blk/vhost-scsi are mostly
> >>worthwhile as drivers for improvements to QEMU performance.  We want to
> >>add more fast paths to QEMU that let us move SCSI and virtio processing
> >>to separate threads, we have proof of concepts that this can be done,
> >>and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> >
> >A general rant below:
> >
> >OTOH if it works, and adds value, we really should consider including code.
> 
> Users want something that has lots of features and performs really,
> really well.  They want everything.
> 
> Having one device type that is "fast" but has no features and
> another that is "not fast" but has a lot of features forces the user
> to make a bad choice.  No one wins in the end.
>
> virtio-scsi is brand new.  It's not as if we've had any significant
> time to make virtio-scsi-qemu faster.  In fact, tcm_vhost existed
> before virtio-scsi-qemu did if I understand correctly.

Can't same can be said about virtio scsi - it seems to be
slower so we force a bad choice between blk and scsi at the user?

> 
> >To me, it does not make sense to reject code just because in theory
> >someone could write even better code.
> 
> There is no theory.  We have proof points with virtio-blk.
> 
> >Code walks. Time to marker matters too.
> 
> But guest/user facing decisions cannot be easily unmade and making
> the wrong technical choices because of premature concerns of "time
> to market" just result in a long term mess.
> 
> There is no technical reason why tcm_vhost is going to be faster
> than doing it in userspace.

But doing what in userspace exactly?

> We can demonstrate this with
> virtio-blk.  This isn't a theoretical argument.
>
> >Yes I realize more options increases support. But downstreams can make
> >their own decisions on whether to support some configurations:
> >add a configure option to disable it and that's enough.
> >
> >>In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> >>completely different devices that happen to speak the same SCSI
> >>transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> >
> >configuration outside QEMU is OK I think - real users use
> >management anyway. But maybe we can have helper scripts
> >like we have for tun?
> 
> Asking a user to write a helper script is pretty awful...

A developer can write a helper. A user should just use management.

> >
> >>and doesn't support -device;
> >
> >This needs to be fixed I think.
> >
> >>it (obviously) presents different
> >>inquiry/vpd/mode data than virtio-scsi-qemu,
> >
> >Why is this obvious and can't be fixed?
> 
> It's an entirely different emulation path.  It's not a simple packet
> protocol like virtio-net.  It's a complex command protocol where the
> backend maintains a very large amount of state.
>
> >Userspace virtio-scsi
> >is pretty flexible - can't it supply matching inquiry/vpd/mode data
> >so that switching is transparent to the guest?
> 
> Basically, the issue is that the kernel has more complete SCSI
> emulation that QEMU does right now.
> 
> There are lots of ways to try to solve this--like try to reuse the
> kernel code in userspace or just improving the userspace code.  If
> we were able to make the two paths identical, then I strongly
> suspect there'd be no point in having tcm_vhost anyway.
>
> Regards,
> 
> Anthony Liguori

However, a question we should ask ourselves is whether this will happen
in practice, and when.

I have no idea, I am just asking questions.

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:40                 ` Michael S. Tsirkin
@ 2012-07-05 14:47                   ` Paolo Bonzini
  2012-07-05 17:26                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-05 14:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto:
>> virtio-scsi is brand new.  It's not as if we've had any significant
>> time to make virtio-scsi-qemu faster.  In fact, tcm_vhost existed
>> before virtio-scsi-qemu did if I understand correctly.

Yes.

> Can't same can be said about virtio scsi - it seems to be
> slower so we force a bad choice between blk and scsi at the user?

virtio-scsi supports multiple devices per PCI slot (or even function),
can talk to tapes, has better passthrough support for disks, and does a
bunch of other things that virtio-blk by design doesn't do.  This
applies to both tcm_vhost and virtio-scsi-qemu.

So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more
benchmarking is needed.  Some people see it faster, some people see it
slower.  In some sense, it's consistent with the expectation that the
two should roughly be the same. :)

>> But guest/user facing decisions cannot be easily unmade and making
>> the wrong technical choices because of premature concerns of "time
>> to market" just result in a long term mess.
>>
>> There is no technical reason why tcm_vhost is going to be faster
>> than doing it in userspace.
> 
> But doing what in userspace exactly?

Processing virtqueues in separate threads, switching the block and SCSI
layer to fine-grained locking, adding some more fast paths.

>> Basically, the issue is that the kernel has more complete SCSI
>> emulation that QEMU does right now.
>>
>> There are lots of ways to try to solve this--like try to reuse the
>> kernel code in userspace or just improving the userspace code.  If
>> we were able to make the two paths identical, then I strongly
>> suspect there'd be no point in having tcm_vhost anyway.
> 
> However, a question we should ask ourselves is whether this will happen
> in practice, and when.

It's already happening, but it takes a substantial amount of preparatory
work before you can actually see results.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:47                   ` Paolo Bonzini
@ 2012-07-05 17:26                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-05 17:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On Thu, Jul 05, 2012 at 04:47:43PM +0200, Paolo Bonzini wrote:
> Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto:
> >> virtio-scsi is brand new.  It's not as if we've had any significant
> >> time to make virtio-scsi-qemu faster.  In fact, tcm_vhost existed
> >> before virtio-scsi-qemu did if I understand correctly.
> 
> Yes.
> 
> > Can't same can be said about virtio scsi - it seems to be
> > slower so we force a bad choice between blk and scsi at the user?
> 
> virtio-scsi supports multiple devices per PCI slot (or even function),
> can talk to tapes, has better passthrough support for disks, and does a
> bunch of other things that virtio-blk by design doesn't do.  This
> applies to both tcm_vhost and virtio-scsi-qemu.
> 
> So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more
> benchmarking is needed.  Some people see it faster, some people see it
> slower.  In some sense, it's consistent with the expectation that the
> two should roughly be the same. :)

Anyway, all I was saying is new technology often lacks some features of
the old one. We are not forcing new inferior one on anyone, so we can
let it mature it tree.

> >> But guest/user facing decisions cannot be easily unmade and making
> >> the wrong technical choices because of premature concerns of "time
> >> to market" just result in a long term mess.
> >>
> >> There is no technical reason why tcm_vhost is going to be faster
> >> than doing it in userspace.
> > 
> > But doing what in userspace exactly?
> 
> Processing virtqueues in separate threads, switching the block and SCSI
> layer to fine-grained locking, adding some more fast paths.
> 
> >> Basically, the issue is that the kernel has more complete SCSI
> >> emulation that QEMU does right now.
> >>
> >> There are lots of ways to try to solve this--like try to reuse the
> >> kernel code in userspace or just improving the userspace code.  If
> >> we were able to make the two paths identical, then I strongly
> >> suspect there'd be no point in having tcm_vhost anyway.
> > 
> > However, a question we should ask ourselves is whether this will happen
> > in practice, and when.
> 
> It's already happening, but it takes a substantial amount of preparatory
> work before you can actually see results.
> 
> Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-04  4:24 ` [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver Nicholas A. Bellinger
  2012-07-05 17:47   ` Bart Van Assche
@ 2012-07-05 17:47   ` Bart Van Assche
  2012-07-05 17:59     ` Bart Van Assche
  2012-07-05 17:59     ` Bart Van Assche
  1 sibling, 2 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:47 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: target-devel, linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi,
	Zhi Yong Wu, Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke

On 07/04/12 04:24, Nicholas A. Bellinger wrote:

> +/* Fill in status and signal that we are done processing this command
> + *
> + * This is scheduled in the vhost work queue so we are called with the owner
> + * process mm and can access the vring.
> + */
> +static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
> +{


As far as I can see vhost_scsi_complete_cmd_work() runs on the context
of a work queue kernel thread and hence doesn't have an mm context. Did
I misunderstand something ?

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-04  4:24 ` [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver Nicholas A. Bellinger
@ 2012-07-05 17:47   ` Bart Van Assche
  2012-07-05 17:47   ` Bart Van Assche
  1 sibling, 0 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:47 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi,
	Paolo Bonzini, lf-virt, Christoph Hellwig

On 07/04/12 04:24, Nicholas A. Bellinger wrote:

> +/* Fill in status and signal that we are done processing this command
> + *
> + * This is scheduled in the vhost work queue so we are called with the owner
> + * process mm and can access the vring.
> + */
> +static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
> +{


As far as I can see vhost_scsi_complete_cmd_work() runs on the context
of a work queue kernel thread and hence doesn't have an mm context. Did
I misunderstand something ?

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  1:52         ` Nicholas A. Bellinger
  2012-07-05 10:22           ` Paolo Bonzini
  2012-07-05 17:53           ` Bart Van Assche
@ 2012-07-05 17:53           ` Bart Van Assche
  2012-07-05 19:57             ` Bart Van Assche
  2012-07-10  0:29           ` Nicholas A. Bellinger
  2012-07-10  0:29           ` Nicholas A. Bellinger
  4 siblings, 1 reply; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:53 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Anthony Liguori, Michael S. Tsirkin, Paolo Bonzini, target-devel,
	linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke

On 07/05/12 01:52, Nicholas A. Bellinger wrote:

> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K


These numbers are interesting. To me these numbers mean that there is a
huge performance bottleneck in the virtio-scsi-raw storage path. Why is
the virtio-scsi-raw bandwidth only one third of the bare-metal raw block
bandwidth ?

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  1:52         ` Nicholas A. Bellinger
  2012-07-05 10:22           ` Paolo Bonzini
@ 2012-07-05 17:53           ` Bart Van Assche
  2012-07-05 17:53           ` Bart Van Assche
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:53 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Anthony Liguori, kvm-devel, linux-scsi,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

On 07/05/12 01:52, Nicholas A. Bellinger wrote:

> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K


These numbers are interesting. To me these numbers mean that there is a
huge performance bottleneck in the virtio-scsi-raw storage path. Why is
the virtio-scsi-raw bandwidth only one third of the bare-metal raw block
bandwidth ?

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-05 17:47   ` Bart Van Assche
@ 2012-07-05 17:59     ` Bart Van Assche
  2012-07-05 17:59     ` Bart Van Assche
  1 sibling, 0 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:59 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: target-devel, linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi,
	Zhi Yong Wu, Anthony Liguori, Paolo Bonzini, Michael S. Tsirkin,
	Christoph Hellwig, Jens Axboe, Hannes Reinecke

On 07/05/12 17:47, Bart Van Assche wrote:

> On 07/04/12 04:24, Nicholas A. Bellinger wrote:
>> +/* Fill in status and signal that we are done processing this command
>> + *
>> + * This is scheduled in the vhost work queue so we are called with the owner
>> + * process mm and can access the vring.
>> + */
>> +static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>> +{
> 
> As far as I can see vhost_scsi_complete_cmd_work() runs on the context
> of a work queue kernel thread and hence doesn't have an mm context. Did
> I misunderstand something ?


Please ignore the above - I've found the answer in vhost_dev_ioctl() and
vhost_dev_set_owner().

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
  2012-07-05 17:47   ` Bart Van Assche
  2012-07-05 17:59     ` Bart Van Assche
@ 2012-07-05 17:59     ` Bart Van Assche
  1 sibling, 0 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 17:59 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
	Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi,
	Paolo Bonzini, lf-virt, Christoph Hellwig

On 07/05/12 17:47, Bart Van Assche wrote:

> On 07/04/12 04:24, Nicholas A. Bellinger wrote:
>> +/* Fill in status and signal that we are done processing this command
>> + *
>> + * This is scheduled in the vhost work queue so we are called with the owner
>> + * process mm and can access the vring.
>> + */
>> +static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>> +{
> 
> As far as I can see vhost_scsi_complete_cmd_work() runs on the context
> of a work queue kernel thread and hence doesn't have an mm context. Did
> I misunderstand something ?


Please ignore the above - I've found the answer in vhost_dev_ioctl() and
vhost_dev_set_owner().

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 17:53           ` Bart Van Assche
@ 2012-07-05 19:57             ` Bart Van Assche
  0 siblings, 0 replies; 57+ messages in thread
From: Bart Van Assche @ 2012-07-05 19:57 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Anthony Liguori, kvm-devel, linux-scsi,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

On 07/05/12 17:53, Bart Van Assche wrote:

> On 07/05/12 01:52, Nicholas A. Bellinger wrote:
>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>> ------------------------------------------------------------------------------------
>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> 
> These numbers are interesting. To me these numbers mean that there is a
> huge performance bottleneck in the virtio-scsi-raw storage path. Why is
> the virtio-scsi-raw bandwidth only one third of the bare-metal raw block
> bandwidth ?


(replying to my own e-mail)

Or maybe the above numbers mean that in the virtio-scsi-raw test I/O was
serialized (I/O depth 1) while the other two tests use a large I/O depth
(64) ? It can't be a coincidence that the virtio-scsi-raw results are
close to the bare-metal results for I/O depth 1.

Another question: which functionality does tcm_vhost provide that is not
yet provided by the SCSI emulation code in qemu + tcm_loop ?

Bart.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:32               ` Paolo Bonzini
@ 2012-07-05 21:00                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2012-07-05 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
	Stefan Hajnoczi

On Thu, Jul 05, 2012 at 04:32:31PM +0200, Paolo Bonzini wrote:
> Il 05/07/2012 15:53, Michael S. Tsirkin ha scritto:
> > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >>>
> >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> >>> ------------------------------------------------------------------------------------
> >>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> >>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> >>
> >> This is impressive, but I think it's still not enough to justify the
> >> inclusion of tcm_vhost.  In my opinion, vhost-blk/vhost-scsi are mostly
> >> worthwhile as drivers for improvements to QEMU performance.  We want to
> >> add more fast paths to QEMU that let us move SCSI and virtio processing
> >> to separate threads, we have proof of concepts that this can be done,
> >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> > 
> > A general rant below:
> > 
> > OTOH if it works, and adds value, we really should consider including code.
> > To me, it does not make sense to reject code just because in theory
> > someone could write even better code.
> 
> It's not about writing better code.  It's about having two completely
> separate SCSI/block layers with completely different feature sets.

You mean qemu one versus kernel one? Both exist anyway :)

> > Code walks. Time to marker matters too.
> > Yes I realize more options increases support. But downstreams can make
> > their own decisions on whether to support some configurations:
> > add a configure option to disable it and that's enough.
> > 
> >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> >> completely different devices that happen to speak the same SCSI
> >> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> > 
> > configuration outside QEMU is OK I think - real users use
> > management anyway. But maybe we can have helper scripts
> > like we have for tun?
> 
> We could add hooks for vhost-scsi in the SCSI devices and let them
> configure themselves.  I'm not sure it is a good idea.

This is exactly what virtio-net does.

> >> and doesn't support -device;
> > 
> > This needs to be fixed I think.
> 
> To be clear, it supports -device for the virtio-scsi HBA itself; it
> doesn't support using -drive/-device to set up the disks hanging off it.

Fixable, isn't it?

> >> it (obviously) presents different
> >> inquiry/vpd/mode data than virtio-scsi-qemu,
> > 
> > Why is this obvious and can't be fixed? Userspace virtio-scsi
> > is pretty flexible - can't it supply matching inquiry/vpd/mode data
> > so that switching is transparent to the guest?
> 
> It cannot support anyway the whole feature set unless you want to port
> thousands of lines from the kernel to QEMU (well, perhaps we'll get
> there but it's far.  And dually, the in-kernel target of course does not
> support qcow2 and friends though perhaps you could imagine some hack
> based on NBD.
> 
> Paolo

Exactly. Kernel also gains functionality all the time.

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:06               ` Anthony Liguori
  2012-07-05 14:40                 ` Michael S. Tsirkin
@ 2012-07-06  3:01                 ` Nicholas A. Bellinger
  2012-07-06  5:43                   ` SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6] James Bottomley
  2012-07-06  3:01                 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
  2 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06  3:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, Paolo Bonzini, target-devel, linux-scsi,
	lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	ksummit-2012-discuss

On Thu, 2012-07-05 at 09:06 -0500, Anthony Liguori wrote:
> On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >>>
> >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> >>> ------------------------------------------------------------------------------------
> >>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> >>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> >>
> >> This is impressive, but I think it's still not enough to justify the
> >> inclusion of tcm_vhost.
> 
> We have demonstrated better results at much higher IOP rates with virtio-blk in 
> userspace so while these results are nice, there's no reason to believe we can't 
> do this in userspace.
> 

So I'm pretty sure this discrepancy is attributed to the small block
random I/O bottleneck currently present for all Linux/SCSI core LLDs
regardless of physical or virtual storage fabric.

The SCSI wide host-lock less conversion that happened in .38 code back
in 2010, and subsequently having LLDs like virtio-scsi convert to run in
host-lock-less mode have helped to some extent..  But it's still not
enough..

Another example where we've been able to prove this bottleneck recently
is with the following target setup:

*) Intel Romley production machines with 128 GB of DDR-3 memory
*) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2)
*) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec 
*) Infiniband SRP Target backported to RHEL 6.2 + latest OFED

In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 +
iomemory_vsl export we end up avoiding SCSI core bottleneck on the
target machine, just as with the tcm_vhost example here for host kernel
side processing with vhost.

Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP
(OFED) Initiator connected to four ib_srpt LUNs, we've observed that
MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs.
~215K with heavy random 4k WRITE iometer / fio tests.  Note this with an
optimized queue_depth ib_srp client w/ noop I/O schedulering, but is
still lacking the host_lock-less patches on RHEL 6.2 OFED..

This bottleneck has been mentioned by various people (including myself)
on linux-scsi the last 18 months, and I've proposed that that it be
discussed at KS-2012 so we can start making some forward progress:

http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000098.html,

> >> In my opinion, vhost-blk/vhost-scsi are mostly
> >> worthwhile as drivers for improvements to QEMU performance.  We want to
> >> add more fast paths to QEMU that let us move SCSI and virtio processing
> >> to separate threads, we have proof of concepts that this can be done,
> >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> >
> > A general rant below:
> >
> > OTOH if it works, and adds value, we really should consider including code.
> 
> Users want something that has lots of features and performs really, really well. 
>   They want everything.
> 
> Having one device type that is "fast" but has no features and another that is 
> "not fast" but has a lot of features forces the user to make a bad choice.  No 
> one wins in the end.
> 
> virtio-scsi is brand new.  It's not as if we've had any significant time to make 
> virtio-scsi-qemu faster.  In fact, tcm_vhost existed before virtio-scsi-qemu did 
> if I understand correctly.
> 

So based upon the data above, I'm going to make a prediction that MSFT
guests connected with SCSI miniport <-> tcm_vhost will out perform Linux
guests with virtio-scsi (w/ <= 3.5 host-lock-less) <-> tcm_vhost w/
connected to the same raw block flash iomemory_vsl backends.

Of course that depends upon how fast virtio-scsi drivers get written for
MSFT guests vs. us fixing the long-term performance bottleneck in our
SCSI subsystem.  ;)

(Ksummit-2012 discuss CC'ed for the later)

> > To me, it does not make sense to reject code just because in theory
> > someone could write even better code.
> 
> There is no theory.  We have proof points with virtio-blk.
> 
> > Code walks. Time to marker matters too.
> 
> But guest/user facing decisions cannot be easily unmade and making the wrong 
> technical choices because of premature concerns of "time to market" just result 
> in a long term mess.
> 
> There is no technical reason why tcm_vhost is going to be faster than doing it 
> in userspace.  We can demonstrate this with virtio-blk.  This isn't a 
> theoretical argument.
> 
> > Yes I realize more options increases support. But downstreams can make
> > their own decisions on whether to support some configurations:
> > add a configure option to disable it and that's enough.
> >
> >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> >> completely different devices that happen to speak the same SCSI
> >> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> >
> > configuration outside QEMU is OK I think - real users use
> > management anyway. But maybe we can have helper scripts
> > like we have for tun?
> 
> Asking a user to write a helper script is pretty awful...
> 

It's easy for anyone with basic python knowledge to use rtslib packages
in the downstream distros to connect to tcm_vhost endpoints LUNs right
now.

All you need is the following vhost.spec, and tcm_vhost works out of the
box for rtslib and targetcli/rtsadmin without any modification to
existing userspace packages:

root@tifa:~# cat /var/target/fabric/vhost.spec 
# WARNING: This is a draft specfile supplied for testing only.

# The fabric module feature set
features = nexus

# Use naa WWNs.
wwn_type = naa

# Non-standard module naming scheme
kernel_module = tcm_vhost

# The configfs group
configfs_group = vhost

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 14:06               ` Anthony Liguori
  2012-07-05 14:40                 ` Michael S. Tsirkin
  2012-07-06  3:01                 ` Nicholas A. Bellinger
@ 2012-07-06  3:01                 ` Nicholas A. Bellinger
  2 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06  3:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jens Axboe, linux-scsi, kvm-devel, Michael S. Tsirkin, lf-virt,
	Anthony Liguori, target-devel, ksummit-2012-discuss,
	Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

On Thu, 2012-07-05 at 09:06 -0500, Anthony Liguori wrote:
> On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >>>
> >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> >>> ------------------------------------------------------------------------------------
> >>> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> >>> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> >>
> >> This is impressive, but I think it's still not enough to justify the
> >> inclusion of tcm_vhost.
> 
> We have demonstrated better results at much higher IOP rates with virtio-blk in 
> userspace so while these results are nice, there's no reason to believe we can't 
> do this in userspace.
> 

So I'm pretty sure this discrepancy is attributed to the small block
random I/O bottleneck currently present for all Linux/SCSI core LLDs
regardless of physical or virtual storage fabric.

The SCSI wide host-lock less conversion that happened in .38 code back
in 2010, and subsequently having LLDs like virtio-scsi convert to run in
host-lock-less mode have helped to some extent..  But it's still not
enough..

Another example where we've been able to prove this bottleneck recently
is with the following target setup:

*) Intel Romley production machines with 128 GB of DDR-3 memory
*) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2)
*) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec 
*) Infiniband SRP Target backported to RHEL 6.2 + latest OFED

In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 +
iomemory_vsl export we end up avoiding SCSI core bottleneck on the
target machine, just as with the tcm_vhost example here for host kernel
side processing with vhost.

Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP
(OFED) Initiator connected to four ib_srpt LUNs, we've observed that
MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs.
~215K with heavy random 4k WRITE iometer / fio tests.  Note this with an
optimized queue_depth ib_srp client w/ noop I/O schedulering, but is
still lacking the host_lock-less patches on RHEL 6.2 OFED..

This bottleneck has been mentioned by various people (including myself)
on linux-scsi the last 18 months, and I've proposed that that it be
discussed at KS-2012 so we can start making some forward progress:

http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000098.html,

> >> In my opinion, vhost-blk/vhost-scsi are mostly
> >> worthwhile as drivers for improvements to QEMU performance.  We want to
> >> add more fast paths to QEMU that let us move SCSI and virtio processing
> >> to separate threads, we have proof of concepts that this can be done,
> >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> >
> > A general rant below:
> >
> > OTOH if it works, and adds value, we really should consider including code.
> 
> Users want something that has lots of features and performs really, really well. 
>   They want everything.
> 
> Having one device type that is "fast" but has no features and another that is 
> "not fast" but has a lot of features forces the user to make a bad choice.  No 
> one wins in the end.
> 
> virtio-scsi is brand new.  It's not as if we've had any significant time to make 
> virtio-scsi-qemu faster.  In fact, tcm_vhost existed before virtio-scsi-qemu did 
> if I understand correctly.
> 

So based upon the data above, I'm going to make a prediction that MSFT
guests connected with SCSI miniport <-> tcm_vhost will out perform Linux
guests with virtio-scsi (w/ <= 3.5 host-lock-less) <-> tcm_vhost w/
connected to the same raw block flash iomemory_vsl backends.

Of course that depends upon how fast virtio-scsi drivers get written for
MSFT guests vs. us fixing the long-term performance bottleneck in our
SCSI subsystem.  ;)

(Ksummit-2012 discuss CC'ed for the later)

> > To me, it does not make sense to reject code just because in theory
> > someone could write even better code.
> 
> There is no theory.  We have proof points with virtio-blk.
> 
> > Code walks. Time to marker matters too.
> 
> But guest/user facing decisions cannot be easily unmade and making the wrong 
> technical choices because of premature concerns of "time to market" just result 
> in a long term mess.
> 
> There is no technical reason why tcm_vhost is going to be faster than doing it 
> in userspace.  We can demonstrate this with virtio-blk.  This isn't a 
> theoretical argument.
> 
> > Yes I realize more options increases support. But downstreams can make
> > their own decisions on whether to support some configurations:
> > add a configure option to disable it and that's enough.
> >
> >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> >> completely different devices that happen to speak the same SCSI
> >> transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> >
> > configuration outside QEMU is OK I think - real users use
> > management anyway. But maybe we can have helper scripts
> > like we have for tun?
> 
> Asking a user to write a helper script is pretty awful...
> 

It's easy for anyone with basic python knowledge to use rtslib packages
in the downstream distros to connect to tcm_vhost endpoints LUNs right
now.

All you need is the following vhost.spec, and tcm_vhost works out of the
box for rtslib and targetcli/rtsadmin without any modification to
existing userspace packages:

root@tifa:~# cat /var/target/fabric/vhost.spec 
# WARNING: This is a draft specfile supplied for testing only.

# The fabric module feature set
features = nexus

# Use naa WWNs.
wwn_type = naa

# Non-standard module naming scheme
kernel_module = tcm_vhost

# The configfs group
configfs_group = vhost

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  9:31         ` Michael S. Tsirkin
@ 2012-07-06  3:13           ` Nicholas A. Bellinger
  0 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
	target-devel, linux-scsi, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig

On Thu, 2012-07-05 at 12:31 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 07:01:05PM -0700, Nicholas A. Bellinger wrote:
> > On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote:

<SNIP>

> > > I was talking about 4/6 first of all.
> > 
> > So yeah, this code is still considered RFC at this point for-3.6, but
> > I'd like to get this into target-pending/for-next in next week for more
> > feedback and start collecting signoffs for the necessary pieces that
> > effect existing vhost code.
> > 
> > By that time the cmwq conversion of tcm_vhost should be in place as
> > well..
> 
> I'll try to give some feedback but I think we do need
> to see the qemu patches - they weren't posted yet, were they?
> This driver has some userspace interface and once
> that is merged it has to be supported.
> So I think we need the buy-in from the qemu side at the principal level.
> 

<nod>

Stefan posted the QEMU vhost-scsi patches a few items, but I think it's
been awhile since the last round of review.  For the recent
development's with tcm_vhost, I've been using Zhi's QEMU tree here:

https://github.com/wuzhy/qemu/tree/vhost-scsi

Other than a few printf I added to help me understand how it works, no
function changes have been made to work with target-pending/tcm_vhost.

Thank you,

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05 13:53             ` Michael S. Tsirkin
                                 ` (2 preceding siblings ...)
  2012-07-05 14:32               ` Paolo Bonzini
@ 2012-07-06  3:38               ` Nicholas A. Bellinger
  2012-07-06  5:39                 ` Paolo Bonzini
  3 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Thu, 2012-07-05 at 16:53 +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> > Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> > > 
> > > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> > > ------------------------------------------------------------------------------------
> > > 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> > > 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> > 
> > This is impressive, but I think it's still not enough to justify the
> > inclusion of tcm_vhost.  In my opinion, vhost-blk/vhost-scsi are mostly
> > worthwhile as drivers for improvements to QEMU performance.  We want to
> > add more fast paths to QEMU that let us move SCSI and virtio processing
> > to separate threads, we have proof of concepts that this can be done,
> > and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> 
> A general rant below:
> 
> OTOH if it works, and adds value, we really should consider including code.
> To me, it does not make sense to reject code just because in theory
> someone could write even better code. Code walks. Time to marker matters too.
> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
> 

+1 for mst here.

I think that type of sentiment deserves a toast at KS/LC in August.  ;)

> > In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> > completely different devices that happen to speak the same SCSI
> > transport.  Not only virtio-scsi-vhost must be configured outside QEMU
> 
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?
> 
> > and doesn't support -device;
> 
> This needs to be fixed I think.
> 
> > it (obviously) presents different
> > inquiry/vpd/mode data than virtio-scsi-qemu,
> 
> Why is this obvious and can't be fixed? Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?
> 

So I imagine that setting inquiry/vpd/mode via configfs attribs to match
whatever the guest wants to see (or expects to see) can be enabled
via /sys/kernel/config/target/core/$HBA/$DEV/[wwn,attrib]/ easily to
whatever is required.

However, beyond basic SCSI WWN related bits, I would avoid trying to
match complex SCSI target state between the in-kernel patch and QEMU
SCSI.  We've had this topic come up numerous times over the nears for
other fabric modules (namely iscsi-target) and usually it end's up with
a long email thread re-hashing history of failures until Linus starts
yelling at the person who is pushing complex kernel <-> user split.

The part where I start to get nervous is where you get into the cluster
+ multipath features.  We have methods in TCM core that rebuild the
exact state of this bits based upon external file metadata, based upon
the running configfs layout.  This is used by physical node failover +
re-takeover to ensure the SCSI client sees exactly the same SCSI state.

Trying to propagate up this type of complexity is where I think you go
overboard.  KISS and let's let fabric independent configfs (leaning on
vfs) do the hard work for tracking these types of SCSI relationships.

> > so that it is not possible to migrate one to the other.
> 
> Migration between different backend types does not seem all that useful.
> The general rule is you need identical flags on both sides to allow
> migration, and it is not clear how valuable it is to relax this
> somewhat.
> 

I really need to learn more about how QEMU Live migration works wrt to
storage before saying how this may (or may not) work.

We certainly have no problems doing physical machine failover with
target_core_mod for iscsi-target, and ATM I don't see why the QEMU
userspace process driving the real-time configfs configuration of the
storage fabric would not work..

> > I don't think vhost-scsi is particularly useful for virtualization,
> > honestly.  However, if it is useful for development, testing or
> > benchmarking of lio itself (does this make any sense? :)) that could be
> > by itself a good reason to include it.
> > 
> > Paolo
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-06  3:38               ` Nicholas A. Bellinger
@ 2012-07-06  5:39                 ` Paolo Bonzini
  0 siblings, 0 replies; 57+ messages in thread
From: Paolo Bonzini @ 2012-07-06  5:39 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

Il 06/07/2012 05:38, Nicholas A. Bellinger ha scritto:
> So I imagine that setting inquiry/vpd/mode via configfs attribs to match
> whatever the guest wants to see (or expects to see) can be enabled
> via /sys/kernel/config/target/core/$HBA/$DEV/[wwn,attrib]/ easily to
> whatever is required.
> 
> However, beyond basic SCSI WWN related bits, I would avoid trying to
> match complex SCSI target state between the in-kernel patch and QEMU
> SCSI.

Agreed.  It should just be the bare minimum to make stable /dev/disk
paths, well, stable between the two backends.

>>> so that it is not possible to migrate one to the other.
>>
>> Migration between different backend types does not seem all that useful.
>> The general rule is you need identical flags on both sides to allow
>> migration, and it is not clear how valuable it is to relax this
>> somewhat.
> 
> I really need to learn more about how QEMU Live migration works wrt to
> storage before saying how this may (or may not) work.

vhost-scsi live migration should be easy to fix.  You need some ioctl or
eventfd mechanism to communicate to userspace that there is no pending
I/O, but you need that anyway also for other operations (as simple as
stopping the VM: QEMU guarantees that the "stop" monitor command returns
only when there is no outstanding I/O).

What worries me most is: 1) the amount of functionality that is lost
with vhost-scsi, especially the new live operations that we're adding to
QEMU; 2) whether any hook we introduce in the QEMU block layer will
cause problems down the road when we set to fix the existing
virtio-blk/virtio-scsi-qemu performance problems.  This is the reason
why I'm reluctant to merge the QEMU bits.  The kernel bits are
self-contained and are much less problematic.

It may well be that _the same_ (or very similar) hooks will be needed by
both tcm_vhost and high-performance userspace virtio backends.  This
would of course remove the objection.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06  3:01                 ` Nicholas A. Bellinger
@ 2012-07-06  5:43                   ` James Bottomley
  2012-07-06  9:13                     ` Nicholas A. Bellinger
                                       ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: James Bottomley @ 2012-07-06  5:43 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Anthony Liguori, Michael S. Tsirkin, Paolo Bonzini, target-devel,
	linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	ksummit-2012-discuss

On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote:

> So I'm pretty sure this discrepancy is attributed to the small block
> random I/O bottleneck currently present for all Linux/SCSI core LLDs
> regardless of physical or virtual storage fabric.
> 
> The SCSI wide host-lock less conversion that happened in .38 code back
> in 2010, and subsequently having LLDs like virtio-scsi convert to run in
> host-lock-less mode have helped to some extent..  But it's still not
> enough..
> 
> Another example where we've been able to prove this bottleneck recently
> is with the following target setup:
> 
> *) Intel Romley production machines with 128 GB of DDR-3 memory
> *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2)
> *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec 
> *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED
> 
> In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 +
> iomemory_vsl export we end up avoiding SCSI core bottleneck on the
> target machine, just as with the tcm_vhost example here for host kernel
> side processing with vhost.
> 
> Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP
> (OFED) Initiator connected to four ib_srpt LUNs, we've observed that
> MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs.
> ~215K with heavy random 4k WRITE iometer / fio tests.  Note this with an
> optimized queue_depth ib_srp client w/ noop I/O schedulering, but is
> still lacking the host_lock-less patches on RHEL 6.2 OFED..
> 
> This bottleneck has been mentioned by various people (including myself)
> on linux-scsi the last 18 months, and I've proposed that that it be
> discussed at KS-2012 so we can start making some forward progress:

Well, no, it hasn't.  You randomly drop things like this into unrelated
email (I suppose that is a mention in strict English construction) but
it's not really enough to get anyone to pay attention since they mostly
stopped reading at the top, if they got that far: most people just go by
subject when wading through threads initially.

But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32
kernel, which is now nearly three years old) is 25% slower than W2k8R2
on infiniband isn't really going to get anyone excited either
(particularly when you mention OFED, which usually means a stack
replacement on Linux anyway).

What people might pay attention to is evidence that there's a problem in
3.5-rc6 (without any OFED crap).  If you're not going to bother
investigating, it has to be in an environment they can reproduce (so
ordinary hardware, not infiniband) otherwise it gets ignored as an
esoteric hardware issue.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06  5:43                   ` SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6] James Bottomley
@ 2012-07-06  9:13                     ` Nicholas A. Bellinger
  2012-07-06 13:49                       ` James Bottomley
  2012-07-06 20:30                     ` [Ksummit-2012-discuss] " Christoph Lameter
  2012-07-06 20:30                     ` Christoph Lameter
  2 siblings, 1 reply; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06  9:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Anthony Liguori, kvm-devel, linux-scsi,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	ksummit-2012-discuss, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote:
> On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote:
> 
> > So I'm pretty sure this discrepancy is attributed to the small block
> > random I/O bottleneck currently present for all Linux/SCSI core LLDs
> > regardless of physical or virtual storage fabric.
> > 
> > The SCSI wide host-lock less conversion that happened in .38 code back
> > in 2010, and subsequently having LLDs like virtio-scsi convert to run in
> > host-lock-less mode have helped to some extent..  But it's still not
> > enough..
> > 
> > Another example where we've been able to prove this bottleneck recently
> > is with the following target setup:
> > 
> > *) Intel Romley production machines with 128 GB of DDR-3 memory
> > *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2)
> > *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec 
> > *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED
> > 
> > In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 +
> > iomemory_vsl export we end up avoiding SCSI core bottleneck on the
> > target machine, just as with the tcm_vhost example here for host kernel
> > side processing with vhost.
> > 
> > Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP
> > (OFED) Initiator connected to four ib_srpt LUNs, we've observed that
> > MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs.
> > ~215K with heavy random 4k WRITE iometer / fio tests.  Note this with an
> > optimized queue_depth ib_srp client w/ noop I/O schedulering, but is
> > still lacking the host_lock-less patches on RHEL 6.2 OFED..
> > 
> > This bottleneck has been mentioned by various people (including myself)
> > on linux-scsi the last 18 months, and I've proposed that that it be
> > discussed at KS-2012 so we can start making some forward progress:
> 
> Well, no, it hasn't.  You randomly drop things like this into unrelated
> email (I suppose that is a mention in strict English construction) but
> it's not really enough to get anyone to pay attention since they mostly
> stopped reading at the top, if they got that far: most people just go by
> subject when wading through threads initially.
> 

It most certainly has been made clear to me, numerous times from many
people in the Linux/SCSI community that there is a bottleneck for small
block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non
Linux based SCSI subsystems.

My apologies if mentioning this issue last year at LC 2011 to you
privately did not take a tone of a more serious nature, or that
proposing a topic for LSF-2012 this year was not a clear enough
indication of a problem with SCSI small block random I/O performance.

> But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32
> kernel, which is now nearly three years old) is 25% slower than W2k8R2
> on infiniband isn't really going to get anyone excited either
> (particularly when you mention OFED, which usually means a stack
> replacement on Linux anyway).
> 

The specific issue was first raised for .38 where we where able to get
most of the interesting high performance LLDs converted to using
internal locking methods so that host_lock did not have to be obtained
during each ->queuecommand() I/O dispatch, right..?

This has helped a good deal for large multi-lun scsi_host configs that
are now running in host-lock less mode, but there is still a large
discrepancy single LUN vs. raw struct block_device access even with LLD
host_lock less mode enabled.

Now I think the virtio-blk client performance is demonstrating this
issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block
flash benchmarks that is demonstrate some other yet-to-be determined
limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio
randrw workload.

> What people might pay attention to is evidence that there's a problem in
> 3.5-rc6 (without any OFED crap).  If you're not going to bother
> investigating, it has to be in an environment they can reproduce (so
> ordinary hardware, not infiniband) otherwise it gets ignored as an
> esoteric hardware issue.
> 

It's really quite simple for anyone to demonstrate the bottleneck
locally on any machine using tcm_loop with raw block flash.  Take a
struct block_device backend (like a Fusion IO /dev/fio*) and using
IBLOCK and export locally accessible SCSI LUNs via tcm_loop..

Using FIO there is a significant drop for randrw 4k performance between
tcm_loop <-> IBLOCK vs. raw struct block device backends.  And no, it's
not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI
LUN limitation for small block random I/Os on the order of ~75K for each
SCSI LUN.

If anyone has gone actually gone faster than this with any single SCSI
LUN on any storage fabric, I would be interested in hearing about your
setup.

Thanks,

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06  9:13                     ` Nicholas A. Bellinger
@ 2012-07-06 13:49                       ` James Bottomley
  2012-07-06 18:21                         ` Nicholas A. Bellinger
  2012-07-06 18:21                         ` Nicholas A. Bellinger
  0 siblings, 2 replies; 57+ messages in thread
From: James Bottomley @ 2012-07-06 13:49 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Anthony Liguori, Michael S. Tsirkin, Paolo Bonzini, target-devel,
	linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	ksummit-2012-discuss

On Fri, 2012-07-06 at 02:13 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote:
> > On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote:
> > 
> > > So I'm pretty sure this discrepancy is attributed to the small block
> > > random I/O bottleneck currently present for all Linux/SCSI core LLDs
> > > regardless of physical or virtual storage fabric.
> > > 
> > > The SCSI wide host-lock less conversion that happened in .38 code back
> > > in 2010, and subsequently having LLDs like virtio-scsi convert to run in
> > > host-lock-less mode have helped to some extent..  But it's still not
> > > enough..
> > > 
> > > Another example where we've been able to prove this bottleneck recently
> > > is with the following target setup:
> > > 
> > > *) Intel Romley production machines with 128 GB of DDR-3 memory
> > > *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2)
> > > *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec 
> > > *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED
> > > 
> > > In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 +
> > > iomemory_vsl export we end up avoiding SCSI core bottleneck on the
> > > target machine, just as with the tcm_vhost example here for host kernel
> > > side processing with vhost.
> > > 
> > > Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP
> > > (OFED) Initiator connected to four ib_srpt LUNs, we've observed that
> > > MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs.
> > > ~215K with heavy random 4k WRITE iometer / fio tests.  Note this with an
> > > optimized queue_depth ib_srp client w/ noop I/O schedulering, but is
> > > still lacking the host_lock-less patches on RHEL 6.2 OFED..
> > > 
> > > This bottleneck has been mentioned by various people (including myself)
> > > on linux-scsi the last 18 months, and I've proposed that that it be
> > > discussed at KS-2012 so we can start making some forward progress:
> > 
> > Well, no, it hasn't.  You randomly drop things like this into unrelated
> > email (I suppose that is a mention in strict English construction) but
> > it's not really enough to get anyone to pay attention since they mostly
> > stopped reading at the top, if they got that far: most people just go by
> > subject when wading through threads initially.
> > 
> 
> It most certainly has been made clear to me, numerous times from many
> people in the Linux/SCSI community that there is a bottleneck for small
> block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non
> Linux based SCSI subsystems.
> 
> My apologies if mentioning this issue last year at LC 2011 to you
> privately did not take a tone of a more serious nature, or that
> proposing a topic for LSF-2012 this year was not a clear enough
> indication of a problem with SCSI small block random I/O performance.
> 
> > But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32
> > kernel, which is now nearly three years old) is 25% slower than W2k8R2
> > on infiniband isn't really going to get anyone excited either
> > (particularly when you mention OFED, which usually means a stack
> > replacement on Linux anyway).
> > 
> 
> The specific issue was first raised for .38 where we where able to get
> most of the interesting high performance LLDs converted to using
> internal locking methods so that host_lock did not have to be obtained
> during each ->queuecommand() I/O dispatch, right..?
> 
> This has helped a good deal for large multi-lun scsi_host configs that
> are now running in host-lock less mode, but there is still a large
> discrepancy single LUN vs. raw struct block_device access even with LLD
> host_lock less mode enabled.
> 
> Now I think the virtio-blk client performance is demonstrating this
> issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block
> flash benchmarks that is demonstrate some other yet-to-be determined
> limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio
> randrw workload.
> 
> > What people might pay attention to is evidence that there's a problem in
> > 3.5-rc6 (without any OFED crap).  If you're not going to bother
> > investigating, it has to be in an environment they can reproduce (so
> > ordinary hardware, not infiniband) otherwise it gets ignored as an
> > esoteric hardware issue.
> > 
> 
> It's really quite simple for anyone to demonstrate the bottleneck
> locally on any machine using tcm_loop with raw block flash.  Take a
> struct block_device backend (like a Fusion IO /dev/fio*) and using
> IBLOCK and export locally accessible SCSI LUNs via tcm_loop..
> 
> Using FIO there is a significant drop for randrw 4k performance between
> tcm_loop <-> IBLOCK vs. raw struct block device backends.  And no, it's
> not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI
> LUN limitation for small block random I/Os on the order of ~75K for each
> SCSI LUN.

Here, you're saying here that the end to end SCSI stack tops out at
around 75k iops, which is reasonably respectable if you don't employ any
mitigation like queue steering and interrupt polling ... what were the
mitigation techniques in the test you employed by the way?

But previously, you ascribed a performance drop of around 75% on
virtio-scsi (topping out around 15-20k iops) to this same problem ...
that doesn't really seem likely.

Here's the rough ranges of concern:

10K iops: standard arrays
100K iops: modern expensive fast flash drives on 6Gb links
1M iops: PCIe NVMexpress like devices

SCSI should do arrays with no problem at all, so I'd be really concerned
that it can't make 0-20k iops.  If you push the system and fine tune it,
SCSI can just about get to 100k iops.  1M iops is still a stretch goal
for pure block drivers.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06 13:49                       ` James Bottomley
@ 2012-07-06 18:21                         ` Nicholas A. Bellinger
  2012-07-06 18:21                         ` Nicholas A. Bellinger
  1 sibling, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06 18:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anthony Liguori, Michael S. Tsirkin, Paolo Bonzini, target-devel,
	linux-scsi, lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke,
	ksummit-2012-discuss

On Fri, 2012-07-06 at 17:49 +0400, James Bottomley wrote:
> On Fri, 2012-07-06 at 02:13 -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote:
> > > On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote:
> > > 

<SNIP>

> > > > This bottleneck has been mentioned by various people (including myself)
> > > > on linux-scsi the last 18 months, and I've proposed that that it be
> > > > discussed at KS-2012 so we can start making some forward progress:
> > > 
> > > Well, no, it hasn't.  You randomly drop things like this into unrelated
> > > email (I suppose that is a mention in strict English construction) but
> > > it's not really enough to get anyone to pay attention since they mostly
> > > stopped reading at the top, if they got that far: most people just go by
> > > subject when wading through threads initially.
> > > 
> > 
> > It most certainly has been made clear to me, numerous times from many
> > people in the Linux/SCSI community that there is a bottleneck for small
> > block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non
> > Linux based SCSI subsystems.
> > 
> > My apologies if mentioning this issue last year at LC 2011 to you
> > privately did not take a tone of a more serious nature, or that
> > proposing a topic for LSF-2012 this year was not a clear enough
> > indication of a problem with SCSI small block random I/O performance.
> > 
> > > But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32
> > > kernel, which is now nearly three years old) is 25% slower than W2k8R2
> > > on infiniband isn't really going to get anyone excited either
> > > (particularly when you mention OFED, which usually means a stack
> > > replacement on Linux anyway).
> > > 
> > 
> > The specific issue was first raised for .38 where we where able to get
> > most of the interesting high performance LLDs converted to using
> > internal locking methods so that host_lock did not have to be obtained
> > during each ->queuecommand() I/O dispatch, right..?
> > 
> > This has helped a good deal for large multi-lun scsi_host configs that
> > are now running in host-lock less mode, but there is still a large
> > discrepancy single LUN vs. raw struct block_device access even with LLD
> > host_lock less mode enabled.
> > 
> > Now I think the virtio-blk client performance is demonstrating this
> > issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block
> > flash benchmarks that is demonstrate some other yet-to-be determined
> > limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio
> > randrw workload.
> > 
> > > What people might pay attention to is evidence that there's a problem in
> > > 3.5-rc6 (without any OFED crap).  If you're not going to bother
> > > investigating, it has to be in an environment they can reproduce (so
> > > ordinary hardware, not infiniband) otherwise it gets ignored as an
> > > esoteric hardware issue.
> > > 
> > 
> > It's really quite simple for anyone to demonstrate the bottleneck
> > locally on any machine using tcm_loop with raw block flash.  Take a
> > struct block_device backend (like a Fusion IO /dev/fio*) and using
> > IBLOCK and export locally accessible SCSI LUNs via tcm_loop..
> > 
> > Using FIO there is a significant drop for randrw 4k performance between
> > tcm_loop <-> IBLOCK vs. raw struct block device backends.  And no, it's
> > not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI
> > LUN limitation for small block random I/Os on the order of ~75K for each
> > SCSI LUN.
> 
> Here, you're saying here that the end to end SCSI stack tops out at
> around 75k iops, which is reasonably respectable if you don't employ any
> mitigation like queue steering and interrupt polling ... what were the
> mitigation techniques in the test you employed by the way?
> 

~75K per SCSI LUN in a multi-lun per host setup is being optimistic btw.
On the other side of the coin, the same pure block device can easily go
~200K per backend.-

For the simplest case with tcm_loop, a struct scsi_cmnd is queued via
cmwq to execute in process context -> submit the backend I/O.  Once
completed from IBLOCK, the I/O is run though a target completion wq, and
completed back to SCSI.

There is no fancy queue steering or interrupt polling going on (at least
not in tcm_loop) because it's a simple virtual SCSI LLD similar to
scsi_debug.

> But previously, you ascribed a performance drop of around 75% on
> virtio-scsi (topping out around 15-20k iops) to this same problem ...
> that doesn't really seem likely.
> 

No.  I ascribed the performance difference between virtio-scsi+tcm_vhost
vs. bare-metal raw block flash to this bottleneck in Linux/SCSI.

It's obvious that virtio-scsi-raw going through QEMU SCSI / block is
having some other shortcomings.

> Here's the rough ranges of concern:
> 
> 10K iops: standard arrays
> 100K iops: modern expensive fast flash drives on 6Gb links
> 1M iops: PCIe NVMexpress like devices
> 
> SCSI should do arrays with no problem at all, so I'd be really concerned
> that it can't make 0-20k iops.  If you push the system and fine tune it,
> SCSI can just about get to 100k iops.  1M iops is still a stretch goal
> for pure block drivers.
> 

1M iops is not a stretch for pure block drivers anymore on commodity
hardwrae.  5 Fusion-IO HBAs + Romley HW can easily go 1M random 4k IOPs
using a pure block driver.

The point is that it would currently take at least 2x the amount of SCSI
LUNs in order to even get close to 1M IOPs with an single LLD driver.
And from the feedback from everyone I've talked to, no one has been able
to make Linux/SCSI go 1M IOPs with any kernel.

--nab


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06 13:49                       ` James Bottomley
  2012-07-06 18:21                         ` Nicholas A. Bellinger
@ 2012-07-06 18:21                         ` Nicholas A. Bellinger
  1 sibling, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06 18:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Anthony Liguori, kvm-devel, linux-scsi,
	Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
	ksummit-2012-discuss, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Fri, 2012-07-06 at 17:49 +0400, James Bottomley wrote:
> On Fri, 2012-07-06 at 02:13 -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote:
> > > On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote:
> > > 

<SNIP>

> > > > This bottleneck has been mentioned by various people (including myself)
> > > > on linux-scsi the last 18 months, and I've proposed that that it be
> > > > discussed at KS-2012 so we can start making some forward progress:
> > > 
> > > Well, no, it hasn't.  You randomly drop things like this into unrelated
> > > email (I suppose that is a mention in strict English construction) but
> > > it's not really enough to get anyone to pay attention since they mostly
> > > stopped reading at the top, if they got that far: most people just go by
> > > subject when wading through threads initially.
> > > 
> > 
> > It most certainly has been made clear to me, numerous times from many
> > people in the Linux/SCSI community that there is a bottleneck for small
> > block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non
> > Linux based SCSI subsystems.
> > 
> > My apologies if mentioning this issue last year at LC 2011 to you
> > privately did not take a tone of a more serious nature, or that
> > proposing a topic for LSF-2012 this year was not a clear enough
> > indication of a problem with SCSI small block random I/O performance.
> > 
> > > But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32
> > > kernel, which is now nearly three years old) is 25% slower than W2k8R2
> > > on infiniband isn't really going to get anyone excited either
> > > (particularly when you mention OFED, which usually means a stack
> > > replacement on Linux anyway).
> > > 
> > 
> > The specific issue was first raised for .38 where we where able to get
> > most of the interesting high performance LLDs converted to using
> > internal locking methods so that host_lock did not have to be obtained
> > during each ->queuecommand() I/O dispatch, right..?
> > 
> > This has helped a good deal for large multi-lun scsi_host configs that
> > are now running in host-lock less mode, but there is still a large
> > discrepancy single LUN vs. raw struct block_device access even with LLD
> > host_lock less mode enabled.
> > 
> > Now I think the virtio-blk client performance is demonstrating this
> > issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block
> > flash benchmarks that is demonstrate some other yet-to-be determined
> > limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio
> > randrw workload.
> > 
> > > What people might pay attention to is evidence that there's a problem in
> > > 3.5-rc6 (without any OFED crap).  If you're not going to bother
> > > investigating, it has to be in an environment they can reproduce (so
> > > ordinary hardware, not infiniband) otherwise it gets ignored as an
> > > esoteric hardware issue.
> > > 
> > 
> > It's really quite simple for anyone to demonstrate the bottleneck
> > locally on any machine using tcm_loop with raw block flash.  Take a
> > struct block_device backend (like a Fusion IO /dev/fio*) and using
> > IBLOCK and export locally accessible SCSI LUNs via tcm_loop..
> > 
> > Using FIO there is a significant drop for randrw 4k performance between
> > tcm_loop <-> IBLOCK vs. raw struct block device backends.  And no, it's
> > not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI
> > LUN limitation for small block random I/Os on the order of ~75K for each
> > SCSI LUN.
> 
> Here, you're saying here that the end to end SCSI stack tops out at
> around 75k iops, which is reasonably respectable if you don't employ any
> mitigation like queue steering and interrupt polling ... what were the
> mitigation techniques in the test you employed by the way?
> 

~75K per SCSI LUN in a multi-lun per host setup is being optimistic btw.
On the other side of the coin, the same pure block device can easily go
~200K per backend.-

For the simplest case with tcm_loop, a struct scsi_cmnd is queued via
cmwq to execute in process context -> submit the backend I/O.  Once
completed from IBLOCK, the I/O is run though a target completion wq, and
completed back to SCSI.

There is no fancy queue steering or interrupt polling going on (at least
not in tcm_loop) because it's a simple virtual SCSI LLD similar to
scsi_debug.

> But previously, you ascribed a performance drop of around 75% on
> virtio-scsi (topping out around 15-20k iops) to this same problem ...
> that doesn't really seem likely.
> 

No.  I ascribed the performance difference between virtio-scsi+tcm_vhost
vs. bare-metal raw block flash to this bottleneck in Linux/SCSI.

It's obvious that virtio-scsi-raw going through QEMU SCSI / block is
having some other shortcomings.

> Here's the rough ranges of concern:
> 
> 10K iops: standard arrays
> 100K iops: modern expensive fast flash drives on 6Gb links
> 1M iops: PCIe NVMexpress like devices
> 
> SCSI should do arrays with no problem at all, so I'd be really concerned
> that it can't make 0-20k iops.  If you push the system and fine tune it,
> SCSI can just about get to 100k iops.  1M iops is still a stretch goal
> for pure block drivers.
> 

1M iops is not a stretch for pure block drivers anymore on commodity
hardwrae.  5 Fusion-IO HBAs + Romley HW can easily go 1M random 4k IOPs
using a pure block driver.

The point is that it would currently take at least 2x the amount of SCSI
LUNs in order to even get close to 1M IOPs with an single LLD driver.
And from the feedback from everyone I've talked to, no one has been able
to make Linux/SCSI go 1M IOPs with any kernel.

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06  5:43                   ` SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6] James Bottomley
  2012-07-06  9:13                     ` Nicholas A. Bellinger
  2012-07-06 20:30                     ` [Ksummit-2012-discuss] " Christoph Lameter
@ 2012-07-06 20:30                     ` Christoph Lameter
  2012-07-06 22:06                       ` Nicholas A. Bellinger
  2012-07-06 22:06                       ` Nicholas A. Bellinger
  2 siblings, 2 replies; 57+ messages in thread
From: Christoph Lameter @ 2012-07-06 20:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nicholas A. Bellinger, Jens Axboe, Anthony Liguori, kvm-devel,
	linux-scsi, lf-virt, Anthony Liguori, target-devel,
	ksummit-2012-discuss, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Fri, 6 Jul 2012, James Bottomley wrote:

> What people might pay attention to is evidence that there's a problem in
> 3.5-rc6 (without any OFED crap).  If you're not going to bother
> investigating, it has to be in an environment they can reproduce (so
> ordinary hardware, not infiniband) otherwise it gets ignored as an
> esoteric hardware issue.

The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been
supported for a long time and its a very important technology given the
problematic nature of ethernet at high network speeds.

OFED crap exists for those running RHEL5/6. The new enterprise distros are
based on the 3.2 kernel which has pretty good Infiniband support
out of the box.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06  5:43                   ` SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6] James Bottomley
  2012-07-06  9:13                     ` Nicholas A. Bellinger
@ 2012-07-06 20:30                     ` Christoph Lameter
  2012-07-06 20:30                     ` Christoph Lameter
  2 siblings, 0 replies; 57+ messages in thread
From: Christoph Lameter @ 2012-07-06 20:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
	Anthony Liguori, target-devel, ksummit-2012-discuss,
	Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi

On Fri, 6 Jul 2012, James Bottomley wrote:

> What people might pay attention to is evidence that there's a problem in
> 3.5-rc6 (without any OFED crap).  If you're not going to bother
> investigating, it has to be in an environment they can reproduce (so
> ordinary hardware, not infiniband) otherwise it gets ignored as an
> esoteric hardware issue.

The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been
supported for a long time and its a very important technology given the
problematic nature of ethernet at high network speeds.

OFED crap exists for those running RHEL5/6. The new enterprise distros are
based on the 3.2 kernel which has pretty good Infiniband support
out of the box.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06 20:30                     ` Christoph Lameter
  2012-07-06 22:06                       ` Nicholas A. Bellinger
@ 2012-07-06 22:06                       ` Nicholas A. Bellinger
  1 sibling, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06 22:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, Jens Axboe, Anthony Liguori, kvm-devel,
	linux-scsi, lf-virt, Anthony Liguori, target-devel,
	ksummit-2012-discuss, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Fri, 2012-07-06 at 15:30 -0500, Christoph Lameter wrote:
> On Fri, 6 Jul 2012, James Bottomley wrote:
> 
> > What people might pay attention to is evidence that there's a problem in
> > 3.5-rc6 (without any OFED crap).  If you're not going to bother
> > investigating, it has to be in an environment they can reproduce (so
> > ordinary hardware, not infiniband) otherwise it gets ignored as an
> > esoteric hardware issue.
> 
> The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been
> supported for a long time and its a very important technology given the
> problematic nature of ethernet at high network speeds.
> 
> OFED crap exists for those running RHEL5/6. The new enterprise distros are
> based on the 3.2 kernel which has pretty good Infiniband support
> out of the box.
> 

So I don't think the HCAs or Infiniband fabric was the limiting factor
for small block random I/O in the RHEL 6.2 w/ OFED vs. Windows Server
2008 R2 w/ OFED setup mentioned earlier.

I've seen both FC and iSCSI fabrics demonstrate the same type of random
small block I/O performance anomalies with Linux/SCSI clients too.  The
v3.x Linux/SCSI clients are certainly better in the multi-lun per host
small block random I/O case, but single LUN performance is (still)
lacking compared to everything else.

Also RHEL 6.2 does have the scsi-host-lock less bits in place now, but
it's been more a matter of converting OFED ib_srp code to run in
host-lock less mode to realize extra gains for multi-lun per host.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
  2012-07-06 20:30                     ` Christoph Lameter
@ 2012-07-06 22:06                       ` Nicholas A. Bellinger
  2012-07-06 22:06                       ` Nicholas A. Bellinger
  1 sibling, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-06 22:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jens Axboe, Anthony Liguori, kvm-devel, linux-scsi, lf-virt,
	James Bottomley, Anthony Liguori, target-devel,
	ksummit-2012-discuss, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

On Fri, 2012-07-06 at 15:30 -0500, Christoph Lameter wrote:
> On Fri, 6 Jul 2012, James Bottomley wrote:
> 
> > What people might pay attention to is evidence that there's a problem in
> > 3.5-rc6 (without any OFED crap).  If you're not going to bother
> > investigating, it has to be in an environment they can reproduce (so
> > ordinary hardware, not infiniband) otherwise it gets ignored as an
> > esoteric hardware issue.
> 
> The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been
> supported for a long time and its a very important technology given the
> problematic nature of ethernet at high network speeds.
> 
> OFED crap exists for those running RHEL5/6. The new enterprise distros are
> based on the 3.2 kernel which has pretty good Infiniband support
> out of the box.
> 

So I don't think the HCAs or Infiniband fabric was the limiting factor
for small block random I/O in the RHEL 6.2 w/ OFED vs. Windows Server
2008 R2 w/ OFED setup mentioned earlier.

I've seen both FC and iSCSI fabrics demonstrate the same type of random
small block I/O performance anomalies with Linux/SCSI clients too.  The
v3.x Linux/SCSI clients are certainly better in the multi-lun per host
small block random I/O case, but single LUN performance is (still)
lacking compared to everything else.

Also RHEL 6.2 does have the scsi-host-lock less bits in place now, but
it's been more a matter of converting OFED ib_srp code to run in
host-lock less mode to realize extra gains for multi-lun per host.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  1:52         ` Nicholas A. Bellinger
                             ` (2 preceding siblings ...)
  2012-07-05 17:53           ` Bart Van Assche
@ 2012-07-10  0:29           ` Nicholas A. Bellinger
  2012-07-10  0:29           ` Nicholas A. Bellinger
  4 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-10  0:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, Paolo Bonzini, target-devel, linux-scsi,
	lf-virt, kvm-devel, Stefan Hajnoczi, Zhi Yong Wu,
	Anthony Liguori, Christoph Hellwig, Jens Axboe, Hannes Reinecke

Hi folks,

On Wed, 2012-07-04 at 18:52 -0700, Nicholas A. Bellinger wrote:
> 
> To give an idea of how things are looking on the performance side, here
> some initial numbers for small block (4k) mixed random IOPs using the
> following fio test setup:

<SNIP>

> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> 
> 

After checking the original benchmarks here again, I realized that for
virtio-scsi+tcm_vhost the results where actually switched..

So this should have been: heavier READ case (25 / 75) == 55K, and
heavier WRITE case (75 / 25) == 45K.

> In the first case, virtio-scsi+tcm_vhost is out performing by 3x
> compared to virtio-scsi-raw using QEMU SCSI emulation with the same raw
> flash backend device.  For the second case heavier WRITE case, tcm_vhost
> is nearing full bare-metal utilization (~55K vs. ~60K).
> 
> Also converting tcm_vhost to use proper cmwq process context I/O
> submission will help to get even closer to bare metal speeds for both
> work-loads.
> 

Here are initial follow-up virtio-scsi randrw 4k benchmarks with
tcm_vhost recently converted to run backend I/O dispatch via modern cmwq
primitives (kworkerd).

fio randrw 4k workload | virtio-scsi+tcm_vhost+cmwq
---------------------------------------------------
  25 Write / 75 Read   |          ~60K
  75 Write / 25 Read   |	  ~45K

So aside from the minor performance improvement for the 25 / 75
workload, the other main improvement is lower CPU usage using the
iomemory_vsl backends.  This is attributed to cmwq providing process
context on the same core as the vhost thread pulling items off vq, which
ends up being on the order of 1/3 less host CPU usage (for both
workloads) primarly from positive cache effects.

This patch is now available in target-pending/tcm_vhost, and I'll be
respinning the initial merge series into for-next-merge over the next
days + another round of list review.

Please let us know if you have any concerns.

Thanks!

--nab


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
  2012-07-05  1:52         ` Nicholas A. Bellinger
                             ` (3 preceding siblings ...)
  2012-07-10  0:29           ` Nicholas A. Bellinger
@ 2012-07-10  0:29           ` Nicholas A. Bellinger
  4 siblings, 0 replies; 57+ messages in thread
From: Nicholas A. Bellinger @ 2012-07-10  0:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jens Axboe, linux-scsi, kvm-devel, Michael S. Tsirkin, lf-virt,
	Anthony Liguori, target-devel, Paolo Bonzini, Zhi Yong Wu,
	Christoph Hellwig, Stefan Hajnoczi

Hi folks,

On Wed, 2012-07-04 at 18:52 -0700, Nicholas A. Bellinger wrote:
> 
> To give an idea of how things are looking on the performance side, here
> some initial numbers for small block (4k) mixed random IOPs using the
> following fio test setup:

<SNIP>

> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read  |      ~15K       |         ~45K          |         ~70K
> 75 Write / 25 Read  |      ~20K       |         ~55K          |         ~60K
> 
> 

After checking the original benchmarks here again, I realized that for
virtio-scsi+tcm_vhost the results where actually switched..

So this should have been: heavier READ case (25 / 75) == 55K, and
heavier WRITE case (75 / 25) == 45K.

> In the first case, virtio-scsi+tcm_vhost is out performing by 3x
> compared to virtio-scsi-raw using QEMU SCSI emulation with the same raw
> flash backend device.  For the second case heavier WRITE case, tcm_vhost
> is nearing full bare-metal utilization (~55K vs. ~60K).
> 
> Also converting tcm_vhost to use proper cmwq process context I/O
> submission will help to get even closer to bare metal speeds for both
> work-loads.
> 

Here are initial follow-up virtio-scsi randrw 4k benchmarks with
tcm_vhost recently converted to run backend I/O dispatch via modern cmwq
primitives (kworkerd).

fio randrw 4k workload | virtio-scsi+tcm_vhost+cmwq
---------------------------------------------------
  25 Write / 75 Read   |          ~60K
  75 Write / 25 Read   |	  ~45K

So aside from the minor performance improvement for the 25 / 75
workload, the other main improvement is lower CPU usage using the
iomemory_vsl backends.  This is attributed to cmwq providing process
context on the same core as the vhost thread pulling items off vq, which
ends up being on the order of 1/3 less host CPU usage (for both
workloads) primarly from positive cache effects.

This patch is now available in target-pending/tcm_vhost, and I'll be
respinning the initial merge series into for-next-merge over the next
days + another round of list review.

Please let us know if you have any concerns.

Thanks!

--nab

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2012-07-10  0:29 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-04  4:24 [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
2012-07-04  4:24 ` [PATCH 1/6] vhost: Separate vhost-net features from vhost features Nicholas A. Bellinger
2012-07-04  4:41   ` Asias He
2012-07-04  4:24 ` [PATCH 2/6] vhost: make vhost work queue visible Nicholas A. Bellinger
2012-07-04  4:24 ` Nicholas A. Bellinger
2012-07-04  4:24 ` [PATCH 3/6] vhost: Add vhost_scsi specific defines Nicholas A. Bellinger
2012-07-04  4:24 ` Nicholas A. Bellinger
2012-07-04  4:24 ` [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver Nicholas A. Bellinger
2012-07-05 17:47   ` Bart Van Assche
2012-07-05 17:47   ` Bart Van Assche
2012-07-05 17:59     ` Bart Van Assche
2012-07-05 17:59     ` Bart Van Assche
2012-07-04  4:24 ` Nicholas A. Bellinger
2012-07-04  4:24 ` [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning Nicholas A. Bellinger
2012-07-04  4:24 ` Nicholas A. Bellinger
2012-07-04 14:50   ` Paolo Bonzini
2012-07-04  4:24 ` [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs Nicholas A. Bellinger
2012-07-04 14:50   ` Paolo Bonzini
2012-07-05  2:05     ` Nicholas A. Bellinger
2012-07-05  6:42       ` Paolo Bonzini
2012-07-04  4:24 ` Nicholas A. Bellinger
2012-07-04 14:02 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Michael S. Tsirkin
2012-07-04 14:52   ` Paolo Bonzini
2012-07-04 15:05     ` Michael S. Tsirkin
2012-07-04 22:12       ` Anthony Liguori
2012-07-05  1:52         ` Nicholas A. Bellinger
2012-07-05 10:22           ` Paolo Bonzini
2012-07-05 13:53             ` Michael S. Tsirkin
2012-07-05 14:06               ` Anthony Liguori
2012-07-05 14:40                 ` Michael S. Tsirkin
2012-07-05 14:47                   ` Paolo Bonzini
2012-07-05 17:26                     ` Michael S. Tsirkin
2012-07-06  3:01                 ` Nicholas A. Bellinger
2012-07-06  5:43                   ` SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6] James Bottomley
2012-07-06  9:13                     ` Nicholas A. Bellinger
2012-07-06 13:49                       ` James Bottomley
2012-07-06 18:21                         ` Nicholas A. Bellinger
2012-07-06 18:21                         ` Nicholas A. Bellinger
2012-07-06 20:30                     ` [Ksummit-2012-discuss] " Christoph Lameter
2012-07-06 20:30                     ` Christoph Lameter
2012-07-06 22:06                       ` Nicholas A. Bellinger
2012-07-06 22:06                       ` Nicholas A. Bellinger
2012-07-06  3:01                 ` [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6 Nicholas A. Bellinger
2012-07-05 14:06               ` Anthony Liguori
2012-07-05 14:32               ` Paolo Bonzini
2012-07-05 21:00                 ` Michael S. Tsirkin
2012-07-06  3:38               ` Nicholas A. Bellinger
2012-07-06  5:39                 ` Paolo Bonzini
2012-07-05 17:53           ` Bart Van Assche
2012-07-05 17:53           ` Bart Van Assche
2012-07-05 19:57             ` Bart Van Assche
2012-07-10  0:29           ` Nicholas A. Bellinger
2012-07-10  0:29           ` Nicholas A. Bellinger
2012-07-05  2:01       ` Nicholas A. Bellinger
2012-07-05  2:01       ` Nicholas A. Bellinger
2012-07-05  9:31         ` Michael S. Tsirkin
2012-07-06  3:13           ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.