All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/22] New virtio PCI layout
@ 2013-03-21  8:29 Rusty Russell
  2013-03-21  8:29 ` [PATCH 01/22] virtio_config: introduce size-based accessors Rusty Russell
                   ` (21 more replies)
  0 siblings, 22 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

I've renewed this again, with some comments from HPA.  I've tried to
keep the new patches separate, so you can see the changes since we
last discussed this (and so it's easy to back it out if we decide it's
insane).

I haven't even looked at the QEMU side so this is completely untested.

Comments gratefully received!
Rusty.

Michael S Tsirkin (1):
  pci: add pci_iomap_range

Rusty Russell (21):
  virtio_config: introduce size-based accessors.
  virtio_config: use size-based accessors.
  virtio_config: make transports implement accessors.
  virtio: use u32, not bitmap for struct virtio_device's features
  virtio: add support for 64 bit features.
  virtio: move vring structure into struct virtqueue.
  virtio-pci: define layout for virtio vendor-specific capabilities.
  virtio_pci: move old defines to legacy, introduce new structure.
  virtio_pci: use _LEGACY_ defines in virtio_pci_legacy.c
  virtio_pci: don't use the legacy driver if we find the new PCI
    capabilities.
  virtio_pci: allow duplicate capabilities.
  virtio_pci: new, capability-aware driver.
  virtio_pci: layout changes as per hpa's suggestions.
  virtio_pci: use little endian for config space.
  virtio_pci: use separate notification offsets for each vq.
  virtio_pci_legacy: cleanup struct virtio_pci_vq_info
  virtio_pci: share structure between legacy and modern.
  virtio_pci: share interrupt/notify handlers between legacy and
    modern.
  virtio_pci: share virtqueue setup/teardown between modern and legacy
    driver.
  virtio_pci: simplify common helpers.
  virtio_pci: fix finalize_features in modern driver.

 drivers/block/virtio_blk.c             |   77 ++--
 drivers/char/virtio_console.c          |   17 +-
 drivers/lguest/lguest_device.c         |   89 +++-
 drivers/net/caif/caif_virtio.c         |   25 +-
 drivers/net/virtio_net.c               |   28 +-
 drivers/remoteproc/remoteproc_virtio.c |    8 +-
 drivers/s390/kvm/kvm_virtio.c          |   88 +++-
 drivers/s390/kvm/virtio_ccw.c          |   39 +-
 drivers/scsi/virtio_scsi.c             |   12 +-
 drivers/virtio/Kconfig                 |   12 +
 drivers/virtio/Makefile                |    3 +-
 drivers/virtio/virtio.c                |   18 +-
 drivers/virtio/virtio_balloon.c        |   10 +-
 drivers/virtio/virtio_mmio.c           |   55 ++-
 drivers/virtio/virtio_pci-common.c     |  395 ++++++++++++++++
 drivers/virtio/virtio_pci-common.h     |  121 +++++
 drivers/virtio/virtio_pci.c            |  777 ++++++++++++--------------------
 drivers/virtio/virtio_pci_legacy.c     |  481 ++++++++++++++++++++
 drivers/virtio/virtio_ring.c           |  116 ++---
 include/asm-generic/pci_iomap.h        |    5 +
 include/linux/virtio.h                 |   11 +-
 include/linux/virtio_config.h          |  205 +++++++--
 include/linux/virtio_pci.h             |   35 ++
 include/uapi/linux/virtio_config.h     |    2 +
 include/uapi/linux/virtio_pci.h        |  111 ++++-
 lib/pci_iomap.c                        |   46 +-
 net/9p/trans_virtio.c                  |    9 +-
 tools/virtio/linux/virtio.h            |   22 +-
 tools/virtio/linux/virtio_config.h     |    2 +-
 tools/virtio/virtio_test.c             |    5 +-
 tools/virtio/vringh_test.c             |   16 +-
 31 files changed, 2018 insertions(+), 822 deletions(-)
 create mode 100644 drivers/virtio/virtio_pci-common.c
 create mode 100644 drivers/virtio/virtio_pci-common.h
 create mode 100644 drivers/virtio/virtio_pci_legacy.c
 create mode 100644 include/linux/virtio_pci.h

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 01/22] virtio_config: introduce size-based accessors.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 02/22] virtio_config: use " Rusty Russell
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

This lets the transport do endian conversion if necessary, and insulates
the drivers from that change.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/linux/virtio_config.h |  161 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 134 insertions(+), 27 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 29b9104..e8f8f71 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -96,33 +96,6 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	return test_bit(fbit, vdev->features);
 }
 
-/**
- * virtio_config_val - look for a feature and get a virtio config entry.
- * @vdev: the virtio device
- * @fbit: the feature bit
- * @offset: the type to search for.
- * @v: a pointer to the value to fill in.
- *
- * The return value is -ENOENT if the feature doesn't exist.  Otherwise
- * the config value is copied into whatever is pointed to by v. */
-#define virtio_config_val(vdev, fbit, offset, v) \
-	virtio_config_buf((vdev), (fbit), (offset), (v), sizeof(*v))
-
-#define virtio_config_val_len(vdev, fbit, offset, v, len) \
-	virtio_config_buf((vdev), (fbit), (offset), (v), (len))
-
-static inline int virtio_config_buf(struct virtio_device *vdev,
-				    unsigned int fbit,
-				    unsigned int offset,
-				    void *buf, unsigned len)
-{
-	if (!virtio_has_feature(vdev, fbit))
-		return -ENOENT;
-
-	vdev->config->get(vdev, offset, buf, len);
-	return 0;
-}
-
 static inline
 struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev,
 					vq_callback_t *c, const char *n)
@@ -162,5 +135,139 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+/* Config space accessors. */
+#define virtio_cread(vdev, structname, member, ptr)			\
+	do {								\
+		/* Must match the member's type, and be integer */	\
+		if (!typecheck(typeof((((structname*)0)->member)), *(ptr))) \
+			(*ptr) = 1;					\
+									\
+		switch (sizeof(*ptr)) {					\
+		case 1:							\
+			*(ptr) = virtio_cread8(vdev,			\
+					       offsetof(structname, member)); \
+			break;						\
+		case 2:							\
+			*(ptr) = virtio_cread16(vdev,			\
+						offsetof(structname, member)); \
+			break;						\
+		case 4:							\
+			*(ptr) = virtio_cread32(vdev,			\
+						offsetof(structname, member)); \
+			break;						\
+		case 8:							\
+			*(ptr) = virtio_cread64(vdev,			\
+						offsetof(structname, member)); \
+			break;						\
+		default:						\
+			BUG();						\
+		}							\
+	} while(0)
+
+/* Config space accessors. */
+#define virtio_cwrite(vdev, structname, member, ptr)			\
+	do {								\
+		/* Must match the member's type, and be integer */	\
+		if (!typecheck(typeof((((structname*)0)->member)), *(ptr))) \
+			BUG_ON((*ptr) == 1);				\
+									\
+		switch (sizeof(*ptr)) {					\
+		case 1:							\
+			virtio_cwrite8(vdev,				\
+				       offsetof(structname, member),	\
+				       *(ptr));				\
+			break;						\
+		case 2:							\
+			virtio_cwrite16(vdev,				\
+					offsetof(structname, member),	\
+					*(ptr));			\
+			break;						\
+		case 4:							\
+			virtio_cwrite32(vdev,				\
+					offsetof(structname, member),	\
+					*(ptr));			\
+			break;						\
+		case 8:							\
+			virtio_cwrite64(vdev,				\
+					offsetof(structname, member),	\
+					*(ptr));			\
+			break;						\
+		default:						\
+			BUG();						\
+		}							\
+	} while(0)
+
+static inline u8 virtio_cread8(struct virtio_device *vdev, unsigned int offset)
+{
+	u8 ret;
+	vdev->config->get(vdev, offset, &ret, sizeof(ret));
+	return ret;
+}
+
+static inline void virtio_cread_bytes(struct virtio_device *vdev,
+				      unsigned int offset,
+				      void *buf, size_t len)
+{
+	vdev->config->get(vdev, offset, buf, len);
+}
+
+static inline void virtio_cwrite8(struct virtio_device *vdev,
+				  unsigned int offset, u8 val)
+{
+	vdev->config->set(vdev, offset, &val, sizeof(val));
+}
+
+static inline u16 virtio_cread16(struct virtio_device *vdev,
+				 unsigned int offset)
+{
+	u16 ret;
+	vdev->config->get(vdev, offset, &ret, sizeof(ret));
+	return ret;
+}
+
+static inline void virtio_cwrite16(struct virtio_device *vdev,
+				   unsigned int offset, u16 val)
+{
+	vdev->config->set(vdev, offset, &val, sizeof(val));
+}
+
+static inline u32 virtio_cread32(struct virtio_device *vdev,
+				 unsigned int offset)
+{
+	u32 ret;
+	vdev->config->get(vdev, offset, &ret, sizeof(ret));
+	return ret;
+}
+
+static inline void virtio_cwrite32(struct virtio_device *vdev,
+				   unsigned int offset, u32 val)
+{
+	vdev->config->set(vdev, offset, &val, sizeof(val));
+}
+
+static inline u64 virtio_cread64(struct virtio_device *vdev,
+				 unsigned int offset)
+{
+	u64 ret;
+	vdev->config->get(vdev, offset, &ret, sizeof(ret));
+	return ret;
+}
+
+static inline void virtio_cwrite64(struct virtio_device *vdev,
+				   unsigned int offset, u64 val)
+{
+	vdev->config->set(vdev, offset, &val, sizeof(val));
+}
+
+/* Conditional config space accessors. */
+#define virtio_cread_feature(vdev, fbit, structname, member, ptr)	\
+	({								\
+		int _r = 0;						\
+		if (!virtio_has_feature(vdev, fbit))			\
+			_r = -ENOENT;					\
+		else							\
+			virtio_cread((vdev), structname, member, ptr);	\
+		_r;							\
+	})
 
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 02/22] virtio_config: use size-based accessors.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
  2013-03-21  8:29 ` [PATCH 01/22] virtio_config: introduce size-based accessors Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

This lets the transport do endian conversion if necessary, and insulates
the drivers from that change.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/block/virtio_blk.c      |   77 +++++++++++++++++----------------------
 drivers/char/virtio_console.c   |   15 +++-----
 drivers/net/caif/caif_virtio.c  |   23 ++++++------
 drivers/net/virtio_net.c        |   28 ++++++++------
 drivers/scsi/virtio_scsi.c      |   12 ++----
 drivers/virtio/virtio_balloon.c |   10 ++---
 net/9p/trans_virtio.c           |    9 ++---
 7 files changed, 79 insertions(+), 95 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6472395..1b1bbd3 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -456,18 +456,15 @@ static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
 static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
 {
 	struct virtio_blk *vblk = bd->bd_disk->private_data;
-	struct virtio_blk_geometry vgeo;
-	int err;
 
 	/* see if the host passed in geometry config */
-	err = virtio_config_val(vblk->vdev, VIRTIO_BLK_F_GEOMETRY,
-				offsetof(struct virtio_blk_config, geometry),
-				&vgeo);
-
-	if (!err) {
-		geo->heads = vgeo.heads;
-		geo->sectors = vgeo.sectors;
-		geo->cylinders = vgeo.cylinders;
+	if (virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_GEOMETRY)) {
+		virtio_cread(vblk->vdev, struct virtio_blk_config,
+			     geometry.cylinders, &geo->cylinders);
+		virtio_cread(vblk->vdev, struct virtio_blk_config,
+			     geometry.heads, &geo->heads);
+		virtio_cread(vblk->vdev, struct virtio_blk_config,
+			     geometry.sectors, &geo->sectors);
 	} else {
 		/* some standard values, similar to sd */
 		geo->heads = 1 << 6;
@@ -529,8 +526,7 @@ static void virtblk_config_changed_work(struct work_struct *work)
 		goto done;
 
 	/* Host must always specify the capacity. */
-	vdev->config->get(vdev, offsetof(struct virtio_blk_config, capacity),
-			  &capacity, sizeof(capacity));
+	virtio_cread(vdev, struct virtio_blk_config, capacity, &capacity);
 
 	/* If capacity is too big, truncate with warning. */
 	if ((sector_t)capacity != capacity) {
@@ -608,9 +604,9 @@ static int virtblk_get_cache_mode(struct virtio_device *vdev)
 	u8 writeback;
 	int err;
 
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_CONFIG_WCE,
-				offsetof(struct virtio_blk_config, wce),
-				&writeback);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE,
+				   struct virtio_blk_config, wce,
+				   &writeback);
 	if (err)
 		writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE);
 
@@ -642,7 +638,6 @@ virtblk_cache_type_store(struct device *dev, struct device_attribute *attr,
 	struct virtio_blk *vblk = disk->private_data;
 	struct virtio_device *vdev = vblk->vdev;
 	int i;
-	u8 writeback;
 
 	BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
 	for (i = ARRAY_SIZE(virtblk_cache_types); --i >= 0; )
@@ -652,11 +647,7 @@ virtblk_cache_type_store(struct device *dev, struct device_attribute *attr,
 	if (i < 0)
 		return -EINVAL;
 
-	writeback = i;
-	vdev->config->set(vdev,
-			  offsetof(struct virtio_blk_config, wce),
-			  &writeback, sizeof(writeback));
-
+	virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
 	virtblk_update_cache_mode(vdev);
 	return count;
 }
@@ -699,9 +690,9 @@ static int virtblk_probe(struct virtio_device *vdev)
 	index = err;
 
 	/* We need to know how many segments before we allocate. */
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX,
-				offsetof(struct virtio_blk_config, seg_max),
-				&sg_elems);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SEG_MAX,
+				   struct virtio_blk_config, seg_max,
+				   &sg_elems);
 
 	/* We need at least one SG element, whatever they say. */
 	if (err || !sg_elems)
@@ -772,8 +763,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		set_disk_ro(vblk->disk, 1);
 
 	/* Host must always specify the capacity. */
-	vdev->config->get(vdev, offsetof(struct virtio_blk_config, capacity),
-			  &cap, sizeof(cap));
+	virtio_cread(vdev, struct virtio_blk_config, capacity, &cap);
 
 	/* If capacity is too big, truncate with warning. */
 	if ((sector_t)cap != cap) {
@@ -794,46 +784,45 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	/* Host can optionally specify maximum segment size and number of
 	 * segments. */
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_SIZE_MAX,
-				offsetof(struct virtio_blk_config, size_max),
-				&v);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
+				   struct virtio_blk_config, size_max, &v);
 	if (!err)
 		blk_queue_max_segment_size(q, v);
 	else
 		blk_queue_max_segment_size(q, -1U);
 
 	/* Host can optionally specify the block size of the device */
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_BLK_SIZE,
-				offsetof(struct virtio_blk_config, blk_size),
-				&blk_size);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
+				   struct virtio_blk_config, blk_size,
+				   &blk_size);
 	if (!err)
 		blk_queue_logical_block_size(q, blk_size);
 	else
 		blk_size = queue_logical_block_size(q);
 
 	/* Use topology information if available */
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_TOPOLOGY,
-			offsetof(struct virtio_blk_config, physical_block_exp),
-			&physical_block_exp);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
+				   struct virtio_blk_config, physical_block_exp,
+				   &physical_block_exp);
 	if (!err && physical_block_exp)
 		blk_queue_physical_block_size(q,
 				blk_size * (1 << physical_block_exp));
 
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_TOPOLOGY,
-			offsetof(struct virtio_blk_config, alignment_offset),
-			&alignment_offset);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
+				   struct virtio_blk_config, alignment_offset,
+				   &alignment_offset);
 	if (!err && alignment_offset)
 		blk_queue_alignment_offset(q, blk_size * alignment_offset);
 
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_TOPOLOGY,
-			offsetof(struct virtio_blk_config, min_io_size),
-			&min_io_size);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
+				   struct virtio_blk_config, min_io_size,
+				   &min_io_size);
 	if (!err && min_io_size)
 		blk_queue_io_min(q, blk_size * min_io_size);
 
-	err = virtio_config_val(vdev, VIRTIO_BLK_F_TOPOLOGY,
-			offsetof(struct virtio_blk_config, opt_io_size),
-			&opt_io_size);
+	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
+				   struct virtio_blk_config, opt_io_size,
+				   &opt_io_size);
 	if (!err && opt_io_size)
 		blk_queue_io_opt(q, blk_size * opt_io_size);
 
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e6ba6b7..1735c38 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1800,12 +1800,8 @@ static void config_intr(struct virtio_device *vdev)
 		struct port *port;
 		u16 rows, cols;
 
-		vdev->config->get(vdev,
-				  offsetof(struct virtio_console_config, cols),
-				  &cols, sizeof(u16));
-		vdev->config->get(vdev,
-				  offsetof(struct virtio_console_config, rows),
-				  &rows, sizeof(u16));
+		virtio_cread(vdev, struct virtio_console_config, cols, &cols);
+		virtio_cread(vdev, struct virtio_console_config, rows, &rows);
 
 		port = find_port_by_id(portdev, 0);
 		set_console_size(port, rows, cols);
@@ -1977,10 +1973,9 @@ static int virtcons_probe(struct virtio_device *vdev)
 
 	/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
 	if (!is_rproc_serial(vdev) &&
-	    virtio_config_val(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
-				  offsetof(struct virtio_console_config,
-					   max_nr_ports),
-				  &portdev->config.max_nr_ports) == 0) {
+	    virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
+				 struct virtio_console_config, max_nr_ports,
+				 &portdev->config.max_nr_ports) == 0) {
 		multiport = true;
 	}
 
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index 8308dee..ef602e3 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -682,18 +682,19 @@ static int cfv_probe(struct virtio_device *vdev)
 		goto err;
 
 	/* Get the CAIF configuration from virtio config space, if available */
-#define GET_VIRTIO_CONFIG_OPS(_v, _var, _f) \
-	((_v)->config->get(_v, offsetof(struct virtio_caif_transf_config, _f), \
-			   &_var, \
-			   FIELD_SIZEOF(struct virtio_caif_transf_config, _f)))
-
 	if (vdev->config->get) {
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_hr, headroom);
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_hr, headroom);
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_tr, tailroom);
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_tr, tailroom);
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->mtu, mtu);
-		GET_VIRTIO_CONFIG_OPS(vdev, cfv->mru, mtu);
+		virtio_cread(vdev, struct virtio_caif_transf_config, headroom,
+			     &cfv->tx_hr);
+		virtio_cread(vdev, struct virtio_caif_transf_config, headroom,
+			     &cfv->rx_hr);
+		virtio_cread(vdev, struct virtio_caif_transf_config, tailroom,
+			     &cfv->tx_tr);
+		virtio_cread(vdev, struct virtio_caif_transf_config, tailroom,
+			     &cfv->rx_tr);
+		virtio_cread(vdev, struct virtio_caif_transf_config, mtu,
+			     &cfv->mtu);
+		virtio_cread(vdev, struct virtio_caif_transf_config, mtu,
+			     &cfv->mru);
 	} else {
 		cfv->tx_hr = CFV_DEF_HEADROOM;
 		cfv->rx_hr = CFV_DEF_HEADROOM;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index be70487..61c1592 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -829,8 +829,13 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 			return -EINVAL;
 		}
 	} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
-		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
-				  addr->sa_data, dev->addr_len);
+		unsigned int i;
+
+		/* Naturally, this has an atomicity problem. */
+		for (i = 0; i < dev->addr_len; i++)
+			virtio_cwrite8(vdev,
+				       offsetof(struct virtio_net_config, mac) +
+				       i, addr->sa_data[i]);
 	}
 
 	eth_commit_mac_addr_change(dev, p);
@@ -1239,9 +1244,8 @@ static void virtnet_config_changed_work(struct work_struct *work)
 	if (!vi->config_enable)
 		goto done;
 
-	if (virtio_config_val(vi->vdev, VIRTIO_NET_F_STATUS,
-			      offsetof(struct virtio_net_config, status),
-			      &v) < 0)
+	if (virtio_cread_feature(vi->vdev, VIRTIO_NET_F_STATUS,
+				 struct virtio_net_config, status, &v) < 0)
 		goto done;
 
 	if (v & VIRTIO_NET_S_ANNOUNCE) {
@@ -1463,9 +1467,9 @@ static int virtnet_probe(struct virtio_device *vdev)
 	u16 max_queue_pairs;
 
 	/* Find if host supports multiqueue virtio_net device */
-	err = virtio_config_val(vdev, VIRTIO_NET_F_MQ,
-				offsetof(struct virtio_net_config,
-				max_virtqueue_pairs), &max_queue_pairs);
+	err = virtio_cread_feature(vdev, VIRTIO_NET_F_MQ,
+				   struct virtio_net_config,
+				   max_virtqueue_pairs, &max_queue_pairs);
 
 	/* We need at least 2 queue's */
 	if (err || max_queue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
@@ -1513,9 +1517,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 	}
 
 	/* Configuration may specify what MAC to use.  Otherwise random. */
-	if (virtio_config_val_len(vdev, VIRTIO_NET_F_MAC,
-				  offsetof(struct virtio_net_config, mac),
-				  dev->dev_addr, dev->addr_len) < 0)
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
+		virtio_cread_bytes(vdev,
+				   offsetof(struct virtio_net_config, mac),
+				   dev->dev_addr, dev->addr_len);
+	else
 		eth_hw_addr_random(dev);
 
 	/* Set up our device-specific information */
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index f679b8c..214d397 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -547,19 +547,15 @@ static struct scsi_host_template virtscsi_host_template = {
 #define virtscsi_config_get(vdev, fld) \
 	({ \
 		typeof(((struct virtio_scsi_config *)0)->fld) __val; \
-		vdev->config->get(vdev, \
-				  offsetof(struct virtio_scsi_config, fld), \
-				  &__val, sizeof(__val)); \
+		virtio_cread(vdev, struct virtio_scsi_config, fld, &__val); \
 		__val; \
 	})
 
 #define virtscsi_config_set(vdev, fld, val) \
-	(void)({ \
+	do { \
 		typeof(((struct virtio_scsi_config *)0)->fld) __val = (val); \
-		vdev->config->set(vdev, \
-				  offsetof(struct virtio_scsi_config, fld), \
-				  &__val, sizeof(__val)); \
-	})
+		virtio_cwrite(vdev, struct virtio_scsi_config, fld, __val); \
+	} while(0)
 
 static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
 			     struct virtqueue *vq)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8dab163..bbab952 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -273,9 +273,8 @@ static inline s64 towards_target(struct virtio_balloon *vb)
 	__le32 v;
 	s64 target;
 
-	vb->vdev->config->get(vb->vdev,
-			      offsetof(struct virtio_balloon_config, num_pages),
-			      &v, sizeof(v));
+	virtio_cread(vb->vdev, struct virtio_balloon_config, num_pages, &v);
+
 	target = le32_to_cpu(v);
 	return target - vb->num_pages;
 }
@@ -284,9 +283,8 @@ static void update_balloon_size(struct virtio_balloon *vb)
 {
 	__le32 actual = cpu_to_le32(vb->num_pages);
 
-	vb->vdev->config->set(vb->vdev,
-			      offsetof(struct virtio_balloon_config, actual),
-			      &actual, sizeof(actual));
+	virtio_cwrite(vb->vdev, struct virtio_balloon_config, num_pages,
+		      &actual);
 }
 
 static int balloon(void *_vballoon)
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index de2e950..d843d5d 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -514,9 +514,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 
 	chan->inuse = false;
 	if (virtio_has_feature(vdev, VIRTIO_9P_MOUNT_TAG)) {
-		vdev->config->get(vdev,
-				offsetof(struct virtio_9p_config, tag_len),
-				&tag_len, sizeof(tag_len));
+		virtio_cread(vdev, struct virtio_9p_config, tag_len, &tag_len);
 	} else {
 		err = -EINVAL;
 		goto out_free_vq;
@@ -526,8 +524,9 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 		err = -ENOMEM;
 		goto out_free_vq;
 	}
-	vdev->config->get(vdev, offsetof(struct virtio_9p_config, tag),
-			tag, tag_len);
+
+	virtio_cread_bytes(vdev, offsetof(struct virtio_9p_config, tag),
+			   tag, tag_len);
 	chan->tag = tag;
 	chan->tag_len = tag_len;
 	err = sysfs_create_file(&(vdev->dev.kobj), &dev_attr_mount_tag.attr);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
  2013-03-21  8:29 ` [PATCH 01/22] virtio_config: introduce size-based accessors Rusty Russell
  2013-03-21  8:29 ` [PATCH 02/22] virtio_config: use " Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  9:09   ` Cornelia Huck
                     ` (2 more replies)
  2013-03-21  8:29 ` [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features Rusty Russell
                   ` (18 subsequent siblings)
  21 siblings, 3 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: Pawel Moll, Brian Swetland, Christian Borntraeger

All transports just pass through at the moment.

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Brian Swetland <swetland@google.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/lguest/lguest_device.c |   79 ++++++++++++++++++++++++++++++++++------
 drivers/net/caif/caif_virtio.c |    2 +-
 drivers/s390/kvm/kvm_virtio.c  |   78 +++++++++++++++++++++++++++++++++------
 drivers/s390/kvm/virtio_ccw.c  |   39 +++++++++++++++++++-
 drivers/virtio/virtio_mmio.c   |   35 +++++++++++++++++-
 drivers/virtio/virtio_pci.c    |   39 +++++++++++++++++---
 include/linux/virtio_config.h  |   70 +++++++++++++++++++++--------------
 7 files changed, 283 insertions(+), 59 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index b3256ff..8554d41 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -153,25 +153,76 @@ static void lg_finalize_features(struct virtio_device *vdev)
 }
 
 /* Once they've found a field, getting a copy of it is easy. */
-static void lg_get(struct virtio_device *vdev, unsigned int offset,
-		   void *buf, unsigned len)
+static u8 lg_get8(struct virtio_device *vdev, unsigned int offset)
 {
 	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
 
 	/* Check they didn't ask for more than the length of the config! */
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(buf, lg_config(desc) + offset, len);
+	BUG_ON(offset + sizeof(u8) > desc->config_len);
+	return *(u8 *)(lg_config(desc) + offset);
 }
 
-/* Setting the contents is also trivial. */
-static void lg_set(struct virtio_device *vdev, unsigned int offset,
-		   const void *buf, unsigned len)
+static void lg_set8(struct virtio_device *vdev, unsigned int offset, u8 val)
 {
 	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
 
 	/* Check they didn't ask for more than the length of the config! */
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(lg_config(desc) + offset, buf, len);
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u8 *)(lg_config(desc) + offset) = val;
+}
+
+static u16 lg_get16(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u16) > desc->config_len);
+	return *(u16 *)(lg_config(desc) + offset);
+}
+
+static void lg_set16(struct virtio_device *vdev, unsigned int offset, u16 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u16 *)(lg_config(desc) + offset) = val;
+}
+
+static u32 lg_get32(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u32) > desc->config_len);
+	return *(u32 *)(lg_config(desc) + offset);
+}
+
+static void lg_set32(struct virtio_device *vdev, unsigned int offset, u32 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u32 *)(lg_config(desc) + offset) = val;
+}
+
+static u64 lg_get64(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u64) > desc->config_len);
+	return *(u64 *)(lg_config(desc) + offset);
+}
+
+static void lg_set64(struct virtio_device *vdev, unsigned int offset, u64 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u64 *)(lg_config(desc) + offset) = val;
 }
 
 /*
@@ -399,8 +450,14 @@ static const char *lg_bus_name(struct virtio_device *vdev)
 static const struct virtio_config_ops lguest_config_ops = {
 	.get_features = lg_get_features,
 	.finalize_features = lg_finalize_features,
-	.get = lg_get,
-	.set = lg_set,
+	.get8 = lg_get8,
+	.set8 = lg_set8,
+	.get16 = lg_get16,
+	.set16 = lg_set16,
+	.get32 = lg_get32,
+	.set32 = lg_set32,
+	.get64 = lg_get64,
+	.set64 = lg_set64,
 	.get_status = lg_get_status,
 	.set_status = lg_set_status,
 	.reset = lg_reset,
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index ef602e3..0f9bae0 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -682,7 +682,7 @@ static int cfv_probe(struct virtio_device *vdev)
 		goto err;
 
 	/* Get the CAIF configuration from virtio config space, if available */
-	if (vdev->config->get) {
+	if (vdev->config->get8) {
 		virtio_cread(vdev, struct virtio_caif_transf_config, headroom,
 			     &cfv->tx_hr);
 		virtio_cread(vdev, struct virtio_caif_transf_config, headroom,
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 6711e65..dcf35b1 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -112,26 +112,82 @@ static void kvm_finalize_features(struct virtio_device *vdev)
 }
 
 /*
- * Reading and writing elements in config space
+ * Reading and writing elements in config space.  Host and guest are always
+ * big-endian, so no conversion necessary.
  */
-static void kvm_get(struct virtio_device *vdev, unsigned int offset,
-		   void *buf, unsigned len)
+static u8 kvm_get8(struct virtio_device *vdev, unsigned int offset)
 {
-	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
 
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(buf, kvm_vq_configspace(desc) + offset, len);
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u8) > desc->config_len);
+	return *(u8 *)(kvm_vq_configspace(desc) + offset);
 }
 
-static void kvm_set(struct virtio_device *vdev, unsigned int offset,
-		   const void *buf, unsigned len)
+static void kvm_set8(struct virtio_device *vdev, unsigned int offset, u8 val)
 {
-	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u8 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u16 kvm_get16(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u16) > desc->config_len);
+	return *(u16 *)(kvm_vq_configspace(desc) + offset);
+}
+
+static void kvm_set16(struct virtio_device *vdev, unsigned int offset, u16 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u16 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u32 kvm_get32(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
 
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(kvm_vq_configspace(desc) + offset, buf, len);
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u32) > desc->config_len);
+	return *(u32 *)(kvm_vq_configspace(desc) + offset);
 }
 
+static void kvm_set32(struct virtio_device *vdev, unsigned int offset, u32 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u32 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u64 kvm_get64(struct virtio_device *vdev, unsigned int offset)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u64) > desc->config_len);
+	return *(u64 *)(kvm_vq_configspace(desc) + offset);
+}
+
+static void kvm_set64(struct virtio_device *vdev, unsigned int offset, u64 val)
+{
+	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u64 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+
 /*
  * The operations to get and set the status word just access
  * the status field of the device descriptor. set_status will also
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 2029b6c..3652473 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -472,6 +472,7 @@ out_free:
 	kfree(ccw);
 }
 
+/* We don't need to do endian conversion, as it's always big endian like us */
 static void virtio_ccw_get_config(struct virtio_device *vdev,
 				  unsigned int offset, void *buf, unsigned len)
 {
@@ -505,6 +506,21 @@ out_free:
 	kfree(ccw);
 }
 
+
+#define VIRTIO_CCW_GET_CONFIGx(bits)					\
+static u##bits virtio_ccw_get_config##bits(struct virtio_device *vdev,	\
+					   unsigned int offset)		\
+{									\
+	u##bits v;							\
+	virtio_ccw_get_config(vdev, offset, &v, sizeof(v));		\
+	return v;							\
+}
+
+VIRTIO_CCW_GET_CONFIGx(8)
+VIRTIO_CCW_GET_CONFIGx(16)
+VIRTIO_CCW_GET_CONFIGx(32)
+VIRTIO_CCW_GET_CONFIGx(64)
+
 static void virtio_ccw_set_config(struct virtio_device *vdev,
 				  unsigned int offset, const void *buf,
 				  unsigned len)
@@ -535,6 +551,19 @@ out_free:
 	kfree(ccw);
 }
 
+#define VIRTIO_CCW_SET_CONFIGx(bits)					\
+static void virtio_ccw_set_config##bits(struct virtio_device *vdev,	\
+					unsigned int offset,		\
+					u##bits v)			\
+{									\
+	virtio_ccw_set_config(vdev, offset, &v, sizeof(v));		\
+}
+
+VIRTIO_CCW_SET_CONFIGx(8)
+VIRTIO_CCW_SET_CONFIGx(16)
+VIRTIO_CCW_SET_CONFIGx(32)
+VIRTIO_CCW_SET_CONFIGx(64)
+
 static u8 virtio_ccw_get_status(struct virtio_device *vdev)
 {
 	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
@@ -564,8 +593,14 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
 static struct virtio_config_ops virtio_ccw_config_ops = {
 	.get_features = virtio_ccw_get_features,
 	.finalize_features = virtio_ccw_finalize_features,
-	.get = virtio_ccw_get_config,
-	.set = virtio_ccw_set_config,
+	.get8 = virtio_ccw_get_config8,
+	.set8 = virtio_ccw_set_config8,
+	.get16 = virtio_ccw_get_config16,
+	.set16 = virtio_ccw_set_config16,
+	.get32 = virtio_ccw_get_config32,
+	.set32 = virtio_ccw_set_config32,
+	.get64 = virtio_ccw_get_config64,
+	.set64 = virtio_ccw_set_config64,
 	.get_status = virtio_ccw_get_status,
 	.set_status = virtio_ccw_set_status,
 	.reset = virtio_ccw_reset,
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 1ba0d68..ad7f38f 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -178,6 +178,19 @@ static void vm_get(struct virtio_device *vdev, unsigned offset,
 		ptr[i] = readb(vm_dev->base + VIRTIO_MMIO_CONFIG + offset + i);
 }
 
+#define VM_GETx(bits)							\
+static u##bits vm_get##bits(struct virtio_device *vdev, unsigned int offset) \
+{									\
+	u##bits v;							\
+	vm_get(vdev, offset, &v, sizeof(v));				\
+	return v;							\
+}
+
+VM_GETx(8)
+VM_GETx(16)
+VM_GETx(32)
+VM_GETx(64)
+
 static void vm_set(struct virtio_device *vdev, unsigned offset,
 		   const void *buf, unsigned len)
 {
@@ -189,6 +202,18 @@ static void vm_set(struct virtio_device *vdev, unsigned offset,
 		writeb(ptr[i], vm_dev->base + VIRTIO_MMIO_CONFIG + offset + i);
 }
 
+#define VM_SETx(bits)							\
+static void vm_set##bits(struct virtio_device *vdev, unsigned int offset, \
+			 u##bits v)					\
+{									\
+	vm_set(vdev, offset, &v, sizeof(v));				\
+}
+
+VM_SETx(8)
+VM_SETx(16)
+VM_SETx(32)
+VM_SETx(64)
+
 static u8 vm_get_status(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
@@ -424,8 +449,14 @@ static const char *vm_bus_name(struct virtio_device *vdev)
 }
 
 static const struct virtio_config_ops virtio_mmio_config_ops = {
-	.get		= vm_get,
-	.set		= vm_set,
+	.get8		= vm_get8,
+	.set8		= vm_set8,
+	.get16		= vm_get16,
+	.set16		= vm_set16,
+	.get32		= vm_get32,
+	.set32		= vm_set32,
+	.get64		= vm_get64,
+	.set64		= vm_set64,
 	.get_status	= vm_get_status,
 	.set_status	= vm_set_status,
 	.reset		= vm_reset,
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a7ce730..96a988b 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -127,7 +127,7 @@ static void vp_finalize_features(struct virtio_device *vdev)
 	iowrite32(vdev->features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
 }
 
-/* virtio config->get() implementation */
+/* Device config access: we use guest endian, as per spec. */
 static void vp_get(struct virtio_device *vdev, unsigned offset,
 		   void *buf, unsigned len)
 {
@@ -141,8 +141,19 @@ static void vp_get(struct virtio_device *vdev, unsigned offset,
 		ptr[i] = ioread8(ioaddr + i);
 }
 
-/* the config->set() implementation.  it's symmetric to the config->get()
- * implementation */
+#define VP_GETx(bits)							\
+static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
+{									\
+	u##bits v;							\
+	vp_get(vdev, offset, &v, sizeof(v));				\
+	return v;							\
+}
+
+VP_GETx(8)
+VP_GETx(16)
+VP_GETx(32)
+VP_GETx(64)
+
 static void vp_set(struct virtio_device *vdev, unsigned offset,
 		   const void *buf, unsigned len)
 {
@@ -156,6 +167,18 @@ static void vp_set(struct virtio_device *vdev, unsigned offset,
 		iowrite8(ptr[i], ioaddr + i);
 }
 
+#define VP_SETx(bits)							\
+static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
+			 u##bits v)					\
+{									\
+	vp_set(vdev, offset, &v, sizeof(v));				\
+}
+
+VP_SETx(8)
+VP_SETx(16)
+VP_SETx(32)
+VP_SETx(64)
+
 /* config->{get,set}_status() implementations */
 static u8 vp_get_status(struct virtio_device *vdev)
 {
@@ -653,8 +676,14 @@ static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
 }
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
-	.get		= vp_get,
-	.set		= vp_set,
+	.get8		= vp_get8,
+	.set8		= vp_set8,
+	.get16		= vp_get16,
+	.set16		= vp_set16,
+	.get32		= vp_get32,
+	.set32		= vp_set32,
+	.get64		= vp_get64,
+	.set64		= vp_set64,
 	.get_status	= vp_get_status,
 	.set_status	= vp_set_status,
 	.reset		= vp_reset,
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..73841ee 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -8,16 +8,30 @@
 
 /**
  * virtio_config_ops - operations for configuring a virtio device
- * @get: read the value of a configuration field
+ * @get8: read a byte from a configuration field
  *	vdev: the virtio_device
  *	offset: the offset of the configuration field
- *	buf: the buffer to write the field value into.
- *	len: the length of the buffer
- * @set: write the value of a configuration field
+ * @set8: write a byte to a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @get16: read a short from a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @set16: write a short to a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @get32: read a u32 from a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @set32: write a u32 to a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @get64: read a u64 from a configuration field
+ *	vdev: the virtio_device
+ *	offset: the offset of the configuration field
+ * @set64: write a u64 to a configuration field
  *	vdev: the virtio_device
  *	offset: the offset of the configuration field
- *	buf: the buffer to read the field value from.
- *	len: the length of the buffer
  * @get_status: read the status byte
  *	vdev: the virtio_device
  *	Returns the status byte
@@ -54,10 +68,14 @@
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
-	void (*get)(struct virtio_device *vdev, unsigned offset,
-		    void *buf, unsigned len);
-	void (*set)(struct virtio_device *vdev, unsigned offset,
-		    const void *buf, unsigned len);
+	u8 (*get8)(struct virtio_device *vdev, unsigned offset);
+	void (*set8)(struct virtio_device *vdev, unsigned offset, u8 val);
+	u16 (*get16)(struct virtio_device *vdev, unsigned offset);
+	void (*set16)(struct virtio_device *vdev, unsigned offset, u16 val);
+	u32 (*get32)(struct virtio_device *vdev, unsigned offset);
+	void (*set32)(struct virtio_device *vdev, unsigned offset, u32 val);
+	u64 (*get64)(struct virtio_device *vdev, unsigned offset);
+	void (*set64)(struct virtio_device *vdev, unsigned offset, u64 val);
 	u8 (*get_status)(struct virtio_device *vdev);
 	void (*set_status)(struct virtio_device *vdev, u8 status);
 	void (*reset)(struct virtio_device *vdev);
@@ -199,64 +217,62 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
 
 static inline u8 virtio_cread8(struct virtio_device *vdev, unsigned int offset)
 {
-	u8 ret;
-	vdev->config->get(vdev, offset, &ret, sizeof(ret));
-	return ret;
+	return vdev->config->get8(vdev, offset);
 }
 
 static inline void virtio_cread_bytes(struct virtio_device *vdev,
 				      unsigned int offset,
 				      void *buf, size_t len)
 {
-	vdev->config->get(vdev, offset, buf, len);
+	u8 *dst = buf;
+	while (len) {
+		*dst = vdev->config->get8(vdev, offset);
+		dst++;
+		offset++;
+		len--;
+	}
 }
 
 static inline void virtio_cwrite8(struct virtio_device *vdev,
 				  unsigned int offset, u8 val)
 {
-	vdev->config->set(vdev, offset, &val, sizeof(val));
+	vdev->config->set8(vdev, offset, val);
 }
 
 static inline u16 virtio_cread16(struct virtio_device *vdev,
 				 unsigned int offset)
 {
-	u16 ret;
-	vdev->config->get(vdev, offset, &ret, sizeof(ret));
-	return ret;
+	return vdev->config->get16(vdev, offset);
 }
 
 static inline void virtio_cwrite16(struct virtio_device *vdev,
 				   unsigned int offset, u16 val)
 {
-	vdev->config->set(vdev, offset, &val, sizeof(val));
+	vdev->config->set16(vdev, offset, val);
 }
 
 static inline u32 virtio_cread32(struct virtio_device *vdev,
 				 unsigned int offset)
 {
-	u32 ret;
-	vdev->config->get(vdev, offset, &ret, sizeof(ret));
-	return ret;
+	return vdev->config->get32(vdev, offset);
 }
 
 static inline void virtio_cwrite32(struct virtio_device *vdev,
 				   unsigned int offset, u32 val)
 {
-	vdev->config->set(vdev, offset, &val, sizeof(val));
+	vdev->config->set32(vdev, offset, val);
 }
 
 static inline u64 virtio_cread64(struct virtio_device *vdev,
 				 unsigned int offset)
 {
-	u64 ret;
-	vdev->config->get(vdev, offset, &ret, sizeof(ret));
-	return ret;
+	return vdev->config->get64(vdev, offset);
 }
 
 static inline void virtio_cwrite64(struct virtio_device *vdev,
 				   unsigned int offset, u64 val)
 {
-	vdev->config->set(vdev, offset, &val, sizeof(val));
+	vdev->config->set64(vdev, offset, val);
 }
 
 /* Conditional config space accessors. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (2 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21 10:00   ` Cornelia Huck
  2013-03-21  8:29 ` [PATCH 05/22] virtio: add support for 64 bit features Rusty Russell
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: Pawel Moll, Brian Swetland, Christian Borntraeger

It seemed like a good idea, but it's actually a pain when we get more
than 32 feature bits.  Just change it to a u32 for now.

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Brian Swetland <swetland@google.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/virtio_console.c          |    2 +-
 drivers/lguest/lguest_device.c         |    2 +-
 drivers/remoteproc/remoteproc_virtio.c |    2 +-
 drivers/s390/kvm/kvm_virtio.c          |    2 +-
 drivers/virtio/virtio.c                |   10 +++++-----
 drivers/virtio/virtio_mmio.c           |    8 ++------
 drivers/virtio/virtio_pci.c            |    3 +--
 drivers/virtio/virtio_ring.c           |    2 +-
 include/linux/virtio.h                 |    3 +--
 include/linux/virtio_config.h          |    2 +-
 tools/virtio/linux/virtio.h            |   22 +---------------------
 tools/virtio/linux/virtio_config.h     |    2 +-
 tools/virtio/virtio_test.c             |    5 ++---
 tools/virtio/vringh_test.c             |   16 ++++++++--------
 14 files changed, 27 insertions(+), 54 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 1735c38..9d9f717 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -351,7 +351,7 @@ static inline bool use_multiport(struct ports_device *portdev)
 	 */
 	if (!portdev->vdev)
 		return 0;
-	return portdev->vdev->features[0] & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+	return portdev->vdev->features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static DEFINE_SPINLOCK(dma_bufs_lock);
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 8554d41..48bd2ad 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -144,7 +144,7 @@ static void lg_finalize_features(struct virtio_device *vdev)
 	memset(out_features, 0, desc->feature_len);
 	bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
 	for (i = 0; i < bits; i++) {
-		if (test_bit(i, vdev->features))
+		if (vdev->features & (1 << i))
 			out_features[i / 8] |= (1 << (i % 8));
 	}
 
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index afed9b7..fb8faee 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -219,7 +219,7 @@ static void rproc_virtio_finalize_features(struct virtio_device *vdev)
 	 * fixed as part of a small resource table overhaul and then an
 	 * extension of the virtio resource entries.
 	 */
-	rvdev->gfeatures = vdev->features[0];
+	rvdev->gfeatures = vdev->features;
 }
 
 static const struct virtio_config_ops rproc_virtio_config_ops = {
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index dcf35b1..2553319 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -106,7 +106,7 @@ static void kvm_finalize_features(struct virtio_device *vdev)
 	memset(out_features, 0, desc->feature_len);
 	bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
 	for (i = 0; i < bits; i++) {
-		if (test_bit(i, vdev->features))
+		if (vdev->features & (1 << i))
 			out_features[i / 8] |= (1 << (i % 8));
 	}
 }
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..dd33d84 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -41,9 +41,9 @@ static ssize_t features_show(struct device *_d,
 
 	/* We actually represent this as a bitstring, as it could be
 	 * arbitrary length in future. */
-	for (i = 0; i < ARRAY_SIZE(dev->features)*BITS_PER_LONG; i++)
+	for (i = 0; i < sizeof(dev->features)*8; i++)
 		len += sprintf(buf+len, "%c",
-			       test_bit(i, dev->features) ? '1' : '0');
+			       dev->features & (1ULL << i) ? '1' : '0');
 	len += sprintf(buf+len, "\n");
 	return len;
 }
@@ -120,18 +120,18 @@ static int virtio_dev_probe(struct device *_d)
 	device_features = dev->config->get_features(dev);
 
 	/* Features supported by both device and driver into dev->features. */
-	memset(dev->features, 0, sizeof(dev->features));
+	dev->features = 0;
 	for (i = 0; i < drv->feature_table_size; i++) {
 		unsigned int f = drv->feature_table[i];
 		BUG_ON(f >= 32);
 		if (device_features & (1 << f))
-			set_bit(f, dev->features);
+			dev->features |= (1 << f);
 	}
 
 	/* Transport features always preserved to pass to finalize_features. */
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++)
 		if (device_features & (1 << i))
-			set_bit(i, dev->features);
+			dev->features |= (1 << i);
 
 	dev->config->finalize_features(dev);
 
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index ad7f38f..d933150 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -155,16 +155,12 @@ static u32 vm_get_features(struct virtio_device *vdev)
 static void vm_finalize_features(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
-	int i;
 
 	/* Give virtio_ring a chance to accept features. */
 	vring_transport_features(vdev);
 
-	for (i = 0; i < ARRAY_SIZE(vdev->features); i++) {
-		writel(i, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
-		writel(vdev->features[i],
-				vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
-	}
+	writel(0, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
+	writel(vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
 }
 
 static void vm_get(struct virtio_device *vdev, unsigned offset,
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 96a988b..55bd65f 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -123,8 +123,7 @@ static void vp_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	/* We only support 32 feature bits. */
-	BUILD_BUG_ON(ARRAY_SIZE(vdev->features) != 1);
-	iowrite32(vdev->features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
+	iowrite32(vdev->features, vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
 }
 
 /* Device config access: we use guest endian, as per spec. */
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5217baf..82afdd8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -814,7 +814,7 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		default:
 			/* We don't understand this bit. */
-			clear_bit(i, vdev->features);
+			vdev->features &= ~(1 << i);
 		}
 	}
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 833f17b..80f55a0 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -100,8 +100,7 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
-	/* Note that this is a Linux set_bit-style bitmap. */
-	unsigned long features[1];
+	u32 features;
 	void *priv;
 };
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 73841ee..c250625 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -111,7 +111,7 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	if (fbit < VIRTIO_TRANSPORT_F_START)
 		virtio_check_driver_offered_feature(vdev, fbit);
 
-	return test_bit(fbit, vdev->features);
+	return vdev->features & (1 << fbit);
 }
 
 static inline
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 6df181a..02a38e8 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -6,31 +6,11 @@
 /* TODO: empty stubs for now. Broken but enough for virtio_ring.c */
 #define list_add_tail(a, b) do {} while (0)
 #define list_del(a) do {} while (0)
-
-#define BIT_WORD(nr)		((nr) / BITS_PER_LONG)
-#define BITS_PER_BYTE		8
-#define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)
-#define BIT_MASK(nr)		(1UL << ((nr) % BITS_PER_LONG))
-
-/* TODO: Not atomic as it should be:
- * we don't use this for anything important. */
-static inline void clear_bit(int nr, volatile unsigned long *addr)
-{
-	unsigned long mask = BIT_MASK(nr);
-	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
-	*p &= ~mask;
-}
-
-static inline int test_bit(int nr, const volatile unsigned long *addr)
-{
-        return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
 /* end of stubs */
 
 struct virtio_device {
 	void *dev;
-	unsigned long features[1];
+	u32 features;
 };
 
 struct virtqueue {
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
index 5049967..1f1636b 100644
--- a/tools/virtio/linux/virtio_config.h
+++ b/tools/virtio/linux/virtio_config.h
@@ -2,5 +2,5 @@
 #define VIRTIO_TRANSPORT_F_END		32
 
 #define virtio_has_feature(dev, feature) \
-	test_bit((feature), (dev)->features)
+	((dev)->features & (1 << feature))
 
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index 814ae80..59f0706 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -59,7 +59,7 @@ void vhost_vq_setup(struct vdev_info *dev, struct vq_info *info)
 {
 	struct vhost_vring_state state = { .index = info->idx };
 	struct vhost_vring_file file = { .index = info->idx };
-	unsigned long long features = dev->vdev.features[0];
+	unsigned long long features = dev->vdev.features;
 	struct vhost_vring_addr addr = {
 		.index = info->idx,
 		.desc_user_addr = (uint64_t)(unsigned long)info->vring.desc,
@@ -112,8 +112,7 @@ static void vdev_info_init(struct vdev_info* dev, unsigned long long features)
 {
 	int r;
 	memset(dev, 0, sizeof *dev);
-	dev->vdev.features[0] = features;
-	dev->vdev.features[1] = features >> 32;
+	dev->vdev.features = features;
 	dev->buf_size = 1024;
 	dev->buf = malloc(dev->buf_size);
 	assert(dev->buf);
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
index 88fe02a..d71fc05 100644
--- a/tools/virtio/vringh_test.c
+++ b/tools/virtio/vringh_test.c
@@ -299,7 +299,7 @@ static int parallel_test(unsigned long features,
 		close(to_guest[1]);
 		close(to_host[0]);
 
-		gvdev.vdev.features[0] = features;
+		gvdev.vdev.features = features;
 		gvdev.to_host_fd = to_host[1];
 		gvdev.notifies = 0;
 
@@ -444,13 +444,13 @@ int main(int argc, char *argv[])
 	bool fast_vringh = false, parallel = false;
 
 	getrange = getrange_iov;
-	vdev.features[0] = 0;
+	vdev.features = 0;
 
 	while (argv[1]) {
 		if (strcmp(argv[1], "--indirect") == 0)
-			vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+			vdev.features |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
 		else if (strcmp(argv[1], "--eventidx") == 0)
-			vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
+			vdev.features |= (1 << VIRTIO_RING_F_EVENT_IDX);
 		else if (strcmp(argv[1], "--slow-range") == 0)
 			getrange = getrange_slow;
 		else if (strcmp(argv[1], "--fast-vringh") == 0)
@@ -463,7 +463,7 @@ int main(int argc, char *argv[])
 	}
 
 	if (parallel)
-		return parallel_test(vdev.features[0], getrange, fast_vringh);
+		return parallel_test(vdev.features, getrange, fast_vringh);
 
 	if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
 		abort();
@@ -478,7 +478,7 @@ int main(int argc, char *argv[])
 
 	/* Set up host side. */
 	vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
-	vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
+	vringh_init_user(&vrh, vdev.features, RINGSIZE, true,
 			 vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
 
 	/* No descriptor to get yet... */
@@ -646,13 +646,13 @@ int main(int argc, char *argv[])
 	}
 
 	/* Test weird (but legal!) indirect. */
-	if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+	if (vdev.features & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
 		char *data = __user_addr_max - USER_MEM/4;
 		struct vring_desc *d = __user_addr_max - USER_MEM/2;
 		struct vring vring;
 
 		/* Force creation of direct, which we modify. */
-		vdev.features[0] &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
+		vdev.features &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
 		vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
 					 __user_addr_min,
 					 never_notify_host,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (3 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21 10:06   ` Cornelia Huck
  2013-04-02 17:09   ` Pawel Moll
  2013-03-21  8:29 ` [PATCH 06/22] virtio: move vring structure into struct virtqueue Rusty Russell
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: Pawel Moll, Brian Swetland, Christian Borntraeger

Change the u32 to a u64, and make sure to use 1ULL everywhere!

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Brian Swetland <swetland@google.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/virtio_console.c          |    2 +-
 drivers/lguest/lguest_device.c         |   10 +++++-----
 drivers/remoteproc/remoteproc_virtio.c |    6 +++++-
 drivers/s390/kvm/kvm_virtio.c          |   10 +++++-----
 drivers/virtio/virtio.c                |   12 ++++++------
 drivers/virtio/virtio_mmio.c           |   14 +++++++++-----
 drivers/virtio/virtio_pci.c            |    5 ++---
 drivers/virtio/virtio_ring.c           |    2 +-
 include/linux/virtio.h                 |    2 +-
 include/linux/virtio_config.h          |    8 ++++----
 tools/virtio/linux/virtio.h            |    2 +-
 tools/virtio/linux/virtio_config.h     |    2 +-
 12 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 9d9f717..1a1e5da 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -351,7 +351,7 @@ static inline bool use_multiport(struct ports_device *portdev)
 	 */
 	if (!portdev->vdev)
 		return 0;
-	return portdev->vdev->features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+	return portdev->vdev->features & (1ULL << VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static DEFINE_SPINLOCK(dma_bufs_lock);
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 48bd2ad..c045f7e 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -94,17 +94,17 @@ static unsigned desc_size(const struct lguest_device_desc *desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 lg_get_features(struct virtio_device *vdev)
+static u64 lg_get_features(struct virtio_device *vdev)
 {
 	unsigned int i;
-	u32 features = 0;
+	u64 features = 0;
 	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
 	u8 *in_features = lg_features(desc);
 
 	/* We do this the slow but generic way. */
-	for (i = 0; i < min(desc->feature_len * 8, 32); i++)
+	for (i = 0; i < min(desc->feature_len * 8, 64); i++)
 		if (in_features[i / 8] & (1 << (i % 8)))
-			features |= (1 << i);
+			features |= (1ULL << i);
 
 	return features;
 }
@@ -144,7 +144,7 @@ static void lg_finalize_features(struct virtio_device *vdev)
 	memset(out_features, 0, desc->feature_len);
 	bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
 	for (i = 0; i < bits; i++) {
-		if (vdev->features & (1 << i))
+		if (vdev->features & (1ULL << i))
 			out_features[i / 8] |= (1 << (i % 8));
 	}
 
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index fb8faee..aefbaae 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -196,7 +196,7 @@ static void rproc_virtio_reset(struct virtio_device *vdev)
 }
 
 /* provide the vdev features as retrieved from the firmware */
-static u32 rproc_virtio_get_features(struct virtio_device *vdev)
+static u64 rproc_virtio_get_features(struct virtio_device *vdev)
 {
 	struct rproc_vdev *rvdev = vdev_to_rvdev(vdev);
 
@@ -210,6 +210,9 @@ static void rproc_virtio_finalize_features(struct virtio_device *vdev)
 	/* Give virtio_ring a chance to accept features */
 	vring_transport_features(vdev);
 
+	/* Make sure we don't have any features > 32 bits! */
+	BUG_ON((u32)vdev->features != vdev->features);
+
 	/*
 	 * Remember the finalized features of our vdev, and provide it
 	 * to the remote processor once it is powered on.
@@ -220,6 +223,7 @@ static void rproc_virtio_finalize_features(struct virtio_device *vdev)
 	 * extension of the virtio resource entries.
 	 */
 	rvdev->gfeatures = vdev->features;
+
 }
 
 static const struct virtio_config_ops rproc_virtio_config_ops = {
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 2553319..05d7b4d 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -80,16 +80,16 @@ static unsigned desc_size(const struct kvm_device_desc *desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 kvm_get_features(struct virtio_device *vdev)
+static u64 kvm_get_features(struct virtio_device *vdev)
 {
 	unsigned int i;
-	u32 features = 0;
+	u64 features = 0;
 	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
 	u8 *in_features = kvm_vq_features(desc);
 
-	for (i = 0; i < min(desc->feature_len * 8, 32); i++)
+	for (i = 0; i < min(desc->feature_len * 8, 64); i++)
 		if (in_features[i / 8] & (1 << (i % 8)))
-			features |= (1 << i);
+			features |= (1ULL << i);
 	return features;
 }
 
@@ -106,7 +106,7 @@ static void kvm_finalize_features(struct virtio_device *vdev)
 	memset(out_features, 0, desc->feature_len);
 	bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
 	for (i = 0; i < bits; i++) {
-		if (vdev->features & (1 << i))
+		if (vdev->features & (1ULL << i))
 			out_features[i / 8] |= (1 << (i % 8));
 	}
 }
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index dd33d84..e342692 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -111,7 +111,7 @@ static int virtio_dev_probe(struct device *_d)
 	int err, i;
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
-	u32 device_features;
+	u64 device_features;
 
 	/* We have a driver! */
 	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -123,15 +123,15 @@ static int virtio_dev_probe(struct device *_d)
 	dev->features = 0;
 	for (i = 0; i < drv->feature_table_size; i++) {
 		unsigned int f = drv->feature_table[i];
-		BUG_ON(f >= 32);
-		if (device_features & (1 << f))
-			dev->features |= (1 << f);
+		BUG_ON(f >= 64);
+		if (device_features & (1ULL << f))
+			dev->features |= (1ULL << f);
 	}
 
 	/* Transport features always preserved to pass to finalize_features. */
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++)
-		if (device_features & (1 << i))
-			dev->features |= (1 << i);
+		if (device_features & (1ULL << i))
+			dev->features |= (1ULL << i);
 
 	dev->config->finalize_features(dev);
 
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index d933150..84ef5fc 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -142,14 +142,16 @@ struct virtio_mmio_vq_info {
 
 /* Configuration interface */
 
-static u32 vm_get_features(struct virtio_device *vdev)
+static u64 vm_get_features(struct virtio_device *vdev)
 {
 	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+	u64 features;
 
-	/* TODO: Features > 32 bits */
 	writel(0, vm_dev->base + VIRTIO_MMIO_HOST_FEATURES_SEL);
-
-	return readl(vm_dev->base + VIRTIO_MMIO_HOST_FEATURES);
+	features = readl(vm_dev->base + VIRTIO_MMIO_HOST_FEATURES);
+	writel(1, vm_dev->base + VIRTIO_MMIO_HOST_FEATURES_SEL);
+	features |= ((u64)readl(vm_dev->base + VIRTIO_MMIO_HOST_FEATURES) << 32);
+	return features;
 }
 
 static void vm_finalize_features(struct virtio_device *vdev)
@@ -160,7 +162,9 @@ static void vm_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	writel(0, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
-	writel(vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
+	writel((u32)vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
+	writel(1, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
+	writel(vdev->features >> 32, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
 }
 
 static void vm_get(struct virtio_device *vdev, unsigned offset,
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 55bd65f..8d081ea 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -105,12 +105,11 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
 }
 
 /* virtio config->get_features() implementation */
-static u32 vp_get_features(struct virtio_device *vdev)
+static u64 vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	/* When someone needs more than 32 feature bits, we'll need to
-	 * steal a bit to indicate that the rest are somewhere else. */
+	/* We only support 32 feature bits. */
 	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES);
 }
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82afdd8..ba58e29 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -814,7 +814,7 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		default:
 			/* We don't understand this bit. */
-			vdev->features &= ~(1 << i);
+			vdev->features &= ~(1ULL << i);
 		}
 	}
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 80f55a0..a05f7c7 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -100,7 +100,7 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
-	u32 features;
+	u64 features;
 	void *priv;
 };
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index c250625..9dfe116 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -84,7 +84,7 @@ struct virtio_config_ops {
 			vq_callback_t *callbacks[],
 			const char *names[]);
 	void (*del_vqs)(struct virtio_device *);
-	u32 (*get_features)(struct virtio_device *vdev);
+	u64 (*get_features)(struct virtio_device *vdev);
 	void (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
@@ -104,14 +104,14 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 {
 	/* Did you forget to fix assumptions on max features? */
 	if (__builtin_constant_p(fbit))
-		BUILD_BUG_ON(fbit >= 32);
+		BUILD_BUG_ON(fbit >= 64);
 	else
-		BUG_ON(fbit >= 32);
+		BUG_ON(fbit >= 64);
 
 	if (fbit < VIRTIO_TRANSPORT_F_START)
 		virtio_check_driver_offered_feature(vdev, fbit);
 
-	return vdev->features & (1 << fbit);
+	return vdev->features & (1ULL << fbit);
 }
 
 static inline
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 02a38e8..358706d 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -10,7 +10,7 @@
 
 struct virtio_device {
 	void *dev;
-	u32 features;
+	u64 features;
 };
 
 struct virtqueue {
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
index 1f1636b..a254c2b 100644
--- a/tools/virtio/linux/virtio_config.h
+++ b/tools/virtio/linux/virtio_config.h
@@ -2,5 +2,5 @@
 #define VIRTIO_TRANSPORT_F_END		32
 
 #define virtio_has_feature(dev, feature) \
-	((dev)->features & (1 << feature))
+	((dev)->features & (1ULL << feature))
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 06/22] virtio: move vring structure into struct virtqueue.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (4 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 05/22] virtio: add support for 64 bit features Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 07/22] pci: add pci_iomap_range Rusty Russell
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Back in 2010 (7c5e9ed0c84e7d70d887878574590638d5572659) MST removed
the abstraction between virtio and the virtio ring, removing the various
ops pointers, and we haven't really missed it.

Now we hoist the struct vring out from the private struct
vring_virtqueue into the struct virtqueue: we've already demonstrated
that it's useful to be able to see the ring size, and the new virtio
pci layout wants to know the location of each part of the ring.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |  114 +++++++++++++++++-------------------------
 include/linux/virtio.h       |    8 ++-
 2 files changed, 54 insertions(+), 68 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ba58e29..7f9d4e9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -53,13 +53,9 @@
 #define END_USE(vq)
 #endif
 
-struct vring_virtqueue
-{
+struct vring_virtqueue {
 	struct virtqueue vq;
 
-	/* Actual memory layout for this queue */
-	struct vring vring;
-
 	/* Can we use weak barriers? */
 	bool weak_barriers;
 
@@ -171,12 +167,12 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq,
 
 	/* Use a single buffer which doesn't continue */
 	head = vq->free_head;
-	vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
-	vq->vring.desc[head].addr = virt_to_phys(desc);
-	vq->vring.desc[head].len = i * sizeof(struct vring_desc);
+	vq->vq.vring.desc[head].flags = VRING_DESC_F_INDIRECT;
+	vq->vq.vring.desc[head].addr = virt_to_phys(desc);
+	vq->vq.vring.desc[head].len = i * sizeof(struct vring_desc);
 
 	/* Update free pointer */
-	vq->free_head = vq->vring.desc[head].next;
+	vq->free_head = vq->vq.vring.desc[head].next;
 
 	return head;
 }
@@ -226,7 +222,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 			goto add_head;
 	}
 
-	BUG_ON(total_sg > vq->vring.num);
+	BUG_ON(total_sg > vq->vq.vring.num);
 	BUG_ON(total_sg == 0);
 
 	if (vq->vq.num_free < total_sg) {
@@ -247,24 +243,24 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	head = i = vq->free_head;
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = next(sg, &total_out)) {
-			vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
-			vq->vring.desc[i].addr = sg_phys(sg);
-			vq->vring.desc[i].len = sg->length;
+			vq->vq.vring.desc[i].flags = VRING_DESC_F_NEXT;
+			vq->vq.vring.desc[i].addr = sg_phys(sg);
+			vq->vq.vring.desc[i].len = sg->length;
 			prev = i;
-			i = vq->vring.desc[i].next;
+			i = vq->vq.vring.desc[i].next;
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = next(sg, &total_in)) {
-			vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-			vq->vring.desc[i].addr = sg_phys(sg);
-			vq->vring.desc[i].len = sg->length;
+			vq->vq.vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
+			vq->vq.vring.desc[i].addr = sg_phys(sg);
+			vq->vq.vring.desc[i].len = sg->length;
 			prev = i;
-			i = vq->vring.desc[i].next;
+			i = vq->vq.vring.desc[i].next;
 		}
 	}
 	/* Last one doesn't continue. */
-	vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
+	vq->vq.vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
 
 	/* Update free pointer */
 	vq->free_head = i;
@@ -275,13 +271,13 @@ add_head:
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
-	avail = (vq->vring.avail->idx & (vq->vring.num-1));
-	vq->vring.avail->ring[avail] = head;
+	avail = (vq->vq.vring.avail->idx & (vq->vq.vring.num-1));
+	vq->vq.vring.avail->ring[avail] = head;
 
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
 	virtio_wmb(vq->weak_barriers);
-	vq->vring.avail->idx++;
+	vq->vq.vring.avail->idx++;
 	vq->num_added++;
 
 	/* This is very unlikely, but theoretically possible.  Kick
@@ -431,8 +427,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	 * event. */
 	virtio_mb(vq->weak_barriers);
 
-	old = vq->vring.avail->idx - vq->num_added;
-	new = vq->vring.avail->idx;
+	old = vq->vq.vring.avail->idx - vq->num_added;
+	new = vq->vq.vring.avail->idx;
 	vq->num_added = 0;
 
 #ifdef DEBUG
@@ -444,10 +440,10 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 #endif
 
 	if (vq->event) {
-		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
+		needs_kick = vring_need_event(vring_avail_event(&vq->vq.vring),
 					      new, old);
 	} else {
-		needs_kick = !(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY);
+		needs_kick = !(vq->vq.vring.used->flags&VRING_USED_F_NO_NOTIFY);
 	}
 	END_USE(vq);
 	return needs_kick;
@@ -497,15 +493,15 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 	i = head;
 
 	/* Free the indirect table */
-	if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
-		kfree(phys_to_virt(vq->vring.desc[i].addr));
+	if (vq->vq.vring.desc[i].flags & VRING_DESC_F_INDIRECT)
+		kfree(phys_to_virt(vq->vq.vring.desc[i].addr));
 
-	while (vq->vring.desc[i].flags & VRING_DESC_F_NEXT) {
-		i = vq->vring.desc[i].next;
+	while (vq->vq.vring.desc[i].flags & VRING_DESC_F_NEXT) {
+		i = vq->vq.vring.desc[i].next;
 		vq->vq.num_free++;
 	}
 
-	vq->vring.desc[i].next = vq->free_head;
+	vq->vq.vring.desc[i].next = vq->free_head;
 	vq->free_head = head;
 	/* Plus final descriptor */
 	vq->vq.num_free++;
@@ -513,7 +509,7 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
-	return vq->last_used_idx != vq->vring.used->idx;
+	return vq->last_used_idx != vq->vq.vring.used->idx;
 }
 
 /**
@@ -555,11 +551,11 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	/* Only get used array entries after they have been exposed by host. */
 	virtio_rmb(vq->weak_barriers);
 
-	last_used = (vq->last_used_idx & (vq->vring.num - 1));
-	i = vq->vring.used->ring[last_used].id;
-	*len = vq->vring.used->ring[last_used].len;
+	last_used = (vq->last_used_idx & (vq->vq.vring.num - 1));
+	i = vq->vq.vring.used->ring[last_used].id;
+	*len = vq->vq.vring.used->ring[last_used].len;
 
-	if (unlikely(i >= vq->vring.num)) {
+	if (unlikely(i >= vq->vq.vring.num)) {
 		BAD_RING(vq, "id %u out of range\n", i);
 		return NULL;
 	}
@@ -575,8 +571,8 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
 	 * the read in the next get_buf call. */
-	if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vring_used_event(&vq->vring) = vq->last_used_idx;
+	if (!(vq->vq.vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
+		vring_used_event(&vq->vq.vring) = vq->last_used_idx;
 		virtio_mb(vq->weak_barriers);
 	}
 
@@ -602,7 +598,7 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+	vq->vq.vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
@@ -628,8 +624,8 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
 	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
 	 * either clear the flags bit or point the event index at the next
 	 * entry. Always do both to keep code simple. */
-	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
-	vring_used_event(&vq->vring) = vq->last_used_idx;
+	vq->vq.vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	vring_used_event(&vq->vq.vring) = vq->last_used_idx;
 	virtio_mb(vq->weak_barriers);
 	if (unlikely(more_used(vq))) {
 		END_USE(vq);
@@ -666,12 +662,12 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
 	 * either clear the flags bit or point the event index at the next
 	 * entry. Always do both to keep code simple. */
-	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	vq->vq.vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	/* TODO: tune this threshold */
-	bufs = (u16)(vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
-	vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
+	bufs = (u16)(vq->vq.vring.avail->idx - vq->last_used_idx) * 3 / 4;
+	vring_used_event(&vq->vq.vring) = vq->last_used_idx + bufs;
 	virtio_mb(vq->weak_barriers);
-	if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
+	if (unlikely((u16)(vq->vq.vring.used->idx-vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
 	}
@@ -697,18 +693,18 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 
 	START_USE(vq);
 
-	for (i = 0; i < vq->vring.num; i++) {
+	for (i = 0; i < vq->vq.vring.num; i++) {
 		if (!vq->data[i])
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->data[i];
 		detach_buf(vq, i);
-		vq->vring.avail->idx--;
+		vq->vq.vring.avail->idx--;
 		END_USE(vq);
 		return buf;
 	}
 	/* That should have freed everything. */
-	BUG_ON(vq->vq.num_free != vq->vring.num);
+	BUG_ON(vq->vq.num_free != vq->vq.vring.num);
 
 	END_USE(vq);
 	return NULL;
@@ -758,7 +754,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 	if (!vq)
 		return NULL;
 
-	vring_init(&vq->vring, num, pages, vring_align);
+	vring_init(&vq->vq.vring, num, pages, vring_align);
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
@@ -780,12 +776,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)
-		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+		vq->vq.vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
 
 	/* Put everything in free lists. */
 	vq->free_head = 0;
 	for (i = 0; i < num-1; i++) {
-		vq->vring.desc[i].next = i+1;
+		vq->vq.vring.desc[i].next = i+1;
 		vq->data[i] = NULL;
 	}
 	vq->data[i] = NULL;
@@ -820,20 +816,4 @@ void vring_transport_features(struct virtio_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vring_transport_features);
 
-/**
- * virtqueue_get_vring_size - return the size of the virtqueue's vring
- * @vq: the struct virtqueue containing the vring of interest.
- *
- * Returns the size of the vring.  This is mainly used for boasting to
- * userspace.  Unlike other operations, this need not be serialized.
- */
-unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
-{
-
-	struct vring_virtqueue *vq = to_vvq(_vq);
-
-	return vq->vring.num;
-}
-EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
-
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a05f7c7..09883f5 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -9,6 +9,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/gfp.h>
 #include <linux/vringh.h>
+#include <uapi/linux/virtio_ring.h>
 
 /**
  * virtqueue - a queue to register buffers for sending or receiving.
@@ -19,6 +20,7 @@
  * @priv: a pointer for the virtqueue implementation to use.
  * @index: the zero-based ordinal number for this queue.
  * @num_free: number of elements we expect to be able to fit.
+ * @vring: the layout of the virtio ring.
  *
  * A note on @num_free: with indirect buffers, each buffer needs one
  * element in the queue, otherwise a buffer will need one element per
@@ -31,6 +33,7 @@ struct virtqueue {
 	struct virtio_device *vdev;
 	unsigned int index;
 	unsigned int num_free;
+	struct vring vring;
 	void *priv;
 };
 
@@ -74,7 +77,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
-unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
+static inline unsigned int virtqueue_get_vring_size(struct virtqueue *vq)
+{
+	return vq->vring.num;
+}
 
 /* FIXME: Obsolete accessor, but required for virtio_net merge. */
 static inline unsigned int virtqueue_get_queue_index(struct virtqueue *vq)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 07/22] pci: add pci_iomap_range
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (5 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 06/22] virtio: move vring structure into struct virtqueue Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 08/22] virtio-pci: define layout for virtio vendor-specific capabilities Rusty Russell
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S Tsirkin

From: Michael S Tsirkin <mst@redhat.com>

Virtio drivers should map the part of the range they need, not necessarily
all of it. They also need non-cacheable mapping even for
prefetchable BARs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/asm-generic/pci_iomap.h |    5 +++++
 lib/pci_iomap.c                 |   46 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index ce37349..8777331 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -15,6 +15,11 @@ struct pci_dev;
 #ifdef CONFIG_PCI
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+				     unsigned offset,
+				     unsigned long minlen,
+				     unsigned long maxlen,
+				     bool force_nocache);
 /* Create a virtual mapping cookie for a port on a given PCI device.
  * Do not call this directly, it exists to make it easier for architectures
  * to override */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 0d83ea8..1ac4def 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -10,33 +10,47 @@
 
 #ifdef CONFIG_PCI
 /**
- * pci_iomap - create a virtual mapping cookie for a PCI BAR
+ * pci_iomap_range - create a virtual mapping cookie for a PCI BAR
  * @dev: PCI device that owns the BAR
  * @bar: BAR number
- * @maxlen: length of the memory to map
+ * @offset: map memory at the given offset in BAR
+ * @minlen: min length of the memory to map
+ * @maxlen: max length of the memory to map
  *
  * Using this function you will get a __iomem address to your device BAR.
  * You can access it using ioread*() and iowrite*(). These functions hide
  * the details if this is a MMIO or PIO address space and will just do what
  * you expect from them in the correct way.
  *
+ * @minlen specifies the minimum length to map. We check that BAR is
+ * large enough.
  * @maxlen specifies the maximum length to map. If you want to get access to
- * the complete BAR without checking for its length first, pass %0 here.
+ * the complete BAR from offset to the end, pass %0 here.
+ * @force_nocache makes the mapping noncacheable even if the BAR
+ * is prefetcheable. It has no effect otherwise.
  * */
-void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
+void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+			      unsigned offset,
+			      unsigned long minlen,
+			      unsigned long maxlen,
+			      bool force_nocache)
 {
 	resource_size_t start = pci_resource_start(dev, bar);
 	resource_size_t len = pci_resource_len(dev, bar);
 	unsigned long flags = pci_resource_flags(dev, bar);
 
-	if (!len || !start)
+	if (len <= offset || !start)
+		return NULL;
+	len -= offset;
+	start += offset;
+	if (len < minlen)
 		return NULL;
 	if (maxlen && len > maxlen)
 		len = maxlen;
 	if (flags & IORESOURCE_IO)
 		return __pci_ioport_map(dev, start, len);
 	if (flags & IORESOURCE_MEM) {
-		if (flags & IORESOURCE_CACHEABLE)
+		if (!force_nocache && (flags & IORESOURCE_CACHEABLE))
 			return ioremap(start, len);
 		return ioremap_nocache(start, len);
 	}
@@ -44,5 +58,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 	return NULL;
 }
 
+/**
+ * pci_iomap - create a virtual mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+	return pci_iomap_range(dev, bar, 0, 0, maxlen, false);
+}
+
 EXPORT_SYMBOL(pci_iomap);
+EXPORT_SYMBOL(pci_iomap_range);
 #endif /* CONFIG_PCI */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 08/22] virtio-pci: define layout for virtio vendor-specific capabilities.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (6 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 07/22] pci: add pci_iomap_range Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 09/22] virtio_pci: move old defines to legacy, introduce new structure Rusty Russell
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Based on patch by Michael S. Tsirkin <mst@redhat.com>, but I found it
hard to follow so changed to use structures which are more
self-documenting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/uapi/linux/virtio_pci.h |   41 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index ea66f3f..b0e7c91 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -92,4 +92,45 @@
 /* The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. */
 #define VIRTIO_PCI_VRING_ALIGN		4096
+
+/* IDs for different capabilities.  Must all exist. */
+/* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR access */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific confiuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	u8 cap_next;	/* Generic PCI field: next ptr. */
+	u8 cfg_type;	/* One of the VIRTIO_PCI_CAP_*_CFG. */
+/* FIXME: Should we use a bir, instead of raw bar number? */
+	u8 bar;		/* Where to find it. */
+	__le32 offset;	/* Offset within bar. */
+	__le32 length;	/* Length. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	__le32 device_feature_select;	/* read-write */
+	__le32 device_feature;		/* read-only */
+	__le32 guest_feature_select;	/* read-write */
+	__le32 guest_feature;		/* read-only */
+	__le16 msix_config;		/* read-write */
+	__u8 device_status;		/* read-write */
+	__u8 unused;
+
+	/* About a specific virtqueue. */
+	__le16 queue_select;	/* read-write */
+	__le16 queue_align;	/* read-write, power of 2. */
+	__le16 queue_size;	/* read-write, power of 2. */
+	__le16 queue_msix_vector;/* read-write */
+	__le64 queue_address;	/* read-write: 0xFFFFFFFFFFFFFFFF == DNE. */
+};
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 09/22] virtio_pci: move old defines to legacy, introduce new structure.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (7 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 08/22] virtio-pci: define layout for virtio vendor-specific capabilities Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 10/22] virtio_pci: use _LEGACY_ defines in virtio_pci_legacy.c Rusty Russell
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

We don't *remove* the old ones, unless VIRTIO_PCI_NO_LEGACY is defined,
but they get a friendly #warning about the change.

Note that config option is not prompted; we always enable it for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/Kconfig             |   12 +
 drivers/virtio/Makefile            |    2 +-
 drivers/virtio/virtio_pci.c        |  858 ------------------------------------
 drivers/virtio/virtio_pci_legacy.c |  858 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_pci.h    |   61 ++-
 5 files changed, 911 insertions(+), 880 deletions(-)
 delete mode 100644 drivers/virtio/virtio_pci.c
 create mode 100644 drivers/virtio/virtio_pci_legacy.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index c6683f2..65de1b2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -22,6 +22,18 @@ config VIRTIO_PCI
 
 	  If unsure, say M.
 
+config VIRTIO_PCI_LEGACY
+	bool
+	default y
+	depends on VIRTIO_PCI
+	---help---
+	  The old BAR0 virtio pci layout was deprecated early 2013.
+
+	  So look out into your driveway.  Do you have a flying car?  If
+	  so, you can happily disable this option and virtio will not
+	  break.  Otherwise, leave it set.  Unless you're testing what
+	  life will be like in The Future.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..23834f5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
-obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
+obj-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
deleted file mode 100644
index 8d081ea..0000000
--- a/drivers/virtio/virtio_pci.c
+++ /dev/null
@@ -1,858 +0,0 @@
-/*
- * Virtio PCI driver
- *
- * This module allows virtio devices to be used over a virtual PCI device.
- * This can be used with QEMU based VMMs like KVM or Xen.
- *
- * Copyright IBM Corp. 2007
- *
- * Authors:
- *  Anthony Liguori  <aliguori@us.ibm.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include <linux/module.h>
-#include <linux/list.h>
-#include <linux/pci.h>
-#include <linux/slab.h>
-#include <linux/interrupt.h>
-#include <linux/virtio.h>
-#include <linux/virtio_config.h>
-#include <linux/virtio_ring.h>
-#include <linux/virtio_pci.h>
-#include <linux/highmem.h>
-#include <linux/spinlock.h>
-
-MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
-MODULE_DESCRIPTION("virtio-pci");
-MODULE_LICENSE("GPL");
-MODULE_VERSION("1");
-
-/* Our device structure */
-struct virtio_pci_device
-{
-	struct virtio_device vdev;
-	struct pci_dev *pci_dev;
-
-	/* the IO mapping for the PCI config space */
-	void __iomem *ioaddr;
-
-	/* a list of queues so we can dispatch IRQs */
-	spinlock_t lock;
-	struct list_head virtqueues;
-
-	/* MSI-X support */
-	int msix_enabled;
-	int intx_enabled;
-	struct msix_entry *msix_entries;
-	cpumask_var_t *msix_affinity_masks;
-	/* Name strings for interrupts. This size should be enough,
-	 * and I'm too lazy to allocate each name separately. */
-	char (*msix_names)[256];
-	/* Number of available vectors */
-	unsigned msix_vectors;
-	/* Vectors allocated, excluding per-vq vectors if any */
-	unsigned msix_used_vectors;
-
-	/* Status saved during hibernate/restore */
-	u8 saved_status;
-
-	/* Whether we have vector per vq */
-	bool per_vq_vectors;
-};
-
-/* Constants for MSI-X */
-/* Use first vector for configuration changes, second and the rest for
- * virtqueues Thus, we need at least 2 vectors for MSI. */
-enum {
-	VP_MSIX_CONFIG_VECTOR = 0,
-	VP_MSIX_VQ_VECTOR = 1,
-};
-
-struct virtio_pci_vq_info
-{
-	/* the actual virtqueue */
-	struct virtqueue *vq;
-
-	/* the number of entries in the queue */
-	int num;
-
-	/* the virtual address of the ring queue */
-	void *queue;
-
-	/* the list node for the virtqueues list */
-	struct list_head node;
-
-	/* MSI-X vector (or none) */
-	unsigned msix_vector;
-};
-
-/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
-static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
-	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
-	{ 0 }
-};
-
-MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
-
-/* Convert a generic virtio device to our structure */
-static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
-{
-	return container_of(vdev, struct virtio_pci_device, vdev);
-}
-
-/* virtio config->get_features() implementation */
-static u64 vp_get_features(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	/* We only support 32 feature bits. */
-	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES);
-}
-
-/* virtio config->finalize_features() implementation */
-static void vp_finalize_features(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	/* Give virtio_ring a chance to accept features. */
-	vring_transport_features(vdev);
-
-	/* We only support 32 feature bits. */
-	iowrite32(vdev->features, vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
-}
-
-/* Device config access: we use guest endian, as per spec. */
-static void vp_get(struct virtio_device *vdev, unsigned offset,
-		   void *buf, unsigned len)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
-	u8 *ptr = buf;
-	int i;
-
-	for (i = 0; i < len; i++)
-		ptr[i] = ioread8(ioaddr + i);
-}
-
-#define VP_GETx(bits)							\
-static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
-{									\
-	u##bits v;							\
-	vp_get(vdev, offset, &v, sizeof(v));				\
-	return v;							\
-}
-
-VP_GETx(8)
-VP_GETx(16)
-VP_GETx(32)
-VP_GETx(64)
-
-static void vp_set(struct virtio_device *vdev, unsigned offset,
-		   const void *buf, unsigned len)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
-	const u8 *ptr = buf;
-	int i;
-
-	for (i = 0; i < len; i++)
-		iowrite8(ptr[i], ioaddr + i);
-}
-
-#define VP_SETx(bits)							\
-static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
-			 u##bits v)					\
-{									\
-	vp_set(vdev, offset, &v, sizeof(v));				\
-}
-
-VP_SETx(8)
-VP_SETx(16)
-VP_SETx(32)
-VP_SETx(64)
-
-/* config->{get,set}_status() implementations */
-static u8 vp_get_status(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-}
-
-static void vp_set_status(struct virtio_device *vdev, u8 status)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	/* We should never be setting status to 0. */
-	BUG_ON(status == 0);
-	iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-}
-
-/* wait for pending irq handlers */
-static void vp_synchronize_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled)
-		synchronize_irq(vp_dev->pci_dev->irq);
-
-	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(vp_dev->msix_entries[i].vector);
-}
-
-static void vp_reset(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	/* 0 status means a reset. */
-	iowrite8(0, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-	/* Flush out the status write, and flush in device writes,
-	 * including MSi-X interrupts, if any. */
-	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
-}
-
-/* the notify function used when creating a virt queue */
-static void vp_notify(struct virtqueue *vq)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
-
-	/* we write the queue's selector into the notification register to
-	 * signal the other end */
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
-}
-
-/* Handle a configuration change: Tell driver if it wants to know. */
-static irqreturn_t vp_config_changed(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_driver *drv;
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	if (drv && drv->config_changed)
-		drv->config_changed(&vp_dev->vdev);
-	return IRQ_HANDLED;
-}
-
-/* Notify all virtqueues on an interrupt. */
-static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_pci_vq_info *info;
-	irqreturn_t ret = IRQ_NONE;
-	unsigned long flags;
-
-	spin_lock_irqsave(&vp_dev->lock, flags);
-	list_for_each_entry(info, &vp_dev->virtqueues, node) {
-		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-			ret = IRQ_HANDLED;
-	}
-	spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-	return ret;
-}
-
-/* A small wrapper to also acknowledge the interrupt when it's handled.
- * I really need an EIO hook for the vring so I can ack the interrupt once we
- * know that we'll be handling the IRQ but before we invoke the callback since
- * the callback may notify the host which results in the host attempting to
- * raise an interrupt that we would then mask once we acknowledged the
- * interrupt. */
-static irqreturn_t vp_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	u8 isr;
-
-	/* reading the ISR has the effect of also clearing it so it's very
-	 * important to save off the value. */
-	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
-
-	/* It's definitely not us if the ISR was not high */
-	if (!isr)
-		return IRQ_NONE;
-
-	/* Configuration change?  Tell driver if it wants to know. */
-	if (isr & VIRTIO_PCI_ISR_CONFIG)
-		vp_config_changed(irq, opaque);
-
-	return vp_vring_interrupt(irq, opaque);
-}
-
-static void vp_free_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled) {
-		free_irq(vp_dev->pci_dev->irq, vp_dev);
-		vp_dev->intx_enabled = 0;
-	}
-
-	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
-		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
-
-	for (i = 0; i < vp_dev->msix_vectors; i++)
-		if (vp_dev->msix_affinity_masks[i])
-			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
-
-	if (vp_dev->msix_enabled) {
-		/* Disable the vector used for configuration */
-		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
-		/* Flush the write out to device */
-		ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
-
-		pci_disable_msix(vp_dev->pci_dev);
-		vp_dev->msix_enabled = 0;
-		vp_dev->msix_vectors = 0;
-	}
-
-	vp_dev->msix_used_vectors = 0;
-	kfree(vp_dev->msix_names);
-	vp_dev->msix_names = NULL;
-	kfree(vp_dev->msix_entries);
-	vp_dev->msix_entries = NULL;
-	kfree(vp_dev->msix_affinity_masks);
-	vp_dev->msix_affinity_masks = NULL;
-}
-
-static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
-				   bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	const char *name = dev_name(&vp_dev->vdev.dev);
-	unsigned i, v;
-	int err = -ENOMEM;
-
-	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
-				       GFP_KERNEL);
-	if (!vp_dev->msix_entries)
-		goto error;
-	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
-				     GFP_KERNEL);
-	if (!vp_dev->msix_names)
-		goto error;
-	vp_dev->msix_affinity_masks
-		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
-			  GFP_KERNEL);
-	if (!vp_dev->msix_affinity_masks)
-		goto error;
-	for (i = 0; i < nvectors; ++i)
-		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
-					GFP_KERNEL))
-			goto error;
-
-	for (i = 0; i < nvectors; ++i)
-		vp_dev->msix_entries[i].entry = i;
-
-	/* pci_enable_msix returns positive if we can't get this many. */
-	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
-	if (err > 0)
-		err = -ENOSPC;
-	if (err)
-		goto error;
-	vp_dev->msix_vectors = nvectors;
-	vp_dev->msix_enabled = 1;
-
-	/* Set the vector used for configuration */
-	v = vp_dev->msix_used_vectors;
-	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-		 "%s-config", name);
-	err = request_irq(vp_dev->msix_entries[v].vector,
-			  vp_config_changed, 0, vp_dev->msix_names[v],
-			  vp_dev);
-	if (err)
-		goto error;
-	++vp_dev->msix_used_vectors;
-
-	iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
-	/* Verify we had enough resources to assign the vector */
-	v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
-	if (v == VIRTIO_MSI_NO_VECTOR) {
-		err = -EBUSY;
-		goto error;
-	}
-
-	if (!per_vq_vectors) {
-		/* Shared vector for all VQs */
-		v = vp_dev->msix_used_vectors;
-		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-			 "%s-virtqueues", name);
-		err = request_irq(vp_dev->msix_entries[v].vector,
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
-				  vp_dev);
-		if (err)
-			goto error;
-		++vp_dev->msix_used_vectors;
-	}
-	return 0;
-error:
-	vp_free_vectors(vdev);
-	return err;
-}
-
-static int vp_request_intx(struct virtio_device *vdev)
-{
-	int err;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
-			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
-	if (!err)
-		vp_dev->intx_enabled = 1;
-	return err;
-}
-
-static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
-				  void (*callback)(struct virtqueue *vq),
-				  const char *name,
-				  u16 msix_vec)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtio_pci_vq_info *info;
-	struct virtqueue *vq;
-	unsigned long flags, size;
-	u16 num;
-	int err;
-
-	/* Select the queue we're interested in */
-	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
-
-	/* Check if queue is either not available or already active. */
-	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
-	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
-		return ERR_PTR(-ENOENT);
-
-	/* allocate and fill out our structure the represents an active
-	 * queue */
-	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
-	if (!info)
-		return ERR_PTR(-ENOMEM);
-
-	info->num = num;
-	info->msix_vector = msix_vec;
-
-	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
-	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
-	if (info->queue == NULL) {
-		err = -ENOMEM;
-		goto out_info;
-	}
-
-	/* activate the queue */
-	iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
-		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
-
-	/* create the vring */
-	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
-				 true, info->queue, vp_notify, callback, name);
-	if (!vq) {
-		err = -ENOMEM;
-		goto out_activate_queue;
-	}
-
-	vq->priv = info;
-	info->vq = vq;
-
-	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
-		iowrite16(msix_vec, vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
-		msix_vec = ioread16(vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
-		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
-			err = -EBUSY;
-			goto out_assign;
-		}
-	}
-
-	if (callback) {
-		spin_lock_irqsave(&vp_dev->lock, flags);
-		list_add(&info->node, &vp_dev->virtqueues);
-		spin_unlock_irqrestore(&vp_dev->lock, flags);
-	} else {
-		INIT_LIST_HEAD(&info->node);
-	}
-
-	return vq;
-
-out_assign:
-	vring_del_virtqueue(vq);
-out_activate_queue:
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
-	free_pages_exact(info->queue, size);
-out_info:
-	kfree(info);
-	return ERR_PTR(err);
-}
-
-static void vp_del_vq(struct virtqueue *vq)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
-	struct virtio_pci_vq_info *info = vq->priv;
-	unsigned long flags, size;
-
-	spin_lock_irqsave(&vp_dev->lock, flags);
-	list_del(&info->node);
-	spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
-
-	if (vp_dev->msix_enabled) {
-		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
-		/* Flush the write out to device */
-		ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
-	}
-
-	vring_del_virtqueue(vq);
-
-	/* Select and deactivate the queue */
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
-
-	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
-	free_pages_exact(info->queue, size);
-	kfree(info);
-}
-
-/* the config->del_vqs() implementation */
-static void vp_del_vqs(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtqueue *vq, *n;
-	struct virtio_pci_vq_info *info;
-
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
-		info = vq->priv;
-		if (vp_dev->per_vq_vectors &&
-			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
-			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
-				 vq);
-		vp_del_vq(vq);
-	}
-	vp_dev->per_vq_vectors = false;
-
-	vp_free_vectors(vdev);
-}
-
-static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
-			      struct virtqueue *vqs[],
-			      vq_callback_t *callbacks[],
-			      const char *names[],
-			      bool use_msix,
-			      bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	u16 msix_vec;
-	int i, err, nvectors, allocated_vectors;
-
-	if (!use_msix) {
-		/* Old style: one normal interrupt for change and all vqs. */
-		err = vp_request_intx(vdev);
-		if (err)
-			goto error_request;
-	} else {
-		if (per_vq_vectors) {
-			/* Best option: one for change interrupt, one per vq. */
-			nvectors = 1;
-			for (i = 0; i < nvqs; ++i)
-				if (callbacks[i])
-					++nvectors;
-		} else {
-			/* Second best: one for change, shared for all vqs. */
-			nvectors = 2;
-		}
-
-		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
-		if (err)
-			goto error_request;
-	}
-
-	vp_dev->per_vq_vectors = per_vq_vectors;
-	allocated_vectors = vp_dev->msix_used_vectors;
-	for (i = 0; i < nvqs; ++i) {
-		if (!names[i]) {
-			vqs[i] = NULL;
-			continue;
-		} else if (!callbacks[i] || !vp_dev->msix_enabled)
-			msix_vec = VIRTIO_MSI_NO_VECTOR;
-		else if (vp_dev->per_vq_vectors)
-			msix_vec = allocated_vectors++;
-		else
-			msix_vec = VP_MSIX_VQ_VECTOR;
-		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
-		if (IS_ERR(vqs[i])) {
-			err = PTR_ERR(vqs[i]);
-			goto error_find;
-		}
-
-		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
-			continue;
-
-		/* allocate per-vq irq if available and necessary */
-		snprintf(vp_dev->msix_names[msix_vec],
-			 sizeof *vp_dev->msix_names,
-			 "%s-%s",
-			 dev_name(&vp_dev->vdev.dev), names[i]);
-		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
-				  vring_interrupt, 0,
-				  vp_dev->msix_names[msix_vec],
-				  vqs[i]);
-		if (err) {
-			vp_del_vq(vqs[i]);
-			goto error_find;
-		}
-	}
-	return 0;
-
-error_find:
-	vp_del_vqs(vdev);
-
-error_request:
-	return err;
-}
-
-/* the config->find_vqs() implementation */
-static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
-		       struct virtqueue *vqs[],
-		       vq_callback_t *callbacks[],
-		       const char *names[])
-{
-	int err;
-
-	/* Try MSI-X with one vector per queue. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
-	if (!err)
-		return 0;
-	/* Fallback: MSI-X with one vector for config, one shared for queues. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				 true, false);
-	if (!err)
-		return 0;
-	/* Finally fall back to regular interrupts. */
-	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				  false, false);
-}
-
-static const char *vp_bus_name(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	return pci_name(vp_dev->pci_dev);
-}
-
-/* Setup the affinity for a virtqueue:
- * - force the affinity for per vq vector
- * - OR over all affinities for shared MSI
- * - ignore the affinity request if we're using INTX
- */
-static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
-{
-	struct virtio_device *vdev = vq->vdev;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtio_pci_vq_info *info = vq->priv;
-	struct cpumask *mask;
-	unsigned int irq;
-
-	if (!vq->callback)
-		return -EINVAL;
-
-	if (vp_dev->msix_enabled) {
-		mask = vp_dev->msix_affinity_masks[info->msix_vector];
-		irq = vp_dev->msix_entries[info->msix_vector].vector;
-		if (cpu == -1)
-			irq_set_affinity_hint(irq, NULL);
-		else {
-			cpumask_set_cpu(cpu, mask);
-			irq_set_affinity_hint(irq, mask);
-		}
-	}
-	return 0;
-}
-
-static const struct virtio_config_ops virtio_pci_config_ops = {
-	.get8		= vp_get8,
-	.set8		= vp_set8,
-	.get16		= vp_get16,
-	.set16		= vp_set16,
-	.get32		= vp_get32,
-	.set32		= vp_set32,
-	.get64		= vp_get64,
-	.set64		= vp_set64,
-	.get_status	= vp_get_status,
-	.set_status	= vp_set_status,
-	.reset		= vp_reset,
-	.find_vqs	= vp_find_vqs,
-	.del_vqs	= vp_del_vqs,
-	.get_features	= vp_get_features,
-	.finalize_features = vp_finalize_features,
-	.bus_name	= vp_bus_name,
-	.set_vq_affinity = vp_set_vq_affinity,
-};
-
-static void virtio_pci_release_dev(struct device *_d)
-{
-	/*
-	 * No need for a release method as we allocate/free
-	 * all devices together with the pci devices.
-	 * Provide an empty one to avoid getting a warning from core.
-	 */
-}
-
-/* the PCI probing function */
-static int virtio_pci_probe(struct pci_dev *pci_dev,
-			    const struct pci_device_id *id)
-{
-	struct virtio_pci_device *vp_dev;
-	int err;
-
-	/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
-	if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
-		return -ENODEV;
-
-	if (pci_dev->revision != VIRTIO_PCI_ABI_VERSION) {
-		printk(KERN_ERR "virtio_pci: expected ABI version %d, got %d\n",
-		       VIRTIO_PCI_ABI_VERSION, pci_dev->revision);
-		return -ENODEV;
-	}
-
-	/* allocate our structure and fill it out */
-	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
-	if (vp_dev == NULL)
-		return -ENOMEM;
-
-	vp_dev->vdev.dev.parent = &pci_dev->dev;
-	vp_dev->vdev.dev.release = virtio_pci_release_dev;
-	vp_dev->vdev.config = &virtio_pci_config_ops;
-	vp_dev->pci_dev = pci_dev;
-	INIT_LIST_HEAD(&vp_dev->virtqueues);
-	spin_lock_init(&vp_dev->lock);
-
-	/* Disable MSI/MSIX to bring device to a known good state. */
-	pci_msi_off(pci_dev);
-
-	/* enable the device */
-	err = pci_enable_device(pci_dev);
-	if (err)
-		goto out;
-
-	err = pci_request_regions(pci_dev, "virtio-pci");
-	if (err)
-		goto out_enable_device;
-
-	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
-	if (vp_dev->ioaddr == NULL) {
-		err = -ENOMEM;
-		goto out_req_regions;
-	}
-
-	pci_set_drvdata(pci_dev, vp_dev);
-	pci_set_master(pci_dev);
-
-	/* we use the subsystem vendor/device id as the virtio vendor/device
-	 * id.  this allows us to use the same PCI vendor/device id for all
-	 * virtio devices and to identify the particular virtio driver by
-	 * the subsystem ids */
-	vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
-	vp_dev->vdev.id.device = pci_dev->subsystem_device;
-
-	/* finally register the virtio device */
-	err = register_virtio_device(&vp_dev->vdev);
-	if (err)
-		goto out_set_drvdata;
-
-	return 0;
-
-out_set_drvdata:
-	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
-out_req_regions:
-	pci_release_regions(pci_dev);
-out_enable_device:
-	pci_disable_device(pci_dev);
-out:
-	kfree(vp_dev);
-	return err;
-}
-
-static void virtio_pci_remove(struct pci_dev *pci_dev)
-{
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-
-	unregister_virtio_device(&vp_dev->vdev);
-
-	vp_del_vqs(&vp_dev->vdev);
-	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
-	pci_release_regions(pci_dev);
-	pci_disable_device(pci_dev);
-	kfree(vp_dev);
-}
-
-#ifdef CONFIG_PM
-static int virtio_pci_freeze(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = 0;
-	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
-	if (drv && drv->freeze)
-		ret = drv->freeze(&vp_dev->vdev);
-
-	if (!ret)
-		pci_disable_device(pci_dev);
-	return ret;
-}
-
-static int virtio_pci_restore(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = pci_enable_device(pci_dev);
-	if (ret)
-		return ret;
-
-	pci_set_master(pci_dev);
-	vp_finalize_features(&vp_dev->vdev);
-
-	if (drv && drv->restore)
-		ret = drv->restore(&vp_dev->vdev);
-
-	/* Finally, tell the device we're all set */
-	if (!ret)
-		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
-
-	return ret;
-}
-
-static const struct dev_pm_ops virtio_pci_pm_ops = {
-	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
-};
-#endif
-
-static struct pci_driver virtio_pci_driver = {
-	.name		= "virtio-pci",
-	.id_table	= virtio_pci_id_table,
-	.probe		= virtio_pci_probe,
-	.remove		= virtio_pci_remove,
-#ifdef CONFIG_PM
-	.driver.pm	= &virtio_pci_pm_ops,
-#endif
-};
-
-module_pci_driver(virtio_pci_driver);
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
new file mode 100644
index 0000000..77baa7c
--- /dev/null
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -0,0 +1,858 @@
+/*
+ * Virtio PCI driver (legacy mode)
+ *
+ * This module allows virtio devices to be used over a virtual PCI device.
+ * This can be used with QEMU based VMMs like KVM or Xen.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors:
+ *  Anthony Liguori  <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_pci.h>
+#include <linux/highmem.h>
+#include <linux/spinlock.h>
+
+MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
+MODULE_DESCRIPTION("virtio-pci-legacy");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+/* Our device structure */
+struct virtio_pci_device
+{
+	struct virtio_device vdev;
+	struct pci_dev *pci_dev;
+
+	/* the IO mapping for the PCI config space */
+	void __iomem *ioaddr;
+
+	/* a list of queues so we can dispatch IRQs */
+	spinlock_t lock;
+	struct list_head virtqueues;
+
+	/* MSI-X support */
+	int msix_enabled;
+	int intx_enabled;
+	struct msix_entry *msix_entries;
+	cpumask_var_t *msix_affinity_masks;
+	/* Name strings for interrupts. This size should be enough,
+	 * and I'm too lazy to allocate each name separately. */
+	char (*msix_names)[256];
+	/* Number of available vectors */
+	unsigned msix_vectors;
+	/* Vectors allocated, excluding per-vq vectors if any */
+	unsigned msix_used_vectors;
+
+	/* Status saved during hibernate/restore */
+	u8 saved_status;
+
+	/* Whether we have vector per vq */
+	bool per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+	VP_MSIX_CONFIG_VECTOR = 0,
+	VP_MSIX_VQ_VECTOR = 1,
+};
+
+struct virtio_pci_vq_info
+{
+	/* the actual virtqueue */
+	struct virtqueue *vq;
+
+	/* the number of entries in the queue */
+	int num;
+
+	/* the virtual address of the ring queue */
+	void *queue;
+
+	/* the list node for the virtqueues list */
+	struct list_head node;
+
+	/* MSI-X vector (or none) */
+	unsigned msix_vector;
+};
+
+/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
+static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
+	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
+	{ 0 }
+};
+
+MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
+
+/* Convert a generic virtio device to our structure */
+static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
+{
+	return container_of(vdev, struct virtio_pci_device, vdev);
+}
+
+/* virtio config->get_features() implementation */
+static u64 vp_get_features(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	/* We only support 32 feature bits. */
+	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES);
+}
+
+/* virtio config->finalize_features() implementation */
+static void vp_finalize_features(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	/* We only support 32 feature bits. */
+	iowrite32(vdev->features, vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
+}
+
+/* Device config access: we use guest endian, as per spec. */
+static void vp_get(struct virtio_device *vdev, unsigned offset,
+		   void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void __iomem *ioaddr = vp_dev->ioaddr +
+				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+	u8 *ptr = buf;
+	int i;
+
+	for (i = 0; i < len; i++)
+		ptr[i] = ioread8(ioaddr + i);
+}
+
+#define VP_GETx(bits)							\
+static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
+{									\
+	u##bits v;							\
+	vp_get(vdev, offset, &v, sizeof(v));				\
+	return v;							\
+}
+
+VP_GETx(8)
+VP_GETx(16)
+VP_GETx(32)
+VP_GETx(64)
+
+static void vp_set(struct virtio_device *vdev, unsigned offset,
+		   const void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void __iomem *ioaddr = vp_dev->ioaddr +
+				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+	const u8 *ptr = buf;
+	int i;
+
+	for (i = 0; i < len; i++)
+		iowrite8(ptr[i], ioaddr + i);
+}
+
+#define VP_SETx(bits)							\
+static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
+			 u##bits v)					\
+{									\
+	vp_set(vdev, offset, &v, sizeof(v));				\
+}
+
+VP_SETx(8)
+VP_SETx(16)
+VP_SETx(32)
+VP_SETx(64)
+
+/* config->{get,set}_status() implementations */
+static u8 vp_get_status(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+}
+
+static void vp_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	/* We should never be setting status to 0. */
+	BUG_ON(status == 0);
+	iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+}
+
+/* wait for pending irq handlers */
+static void vp_synchronize_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		synchronize_irq(vp_dev->pci_dev->irq);
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		synchronize_irq(vp_dev->msix_entries[i].vector);
+}
+
+static void vp_reset(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	/* 0 status means a reset. */
+	iowrite8(0, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	/* Flush out the status write, and flush in device writes,
+	 * including MSi-X interrupts, if any. */
+	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	/* Flush pending VQ/configuration callbacks. */
+	vp_synchronize_vectors(vdev);
+}
+
+/* the notify function used when creating a virt queue */
+static void vp_notify(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+
+	/* we write the queue's selector into the notification register to
+	 * signal the other end */
+	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+}
+
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_driver *drv;
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	if (drv && drv->config_changed)
+		drv->config_changed(&vp_dev->vdev);
+	return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_pci_vq_info *info;
+	irqreturn_t ret = IRQ_NONE;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_for_each_entry(info, &vp_dev->virtqueues, node) {
+		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+			ret = IRQ_HANDLED;
+	}
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	return ret;
+}
+
+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+static irqreturn_t vp_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	u8 isr;
+
+	/* reading the ISR has the effect of also clearing it so it's very
+	 * important to save off the value. */
+	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+
+	/* It's definitely not us if the ISR was not high */
+	if (!isr)
+		return IRQ_NONE;
+
+	/* Configuration change?  Tell driver if it wants to know. */
+	if (isr & VIRTIO_PCI_ISR_CONFIG)
+		vp_config_changed(irq, opaque);
+
+	return vp_vring_interrupt(irq, opaque);
+}
+
+static void vp_free_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled) {
+		free_irq(vp_dev->pci_dev->irq, vp_dev);
+		vp_dev->intx_enabled = 0;
+	}
+
+	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
+		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+
+	for (i = 0; i < vp_dev->msix_vectors; i++)
+		if (vp_dev->msix_affinity_masks[i])
+			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
+
+	if (vp_dev->msix_enabled) {
+		/* Disable the vector used for configuration */
+		iowrite16(VIRTIO_MSI_NO_VECTOR,
+			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+		/* Flush the write out to device */
+		ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+		pci_disable_msix(vp_dev->pci_dev);
+		vp_dev->msix_enabled = 0;
+		vp_dev->msix_vectors = 0;
+	}
+
+	vp_dev->msix_used_vectors = 0;
+	kfree(vp_dev->msix_names);
+	vp_dev->msix_names = NULL;
+	kfree(vp_dev->msix_entries);
+	vp_dev->msix_entries = NULL;
+	kfree(vp_dev->msix_affinity_masks);
+	vp_dev->msix_affinity_masks = NULL;
+}
+
+static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
+				   bool per_vq_vectors)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	const char *name = dev_name(&vp_dev->vdev.dev);
+	unsigned i, v;
+	int err = -ENOMEM;
+
+	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
+				       GFP_KERNEL);
+	if (!vp_dev->msix_entries)
+		goto error;
+	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
+				     GFP_KERNEL);
+	if (!vp_dev->msix_names)
+		goto error;
+	vp_dev->msix_affinity_masks
+		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
+			  GFP_KERNEL);
+	if (!vp_dev->msix_affinity_masks)
+		goto error;
+	for (i = 0; i < nvectors; ++i)
+		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
+					GFP_KERNEL))
+			goto error;
+
+	for (i = 0; i < nvectors; ++i)
+		vp_dev->msix_entries[i].entry = i;
+
+	/* pci_enable_msix returns positive if we can't get this many. */
+	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
+	if (err > 0)
+		err = -ENOSPC;
+	if (err)
+		goto error;
+	vp_dev->msix_vectors = nvectors;
+	vp_dev->msix_enabled = 1;
+
+	/* Set the vector used for configuration */
+	v = vp_dev->msix_used_vectors;
+	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+		 "%s-config", name);
+	err = request_irq(vp_dev->msix_entries[v].vector,
+			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  vp_dev);
+	if (err)
+		goto error;
+	++vp_dev->msix_used_vectors;
+
+	iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+	/* Verify we had enough resources to assign the vector */
+	v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+	if (v == VIRTIO_MSI_NO_VECTOR) {
+		err = -EBUSY;
+		goto error;
+	}
+
+	if (!per_vq_vectors) {
+		/* Shared vector for all VQs */
+		v = vp_dev->msix_used_vectors;
+		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+			 "%s-virtqueues", name);
+		err = request_irq(vp_dev->msix_entries[v].vector,
+				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
+				  vp_dev);
+		if (err)
+			goto error;
+		++vp_dev->msix_used_vectors;
+	}
+	return 0;
+error:
+	vp_free_vectors(vdev);
+	return err;
+}
+
+static int vp_request_intx(struct virtio_device *vdev)
+{
+	int err;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
+			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
+	if (!err)
+		vp_dev->intx_enabled = 1;
+	return err;
+}
+
+static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
+				  void (*callback)(struct virtqueue *vq),
+				  const char *name,
+				  u16 msix_vec)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info;
+	struct virtqueue *vq;
+	unsigned long flags, size;
+	u16 num;
+	int err;
+
+	/* Select the queue we're interested in */
+	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+
+	/* Check if queue is either not available or already active. */
+	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
+	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
+		return ERR_PTR(-ENOENT);
+
+	/* allocate and fill out our structure the represents an active
+	 * queue */
+	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	info->num = num;
+	info->msix_vector = msix_vec;
+
+	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
+	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
+	if (info->queue == NULL) {
+		err = -ENOMEM;
+		goto out_info;
+	}
+
+	/* activate the queue */
+	iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
+		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+
+	/* create the vring */
+	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
+				 true, info->queue, vp_notify, callback, name);
+	if (!vq) {
+		err = -ENOMEM;
+		goto out_activate_queue;
+	}
+
+	vq->priv = info;
+	info->vq = vq;
+
+	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
+		iowrite16(msix_vec, vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		msix_vec = ioread16(vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
+			err = -EBUSY;
+			goto out_assign;
+		}
+	}
+
+	if (callback) {
+		spin_lock_irqsave(&vp_dev->lock, flags);
+		list_add(&info->node, &vp_dev->virtqueues);
+		spin_unlock_irqrestore(&vp_dev->lock, flags);
+	} else {
+		INIT_LIST_HEAD(&info->node);
+	}
+
+	return vq;
+
+out_assign:
+	vring_del_virtqueue(vq);
+out_activate_queue:
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+	free_pages_exact(info->queue, size);
+out_info:
+	kfree(info);
+	return ERR_PTR(err);
+}
+
+static void vp_del_vq(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+	unsigned long flags, size;
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_del(&info->node);
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+
+	if (vp_dev->msix_enabled) {
+		iowrite16(VIRTIO_MSI_NO_VECTOR,
+			  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		/* Flush the write out to device */
+		ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+	}
+
+	vring_del_virtqueue(vq);
+
+	/* Select and deactivate the queue */
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+
+	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
+	free_pages_exact(info->queue, size);
+	kfree(info);
+}
+
+/* the config->del_vqs() implementation */
+static void vp_del_vqs(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtqueue *vq, *n;
+	struct virtio_pci_vq_info *info;
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		info = vq->priv;
+		if (vp_dev->per_vq_vectors &&
+			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
+			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
+				 vq);
+		vp_del_vq(vq);
+	}
+	vp_dev->per_vq_vectors = false;
+
+	vp_free_vectors(vdev);
+}
+
+static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+			      struct virtqueue *vqs[],
+			      vq_callback_t *callbacks[],
+			      const char *names[],
+			      bool use_msix,
+			      bool per_vq_vectors)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	u16 msix_vec;
+	int i, err, nvectors, allocated_vectors;
+
+	if (!use_msix) {
+		/* Old style: one normal interrupt for change and all vqs. */
+		err = vp_request_intx(vdev);
+		if (err)
+			goto error_request;
+	} else {
+		if (per_vq_vectors) {
+			/* Best option: one for change interrupt, one per vq. */
+			nvectors = 1;
+			for (i = 0; i < nvqs; ++i)
+				if (callbacks[i])
+					++nvectors;
+		} else {
+			/* Second best: one for change, shared for all vqs. */
+			nvectors = 2;
+		}
+
+		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
+		if (err)
+			goto error_request;
+	}
+
+	vp_dev->per_vq_vectors = per_vq_vectors;
+	allocated_vectors = vp_dev->msix_used_vectors;
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		} else if (!callbacks[i] || !vp_dev->msix_enabled)
+			msix_vec = VIRTIO_MSI_NO_VECTOR;
+		else if (vp_dev->per_vq_vectors)
+			msix_vec = allocated_vectors++;
+		else
+			msix_vec = VP_MSIX_VQ_VECTOR;
+		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto error_find;
+		}
+
+		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
+			continue;
+
+		/* allocate per-vq irq if available and necessary */
+		snprintf(vp_dev->msix_names[msix_vec],
+			 sizeof *vp_dev->msix_names,
+			 "%s-%s",
+			 dev_name(&vp_dev->vdev.dev), names[i]);
+		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
+				  vring_interrupt, 0,
+				  vp_dev->msix_names[msix_vec],
+				  vqs[i]);
+		if (err) {
+			vp_del_vq(vqs[i]);
+			goto error_find;
+		}
+	}
+	return 0;
+
+error_find:
+	vp_del_vqs(vdev);
+
+error_request:
+	return err;
+}
+
+/* the config->find_vqs() implementation */
+static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+		       struct virtqueue *vqs[],
+		       vq_callback_t *callbacks[],
+		       const char *names[])
+{
+	int err;
+
+	/* Try MSI-X with one vector per queue. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
+	if (!err)
+		return 0;
+	/* Fallback: MSI-X with one vector for config, one shared for queues. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				 true, false);
+	if (!err)
+		return 0;
+	/* Finally fall back to regular interrupts. */
+	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				  false, false);
+}
+
+static const char *vp_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	return pci_name(vp_dev->pci_dev);
+}
+
+/* Setup the affinity for a virtqueue:
+ * - force the affinity for per vq vector
+ * - OR over all affinities for shared MSI
+ * - ignore the affinity request if we're using INTX
+ */
+static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
+{
+	struct virtio_device *vdev = vq->vdev;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+	struct cpumask *mask;
+	unsigned int irq;
+
+	if (!vq->callback)
+		return -EINVAL;
+
+	if (vp_dev->msix_enabled) {
+		mask = vp_dev->msix_affinity_masks[info->msix_vector];
+		irq = vp_dev->msix_entries[info->msix_vector].vector;
+		if (cpu == -1)
+			irq_set_affinity_hint(irq, NULL);
+		else {
+			cpumask_set_cpu(cpu, mask);
+			irq_set_affinity_hint(irq, mask);
+		}
+	}
+	return 0;
+}
+
+static const struct virtio_config_ops virtio_pci_config_ops = {
+	.get8		= vp_get8,
+	.set8		= vp_set8,
+	.get16		= vp_get16,
+	.set16		= vp_set16,
+	.get32		= vp_get32,
+	.set32		= vp_set32,
+	.get64		= vp_get64,
+	.set64		= vp_set64,
+	.get_status	= vp_get_status,
+	.set_status	= vp_set_status,
+	.reset		= vp_reset,
+	.find_vqs	= vp_find_vqs,
+	.del_vqs	= vp_del_vqs,
+	.get_features	= vp_get_features,
+	.finalize_features = vp_finalize_features,
+	.bus_name	= vp_bus_name,
+	.set_vq_affinity = vp_set_vq_affinity,
+};
+
+static void virtio_pci_release_dev(struct device *_d)
+{
+	/*
+	 * No need for a release method as we allocate/free
+	 * all devices together with the pci devices.
+	 * Provide an empty one to avoid getting a warning from core.
+	 */
+}
+
+/* the PCI probing function */
+static int virtio_pci_probe(struct pci_dev *pci_dev,
+			    const struct pci_device_id *id)
+{
+	struct virtio_pci_device *vp_dev;
+	int err;
+
+	/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
+	if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
+		return -ENODEV;
+
+	if (pci_dev->revision != VIRTIO_PCI_ABI_VERSION) {
+		printk(KERN_ERR "virtio_pci_legacy: expected ABI version %d, got %d\n",
+		       VIRTIO_PCI_ABI_VERSION, pci_dev->revision);
+		return -ENODEV;
+	}
+
+	/* allocate our structure and fill it out */
+	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
+	if (vp_dev == NULL)
+		return -ENOMEM;
+
+	vp_dev->vdev.dev.parent = &pci_dev->dev;
+	vp_dev->vdev.dev.release = virtio_pci_release_dev;
+	vp_dev->vdev.config = &virtio_pci_config_ops;
+	vp_dev->pci_dev = pci_dev;
+	INIT_LIST_HEAD(&vp_dev->virtqueues);
+	spin_lock_init(&vp_dev->lock);
+
+	/* Disable MSI/MSIX to bring device to a known good state. */
+	pci_msi_off(pci_dev);
+
+	/* enable the device */
+	err = pci_enable_device(pci_dev);
+	if (err)
+		goto out;
+
+	err = pci_request_regions(pci_dev, "virtio-pci-legacy");
+	if (err)
+		goto out_enable_device;
+
+	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
+	if (vp_dev->ioaddr == NULL) {
+		err = -ENOMEM;
+		goto out_req_regions;
+	}
+
+	pci_set_drvdata(pci_dev, vp_dev);
+	pci_set_master(pci_dev);
+
+	/* we use the subsystem vendor/device id as the virtio vendor/device
+	 * id.  this allows us to use the same PCI vendor/device id for all
+	 * virtio devices and to identify the particular virtio driver by
+	 * the subsystem ids */
+	vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
+	vp_dev->vdev.id.device = pci_dev->subsystem_device;
+
+	/* finally register the virtio device */
+	err = register_virtio_device(&vp_dev->vdev);
+	if (err)
+		goto out_set_drvdata;
+
+	return 0;
+
+out_set_drvdata:
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->ioaddr);
+out_req_regions:
+	pci_release_regions(pci_dev);
+out_enable_device:
+	pci_disable_device(pci_dev);
+out:
+	kfree(vp_dev);
+	return err;
+}
+
+static void virtio_pci_remove(struct pci_dev *pci_dev)
+{
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+
+	unregister_virtio_device(&vp_dev->vdev);
+
+	vp_del_vqs(&vp_dev->vdev);
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	pci_release_regions(pci_dev);
+	pci_disable_device(pci_dev);
+	kfree(vp_dev);
+}
+
+#ifdef CONFIG_PM
+static int virtio_pci_freeze(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = 0;
+	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
+	if (drv && drv->freeze)
+		ret = drv->freeze(&vp_dev->vdev);
+
+	if (!ret)
+		pci_disable_device(pci_dev);
+	return ret;
+}
+
+static int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+
+	pci_set_master(pci_dev);
+	vp_finalize_features(&vp_dev->vdev);
+
+	if (drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	/* Finally, tell the device we're all set */
+	if (!ret)
+		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
+
+	return ret;
+}
+
+static const struct dev_pm_ops virtio_pci_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
+};
+#endif
+
+static struct pci_driver virtio_pci_driver_legacy = {
+	.name		= "virtio-pci-legacy",
+	.id_table	= virtio_pci_id_table,
+	.probe		= virtio_pci_probe,
+	.remove		= virtio_pci_remove,
+#ifdef CONFIG_PM
+	.driver.pm	= &virtio_pci_pm_ops,
+#endif
+};
+
+module_pci_driver(virtio_pci_driver_legacy);
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index b0e7c91..9eb6373 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -42,56 +42,75 @@
 #include <linux/virtio_config.h>
 
 /* A 32-bit r/o bitmask of the features supported by the host */
-#define VIRTIO_PCI_HOST_FEATURES	0
+#define VIRTIO_PCI_LEGACY_HOST_FEATURES		0
 
 /* A 32-bit r/w bitmask of features activated by the guest */
-#define VIRTIO_PCI_GUEST_FEATURES	4
+#define VIRTIO_PCI_LEGACY_GUEST_FEATURES	4
 
 /* A 32-bit r/w PFN for the currently selected queue */
-#define VIRTIO_PCI_QUEUE_PFN		8
+#define VIRTIO_PCI_LEGACY_QUEUE_PFN		8
 
 /* A 16-bit r/o queue size for the currently selected queue */
-#define VIRTIO_PCI_QUEUE_NUM		12
+#define VIRTIO_PCI_LEGACY_QUEUE_NUM		12
 
 /* A 16-bit r/w queue selector */
-#define VIRTIO_PCI_QUEUE_SEL		14
+#define VIRTIO_PCI_LEGACY_QUEUE_SEL		14
 
 /* A 16-bit r/w queue notifier */
-#define VIRTIO_PCI_QUEUE_NOTIFY		16
+#define VIRTIO_PCI_LEGACY_QUEUE_NOTIFY		16
 
 /* An 8-bit device status register.  */
-#define VIRTIO_PCI_STATUS		18
+#define VIRTIO_PCI_LEGACY_STATUS		18
 
 /* An 8-bit r/o interrupt status register.  Reading the value will return the
  * current contents of the ISR and will also clear it.  This is effectively
  * a read-and-acknowledge. */
-#define VIRTIO_PCI_ISR			19
-
-/* The bit of the ISR which indicates a device configuration change. */
-#define VIRTIO_PCI_ISR_CONFIG		0x2
+#define VIRTIO_PCI_LEGACY_ISR			19
 
 /* MSI-X registers: only enabled if MSI-X is enabled. */
 /* A 16-bit vector for configuration changes. */
-#define VIRTIO_MSI_CONFIG_VECTOR        20
+#define VIRTIO_MSI_LEGACY_CONFIG_VECTOR        20
 /* A 16-bit vector for selected queue notifications. */
-#define VIRTIO_MSI_QUEUE_VECTOR         22
-/* Vector value used to disable MSI for queue */
-#define VIRTIO_MSI_NO_VECTOR            0xffff
+#define VIRTIO_MSI_LEGACY_QUEUE_VECTOR         22
 
 /* The remaining space is defined by each driver as the per-driver
  * configuration space */
-#define VIRTIO_PCI_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
-
-/* Virtio ABI version, this must match exactly */
-#define VIRTIO_PCI_ABI_VERSION		0
+#define VIRTIO_PCI_LEGACY_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
 
 /* How many bits to shift physical queue address written to QUEUE_PFN.
  * 12 is historical, and due to x86 page size. */
-#define VIRTIO_PCI_QUEUE_ADDR_SHIFT	12
+#define VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT	12
 
 /* The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. */
-#define VIRTIO_PCI_VRING_ALIGN		4096
+#define VIRTIO_PCI_LEGACY_VRING_ALIGN		4096
+
+#ifndef VIRTIO_PCI_NO_LEGACY
+/* Don't break compile of old userspace code.  These will go away. */
+#warning "Please support virtio_pci non-legacy mode!"
+#define VIRTIO_PCI_HOST_FEATURES VIRTIO_PCI_LEGACY_HOST_FEATURES
+#define VIRTIO_PCI_GUEST_FEATURES VIRTIO_PCI_LEGACY_GUEST_FEATURES
+#define VIRTIO_PCI_QUEUE_PFN VIRTIO_PCI_LEGACY_QUEUE_PFN
+#define VIRTIO_PCI_QUEUE_NUM VIRTIO_PCI_LEGACY_QUEUE_NUM
+#define VIRTIO_PCI_QUEUE_SEL VIRTIO_PCI_LEGACY_QUEUE_SEL
+#define VIRTIO_PCI_QUEUE_NOTIFY VIRTIO_PCI_LEGACY_QUEUE_NOTIFY
+#define VIRTIO_PCI_STATUS VIRTIO_PCI_LEGACY_STATUS
+#define VIRTIO_PCI_ISR VIRTIO_PCI_LEGACY_ISR
+#define VIRTIO_MSI_CONFIG_VECTOR VIRTIO_MSI_LEGACY_CONFIG_VECTOR
+#define VIRTIO_MSI_QUEUE_VECTOR VIRTIO_MSI_LEGACY_QUEUE_VECTOR
+#define VIRTIO_PCI_CONFIG(dev) VIRTIO_PCI_LEGACY_CONFIG(dev)
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT
+#define VIRTIO_PCI_VRING_ALIGN VIRTIO_PCI_LEGACY_VRING_ALIGN
+#endif /* ...!KERNEL */
+
+/* Virtio ABI version, this must match exactly */
+#define VIRTIO_PCI_ABI_VERSION		0
+
+/* Vector value used to disable MSI for queue */
+#define VIRTIO_MSI_NO_VECTOR            0xffff
+
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG		0x2
 
 /* IDs for different capabilities.  Must all exist. */
 /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 10/22] virtio_pci: use _LEGACY_ defines in virtio_pci_legacy.c
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (8 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 09/22] virtio_pci: move old defines to legacy, introduce new structure Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 11/22] virtio_pci: don't use the legacy driver if we find the new PCI capabilities Rusty Russell
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

include/linux/virtio_pci.h turns off the compat defines, and we use it
rather than including include/uapi/linux/virtio_pci.h directly.

This makes it obvious if we use legacy defines elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci_legacy.c |   62 +++++++++++++++++++-----------------
 include/linux/virtio_pci.h         |    7 ++++
 include/uapi/linux/virtio_pci.h    |    6 ++--
 3 files changed, 43 insertions(+), 32 deletions(-)
 create mode 100644 include/linux/virtio_pci.h

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 77baa7c..c75eb39 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -110,7 +110,7 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
 	/* We only support 32 feature bits. */
-	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES);
+	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_HOST_FEATURES);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -122,7 +122,8 @@ static void vp_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	/* We only support 32 feature bits. */
-	iowrite32(vdev->features, vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
+	iowrite32(vdev->features,
+		  vp_dev->ioaddr + VIRTIO_PCI_LEGACY_GUEST_FEATURES);
 }
 
 /* Device config access: we use guest endian, as per spec. */
@@ -131,7 +132,7 @@ static void vp_get(struct virtio_device *vdev, unsigned offset,
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+				VIRTIO_PCI_LEGACY_CONFIG(vp_dev) + offset;
 	u8 *ptr = buf;
 	int i;
 
@@ -157,7 +158,7 @@ static void vp_set(struct virtio_device *vdev, unsigned offset,
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+				VIRTIO_PCI_LEGACY_CONFIG(vp_dev) + offset;
 	const u8 *ptr = buf;
 	int i;
 
@@ -181,7 +182,7 @@ VP_SETx(64)
 static u8 vp_get_status(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
 }
 
 static void vp_set_status(struct virtio_device *vdev, u8 status)
@@ -189,7 +190,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
-	iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
 }
 
 /* wait for pending irq handlers */
@@ -209,10 +210,10 @@ static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	/* 0 status means a reset. */
-	iowrite8(0, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	iowrite8(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
 	/* Flush out the status write, and flush in device writes,
 	 * including MSi-X interrupts, if any. */
-	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
+	ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 }
@@ -224,7 +225,7 @@ static void vp_notify(struct virtqueue *vq)
 
 	/* we write the queue's selector into the notification register to
 	 * signal the other end */
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NOTIFY);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -271,7 +272,7 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
-	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_ISR);
 
 	/* It's definitely not us if the ISR was not high */
 	if (!isr)
@@ -304,9 +305,9 @@ static void vp_free_vectors(struct virtio_device *vdev)
 	if (vp_dev->msix_enabled) {
 		/* Disable the vector used for configuration */
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 		/* Flush the write out to device */
-		ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+		ioread16(vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 
 		pci_disable_msix(vp_dev->pci_dev);
 		vp_dev->msix_enabled = 0;
@@ -371,9 +372,9 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		goto error;
 	++vp_dev->msix_used_vectors;
 
-	iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+	iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 	/* Verify we had enough resources to assign the vector */
-	v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+	v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 	if (v == VIRTIO_MSI_NO_VECTOR) {
 		err = -EBUSY;
 		goto error;
@@ -422,11 +423,11 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	int err;
 
 	/* Select the queue we're interested in */
-	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_SEL);
 
 	/* Check if queue is either not available or already active. */
-	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM);
-	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN))
+	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NUM);
+	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN))
 		return ERR_PTR(-ENOENT);
 
 	/* allocate and fill out our structure the represents an active
@@ -438,7 +439,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	info->num = num;
 	info->msix_vector = msix_vec;
 
-	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
+	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_LEGACY_VRING_ALIGN));
 	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
 	if (info->queue == NULL) {
 		err = -ENOMEM;
@@ -446,11 +447,12 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	}
 
 	/* activate the queue */
-	iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
-		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+	iowrite32(virt_to_phys(info->queue)>>VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT,
+		  vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
 	/* create the vring */
-	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
+	vq = vring_new_virtqueue(index, info->num,
+				 VIRTIO_PCI_LEGACY_VRING_ALIGN, vdev,
 				 true, info->queue, vp_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
@@ -461,8 +463,10 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	info->vq = vq;
 
 	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
-		iowrite16(msix_vec, vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
-		msix_vec = ioread16(vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		iowrite16(msix_vec,
+			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
+		msix_vec = ioread16(vp_dev->ioaddr
+				    + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
 		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
 			err = -EBUSY;
 			goto out_assign;
@@ -482,7 +486,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 out_assign:
 	vring_del_virtqueue(vq);
 out_activate_queue:
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 	free_pages_exact(info->queue, size);
 out_info:
 	kfree(info);
@@ -499,21 +503,21 @@ static void vp_del_vq(struct virtqueue *vq)
 	list_del(&info->node);
 	spin_unlock_irqrestore(&vp_dev->lock, flags);
 
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_SEL);
 
 	if (vp_dev->msix_enabled) {
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
 		/* Flush the write out to device */
-		ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+		ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_ISR);
 	}
 
 	vring_del_virtqueue(vq);
 
 	/* Select and deactivate the queue */
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
-	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
+	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_LEGACY_VRING_ALIGN));
 	free_pages_exact(info->queue, size);
 	kfree(info);
 }
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
new file mode 100644
index 0000000..af5b9ab
--- /dev/null
+++ b/include/linux/virtio_pci.h
@@ -0,0 +1,7 @@
+#ifndef _LINUX_VIRTIO_PCI_H
+#define _LINUX_VIRTIO_PCI_H
+
+#define VIRTIO_PCI_NO_LEGACY
+#include <uapi/linux/virtio_pci.h>
+
+#endif /* _LINUX_VIRTIO_PCI_H */
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 9eb6373..0d12828 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -36,8 +36,8 @@
  * SUCH DAMAGE.
  */
 
-#ifndef _LINUX_VIRTIO_PCI_H
-#define _LINUX_VIRTIO_PCI_H
+#ifndef _UAPI_LINUX_VIRTIO_PCI_H
+#define _UAPI_LINUX_VIRTIO_PCI_H
 
 #include <linux/virtio_config.h>
 
@@ -152,4 +152,4 @@ struct virtio_pci_common_cfg {
 	__le16 queue_msix_vector;/* read-write */
 	__le64 queue_address;	/* read-write: 0xFFFFFFFFFFFFFFFF == DNE. */
 };
-#endif
+#endif /* _UAPI_LINUX_VIRTIO_PCI_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 11/22] virtio_pci: don't use the legacy driver if we find the new PCI capabilities.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (9 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 10/22] virtio_pci: use _LEGACY_ defines in virtio_pci_legacy.c Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 12/22] virtio_pci: allow duplicate capabilities Rusty Russell
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

With module option to override.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci_legacy.c |   20 +++++++++++++++++++-
 include/linux/virtio_pci.h         |   16 ++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index c75eb39..501fa79 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -26,6 +26,10 @@
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
 
+static bool force_nonlegacy;
+module_param(force_nonlegacy, bool, 0644);
+MODULE_PARM_DESC(force_nonlegacy, "Take over non-legacy virtio devices too");
+
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
 MODULE_DESCRIPTION("virtio-pci-legacy");
 MODULE_LICENSE("GPL");
@@ -711,7 +715,7 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 			    const struct pci_device_id *id)
 {
 	struct virtio_pci_device *vp_dev;
-	int err;
+	int err, cap;
 
 	/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
 	if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
@@ -723,6 +727,20 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 		return -ENODEV;
 	}
 
+	/* We leave modern virtio-pci for the modern driver. */
+	cap = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG);
+	if (cap) {
+		if (force_nonlegacy)
+			dev_info(&pci_dev->dev,
+				 "virtio_pci_legacy: forcing legacy mode!\n");
+		else {
+			dev_info(&pci_dev->dev,
+				 "virtio_pci_legacy: leaving to"
+				 " non-legacy driver\n");
+			return -ENODEV;
+		}
+	}
+
 	/* allocate our structure and fill it out */
 	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
 	if (vp_dev == NULL)
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index af5b9ab..2714160 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -4,4 +4,20 @@
 #define VIRTIO_PCI_NO_LEGACY
 #include <uapi/linux/virtio_pci.h>
 
+/* Returns offset of the capability, or 0. */
+static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type)
+{
+	int pos;
+
+	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+	     pos > 0;
+	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+		u8 type;
+		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+							 cfg_type), &type);
+		if (type == cfg_type)
+			return pos;
+	}
+	return 0;
+}
 #endif /* _LINUX_VIRTIO_PCI_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (10 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 11/22] virtio_pci: don't use the legacy driver if we find the new PCI capabilities Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21 10:28   ` Michael S. Tsirkin
  2013-03-21  8:29 ` [PATCH 13/22] virtio_pci: new, capability-aware driver Rusty Russell
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: H. Peter Anvin

Another HPA suggestion: that the device be allowed to offer duplicate
capabilities, particularly so it can offer a mem and an I/O bar and let
the guest decide (Linux guest probably doesn't care?).

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci_legacy.c |    3 ++-
 include/linux/virtio_pci.h         |   20 ++++++++++++++++----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 501fa79..c7aadcb 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -728,7 +728,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	}
 
 	/* We leave modern virtio-pci for the modern driver. */
-	cap = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG);
+	cap = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
+					 IORESOURCE_IO|IORESOURCE_MEM);
 	if (cap) {
 		if (force_nonlegacy)
 			dev_info(&pci_dev->dev,
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index 2714160..6d2816b 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -4,18 +4,30 @@
 #define VIRTIO_PCI_NO_LEGACY
 #include <uapi/linux/virtio_pci.h>
 
-/* Returns offset of the capability, or 0. */
-static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type)
+/**
+ * virtio_pci_find_capability - walk capabilities to find device info.
+ * @dev: the pci device
+ * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
+ * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
+ *
+ * Returns offset of the capability, or 0.
+ */
+static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
+					     u32 ioresource_types)
 {
 	int pos;
 
 	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
 	     pos > 0;
 	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
-		u8 type;
+		u8 type, bar;
 		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
 							 cfg_type), &type);
-		if (type == cfg_type)
+		if (type != cfg_type)
+			continue;
+		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+							 bar), &bar);
+		if (pci_resource_flags(dev, bar) & ioresource_types)
 			return pos;
 	}
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 13/22] virtio_pci: new, capability-aware driver.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (11 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 12/22] virtio_pci: allow duplicate capabilities Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21 10:24   ` Michael S. Tsirkin
  2013-03-21  8:29 ` [PATCH 14/22] virtio_pci: layout changes as per hpa's suggestions Rusty Russell
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Differences:
1) Uses 4 pci capabilities to demark common, irq, notify and dev-specific areas.
2) Guest sets queue size, using host-provided maximum.
3) Guest sets queue alignment, rather than ABI-defined 4096.
4) More than 32 feature bits (a lot more!).

Signed-off-by: Rusty Russell <rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/Makefile     |    1 +
 drivers/virtio/virtio_pci.c |  979 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 980 insertions(+)
 create mode 100644 drivers/virtio/virtio_pci.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 23834f5..eec0a42 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
+obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
new file mode 100644
index 0000000..b86b99c
--- /dev/null
+++ b/drivers/virtio/virtio_pci.c
@@ -0,0 +1,979 @@
+/*
+ * Virtio PCI driver
+ *
+ * This module allows virtio devices to be used over a virtual PCI
+ * device.  Copyright 2011, Rusty Russell IBM Corporation, but based
+ * on the older virtio_pci_legacy.c, which was Copyright IBM
+ * Corp. 2007.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#define VIRTIO_PCI_NO_LEGACY
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_pci.h>
+#include <linux/highmem.h>
+#include <linux/spinlock.h>
+
+MODULE_AUTHOR("Rusty Russell <rusty@rustcorp.com.au>");
+MODULE_DESCRIPTION("virtio-pci");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("2");
+
+/* Our device structure */
+struct virtio_pci_device {
+	struct virtio_device vdev;
+	struct pci_dev *pci_dev;
+
+	/* The IO mapping for the PCI config space */
+	struct virtio_pci_common_cfg __iomem *common;
+	/* Where to read and clear interrupt */
+	u8 __iomem *isr;
+	/* Write the virtqueue index here to notify device of activity. */
+	__le16 __iomem *notify;
+	/* Device-specific data. */
+	void __iomem *device;
+
+	/* a list of queues so we can dispatch IRQs */
+	spinlock_t lock;
+	struct list_head virtqueues;
+
+	/* MSI-X support */
+	int msix_enabled;
+	int intx_enabled;
+	struct msix_entry *msix_entries;
+	cpumask_var_t *msix_affinity_masks;
+	/* Name strings for interrupts. This size should be enough,
+	 * and I'm too lazy to allocate each name separately. */
+	char (*msix_names)[256];
+	/* Number of available vectors */
+	unsigned msix_vectors;
+	/* Vectors allocated, excluding per-vq vectors if any */
+	unsigned msix_used_vectors;
+
+	/* Status saved during hibernate/restore */
+	u8 saved_status;
+
+	/* Whether we have vector per vq */
+	bool per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+	VP_MSIX_CONFIG_VECTOR = 0,
+	VP_MSIX_VQ_VECTOR = 1,
+};
+
+struct virtio_pci_vq_info {
+	/* the actual virtqueue */
+	struct virtqueue *vq;
+
+	/* the pages used for the queue. */
+	void *queue;
+
+	/* the list node for the virtqueues list */
+	struct list_head node;
+
+	/* MSI-X vector (or none) */
+	unsigned msix_vector;
+};
+
+/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
+static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
+	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
+	{ 0 }
+};
+
+MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
+
+/* Convert a generic virtio device to our structure */
+static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
+{
+	return container_of(vdev, struct virtio_pci_device, vdev);
+}
+
+/* There is no iowrite64.  We use two 32-bit ops. */
+static void iowrite64(u64 val, const __le64 *addr)
+{
+	iowrite32((u32)val, (__le32 *)addr);
+	iowrite32(val >> 32, (__le32 *)addr + 1);
+}
+
+/* There is no ioread64.  We use two 32-bit ops. */
+static u64 ioread64(__le64 *addr)
+{
+	return ioread32(addr) | ((u64)ioread32((__le32 *)addr + 1) << 32);
+}
+
+static u64 vp_get_features(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	u64 features;
+
+	iowrite32(0, &vp_dev->common->device_feature_select);
+	features = ioread32(&vp_dev->common->device_feature);
+	iowrite32(1, &vp_dev->common->device_feature_select);
+	features |= ((u64)ioread32(&vp_dev->common->device_feature) << 32);
+	return features;
+}
+
+static void vp_finalize_features(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	iowrite32(0, &vp_dev->common->guest_feature_select);
+	iowrite32((u32)vdev->features, &vp_dev->common->guest_feature);
+	iowrite32(1, &vp_dev->common->guest_feature_select);
+	iowrite32(vdev->features >> 32, &vp_dev->common->guest_feature);
+}
+
+/* virtio config->get() implementation */
+static void vp_get(struct virtio_device *vdev, unsigned offset,
+		   void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void __iomem *ioaddr = vp_dev->device + offset;
+	u8 *ptr = buf;
+	int i;
+
+	for (i = 0; i < len; i++)
+		ptr[i] = ioread8(ioaddr + i);
+}
+
+#define VP_GETx(bits)							\
+static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
+{									\
+	u##bits v;							\
+	vp_get(vdev, offset, &v, sizeof(v));				\
+	return v;							\
+}
+
+VP_GETx(8)
+VP_GETx(16)
+VP_GETx(32)
+VP_GETx(64)
+
+/* the config->set() implementation.  it's symmetric to the config->get()
+ * implementation */
+static void vp_set(struct virtio_device *vdev, unsigned offset,
+		   const void *buf, unsigned len)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	void __iomem *ioaddr = vp_dev->device + offset;
+	const u8 *ptr = buf;
+	int i;
+
+	for (i = 0; i < len; i++)
+		iowrite8(ptr[i], ioaddr + i);
+}
+
+#define VP_SETx(bits)							\
+static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
+			 u##bits v)					\
+{									\
+	vp_set(vdev, offset, &v, sizeof(v));				\
+}
+
+VP_SETx(8)
+VP_SETx(16)
+VP_SETx(32)
+VP_SETx(64)
+
+/* config->{get,set}_status() implementations */
+static u8 vp_get_status(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	return ioread8(&vp_dev->common->device_status);
+}
+
+static void vp_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	/* We should never be setting status to 0. */
+	BUG_ON(status == 0);
+	iowrite8(status, &vp_dev->common->device_status);
+}
+
+/* wait for pending irq handlers */
+static void vp_synchronize_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		synchronize_irq(vp_dev->pci_dev->irq);
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		synchronize_irq(vp_dev->msix_entries[i].vector);
+}
+
+static void vp_reset(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	/* 0 status means a reset. */
+	iowrite8(0, &vp_dev->common->device_status);
+	/* Flush out the status write, and flush in device writes,
+	 * including MSi-X interrupts, if any. */
+	ioread8(&vp_dev->common->device_status);
+	/* Flush pending VQ/configuration callbacks. */
+	vp_synchronize_vectors(vdev);
+}
+
+/* the notify function used when creating a virt queue */
+static void vp_notify(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+
+	/* we write the queue's selector into the notification register to
+	 * signal the other end */
+	iowrite16(vq->index, vp_dev->notify);
+}
+
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_driver *drv;
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	if (drv->config_changed)
+		drv->config_changed(&vp_dev->vdev);
+	return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_pci_vq_info *info;
+	irqreturn_t ret = IRQ_NONE;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_for_each_entry(info, &vp_dev->virtqueues, node) {
+		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+			ret = IRQ_HANDLED;
+	}
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	return ret;
+}
+
+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+static irqreturn_t vp_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	u8 isr;
+
+	/* reading the ISR has the effect of also clearing it so it's very
+	 * important to save off the value. */
+	isr = ioread8(vp_dev->isr);
+
+	/* It's definitely not us if the ISR was not high */
+	if (!isr)
+		return IRQ_NONE;
+
+	/* Configuration change?  Tell driver if it wants to know. */
+	if (isr & VIRTIO_PCI_ISR_CONFIG)
+		vp_config_changed(irq, opaque);
+
+	return vp_vring_interrupt(irq, opaque);
+}
+
+static void vp_free_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled) {
+		free_irq(vp_dev->pci_dev->irq, vp_dev);
+		vp_dev->intx_enabled = 0;
+	}
+
+	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
+		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+
+	for (i = 0; i < vp_dev->msix_vectors; i++)
+		if (vp_dev->msix_affinity_masks[i])
+			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
+
+	if (vp_dev->msix_enabled) {
+		/* Disable the vector used for configuration */
+		iowrite16(VIRTIO_MSI_NO_VECTOR, &vp_dev->common->msix_config);
+		/* Flush the write out to device */
+		ioread16(&vp_dev->common->msix_config);
+
+		pci_disable_msix(vp_dev->pci_dev);
+		vp_dev->msix_enabled = 0;
+		vp_dev->msix_vectors = 0;
+	}
+
+	vp_dev->msix_used_vectors = 0;
+	kfree(vp_dev->msix_names);
+	vp_dev->msix_names = NULL;
+	kfree(vp_dev->msix_entries);
+	vp_dev->msix_entries = NULL;
+	kfree(vp_dev->msix_affinity_masks);
+	vp_dev->msix_affinity_masks = NULL;
+}
+
+static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
+				   bool per_vq_vectors)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	const char *name = dev_name(&vp_dev->vdev.dev);
+	unsigned i, v;
+	int err = -ENOMEM;
+
+	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
+				       GFP_KERNEL);
+	if (!vp_dev->msix_entries)
+		goto error;
+	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
+				     GFP_KERNEL);
+	if (!vp_dev->msix_names)
+		goto error;
+	vp_dev->msix_affinity_masks
+		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
+			  GFP_KERNEL);
+	if (!vp_dev->msix_affinity_masks)
+		goto error;
+	for (i = 0; i < nvectors; ++i)
+		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
+					GFP_KERNEL))
+			goto error;
+
+	for (i = 0; i < nvectors; ++i)
+		vp_dev->msix_entries[i].entry = i;
+
+	/* pci_enable_msix returns positive if we can't get this many. */
+	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
+	if (err > 0)
+		err = -ENOSPC;
+	if (err)
+		goto error;
+	vp_dev->msix_vectors = nvectors;
+	vp_dev->msix_enabled = 1;
+
+	/* Set the vector used for configuration */
+	v = vp_dev->msix_used_vectors;
+	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+		 "%s-config", name);
+	err = request_irq(vp_dev->msix_entries[v].vector,
+			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  vp_dev);
+	if (err)
+		goto error;
+	++vp_dev->msix_used_vectors;
+
+	iowrite16(v, &vp_dev->common->msix_config);
+	/* Verify we had enough resources to assign the vector */
+	v = ioread16(&vp_dev->common->msix_config);
+	if (v == VIRTIO_MSI_NO_VECTOR) {
+		err = -EBUSY;
+		goto error;
+	}
+
+	if (!per_vq_vectors) {
+		/* Shared vector for all VQs */
+		v = vp_dev->msix_used_vectors;
+		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+			 "%s-virtqueues", name);
+		err = request_irq(vp_dev->msix_entries[v].vector,
+				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
+				  vp_dev);
+		if (err)
+			goto error;
+		++vp_dev->msix_used_vectors;
+	}
+	return 0;
+error:
+	vp_free_vectors(vdev);
+	return err;
+}
+
+static int vp_request_intx(struct virtio_device *vdev)
+{
+	int err;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
+			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
+	if (!err)
+		vp_dev->intx_enabled = 1;
+	return err;
+}
+
+static size_t vring_pci_size(u16 num)
+{
+	/* We only need a cacheline separation. */
+	return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES));
+}
+
+static void *alloc_virtqueue_pages(u16 *num)
+{
+	void *pages;
+
+	/* 1024 entries uses about 32k */
+	if (*num > 1024)
+		*num = 1024;
+
+	for (; *num; *num /= 2) {
+		pages = alloc_pages_exact(vring_pci_size(*num),
+					  GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN);
+		if (pages)
+			return pages;
+	}
+	return NULL;
+}
+
+static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
+				  void (*callback)(struct virtqueue *vq),
+				  const char *name,
+				  u16 msix_vec)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info;
+	struct virtqueue *vq;
+	u16 num;
+	int err;
+
+	/* Select the queue we're interested in */
+	iowrite16(index, &vp_dev->common->queue_select);
+
+	switch (ioread64(&vp_dev->common->queue_address)) {
+	case 0xFFFFFFFFFFFFFFFFULL:
+		return ERR_PTR(-ENOENT);
+	case 0:
+		/* Uninitialized.  Excellent. */
+		break;
+	default:
+		/* We've already set this up? */
+		return ERR_PTR(-EBUSY);
+	}
+
+	/* Maximum size must be a power of 2. */
+	num = ioread16(&vp_dev->common->queue_size);
+	if (num & (num - 1)) {
+		dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", num);
+		return ERR_PTR(-EINVAL);
+	}
+
+	/* allocate and fill out our structure the represents an active
+	 * queue */
+	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	info->msix_vector = msix_vec;
+
+	info->queue = alloc_virtqueue_pages(&num);
+	if (info->queue == NULL) {
+		err = -ENOMEM;
+		goto out_info;
+	}
+
+	/* create the vring */
+	vq = vring_new_virtqueue(index, num, SMP_CACHE_BYTES, vdev,
+				 true, info->queue, vp_notify, callback, name);
+	if (!vq) {
+		err = -ENOMEM;
+		goto out_alloc_pages;
+	}
+
+	vq->priv = info;
+	info->vq = vq;
+
+	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
+		iowrite16(msix_vec, &vp_dev->common->queue_msix_vector);
+		msix_vec = ioread16(&vp_dev->common->queue_msix_vector);
+		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
+			err = -EBUSY;
+			goto out_new_virtqueue;
+		}
+	}
+
+	if (callback) {
+		unsigned long flags;
+		spin_lock_irqsave(&vp_dev->lock, flags);
+		list_add(&info->node, &vp_dev->virtqueues);
+		spin_unlock_irqrestore(&vp_dev->lock, flags);
+	} else {
+		INIT_LIST_HEAD(&info->node);
+	}
+
+	/* Activate the queue. */
+	iowrite64(virt_to_phys(info->queue), &vp_dev->common->queue_address);
+	iowrite16(SMP_CACHE_BYTES, &vp_dev->common->queue_align);
+	iowrite16(num, &vp_dev->common->queue_size);
+
+	return vq;
+
+out_new_virtqueue:
+	vring_del_virtqueue(vq);
+out_alloc_pages:
+	free_pages_exact(info->queue, vring_pci_size(num));
+out_info:
+	kfree(info);
+	return ERR_PTR(err);
+}
+
+static void vp_del_vq(struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+	unsigned long flags, size = vring_pci_size(vq->vring.num);
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_del(&info->node);
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	/* Select and deactivate the queue */
+	iowrite16(vq->index, &vp_dev->common->queue_select);
+
+	if (vp_dev->msix_enabled) {
+		iowrite16(VIRTIO_MSI_NO_VECTOR,
+			  &vp_dev->common->queue_msix_vector);
+		/* Flush the write out to device */
+		ioread16(&vp_dev->common->queue_msix_vector);
+	}
+
+	vring_del_virtqueue(vq);
+
+	/* This is for our own benefit, not the device's! */
+	iowrite64(0, &vp_dev->common->queue_address);
+	iowrite16(0, &vp_dev->common->queue_size);
+	iowrite16(0, &vp_dev->common->queue_align);
+
+	free_pages_exact(info->queue, size);
+	kfree(info);
+}
+
+/* the config->del_vqs() implementation */
+static void vp_del_vqs(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtqueue *vq, *n;
+	struct virtio_pci_vq_info *info;
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		info = vq->priv;
+		if (vp_dev->per_vq_vectors &&
+			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
+			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
+				 vq);
+		vp_del_vq(vq);
+	}
+	vp_dev->per_vq_vectors = false;
+
+	vp_free_vectors(vdev);
+}
+
+static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+			      struct virtqueue *vqs[],
+			      vq_callback_t *callbacks[],
+			      const char *names[],
+			      bool use_msix,
+			      bool per_vq_vectors)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	u16 msix_vec;
+	int i, err, nvectors, allocated_vectors;
+
+	if (!use_msix) {
+		/* Old style: one normal interrupt for change and all vqs. */
+		err = vp_request_intx(vdev);
+		if (err)
+			goto error_request;
+	} else {
+		if (per_vq_vectors) {
+			/* Best option: one for change interrupt, one per vq. */
+			nvectors = 1;
+			for (i = 0; i < nvqs; ++i)
+				if (callbacks[i])
+					++nvectors;
+		} else {
+			/* Second best: one for change, shared for all vqs. */
+			nvectors = 2;
+		}
+
+		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
+		if (err)
+			goto error_request;
+	}
+
+	vp_dev->per_vq_vectors = per_vq_vectors;
+	allocated_vectors = vp_dev->msix_used_vectors;
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		} else if (!callbacks[i] || !vp_dev->msix_enabled)
+			msix_vec = VIRTIO_MSI_NO_VECTOR;
+		else if (vp_dev->per_vq_vectors)
+			msix_vec = allocated_vectors++;
+		else
+			msix_vec = VP_MSIX_VQ_VECTOR;
+		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto error_find;
+		}
+
+		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
+			continue;
+
+		/* allocate per-vq irq if available and necessary */
+		snprintf(vp_dev->msix_names[msix_vec],
+			 sizeof *vp_dev->msix_names,
+			 "%s-%s",
+			 dev_name(&vp_dev->vdev.dev), names[i]);
+		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
+				  vring_interrupt, 0,
+				  vp_dev->msix_names[msix_vec],
+				  vqs[i]);
+		if (err) {
+			vp_del_vq(vqs[i]);
+			goto error_find;
+		}
+	}
+	return 0;
+
+error_find:
+	vp_del_vqs(vdev);
+
+error_request:
+	return err;
+}
+
+/* the config->find_vqs() implementation */
+static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+		       struct virtqueue *vqs[],
+		       vq_callback_t *callbacks[],
+		       const char *names[])
+{
+	int err;
+
+	/* Try MSI-X with one vector per queue. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
+	if (!err)
+		return 0;
+	/* Fallback: MSI-X with one vector for config, one shared for queues. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				 true, false);
+	if (!err)
+		return 0;
+	/* Finally fall back to regular interrupts. */
+	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				  false, false);
+}
+
+static const char *vp_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	return pci_name(vp_dev->pci_dev);
+}
+
+/* Setup the affinity for a virtqueue:
+ * - force the affinity for per vq vector
+ * - OR over all affinities for shared MSI
+ * - ignore the affinity request if we're using INTX
+ */
+static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
+{
+	struct virtio_device *vdev = vq->vdev;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+	struct cpumask *mask;
+	unsigned int irq;
+
+	if (!vq->callback)
+		return -EINVAL;
+
+	if (vp_dev->msix_enabled) {
+		mask = vp_dev->msix_affinity_masks[info->msix_vector];
+		irq = vp_dev->msix_entries[info->msix_vector].vector;
+		if (cpu == -1)
+			irq_set_affinity_hint(irq, NULL);
+		else {
+			cpumask_set_cpu(cpu, mask);
+			irq_set_affinity_hint(irq, mask);
+		}
+	}
+	return 0;
+}
+
+static const struct virtio_config_ops virtio_pci_config_ops = {
+	.get8		= vp_get8,
+	.set8		= vp_set8,
+	.get16		= vp_get16,
+	.set16		= vp_set16,
+	.get32		= vp_get32,
+	.set32		= vp_set32,
+	.get64		= vp_get64,
+	.set64		= vp_set64,
+	.get_status	= vp_get_status,
+	.set_status	= vp_set_status,
+	.reset		= vp_reset,
+	.find_vqs	= vp_find_vqs,
+	.del_vqs	= vp_del_vqs,
+	.get_features	= vp_get_features,
+	.finalize_features = vp_finalize_features,
+	.bus_name	= vp_bus_name,
+	.set_vq_affinity = vp_set_vq_affinity,
+};
+
+static void virtio_pci_release_dev(struct device *_d)
+{
+	/*
+	 * No need for a release method as we allocate/free
+	 * all devices together with the pci devices.
+	 * Provide an empty one to avoid getting a warning from core.
+	 */
+}
+
+static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
+{
+	u8 bar;
+	u32 offset, length;
+	void __iomem *p;
+
+	pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap, bar),
+			     &bar);
+	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
+			     &offset);
+	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
+			     &length);
+
+	if (length < expect) {
+		dev_err(&dev->dev,
+			"virtio_pci: small capability len %u (%u expected)\n",
+			length, expect);
+		return NULL;
+	}
+
+	/* We want uncachable mapping, even if bar is cachable. */
+	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
+	if (!p)
+		dev_err(&dev->dev,
+			"virtio_pci: unable to map virtio %u@%u on bar %i\n",
+			length, offset, bar);
+	return p;
+}
+
+
+/* the PCI probing function */
+static int virtio_pci_probe(struct pci_dev *pci_dev,
+			    const struct pci_device_id *id)
+{
+	struct virtio_pci_device *vp_dev;
+	int err, common, isr, notify, device;
+
+	/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
+	if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
+		return -ENODEV;
+
+	if (pci_dev->revision != VIRTIO_PCI_ABI_VERSION) {
+		printk(KERN_ERR "virtio_pci: expected ABI version %d, got %d\n",
+		       VIRTIO_PCI_ABI_VERSION, pci_dev->revision);
+		return -ENODEV;
+	}
+
+	/* check for a common config: if not, use legacy mode (bar 0). */
+	common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
+					    IORESOURCE_IO|IORESOURCE_MEM);
+	if (!common) {
+		dev_info(&pci_dev->dev,
+			 "virtio_pci: leaving for legacy driver\n");
+		return -ENODEV;
+	}
+
+	/* If common is there, these should be too... */
+	isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
+					 IORESOURCE_IO|IORESOURCE_MEM);
+	notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
+					    IORESOURCE_IO|IORESOURCE_MEM);
+	device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
+					    IORESOURCE_IO|IORESOURCE_MEM);
+	if (!isr || !notify || !device) {
+		dev_err(&pci_dev->dev,
+			"virtio_pci: missing capabilities %i/%i/%i/%i\n",
+			common, isr, notify, device);
+		return -EINVAL;
+	}
+
+	/* allocate our structure and fill it out */
+	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
+	if (vp_dev == NULL)
+		return -ENOMEM;
+
+	vp_dev->vdev.dev.parent = &pci_dev->dev;
+	vp_dev->vdev.dev.release = virtio_pci_release_dev;
+	vp_dev->vdev.config = &virtio_pci_config_ops;
+	vp_dev->pci_dev = pci_dev;
+	INIT_LIST_HEAD(&vp_dev->virtqueues);
+	spin_lock_init(&vp_dev->lock);
+
+	/* Disable MSI/MSIX to bring device to a known good state. */
+	pci_msi_off(pci_dev);
+
+	/* enable the device */
+	err = pci_enable_device(pci_dev);
+	if (err)
+		goto out;
+
+	err = pci_request_regions(pci_dev, "virtio-pci");
+	if (err)
+		goto out_enable_device;
+
+	err = -EINVAL;
+	vp_dev->common = map_capability(pci_dev, common,
+					sizeof(struct virtio_pci_common_cfg));
+	if (!vp_dev->common)
+		goto out_req_regions;
+	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8));
+	if (!vp_dev->isr)
+		goto out_map_common;
+	vp_dev->notify = map_capability(pci_dev, notify, sizeof(u16));
+	if (!vp_dev->notify)
+		goto out_map_isr;
+	vp_dev->device = map_capability(pci_dev, device, 0);
+	if (!vp_dev->device)
+		goto out_map_notify;
+
+	pci_set_drvdata(pci_dev, vp_dev);
+	pci_set_master(pci_dev);
+
+	/* we use the subsystem vendor/device id as the virtio vendor/device
+	 * id.  this allows us to use the same PCI vendor/device id for all
+	 * virtio devices and to identify the particular virtio driver by
+	 * the subsystem ids */
+	vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
+	vp_dev->vdev.id.device = pci_dev->subsystem_device;
+
+	/* finally register the virtio device */
+	err = register_virtio_device(&vp_dev->vdev);
+	if (err)
+		goto out_set_drvdata;
+
+	return 0;
+
+out_set_drvdata:
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->device);
+out_map_notify:
+	pci_iounmap(pci_dev, vp_dev->notify);
+out_map_isr:
+	pci_iounmap(pci_dev, vp_dev->isr);
+out_map_common:
+	pci_iounmap(pci_dev, vp_dev->common);
+out_req_regions:
+	pci_release_regions(pci_dev);
+out_enable_device:
+	pci_disable_device(pci_dev);
+out:
+	kfree(vp_dev);
+	return err;
+}
+
+static void virtio_pci_remove(struct pci_dev *pci_dev)
+{
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+
+	unregister_virtio_device(&vp_dev->vdev);
+
+	vp_del_vqs(&vp_dev->vdev);
+	pci_set_drvdata(pci_dev, NULL);
+	pci_iounmap(pci_dev, vp_dev->device);
+	pci_iounmap(pci_dev, vp_dev->notify);
+	pci_iounmap(pci_dev, vp_dev->isr);
+	pci_iounmap(pci_dev, vp_dev->common);
+	pci_release_regions(pci_dev);
+	pci_disable_device(pci_dev);
+	kfree(vp_dev);
+}
+
+#ifdef CONFIG_PM
+static int virtio_pci_freeze(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = 0;
+	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
+	if (drv && drv->freeze)
+		ret = drv->freeze(&vp_dev->vdev);
+
+	if (!ret)
+		pci_disable_device(pci_dev);
+	return ret;
+}
+
+static int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+
+	pci_set_master(pci_dev);
+	vp_finalize_features(&vp_dev->vdev);
+
+	if (drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	/* Finally, tell the device we're all set */
+	if (!ret)
+		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
+
+	return ret;
+}
+
+static const struct dev_pm_ops virtio_pci_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
+};
+#endif
+
+static struct pci_driver virtio_pci_driver = {
+	.name		= "virtio-pci",
+	.id_table	= virtio_pci_id_table,
+	.probe		= virtio_pci_probe,
+	.remove		= virtio_pci_remove,
+#ifdef CONFIG_PM
+ 	.driver.pm	= &virtio_pci_pm_ops,
+#endif
+};
+
+module_pci_driver(virtio_pci_driver);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 14/22] virtio_pci: layout changes as per hpa's suggestions.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (12 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 13/22] virtio_pci: new, capability-aware driver Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 15/22] virtio_pci: use little endian for config space Rusty Russell
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: H. Peter Anvin

1) Drop the FIXME reference to BIR, just use BAR.
2) Make queue addresses explicit: desc/avail/used.
3) Add an explicit number of queues.
4) Have explicit queue_enable, which can double as a method to disable
   all activity on a queue (write 0, read until 0).

I also noticed that the 64-bit queue_address was at offset 28 (0x1C),
which is unusual, so add more padding to take the first 64-bit value
to offset 32 (0x20).

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci.c     |   48 +++++++++++++++++++++++++++++++--------
 include/uapi/linux/virtio_pci.h |   11 +++++----
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index b86b99c..0169531 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -457,13 +457,14 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	u16 num;
 	int err;
 
+	if (index >= ioread16(&vp_dev->common->num_queues))
+		return ERR_PTR(-ENOENT);
+
 	/* Select the queue we're interested in */
 	iowrite16(index, &vp_dev->common->queue_select);
 
-	switch (ioread64(&vp_dev->common->queue_address)) {
-	case 0xFFFFFFFFFFFFFFFFULL:
-		return ERR_PTR(-ENOENT);
-	case 0:
+	/* Sanity check */
+	switch (ioread64(&vp_dev->common->queue_desc)) {
 		/* Uninitialized.  Excellent. */
 		break;
 	default:
@@ -522,9 +523,11 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	}
 
 	/* Activate the queue. */
-	iowrite64(virt_to_phys(info->queue), &vp_dev->common->queue_address);
-	iowrite16(SMP_CACHE_BYTES, &vp_dev->common->queue_align);
 	iowrite16(num, &vp_dev->common->queue_size);
+	iowrite64(virt_to_phys(vq->vring.desc), &vp_dev->common->queue_desc);
+	iowrite64(virt_to_phys(vq->vring.avail), &vp_dev->common->queue_avail);
+	iowrite64(virt_to_phys(vq->vring.used), &vp_dev->common->queue_used);
+	iowrite8(1, &vp_dev->common->queue_enable);
 
 	return vq;
 
@@ -537,6 +540,29 @@ out_info:
 	return ERR_PTR(err);
 }
 
+static void vp_vq_disable(struct virtio_pci_device *vp_dev,
+			  struct virtqueue *vq)
+{
+	unsigned long end;
+
+	/* Select the queue */
+	iowrite16(vq->index, &vp_dev->common->queue_select);
+
+	/* Disable it */
+ 	iowrite16(0, &vp_dev->common->queue_enable);
+
+	/* It's almost certainly synchronous, but just in case. */
+	end = jiffies + HZ/2;
+	while (ioread16(&vp_dev->common->queue_enable) != 0) {
+		if (time_after(jiffies, end)) {
+			dev_warn(&vp_dev->pci_dev->dev,
+				 "virtio_pci: disable ignored\n");
+			break;
+		}
+		cpu_relax();
+	}
+}
+
 static void vp_del_vq(struct virtqueue *vq)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
@@ -547,7 +573,10 @@ static void vp_del_vq(struct virtqueue *vq)
 	list_del(&info->node);
 	spin_unlock_irqrestore(&vp_dev->lock, flags);
 
-	/* Select and deactivate the queue */
+	/* It should be quiescent, but disable first just in case. */
+	vp_vq_disable(vp_dev, vq);
+
+	/* Select the queue */
 	iowrite16(vq->index, &vp_dev->common->queue_select);
 
 	if (vp_dev->msix_enabled) {
@@ -560,9 +589,10 @@ static void vp_del_vq(struct virtqueue *vq)
 	vring_del_virtqueue(vq);
 
 	/* This is for our own benefit, not the device's! */
-	iowrite64(0, &vp_dev->common->queue_address);
 	iowrite16(0, &vp_dev->common->queue_size);
-	iowrite16(0, &vp_dev->common->queue_align);
+	iowrite64(0, &vp_dev->common->queue_desc);
+	iowrite64(0, &vp_dev->common->queue_avail);
+	iowrite64(0, &vp_dev->common->queue_used);
 
 	free_pages_exact(info->queue, size);
 	kfree(info);
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 0d12828..b334cd9 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -128,7 +128,6 @@ struct virtio_pci_cap {
 	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
 	u8 cap_next;	/* Generic PCI field: next ptr. */
 	u8 cfg_type;	/* One of the VIRTIO_PCI_CAP_*_CFG. */
-/* FIXME: Should we use a bir, instead of raw bar number? */
 	u8 bar;		/* Where to find it. */
 	__le32 offset;	/* Offset within bar. */
 	__le32 length;	/* Length. */
@@ -142,14 +141,18 @@ struct virtio_pci_common_cfg {
 	__le32 guest_feature_select;	/* read-write */
 	__le32 guest_feature;		/* read-only */
 	__le16 msix_config;		/* read-write */
+	__le16 num_queues;		/* read-only */
 	__u8 device_status;		/* read-write */
-	__u8 unused;
+	__u8 unused1;
+	__le16 unused2;
 
 	/* About a specific virtqueue. */
 	__le16 queue_select;	/* read-write */
-	__le16 queue_align;	/* read-write, power of 2. */
 	__le16 queue_size;	/* read-write, power of 2. */
 	__le16 queue_msix_vector;/* read-write */
-	__le64 queue_address;	/* read-write: 0xFFFFFFFFFFFFFFFF == DNE. */
+	__le16 queue_enable;	/* read-write */
+	__le64 queue_desc;	/* read-write */
+	__le64 queue_avail;	/* read-write */
+	__le64 queue_used;	/* read-write */
 };
 #endif /* _UAPI_LINUX_VIRTIO_PCI_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 15/22] virtio_pci: use little endian for config space.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (13 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 14/22] virtio_pci: layout changes as per hpa's suggestions Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 16/22] virtio_pci: use separate notification offsets for each vq Rusty Russell
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Previously, it was defined as "guest-endian".  This was always confusing
for PCI, for which everything else is defined a little endian.

The ring itself is unchanged, this is just the per-device config info.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci.c |   77 +++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 0169531..f252afe 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -140,57 +140,62 @@ static void vp_finalize_features(struct virtio_device *vdev)
 	iowrite32(vdev->features >> 32, &vp_dev->common->guest_feature);
 }
 
-/* virtio config->get() implementation */
-static void vp_get(struct virtio_device *vdev, unsigned offset,
-		   void *buf, unsigned len)
+/* virtio config is little-endian for virtio_pci (vs guest-endian for legacy) */
+static u8 vp_get8(struct virtio_device *vdev, unsigned offset)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->device + offset;
-	u8 *ptr = buf;
-	int i;
 
-	for (i = 0; i < len; i++)
-		ptr[i] = ioread8(ioaddr + i);
+	return ioread8(vp_dev->device + offset);
 }
 
-#define VP_GETx(bits)							\
-static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
-{									\
-	u##bits v;							\
-	vp_get(vdev, offset, &v, sizeof(v));				\
-	return v;							\
+static void vp_set8(struct virtio_device *vdev, unsigned offset, u8 val)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	iowrite8(val, vp_dev->device + offset);
 }
 
-VP_GETx(8)
-VP_GETx(16)
-VP_GETx(32)
-VP_GETx(64)
+static u16 vp_get16(struct virtio_device *vdev, unsigned offset)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-/* the config->set() implementation.  it's symmetric to the config->get()
- * implementation */
-static void vp_set(struct virtio_device *vdev, unsigned offset,
-		   const void *buf, unsigned len)
+	return ioread16(vp_dev->device + offset);
+}
+
+static void vp_set16(struct virtio_device *vdev, unsigned offset, u16 val)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	iowrite16(val, vp_dev->device + offset);
+}
+
+static u32 vp_get32(struct virtio_device *vdev, unsigned offset)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->device + offset;
-	const u8 *ptr = buf;
-	int i;
 
-	for (i = 0; i < len; i++)
-		iowrite8(ptr[i], ioaddr + i);
+	return ioread32(vp_dev->device + offset);
 }
 
-#define VP_SETx(bits)							\
-static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
-			 u##bits v)					\
-{									\
-	vp_set(vdev, offset, &v, sizeof(v));				\
+static void vp_set32(struct virtio_device *vdev, unsigned offset, u32 val)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	iowrite32(val, vp_dev->device + offset);
 }
 
-VP_SETx(8)
-VP_SETx(16)
-VP_SETx(32)
-VP_SETx(64)
+static u64 vp_get64(struct virtio_device *vdev, unsigned offset)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	return ioread64(vp_dev->device + offset);
+}
+
+static void vp_set64(struct virtio_device *vdev, unsigned offset, u64 val)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	iowrite64(val, vp_dev->device + offset);
+}
 
 /* config->{get,set}_status() implementations */
 static u8 vp_get_status(struct virtio_device *vdev)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (14 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 15/22] virtio_pci: use little endian for config space Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21 10:13   ` Michael S. Tsirkin
  2013-03-21  8:29 ` [PATCH 17/22] virtio_pci_legacy: cleanup struct virtio_pci_vq_info Rusty Russell
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization; +Cc: Michael S. Tsirkin

(MST, is this what you were thinking?)

This uses the previously-unused field for "queue_notify".  This contains
the offset from the notification area given by the VIRTIO_PCI_CAP_NOTIFY_CFG
header.

(A device can still make them all overlap if it wants, since the queue
index is written: it can still distinguish different notifications).

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci.c     |   61 +++++++++++++++++++++++++++------------
 include/uapi/linux/virtio_pci.h |    2 +-
 2 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index f252afe..d492361 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -37,11 +37,15 @@ struct virtio_pci_device {
 	struct virtio_pci_common_cfg __iomem *common;
 	/* Where to read and clear interrupt */
 	u8 __iomem *isr;
-	/* Write the virtqueue index here to notify device of activity. */
-	__le16 __iomem *notify;
+	/* Write the vq index here to notify device of activity. */
+	void __iomem *notify_base;
 	/* Device-specific data. */
 	void __iomem *device;
 
+	/* So we can sanity-check accesses. */
+	size_t notify_len;
+	size_t device_len;
+
 	/* a list of queues so we can dispatch IRQs */
 	spinlock_t lock;
 	struct list_head virtqueues;
@@ -84,6 +88,9 @@ struct virtio_pci_vq_info {
 	/* the list node for the virtqueues list */
 	struct list_head node;
 
+	/* Notify area for this vq. */
+	u16 __iomem *notify;
+
 	/* MSI-X vector (or none) */
 	unsigned msix_vector;
 };
@@ -240,11 +247,11 @@ static void vp_reset(struct virtio_device *vdev)
 /* the notify function used when creating a virt queue */
 static void vp_notify(struct virtqueue *vq)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
 
-	/* we write the queue's selector into the notification register to
-	 * signal the other end */
-	iowrite16(vq->index, vp_dev->notify);
+	/* we write the queue selector into the notification register
+	 * to signal the other end */
+	iowrite16(vq->index, info->notify);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -460,7 +467,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	struct virtio_pci_vq_info *info;
 	struct virtqueue *vq;
 	u16 num;
-	int err;
+	int err, off;
 
 	if (index >= ioread16(&vp_dev->common->num_queues))
 		return ERR_PTR(-ENOENT);
@@ -492,6 +499,17 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	info->msix_vector = msix_vec;
 
+	/* get offset of notification byte for this virtqueue */
+	off = ioread16(&vp_dev->common->queue_notify);
+	if (off > vp_dev->notify_len) {
+		dev_warn(&vp_dev->pci_dev->dev,
+			 "bad notification offset %u for queue %u (> %u)",
+			 off, index, vp_dev->notify_len);
+		err = -EINVAL;
+		goto out_info;
+	}
+	info->notify = vp_dev->notify_base + off;
+
 	info->queue = alloc_virtqueue_pages(&num);
 	if (info->queue == NULL) {
 		err = -ENOMEM;
@@ -787,7 +805,8 @@ static void virtio_pci_release_dev(struct device *_d)
 	 */
 }
 
-static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
+static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
+				    size_t *len)
 {
 	u8 bar;
 	u32 offset, length;
@@ -800,13 +819,16 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
 	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
 			     &length);
 
-	if (length < expect) {
+	if (length < minlen) {
 		dev_err(&dev->dev,
-			"virtio_pci: small capability len %u (%u expected)\n",
-			length, expect);
+			"virtio_pci: small capability len %u (%zu expected)\n",
+			length, minlen);
 		return NULL;
 	}
 
+	if (len)
+		*len = length;
+
 	/* We want uncachable mapping, even if bar is cachable. */
 	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
 	if (!p)
@@ -883,16 +905,19 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 
 	err = -EINVAL;
 	vp_dev->common = map_capability(pci_dev, common,
-					sizeof(struct virtio_pci_common_cfg));
+					sizeof(struct virtio_pci_common_cfg),
+					NULL);
 	if (!vp_dev->common)
 		goto out_req_regions;
-	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8));
+	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL);
 	if (!vp_dev->isr)
 		goto out_map_common;
-	vp_dev->notify = map_capability(pci_dev, notify, sizeof(u16));
-	if (!vp_dev->notify)
+	vp_dev->notify_base = map_capability(pci_dev, notify, sizeof(u8),
+					     &vp_dev->notify_len);
+	if (!vp_dev->notify_len)
 		goto out_map_isr;
-	vp_dev->device = map_capability(pci_dev, device, 0);
+	vp_dev->device = map_capability(pci_dev, device, 0,
+					&vp_dev->device_len);
 	if (!vp_dev->device)
 		goto out_map_notify;
 
@@ -917,7 +942,7 @@ out_set_drvdata:
 	pci_set_drvdata(pci_dev, NULL);
 	pci_iounmap(pci_dev, vp_dev->device);
 out_map_notify:
-	pci_iounmap(pci_dev, vp_dev->notify);
+	pci_iounmap(pci_dev, vp_dev->notify_base);
 out_map_isr:
 	pci_iounmap(pci_dev, vp_dev->isr);
 out_map_common:
@@ -940,7 +965,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 	vp_del_vqs(&vp_dev->vdev);
 	pci_set_drvdata(pci_dev, NULL);
 	pci_iounmap(pci_dev, vp_dev->device);
-	pci_iounmap(pci_dev, vp_dev->notify);
+	pci_iounmap(pci_dev, vp_dev->notify_base);
 	pci_iounmap(pci_dev, vp_dev->isr);
 	pci_iounmap(pci_dev, vp_dev->common);
 	pci_release_regions(pci_dev);
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index b334cd9..23b90cb 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -144,13 +144,13 @@ struct virtio_pci_common_cfg {
 	__le16 num_queues;		/* read-only */
 	__u8 device_status;		/* read-write */
 	__u8 unused1;
-	__le16 unused2;
 
 	/* About a specific virtqueue. */
 	__le16 queue_select;	/* read-write */
 	__le16 queue_size;	/* read-write, power of 2. */
 	__le16 queue_msix_vector;/* read-write */
 	__le16 queue_enable;	/* read-write */
+	__le16 queue_notify;	/* read-only */
 	__le64 queue_desc;	/* read-write */
 	__le64 queue_avail;	/* read-write */
 	__le64 queue_used;	/* read-write */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 17/22] virtio_pci_legacy: cleanup struct virtio_pci_vq_info
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (15 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 16/22] virtio_pci: use separate notification offsets for each vq Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 18/22] virtio_pci: share structure between legacy and modern Rusty Russell
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

By removing the redundant num field and including a pointer to where we
should notify, it exactly matches the virtio_pci one now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci_legacy.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index c7aadcb..429f593 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -81,15 +81,15 @@ struct virtio_pci_vq_info
 	/* the actual virtqueue */
 	struct virtqueue *vq;
 
-	/* the number of entries in the queue */
-	int num;
-
 	/* the virtual address of the ring queue */
 	void *queue;
 
 	/* the list node for the virtqueues list */
 	struct list_head node;
 
+	/* Notify area for this vq. */
+	u16 __iomem *notify;
+
 	/* MSI-X vector (or none) */
 	unsigned msix_vector;
 };
@@ -225,11 +225,11 @@ static void vp_reset(struct virtio_device *vdev)
 /* the notify function used when creating a virt queue */
 static void vp_notify(struct virtqueue *vq)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
 
-	/* we write the queue's selector into the notification register to
-	 * signal the other end */
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NOTIFY);
+	/* we write the queue selector into the notification register
+	 * to signal the other end */
+	iowrite16(vq->index, info->notify);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -440,7 +440,6 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	if (!info)
 		return ERR_PTR(-ENOMEM);
 
-	info->num = num;
 	info->msix_vector = msix_vec;
 
 	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_LEGACY_VRING_ALIGN));
@@ -455,7 +454,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 		  vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
 	/* create the vring */
-	vq = vring_new_virtqueue(index, info->num,
+	vq = vring_new_virtqueue(index, num,
 				 VIRTIO_PCI_LEGACY_VRING_ALIGN, vdev,
 				 true, info->queue, vp_notify, callback, name);
 	if (!vq) {
@@ -465,6 +464,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	vq->priv = info;
 	info->vq = vq;
+	info->notify = vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NOTIFY;
 
 	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
 		iowrite16(msix_vec,
@@ -521,7 +521,8 @@ static void vp_del_vq(struct virtqueue *vq)
 	/* Select and deactivate the queue */
 	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
-	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_LEGACY_VRING_ALIGN));
+	size = PAGE_ALIGN(vring_size(vq->vring.num,
+				     VIRTIO_PCI_LEGACY_VRING_ALIGN));
 	free_pages_exact(info->queue, size);
 	kfree(info);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 18/22] virtio_pci: share structure between legacy and modern.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (16 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 17/22] virtio_pci_legacy: cleanup struct virtio_pci_vq_info Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 19/22] virtio_pci: share interrupt/notify handlers " Rusty Russell
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

They're almost identical: we add a "legacy" ioregion (what was
"ioaddr" in the legacy driver), and move it out to virtio_pci-common.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci-common.h |   76 +++++++++++++++++++++
 drivers/virtio/virtio_pci.c        |   68 +------------------
 drivers/virtio/virtio_pci_legacy.c |  128 +++++++++++-------------------------
 3 files changed, 116 insertions(+), 156 deletions(-)
 create mode 100644 drivers/virtio/virtio_pci-common.h

diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h
new file mode 100644
index 0000000..8ff8c92
--- /dev/null
+++ b/drivers/virtio/virtio_pci-common.h
@@ -0,0 +1,76 @@
+#include <linux/pci.h>
+#include <linux/virtio_pci.h>
+
+/* Our device structure: shared by virtio_pci and virtio_pci_legacy. */
+struct virtio_pci_device {
+	struct virtio_device vdev;
+	struct pci_dev *pci_dev;
+
+	/* The IO mapping for the PCI config space (non-legacy mode) */
+	struct virtio_pci_common_cfg __iomem *common;
+	/* Device-specific data (non-legacy mode)  */
+	void __iomem *device;
+	/* Base of vq notifications (non-legacy mode). */
+	void __iomem *notify_base;
+
+	/* In legacy mode, these two point to within ->legacy. */
+	/* Where to read and clear interrupt */
+	u8 __iomem *isr;
+
+	/* So we can sanity-check accesses. */
+	size_t notify_len;
+	size_t device_len;
+
+	/* a list of queues so we can dispatch IRQs */
+	spinlock_t lock;
+	struct list_head virtqueues;
+
+	/* MSI-X support */
+	int msix_enabled;
+	int intx_enabled;
+	struct msix_entry *msix_entries;
+	cpumask_var_t *msix_affinity_masks;
+	/* Name strings for interrupts. This size should be enough,
+	 * and I'm too lazy to allocate each name separately. */
+	char (*msix_names)[256];
+	/* Number of available vectors */
+	unsigned msix_vectors;
+	/* Vectors allocated, excluding per-vq vectors if any */
+	unsigned msix_used_vectors;
+
+	/* Status saved during hibernate/restore */
+	u8 saved_status;
+
+	/* Whether we have vector per vq */
+	bool per_vq_vectors;
+
+#ifdef CONFIG_VIRTIO_PCI_LEGACY
+	/* Instead of common and device, legacy uses this: */
+	void __iomem *legacy;
+#endif
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+	VP_MSIX_CONFIG_VECTOR = 0,
+	VP_MSIX_VQ_VECTOR = 1,
+};
+
+struct virtio_pci_vq_info {
+	/* the actual virtqueue */
+	struct virtqueue *vq;
+
+	/* the pages used for the queue. */
+	void *queue;
+
+	/* the list node for the virtqueues list */
+	struct list_head node;
+
+	/* Notify area for this vq. */
+	u16 __iomem *notify;
+
+	/* MSI-X vector (or none) */
+	unsigned msix_vector;
+};
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index d492361..340ab2e 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -22,79 +22,13 @@
 #include <linux/virtio_pci.h>
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
+#include "virtio_pci-common.h"
 
 MODULE_AUTHOR("Rusty Russell <rusty@rustcorp.com.au>");
 MODULE_DESCRIPTION("virtio-pci");
 MODULE_LICENSE("GPL");
 MODULE_VERSION("2");
 
-/* Our device structure */
-struct virtio_pci_device {
-	struct virtio_device vdev;
-	struct pci_dev *pci_dev;
-
-	/* The IO mapping for the PCI config space */
-	struct virtio_pci_common_cfg __iomem *common;
-	/* Where to read and clear interrupt */
-	u8 __iomem *isr;
-	/* Write the vq index here to notify device of activity. */
-	void __iomem *notify_base;
-	/* Device-specific data. */
-	void __iomem *device;
-
-	/* So we can sanity-check accesses. */
-	size_t notify_len;
-	size_t device_len;
-
-	/* a list of queues so we can dispatch IRQs */
-	spinlock_t lock;
-	struct list_head virtqueues;
-
-	/* MSI-X support */
-	int msix_enabled;
-	int intx_enabled;
-	struct msix_entry *msix_entries;
-	cpumask_var_t *msix_affinity_masks;
-	/* Name strings for interrupts. This size should be enough,
-	 * and I'm too lazy to allocate each name separately. */
-	char (*msix_names)[256];
-	/* Number of available vectors */
-	unsigned msix_vectors;
-	/* Vectors allocated, excluding per-vq vectors if any */
-	unsigned msix_used_vectors;
-
-	/* Status saved during hibernate/restore */
-	u8 saved_status;
-
-	/* Whether we have vector per vq */
-	bool per_vq_vectors;
-};
-
-/* Constants for MSI-X */
-/* Use first vector for configuration changes, second and the rest for
- * virtqueues Thus, we need at least 2 vectors for MSI. */
-enum {
-	VP_MSIX_CONFIG_VECTOR = 0,
-	VP_MSIX_VQ_VECTOR = 1,
-};
-
-struct virtio_pci_vq_info {
-	/* the actual virtqueue */
-	struct virtqueue *vq;
-
-	/* the pages used for the queue. */
-	void *queue;
-
-	/* the list node for the virtqueues list */
-	struct list_head node;
-
-	/* Notify area for this vq. */
-	u16 __iomem *notify;
-
-	/* MSI-X vector (or none) */
-	unsigned msix_vector;
-};
-
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
 static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
 	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 429f593..4c51965 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -25,6 +25,7 @@
 #include <linux/virtio_pci.h>
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
+#include "virtio_pci-common.h"
 
 static bool force_nonlegacy;
 module_param(force_nonlegacy, bool, 0644);
@@ -35,65 +36,6 @@ MODULE_DESCRIPTION("virtio-pci-legacy");
 MODULE_LICENSE("GPL");
 MODULE_VERSION("1");
 
-/* Our device structure */
-struct virtio_pci_device
-{
-	struct virtio_device vdev;
-	struct pci_dev *pci_dev;
-
-	/* the IO mapping for the PCI config space */
-	void __iomem *ioaddr;
-
-	/* a list of queues so we can dispatch IRQs */
-	spinlock_t lock;
-	struct list_head virtqueues;
-
-	/* MSI-X support */
-	int msix_enabled;
-	int intx_enabled;
-	struct msix_entry *msix_entries;
-	cpumask_var_t *msix_affinity_masks;
-	/* Name strings for interrupts. This size should be enough,
-	 * and I'm too lazy to allocate each name separately. */
-	char (*msix_names)[256];
-	/* Number of available vectors */
-	unsigned msix_vectors;
-	/* Vectors allocated, excluding per-vq vectors if any */
-	unsigned msix_used_vectors;
-
-	/* Status saved during hibernate/restore */
-	u8 saved_status;
-
-	/* Whether we have vector per vq */
-	bool per_vq_vectors;
-};
-
-/* Constants for MSI-X */
-/* Use first vector for configuration changes, second and the rest for
- * virtqueues Thus, we need at least 2 vectors for MSI. */
-enum {
-	VP_MSIX_CONFIG_VECTOR = 0,
-	VP_MSIX_VQ_VECTOR = 1,
-};
-
-struct virtio_pci_vq_info
-{
-	/* the actual virtqueue */
-	struct virtqueue *vq;
-
-	/* the virtual address of the ring queue */
-	void *queue;
-
-	/* the list node for the virtqueues list */
-	struct list_head node;
-
-	/* Notify area for this vq. */
-	u16 __iomem *notify;
-
-	/* MSI-X vector (or none) */
-	unsigned msix_vector;
-};
-
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
 static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
 	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
@@ -114,7 +56,7 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
 	/* We only support 32 feature bits. */
-	return ioread32(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_HOST_FEATURES);
+	return ioread32(vp_dev->legacy + VIRTIO_PCI_LEGACY_HOST_FEATURES);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -127,7 +69,7 @@ static void vp_finalize_features(struct virtio_device *vdev)
 
 	/* We only support 32 feature bits. */
 	iowrite32(vdev->features,
-		  vp_dev->ioaddr + VIRTIO_PCI_LEGACY_GUEST_FEATURES);
+		  vp_dev->legacy + VIRTIO_PCI_LEGACY_GUEST_FEATURES);
 }
 
 /* Device config access: we use guest endian, as per spec. */
@@ -135,13 +77,13 @@ static void vp_get(struct virtio_device *vdev, unsigned offset,
 		   void *buf, unsigned len)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
+	void __iomem *legacy = vp_dev->legacy +
 				VIRTIO_PCI_LEGACY_CONFIG(vp_dev) + offset;
 	u8 *ptr = buf;
 	int i;
 
 	for (i = 0; i < len; i++)
-		ptr[i] = ioread8(ioaddr + i);
+		ptr[i] = ioread8(legacy + i);
 }
 
 #define VP_GETx(bits)							\
@@ -161,7 +103,7 @@ static void vp_set(struct virtio_device *vdev, unsigned offset,
 		   const void *buf, unsigned len)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
+	void __iomem *ioaddr = vp_dev->legacy +
 				VIRTIO_PCI_LEGACY_CONFIG(vp_dev) + offset;
 	const u8 *ptr = buf;
 	int i;
@@ -186,7 +128,7 @@ VP_SETx(64)
 static u8 vp_get_status(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	return ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
+	return ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 }
 
 static void vp_set_status(struct virtio_device *vdev, u8 status)
@@ -194,7 +136,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
-	iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
+	iowrite8(status, vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 }
 
 /* wait for pending irq handlers */
@@ -214,10 +156,10 @@ static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	/* 0 status means a reset. */
-	iowrite8(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
+	iowrite8(0, vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 	/* Flush out the status write, and flush in device writes,
 	 * including MSi-X interrupts, if any. */
-	ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_STATUS);
+	ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 }
@@ -276,7 +218,7 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
-	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_ISR);
+	isr = ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_ISR);
 
 	/* It's definitely not us if the ISR was not high */
 	if (!isr)
@@ -309,9 +251,9 @@ static void vp_free_vectors(struct virtio_device *vdev)
 	if (vp_dev->msix_enabled) {
 		/* Disable the vector used for configuration */
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
+			  vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 		/* Flush the write out to device */
-		ioread16(vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
+		ioread16(vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 
 		pci_disable_msix(vp_dev->pci_dev);
 		vp_dev->msix_enabled = 0;
@@ -376,9 +318,9 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		goto error;
 	++vp_dev->msix_used_vectors;
 
-	iowrite16(v, vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
+	iowrite16(v, vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 	/* Verify we had enough resources to assign the vector */
-	v = ioread16(vp_dev->ioaddr + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
+	v = ioread16(vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
 	if (v == VIRTIO_MSI_NO_VECTOR) {
 		err = -EBUSY;
 		goto error;
@@ -427,11 +369,11 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	int err;
 
 	/* Select the queue we're interested in */
-	iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_SEL);
+	iowrite16(index, vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_SEL);
 
 	/* Check if queue is either not available or already active. */
-	num = ioread16(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NUM);
-	if (!num || ioread32(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN))
+	num = ioread16(vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_NUM);
+	if (!num || ioread32(vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_PFN))
 		return ERR_PTR(-ENOENT);
 
 	/* allocate and fill out our structure the represents an active
@@ -451,7 +393,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	/* activate the queue */
 	iowrite32(virt_to_phys(info->queue)>>VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT,
-		  vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
+		  vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
 	/* create the vring */
 	vq = vring_new_virtqueue(index, num,
@@ -464,12 +406,12 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	vq->priv = info;
 	info->vq = vq;
-	info->notify = vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_NOTIFY;
+	info->notify = vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_NOTIFY;
 
 	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
 		iowrite16(msix_vec,
-			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
-		msix_vec = ioread16(vp_dev->ioaddr
+			  vp_dev->legacy + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
+		msix_vec = ioread16(vp_dev->legacy
 				    + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
 		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
 			err = -EBUSY;
@@ -490,7 +432,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 out_assign:
 	vring_del_virtqueue(vq);
 out_activate_queue:
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
+	iowrite32(0, vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 	free_pages_exact(info->queue, size);
 out_info:
 	kfree(info);
@@ -507,19 +449,19 @@ static void vp_del_vq(struct virtqueue *vq)
 	list_del(&info->node);
 	spin_unlock_irqrestore(&vp_dev->lock, flags);
 
-	iowrite16(vq->index, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_SEL);
+	iowrite16(vq->index, vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_SEL);
 
 	if (vp_dev->msix_enabled) {
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->ioaddr + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
+			  vp_dev->legacy + VIRTIO_MSI_LEGACY_QUEUE_VECTOR);
 		/* Flush the write out to device */
-		ioread8(vp_dev->ioaddr + VIRTIO_PCI_LEGACY_ISR);
+		ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_ISR);
 	}
 
 	vring_del_virtqueue(vq);
 
 	/* Select and deactivate the queue */
-	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_LEGACY_QUEUE_PFN);
+	iowrite32(0, vp_dev->legacy + VIRTIO_PCI_LEGACY_QUEUE_PFN);
 
 	size = PAGE_ALIGN(vring_size(vq->vring.num,
 				     VIRTIO_PCI_LEGACY_VRING_ALIGN));
@@ -767,12 +709,20 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	if (err)
 		goto out_enable_device;
 
-	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
-	if (vp_dev->ioaddr == NULL) {
+	vp_dev->legacy = pci_iomap(pci_dev, 0, 0);
+	if (vp_dev->legacy == NULL) {
 		err = -ENOMEM;
 		goto out_req_regions;
 	}
 
+	/* Not used for legacy virtio PCI */
+	vp_dev->common = NULL;
+	vp_dev->device = NULL;
+	vp_dev->notify_base = NULL;
+	vp_dev->notify_len = sizeof(u16);
+	/* Device config len actually depends on MSI-X: may overestimate */
+	vp_dev->device_len = pci_resource_len(pci_dev, 0) - 20;
+
 	pci_set_drvdata(pci_dev, vp_dev);
 	pci_set_master(pci_dev);
 
@@ -792,7 +742,7 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 
 out_set_drvdata:
 	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	pci_iounmap(pci_dev, vp_dev->legacy);
 out_req_regions:
 	pci_release_regions(pci_dev);
 out_enable_device:
@@ -810,7 +760,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 
 	vp_del_vqs(&vp_dev->vdev);
 	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	pci_iounmap(pci_dev, vp_dev->legacy);
 	pci_release_regions(pci_dev);
 	pci_disable_device(pci_dev);
 	kfree(vp_dev);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 19/22] virtio_pci: share interrupt/notify handlers between legacy and modern.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (17 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 18/22] virtio_pci: share structure between legacy and modern Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 20/22] virtio_pci: share virtqueue setup/teardown between modern and legacy driver Rusty Russell
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

If we make the legacy driver set up the ->isr pointer in
the struct virtio_pci_device structure, we can use that in common code.
(the positions have changed, but the semantics haven't).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/Makefile            |    4 +-
 drivers/virtio/virtio_pci-common.c |   80 +++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_pci-common.h |   34 ++++++++++++++
 drivers/virtio/virtio_pci.c        |   84 +++-------------------------------
 drivers/virtio/virtio_pci_legacy.c |   87 ++++--------------------------------
 5 files changed, 131 insertions(+), 158 deletions(-)
 create mode 100644 drivers/virtio/virtio_pci-common.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index eec0a42..0f23411 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
-obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
-obj-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
+obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o virtio_pci-common.o
+obj-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o virtio_pci-common.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
diff --git a/drivers/virtio/virtio_pci-common.c b/drivers/virtio/virtio_pci-common.c
new file mode 100644
index 0000000..f6588c2
--- /dev/null
+++ b/drivers/virtio/virtio_pci-common.c
@@ -0,0 +1,80 @@
+/*
+ * Virtio PCI driver - common code for legacy and non-legacy.
+ *
+ * Copyright 2011, Rusty Russell IBM Corporation, but based on the
+ * older virtio_pci_legacy.c, which was Copyright IBM Corp. 2007.  But
+ * most of the interrupt setup code was written by Michael S. Tsirkin.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#define VIRTIO_PCI_NO_LEGACY
+#include "virtio_pci-common.h"
+#include <linux/virtio_ring.h>
+
+/* the notify function used when creating a virt queue */
+void virtio_pci_notify(struct virtqueue *vq)
+{
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	/* we write the queue's selector into the notification register to
+	 * signal the other end */
+	iowrite16(vq->index, info->notify);
+}
+
+/* Handle a configuration change: Tell driver if it wants to know. */
+irqreturn_t virtio_pci_config_changed(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_driver *drv;
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	if (drv->config_changed)
+		drv->config_changed(&vp_dev->vdev);
+	return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+irqreturn_t virtio_pci_vring_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	struct virtio_pci_vq_info *info;
+	irqreturn_t ret = IRQ_NONE;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_for_each_entry(info, &vp_dev->virtqueues, node) {
+		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+			ret = IRQ_HANDLED;
+	}
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	return ret;
+}
+
+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+irqreturn_t virtio_pci_interrupt(int irq, void *opaque)
+{
+	struct virtio_pci_device *vp_dev = opaque;
+	u8 isr;
+
+	/* reading the ISR has the effect of also clearing it so it's very
+	 * important to save off the value. */
+	isr = ioread8(vp_dev->isr);
+
+	/* It's definitely not us if the ISR was not high */
+	if (!isr)
+		return IRQ_NONE;
+
+	/* Configuration change?  Tell driver if it wants to know. */
+	if (isr & VIRTIO_PCI_ISR_CONFIG)
+		virtio_pci_config_changed(irq, opaque);
+
+	return virtio_pci_vring_interrupt(irq, opaque);
+}
diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h
index 8ff8c92..7dbc244 100644
--- a/drivers/virtio/virtio_pci-common.h
+++ b/drivers/virtio/virtio_pci-common.h
@@ -50,6 +50,12 @@ struct virtio_pci_device {
 #endif
 };
 
+/* Convert a generic virtio device to our structure */
+static inline struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
+{
+	return container_of(vdev, struct virtio_pci_device, vdev);
+}
+
 /* Constants for MSI-X */
 /* Use first vector for configuration changes, second and the rest for
  * virtqueues Thus, we need at least 2 vectors for MSI. */
@@ -74,3 +80,31 @@ struct virtio_pci_vq_info {
 	/* MSI-X vector (or none) */
 	unsigned msix_vector;
 };
+
+/* the notify function used when creating a virt queue */
+void virtio_pci_notify(struct virtqueue *vq);
+/* Handle a configuration change: Tell driver if it wants to know. */
+irqreturn_t virtio_pci_config_changed(int irq, void *opaque);
+/* Notify all virtqueues on an interrupt. */
+irqreturn_t virtio_pci_vring_interrupt(int irq, void *opaque);
+/* Acknowledge, check for config or vq interrupt. */
+irqreturn_t virtio_pci_interrupt(int irq, void *opaque);
+
+/* Core of a config->find_vqs() implementation */
+int virtio_pci_find_vqs(struct virtio_pci_device *vp_dev,
+			__le16 __iomem *msix_config,
+			struct virtqueue *(setup_vq)(struct virtio_pci_device *,
+						     unsigned,
+						     void (*)(struct virtqueue*),
+						     const char *,
+						     u16 msix_vec),
+			void (*del_vq)(struct virtqueue *vq),
+			unsigned nvqs,
+			struct virtqueue *vqs[],
+			vq_callback_t *callbacks[],
+			const char *names[]);
+
+/* the core of a config->del_vqs() implementation */
+void virtio_pci_del_vqs(struct virtio_pci_device *vp_dev,
+			__le16 __iomem *msix_config,
+			void (*del_vq)(struct virtqueue *vq));
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 340ab2e..f720421 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -37,12 +37,6 @@ static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
 
 MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
 
-/* Convert a generic virtio device to our structure */
-static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
-{
-	return container_of(vdev, struct virtio_pci_device, vdev);
-}
-
 /* There is no iowrite64.  We use two 32-bit ops. */
 static void iowrite64(u64 val, const __le64 *addr)
 {
@@ -178,73 +172,6 @@ static void vp_reset(struct virtio_device *vdev)
 	vp_synchronize_vectors(vdev);
 }
 
-/* the notify function used when creating a virt queue */
-static void vp_notify(struct virtqueue *vq)
-{
-	struct virtio_pci_vq_info *info = vq->priv;
-
-	/* we write the queue selector into the notification register
-	 * to signal the other end */
-	iowrite16(vq->index, info->notify);
-}
-
-/* Handle a configuration change: Tell driver if it wants to know. */
-static irqreturn_t vp_config_changed(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_driver *drv;
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	if (drv->config_changed)
-		drv->config_changed(&vp_dev->vdev);
-	return IRQ_HANDLED;
-}
-
-/* Notify all virtqueues on an interrupt. */
-static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_pci_vq_info *info;
-	irqreturn_t ret = IRQ_NONE;
-	unsigned long flags;
-
-	spin_lock_irqsave(&vp_dev->lock, flags);
-	list_for_each_entry(info, &vp_dev->virtqueues, node) {
-		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-			ret = IRQ_HANDLED;
-	}
-	spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-	return ret;
-}
-
-/* A small wrapper to also acknowledge the interrupt when it's handled.
- * I really need an EIO hook for the vring so I can ack the interrupt once we
- * know that we'll be handling the IRQ but before we invoke the callback since
- * the callback may notify the host which results in the host attempting to
- * raise an interrupt that we would then mask once we acknowledged the
- * interrupt. */
-static irqreturn_t vp_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	u8 isr;
-
-	/* reading the ISR has the effect of also clearing it so it's very
-	 * important to save off the value. */
-	isr = ioread8(vp_dev->isr);
-
-	/* It's definitely not us if the ISR was not high */
-	if (!isr)
-		return IRQ_NONE;
-
-	/* Configuration change?  Tell driver if it wants to know. */
-	if (isr & VIRTIO_PCI_ISR_CONFIG)
-		vp_config_changed(irq, opaque);
-
-	return vp_vring_interrupt(irq, opaque);
-}
-
 static void vp_free_vectors(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -325,7 +252,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 		 "%s-config", name);
 	err = request_irq(vp_dev->msix_entries[v].vector,
-			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  virtio_pci_config_changed, 0, vp_dev->msix_names[v],
 			  vp_dev);
 	if (err)
 		goto error;
@@ -345,8 +272,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 			 "%s-virtqueues", name);
 		err = request_irq(vp_dev->msix_entries[v].vector,
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
-				  vp_dev);
+				  virtio_pci_vring_interrupt, 0,
+				  vp_dev->msix_names[v], vp_dev);
 		if (err)
 			goto error;
 		++vp_dev->msix_used_vectors;
@@ -362,7 +289,7 @@ static int vp_request_intx(struct virtio_device *vdev)
 	int err;
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
+	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
 			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
 	if (!err)
 		vp_dev->intx_enabled = 1;
@@ -452,7 +379,8 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	/* create the vring */
 	vq = vring_new_virtqueue(index, num, SMP_CACHE_BYTES, vdev,
-				 true, info->queue, vp_notify, callback, name);
+				 true, info->queue, virtio_pci_notify,
+				 callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto out_alloc_pages;
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 4c51965..0c604c7 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -44,12 +44,6 @@ static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
 
 MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
 
-/* Convert a generic virtio device to our structure */
-static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
-{
-	return container_of(vdev, struct virtio_pci_device, vdev);
-}
-
 /* virtio config->get_features() implementation */
 static u64 vp_get_features(struct virtio_device *vdev)
 {
@@ -164,73 +158,6 @@ static void vp_reset(struct virtio_device *vdev)
 	vp_synchronize_vectors(vdev);
 }
 
-/* the notify function used when creating a virt queue */
-static void vp_notify(struct virtqueue *vq)
-{
-	struct virtio_pci_vq_info *info = vq->priv;
-
-	/* we write the queue selector into the notification register
-	 * to signal the other end */
-	iowrite16(vq->index, info->notify);
-}
-
-/* Handle a configuration change: Tell driver if it wants to know. */
-static irqreturn_t vp_config_changed(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_driver *drv;
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	if (drv && drv->config_changed)
-		drv->config_changed(&vp_dev->vdev);
-	return IRQ_HANDLED;
-}
-
-/* Notify all virtqueues on an interrupt. */
-static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	struct virtio_pci_vq_info *info;
-	irqreturn_t ret = IRQ_NONE;
-	unsigned long flags;
-
-	spin_lock_irqsave(&vp_dev->lock, flags);
-	list_for_each_entry(info, &vp_dev->virtqueues, node) {
-		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-			ret = IRQ_HANDLED;
-	}
-	spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-	return ret;
-}
-
-/* A small wrapper to also acknowledge the interrupt when it's handled.
- * I really need an EIO hook for the vring so I can ack the interrupt once we
- * know that we'll be handling the IRQ but before we invoke the callback since
- * the callback may notify the host which results in the host attempting to
- * raise an interrupt that we would then mask once we acknowledged the
- * interrupt. */
-static irqreturn_t vp_interrupt(int irq, void *opaque)
-{
-	struct virtio_pci_device *vp_dev = opaque;
-	u8 isr;
-
-	/* reading the ISR has the effect of also clearing it so it's very
-	 * important to save off the value. */
-	isr = ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_ISR);
-
-	/* It's definitely not us if the ISR was not high */
-	if (!isr)
-		return IRQ_NONE;
-
-	/* Configuration change?  Tell driver if it wants to know. */
-	if (isr & VIRTIO_PCI_ISR_CONFIG)
-		vp_config_changed(irq, opaque);
-
-	return vp_vring_interrupt(irq, opaque);
-}
-
 static void vp_free_vectors(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -312,7 +239,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 		 "%s-config", name);
 	err = request_irq(vp_dev->msix_entries[v].vector,
-			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  virtio_pci_config_changed, 0, vp_dev->msix_names[v],
 			  vp_dev);
 	if (err)
 		goto error;
@@ -332,8 +259,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 			 "%s-virtqueues", name);
 		err = request_irq(vp_dev->msix_entries[v].vector,
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
-				  vp_dev);
+				  virtio_pci_vring_interrupt, 0,
+				  vp_dev->msix_names[v], vp_dev);
 		if (err)
 			goto error;
 		++vp_dev->msix_used_vectors;
@@ -349,7 +276,7 @@ static int vp_request_intx(struct virtio_device *vdev)
 	int err;
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
+	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
 			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
 	if (!err)
 		vp_dev->intx_enabled = 1;
@@ -398,7 +325,8 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	/* create the vring */
 	vq = vring_new_virtqueue(index, num,
 				 VIRTIO_PCI_LEGACY_VRING_ALIGN, vdev,
-				 true, info->queue, vp_notify, callback, name);
+				 true, info->queue, virtio_pci_notify,
+				 callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto out_activate_queue;
@@ -723,6 +651,9 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	/* Device config len actually depends on MSI-X: may overestimate */
 	vp_dev->device_len = pci_resource_len(pci_dev, 0) - 20;
 
+	/* Setting this lets us share interrupt handlers with virtio_pci */
+	vp_dev->isr = vp_dev->legacy + VIRTIO_PCI_LEGACY_ISR;
+
 	pci_set_drvdata(pci_dev, vp_dev);
 	pci_set_master(pci_dev);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 20/22] virtio_pci: share virtqueue setup/teardown between modern and legacy driver.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (18 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 19/22] virtio_pci: share interrupt/notify handlers " Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 21/22] virtio_pci: simplify common helpers Rusty Russell
  2013-03-21  8:29 ` [PATCH 22/22] virtio_pci: fix finalize_features in modern driver Rusty Russell
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

There's a great deal of work in setting up and disabling interrupts,
particularly with MSI-X, which is generic.  So we move most of the
work out to helpers which take the location of the msix_config
register, and setup_vq and del_vq functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci-common.c |  349 ++++++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_pci-common.h |   32 ++--
 drivers/virtio/virtio_pci.c        |  330 +---------------------------------
 drivers/virtio/virtio_pci_legacy.c |  340 ++---------------------------------
 4 files changed, 396 insertions(+), 655 deletions(-)

diff --git a/drivers/virtio/virtio_pci-common.c b/drivers/virtio/virtio_pci-common.c
index f6588c2..837d34b 100644
--- a/drivers/virtio/virtio_pci-common.c
+++ b/drivers/virtio/virtio_pci-common.c
@@ -11,6 +11,7 @@
 #define VIRTIO_PCI_NO_LEGACY
 #include "virtio_pci-common.h"
 #include <linux/virtio_ring.h>
+#include <linux/interrupt.h>
 
 /* the notify function used when creating a virt queue */
 void virtio_pci_notify(struct virtqueue *vq)
@@ -78,3 +79,351 @@ irqreturn_t virtio_pci_interrupt(int irq, void *opaque)
 
 	return virtio_pci_vring_interrupt(irq, opaque);
 }
+
+/* wait for pending irq handlers */
+void virtio_pci_synchronize_vectors(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		synchronize_irq(vp_dev->pci_dev->irq);
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		synchronize_irq(vp_dev->msix_entries[i].vector);
+}
+
+static void vp_free_vectors(struct virtio_device *vdev,
+			    __le16 __iomem *msix_config)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled) {
+		free_irq(vp_dev->pci_dev->irq, vp_dev);
+		vp_dev->intx_enabled = 0;
+	}
+
+	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
+		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+
+	for (i = 0; i < vp_dev->msix_vectors; i++)
+		if (vp_dev->msix_affinity_masks[i])
+			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
+
+	if (vp_dev->msix_enabled) {
+		/* Disable the vector used for configuration */
+		iowrite16(VIRTIO_MSI_NO_VECTOR, msix_config);
+		/* Flush the write out to device */
+		ioread16(msix_config);
+
+		pci_disable_msix(vp_dev->pci_dev);
+		vp_dev->msix_enabled = 0;
+		vp_dev->msix_vectors = 0;
+	}
+
+	vp_dev->msix_used_vectors = 0;
+	kfree(vp_dev->msix_names);
+	vp_dev->msix_names = NULL;
+	kfree(vp_dev->msix_entries);
+	vp_dev->msix_entries = NULL;
+	kfree(vp_dev->msix_affinity_masks);
+	vp_dev->msix_affinity_masks = NULL;
+}
+
+static int vp_request_msix_vectors(struct virtio_device *vdev,
+				   int nvectors,
+				   __le16 __iomem *msix_config,
+				   bool per_vq_vectors)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	const char *name = dev_name(&vp_dev->vdev.dev);
+	unsigned i, v;
+	int err = -ENOMEM;
+
+	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
+				       GFP_KERNEL);
+	if (!vp_dev->msix_entries)
+		goto error;
+	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
+				     GFP_KERNEL);
+	if (!vp_dev->msix_names)
+		goto error;
+	vp_dev->msix_affinity_masks
+		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
+			  GFP_KERNEL);
+	if (!vp_dev->msix_affinity_masks)
+		goto error;
+	for (i = 0; i < nvectors; ++i)
+		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
+					GFP_KERNEL))
+			goto error;
+
+	for (i = 0; i < nvectors; ++i)
+		vp_dev->msix_entries[i].entry = i;
+
+	/* pci_enable_msix returns positive if we can't get this many. */
+	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
+	if (err > 0)
+		err = -ENOSPC;
+	if (err)
+		goto error;
+	vp_dev->msix_vectors = nvectors;
+	vp_dev->msix_enabled = 1;
+
+	/* Set the vector used for configuration */
+	v = vp_dev->msix_used_vectors;
+	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+		 "%s-config", name);
+	err = request_irq(vp_dev->msix_entries[v].vector,
+			  virtio_pci_config_changed, 0, vp_dev->msix_names[v],
+			  vp_dev);
+	if (err)
+		goto error;
+	++vp_dev->msix_used_vectors;
+
+	iowrite16(v, msix_config);
+	/* Verify we had enough resources to assign the vector */
+	v = ioread16(msix_config);
+	if (v == VIRTIO_MSI_NO_VECTOR) {
+		err = -EBUSY;
+		goto error;
+	}
+
+	if (!per_vq_vectors) {
+		/* Shared vector for all VQs */
+		v = vp_dev->msix_used_vectors;
+		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
+			 "%s-virtqueues", name);
+		err = request_irq(vp_dev->msix_entries[v].vector,
+				  virtio_pci_vring_interrupt, 0,
+				  vp_dev->msix_names[v], vp_dev);
+		if (err)
+			goto error;
+		++vp_dev->msix_used_vectors;
+	}
+	return 0;
+error:
+	vp_free_vectors(vdev, msix_config);
+	return err;
+}
+
+static int vp_request_intx(struct virtio_device *vdev)
+{
+	int err;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
+			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
+	if (!err)
+		vp_dev->intx_enabled = 1;
+	return err;
+}
+
+static int vp_try_to_find_vqs(struct virtio_device *vdev,
+			      unsigned nvqs,
+			      struct virtqueue *vqs[],
+			      vq_callback_t *callbacks[],
+			      const char *names[],
+			      bool use_msix,
+			      bool per_vq_vectors,
+			      __le16 __iomem *msix_config,
+			      virtio_pci_setup_vq_fn *setup_vq,
+			      void (*del_vq)(struct virtqueue *vq))
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	u16 msix_vec;
+	int i, err, nvectors, allocated_vectors;
+
+	if (!use_msix) {
+		/* Old style: one normal interrupt for change and all vqs. */
+		err = vp_request_intx(vdev);
+		if (err)
+			goto error_request;
+	} else {
+		if (per_vq_vectors) {
+			/* Best option: one for change interrupt, one per vq. */
+			nvectors = 1;
+			for (i = 0; i < nvqs; ++i)
+				if (callbacks[i])
+					++nvectors;
+		} else {
+			/* Second best: one for change, shared for all vqs. */
+			nvectors = 2;
+		}
+
+		err = vp_request_msix_vectors(vdev, nvectors, 
+					      msix_config, per_vq_vectors);
+		if (err)
+			goto error_request;
+	}
+
+	vp_dev->per_vq_vectors = per_vq_vectors;
+	allocated_vectors = vp_dev->msix_used_vectors;
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		} else if (!callbacks[i] || !vp_dev->msix_enabled)
+			msix_vec = VIRTIO_MSI_NO_VECTOR;
+		else if (vp_dev->per_vq_vectors)
+			msix_vec = allocated_vectors++;
+		else
+			msix_vec = VP_MSIX_VQ_VECTOR;
+		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i],
+				  msix_vec);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto error_find;
+		}
+
+		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
+			continue;
+
+		/* allocate per-vq irq if available and necessary */
+		snprintf(vp_dev->msix_names[msix_vec],
+			 sizeof *vp_dev->msix_names,
+			 "%s-%s",
+			 dev_name(&vp_dev->vdev.dev), names[i]);
+		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
+				  vring_interrupt, 0,
+				  vp_dev->msix_names[msix_vec],
+				  vqs[i]);
+		if (err) {
+			del_vq(vqs[i]);
+			goto error_find;
+		}
+	}
+	return 0;
+
+error_find:
+	virtio_pci_del_vqs(vdev, msix_config, del_vq);
+
+error_request:
+	return err;
+}
+
+int virtio_pci_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+			struct virtqueue *vqs[],
+			vq_callback_t *callbacks[],
+			const char *names[],
+			__le16 __iomem *msix_config,
+			virtio_pci_setup_vq_fn *setup_vq,
+			void (*del_vq)(struct virtqueue *vq))
+{
+	int err;
+
+	/* Try MSI-X with one vector per queue. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				 true, true, msix_config, setup_vq, del_vq);
+	if (!err)
+		return 0;
+	/* Fallback: MSI-X with one vector for config, one shared for queues. */
+	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				 true, false, msix_config, setup_vq, del_vq);
+	if (!err)
+		return 0;
+	/* Finally fall back to regular interrupts. */
+	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				  false, false, msix_config, setup_vq, del_vq);
+}
+
+void virtio_pci_del_vqs(struct virtio_device *vdev,
+			__le16 __iomem *msix_config,
+			void (*del_vq)(struct virtqueue *vq))
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtqueue *vq, *n;
+	struct virtio_pci_vq_info *info;
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		info = vq->priv;
+		if (vp_dev->per_vq_vectors &&
+			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
+			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
+				 vq);
+		del_vq(vq);
+	}
+	vp_dev->per_vq_vectors = false;
+
+	vp_free_vectors(vdev, msix_config);
+}
+
+/* Setup the affinity for a virtqueue:
+ * - force the affinity for per vq vector
+ * - OR over all affinities for shared MSI
+ * - ignore the affinity request if we're using INTX
+ */
+int virtio_pci_set_vq_affinity(struct virtqueue *vq, int cpu)
+{
+	struct virtio_device *vdev = vq->vdev;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+	struct cpumask *mask;
+	unsigned int irq;
+
+	if (!vq->callback)
+		return -EINVAL;
+
+	if (vp_dev->msix_enabled) {
+		mask = vp_dev->msix_affinity_masks[info->msix_vector];
+		irq = vp_dev->msix_entries[info->msix_vector].vector;
+		if (cpu == -1)
+			irq_set_affinity_hint(irq, NULL);
+		else {
+			cpumask_set_cpu(cpu, mask);
+			irq_set_affinity_hint(irq, mask);
+		}
+	}
+	return 0;
+}
+
+#ifdef CONFIG_PM
+int virtio_pci_freeze(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = 0;
+	vp_dev->saved_status = vp_dev->vdev.config->get_status(&vp_dev->vdev);
+	if (drv && drv->freeze)
+		ret = drv->freeze(&vp_dev->vdev);
+
+	if (!ret)
+		pci_disable_device(pci_dev);
+	return ret;
+}
+
+int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+
+	pci_set_master(pci_dev);
+	vp_dev->vdev.config->finalize_features(&vp_dev->vdev);
+
+	if (drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	/* Finally, tell the device we're all set */
+	if (!ret)
+		vp_dev->vdev.config->set_status(&vp_dev->vdev,
+						vp_dev->saved_status);
+
+	return ret;
+}
+#endif
diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h
index 7dbc244..2c4d890 100644
--- a/drivers/virtio/virtio_pci-common.h
+++ b/drivers/virtio/virtio_pci-common.h
@@ -90,21 +90,33 @@ irqreturn_t virtio_pci_vring_interrupt(int irq, void *opaque);
 /* Acknowledge, check for config or vq interrupt. */
 irqreturn_t virtio_pci_interrupt(int irq, void *opaque);
 
+typedef struct virtqueue *virtio_pci_setup_vq_fn(struct virtio_device *,
+						 unsigned index,
+						 void (*callback)
+							(struct virtqueue *),
+						 const char *name,
+						 u16 msix_vec);
+
 /* Core of a config->find_vqs() implementation */
-int virtio_pci_find_vqs(struct virtio_pci_device *vp_dev,
-			__le16 __iomem *msix_config,
-			struct virtqueue *(setup_vq)(struct virtio_pci_device *,
-						     unsigned,
-						     void (*)(struct virtqueue*),
-						     const char *,
-						     u16 msix_vec),
-			void (*del_vq)(struct virtqueue *vq),
+int virtio_pci_find_vqs(struct virtio_device *vdev,
 			unsigned nvqs,
 			struct virtqueue *vqs[],
 			vq_callback_t *callbacks[],
-			const char *names[]);
+			const char *names[],
+			__le16 __iomem *msix_config,
+			virtio_pci_setup_vq_fn *setup_vq,
+			void (*del_vq)(struct virtqueue *vq));
 
 /* the core of a config->del_vqs() implementation */
-void virtio_pci_del_vqs(struct virtio_pci_device *vp_dev,
+void virtio_pci_del_vqs(struct virtio_device *vdev,
 			__le16 __iomem *msix_config,
 			void (*del_vq)(struct virtqueue *vq));
+
+void virtio_pci_synchronize_vectors(struct virtio_device *vdev);
+
+int virtio_pci_set_vq_affinity(struct virtqueue *vq, int cpu);
+
+#ifdef CONFIG_PM
+int virtio_pci_freeze(struct device *dev);
+int virtio_pci_restore(struct device *dev);
+#endif
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index f720421..937fae7 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -147,19 +147,6 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	iowrite8(status, &vp_dev->common->device_status);
 }
 
-/* wait for pending irq handlers */
-static void vp_synchronize_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled)
-		synchronize_irq(vp_dev->pci_dev->irq);
-
-	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(vp_dev->msix_entries[i].vector);
-}
-
 static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -169,131 +156,7 @@ static void vp_reset(struct virtio_device *vdev)
 	 * including MSi-X interrupts, if any. */
 	ioread8(&vp_dev->common->device_status);
 	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
-}
-
-static void vp_free_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled) {
-		free_irq(vp_dev->pci_dev->irq, vp_dev);
-		vp_dev->intx_enabled = 0;
-	}
-
-	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
-		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
-
-	for (i = 0; i < vp_dev->msix_vectors; i++)
-		if (vp_dev->msix_affinity_masks[i])
-			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
-
-	if (vp_dev->msix_enabled) {
-		/* Disable the vector used for configuration */
-		iowrite16(VIRTIO_MSI_NO_VECTOR, &vp_dev->common->msix_config);
-		/* Flush the write out to device */
-		ioread16(&vp_dev->common->msix_config);
-
-		pci_disable_msix(vp_dev->pci_dev);
-		vp_dev->msix_enabled = 0;
-		vp_dev->msix_vectors = 0;
-	}
-
-	vp_dev->msix_used_vectors = 0;
-	kfree(vp_dev->msix_names);
-	vp_dev->msix_names = NULL;
-	kfree(vp_dev->msix_entries);
-	vp_dev->msix_entries = NULL;
-	kfree(vp_dev->msix_affinity_masks);
-	vp_dev->msix_affinity_masks = NULL;
-}
-
-static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
-				   bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	const char *name = dev_name(&vp_dev->vdev.dev);
-	unsigned i, v;
-	int err = -ENOMEM;
-
-	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
-				       GFP_KERNEL);
-	if (!vp_dev->msix_entries)
-		goto error;
-	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
-				     GFP_KERNEL);
-	if (!vp_dev->msix_names)
-		goto error;
-	vp_dev->msix_affinity_masks
-		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
-			  GFP_KERNEL);
-	if (!vp_dev->msix_affinity_masks)
-		goto error;
-	for (i = 0; i < nvectors; ++i)
-		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
-					GFP_KERNEL))
-			goto error;
-
-	for (i = 0; i < nvectors; ++i)
-		vp_dev->msix_entries[i].entry = i;
-
-	/* pci_enable_msix returns positive if we can't get this many. */
-	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
-	if (err > 0)
-		err = -ENOSPC;
-	if (err)
-		goto error;
-	vp_dev->msix_vectors = nvectors;
-	vp_dev->msix_enabled = 1;
-
-	/* Set the vector used for configuration */
-	v = vp_dev->msix_used_vectors;
-	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-		 "%s-config", name);
-	err = request_irq(vp_dev->msix_entries[v].vector,
-			  virtio_pci_config_changed, 0, vp_dev->msix_names[v],
-			  vp_dev);
-	if (err)
-		goto error;
-	++vp_dev->msix_used_vectors;
-
-	iowrite16(v, &vp_dev->common->msix_config);
-	/* Verify we had enough resources to assign the vector */
-	v = ioread16(&vp_dev->common->msix_config);
-	if (v == VIRTIO_MSI_NO_VECTOR) {
-		err = -EBUSY;
-		goto error;
-	}
-
-	if (!per_vq_vectors) {
-		/* Shared vector for all VQs */
-		v = vp_dev->msix_used_vectors;
-		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-			 "%s-virtqueues", name);
-		err = request_irq(vp_dev->msix_entries[v].vector,
-				  virtio_pci_vring_interrupt, 0,
-				  vp_dev->msix_names[v], vp_dev);
-		if (err)
-			goto error;
-		++vp_dev->msix_used_vectors;
-	}
-	return 0;
-error:
-	vp_free_vectors(vdev);
-	return err;
-}
-
-static int vp_request_intx(struct virtio_device *vdev)
-{
-	int err;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
-			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
-	if (!err)
-		vp_dev->intx_enabled = 1;
-	return err;
+	virtio_pci_synchronize_vectors(vdev);
 }
 
 static size_t vring_pci_size(u16 num)
@@ -448,7 +311,7 @@ static void vp_vq_disable(struct virtio_pci_device *vp_dev,
 	}
 }
 
-static void vp_del_vq(struct virtqueue *vq)
+static void del_vq(struct virtqueue *vq)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
 	struct virtio_pci_vq_info *info = vq->priv;
@@ -487,97 +350,7 @@ static void vp_del_vq(struct virtqueue *vq)
 static void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtqueue *vq, *n;
-	struct virtio_pci_vq_info *info;
-
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
-		info = vq->priv;
-		if (vp_dev->per_vq_vectors &&
-			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
-			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
-				 vq);
-		vp_del_vq(vq);
-	}
-	vp_dev->per_vq_vectors = false;
-
-	vp_free_vectors(vdev);
-}
-
-static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
-			      struct virtqueue *vqs[],
-			      vq_callback_t *callbacks[],
-			      const char *names[],
-			      bool use_msix,
-			      bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	u16 msix_vec;
-	int i, err, nvectors, allocated_vectors;
-
-	if (!use_msix) {
-		/* Old style: one normal interrupt for change and all vqs. */
-		err = vp_request_intx(vdev);
-		if (err)
-			goto error_request;
-	} else {
-		if (per_vq_vectors) {
-			/* Best option: one for change interrupt, one per vq. */
-			nvectors = 1;
-			for (i = 0; i < nvqs; ++i)
-				if (callbacks[i])
-					++nvectors;
-		} else {
-			/* Second best: one for change, shared for all vqs. */
-			nvectors = 2;
-		}
-
-		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
-		if (err)
-			goto error_request;
-	}
-
-	vp_dev->per_vq_vectors = per_vq_vectors;
-	allocated_vectors = vp_dev->msix_used_vectors;
-	for (i = 0; i < nvqs; ++i) {
-		if (!names[i]) {
-			vqs[i] = NULL;
-			continue;
-		} else if (!callbacks[i] || !vp_dev->msix_enabled)
-			msix_vec = VIRTIO_MSI_NO_VECTOR;
-		else if (vp_dev->per_vq_vectors)
-			msix_vec = allocated_vectors++;
-		else
-			msix_vec = VP_MSIX_VQ_VECTOR;
-		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
-		if (IS_ERR(vqs[i])) {
-			err = PTR_ERR(vqs[i]);
-			goto error_find;
-		}
-
-		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
-			continue;
-
-		/* allocate per-vq irq if available and necessary */
-		snprintf(vp_dev->msix_names[msix_vec],
-			 sizeof *vp_dev->msix_names,
-			 "%s-%s",
-			 dev_name(&vp_dev->vdev.dev), names[i]);
-		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
-				  vring_interrupt, 0,
-				  vp_dev->msix_names[msix_vec],
-				  vqs[i]);
-		if (err) {
-			vp_del_vq(vqs[i]);
-			goto error_find;
-		}
-	}
-	return 0;
-
-error_find:
-	vp_del_vqs(vdev);
-
-error_request:
-	return err;
+	virtio_pci_del_vqs(vdev, &vp_dev->common->msix_config, del_vq);
 }
 
 /* the config->find_vqs() implementation */
@@ -586,56 +359,18 @@ static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		       vq_callback_t *callbacks[],
 		       const char *names[])
 {
-	int err;
-
-	/* Try MSI-X with one vector per queue. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
-	if (!err)
-		return 0;
-	/* Fallback: MSI-X with one vector for config, one shared for queues. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				 true, false);
-	if (!err)
-		return 0;
-	/* Finally fall back to regular interrupts. */
-	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				  false, false);
-}
-
-static const char *vp_bus_name(struct virtio_device *vdev)
-{
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	return pci_name(vp_dev->pci_dev);
+	return virtio_pci_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				   &vp_dev->common->msix_config,
+				   setup_vq, del_vq);
 }
 
-/* Setup the affinity for a virtqueue:
- * - force the affinity for per vq vector
- * - OR over all affinities for shared MSI
- * - ignore the affinity request if we're using INTX
- */
-static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
+static const char *vp_bus_name(struct virtio_device *vdev)
 {
-	struct virtio_device *vdev = vq->vdev;
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtio_pci_vq_info *info = vq->priv;
-	struct cpumask *mask;
-	unsigned int irq;
 
-	if (!vq->callback)
-		return -EINVAL;
-
-	if (vp_dev->msix_enabled) {
-		mask = vp_dev->msix_affinity_masks[info->msix_vector];
-		irq = vp_dev->msix_entries[info->msix_vector].vector;
-		if (cpu == -1)
-			irq_set_affinity_hint(irq, NULL);
-		else {
-			cpumask_set_cpu(cpu, mask);
-			irq_set_affinity_hint(irq, mask);
-		}
-	}
-	return 0;
+	return pci_name(vp_dev->pci_dev);
 }
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -655,7 +390,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
-	.set_vq_affinity = vp_set_vq_affinity,
+	.set_vq_affinity = virtio_pci_set_vq_affinity,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
@@ -836,53 +571,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 }
 
 #ifdef CONFIG_PM
-static int virtio_pci_freeze(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = 0;
-	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
-	if (drv && drv->freeze)
-		ret = drv->freeze(&vp_dev->vdev);
-
-	if (!ret)
-		pci_disable_device(pci_dev);
-	return ret;
-}
-
-static int virtio_pci_restore(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = pci_enable_device(pci_dev);
-	if (ret)
-		return ret;
-
-	pci_set_master(pci_dev);
-	vp_finalize_features(&vp_dev->vdev);
-
-	if (drv && drv->restore)
-		ret = drv->restore(&vp_dev->vdev);
-
-	/* Finally, tell the device we're all set */
-	if (!ret)
-		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
-
-	return ret;
-}
-
 static const struct dev_pm_ops virtio_pci_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
 };
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 0c604c7..5ab05c3 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -133,19 +133,6 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	iowrite8(status, vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 }
 
-/* wait for pending irq handlers */
-static void vp_synchronize_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled)
-		synchronize_irq(vp_dev->pci_dev->irq);
-
-	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(vp_dev->msix_entries[i].vector);
-}
-
 static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -155,138 +142,14 @@ static void vp_reset(struct virtio_device *vdev)
 	 * including MSi-X interrupts, if any. */
 	ioread8(vp_dev->legacy + VIRTIO_PCI_LEGACY_STATUS);
 	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
-}
-
-static void vp_free_vectors(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	int i;
-
-	if (vp_dev->intx_enabled) {
-		free_irq(vp_dev->pci_dev->irq, vp_dev);
-		vp_dev->intx_enabled = 0;
-	}
-
-	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
-		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
-
-	for (i = 0; i < vp_dev->msix_vectors; i++)
-		if (vp_dev->msix_affinity_masks[i])
-			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
-
-	if (vp_dev->msix_enabled) {
-		/* Disable the vector used for configuration */
-		iowrite16(VIRTIO_MSI_NO_VECTOR,
-			  vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
-		/* Flush the write out to device */
-		ioread16(vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
-
-		pci_disable_msix(vp_dev->pci_dev);
-		vp_dev->msix_enabled = 0;
-		vp_dev->msix_vectors = 0;
-	}
-
-	vp_dev->msix_used_vectors = 0;
-	kfree(vp_dev->msix_names);
-	vp_dev->msix_names = NULL;
-	kfree(vp_dev->msix_entries);
-	vp_dev->msix_entries = NULL;
-	kfree(vp_dev->msix_affinity_masks);
-	vp_dev->msix_affinity_masks = NULL;
+	virtio_pci_synchronize_vectors(vdev);
 }
 
-static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
-				   bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	const char *name = dev_name(&vp_dev->vdev.dev);
-	unsigned i, v;
-	int err = -ENOMEM;
-
-	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
-				       GFP_KERNEL);
-	if (!vp_dev->msix_entries)
-		goto error;
-	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
-				     GFP_KERNEL);
-	if (!vp_dev->msix_names)
-		goto error;
-	vp_dev->msix_affinity_masks
-		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
-			  GFP_KERNEL);
-	if (!vp_dev->msix_affinity_masks)
-		goto error;
-	for (i = 0; i < nvectors; ++i)
-		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
-					GFP_KERNEL))
-			goto error;
-
-	for (i = 0; i < nvectors; ++i)
-		vp_dev->msix_entries[i].entry = i;
-
-	/* pci_enable_msix returns positive if we can't get this many. */
-	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
-	if (err > 0)
-		err = -ENOSPC;
-	if (err)
-		goto error;
-	vp_dev->msix_vectors = nvectors;
-	vp_dev->msix_enabled = 1;
-
-	/* Set the vector used for configuration */
-	v = vp_dev->msix_used_vectors;
-	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-		 "%s-config", name);
-	err = request_irq(vp_dev->msix_entries[v].vector,
-			  virtio_pci_config_changed, 0, vp_dev->msix_names[v],
-			  vp_dev);
-	if (err)
-		goto error;
-	++vp_dev->msix_used_vectors;
-
-	iowrite16(v, vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
-	/* Verify we had enough resources to assign the vector */
-	v = ioread16(vp_dev->legacy + VIRTIO_MSI_LEGACY_CONFIG_VECTOR);
-	if (v == VIRTIO_MSI_NO_VECTOR) {
-		err = -EBUSY;
-		goto error;
-	}
-
-	if (!per_vq_vectors) {
-		/* Shared vector for all VQs */
-		v = vp_dev->msix_used_vectors;
-		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
-			 "%s-virtqueues", name);
-		err = request_irq(vp_dev->msix_entries[v].vector,
-				  virtio_pci_vring_interrupt, 0,
-				  vp_dev->msix_names[v], vp_dev);
-		if (err)
-			goto error;
-		++vp_dev->msix_used_vectors;
-	}
-	return 0;
-error:
-	vp_free_vectors(vdev);
-	return err;
-}
-
-static int vp_request_intx(struct virtio_device *vdev)
-{
-	int err;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-
-	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
-			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
-	if (!err)
-		vp_dev->intx_enabled = 1;
-	return err;
-}
-
-static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
-				  void (*callback)(struct virtqueue *vq),
-				  const char *name,
-				  u16 msix_vec)
+static struct virtqueue *setup_legacy_vq(struct virtio_device *vdev,
+					 unsigned index,
+					 void (*callback)(struct virtqueue *vq),
+					 const char *name,
+					 u16 msix_vec)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_vq_info *info;
@@ -367,7 +230,7 @@ out_info:
 	return ERR_PTR(err);
 }
 
-static void vp_del_vq(struct virtqueue *vq)
+static void del_legacy_vq(struct virtqueue *vq)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
 	struct virtio_pci_vq_info *info = vq->priv;
@@ -401,97 +264,10 @@ static void vp_del_vq(struct virtqueue *vq)
 static void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtqueue *vq, *n;
-	struct virtio_pci_vq_info *info;
-
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
-		info = vq->priv;
-		if (vp_dev->per_vq_vectors &&
-			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
-			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
-				 vq);
-		vp_del_vq(vq);
-	}
-	vp_dev->per_vq_vectors = false;
-
-	vp_free_vectors(vdev);
-}
-
-static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
-			      struct virtqueue *vqs[],
-			      vq_callback_t *callbacks[],
-			      const char *names[],
-			      bool use_msix,
-			      bool per_vq_vectors)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	u16 msix_vec;
-	int i, err, nvectors, allocated_vectors;
-
-	if (!use_msix) {
-		/* Old style: one normal interrupt for change and all vqs. */
-		err = vp_request_intx(vdev);
-		if (err)
-			goto error_request;
-	} else {
-		if (per_vq_vectors) {
-			/* Best option: one for change interrupt, one per vq. */
-			nvectors = 1;
-			for (i = 0; i < nvqs; ++i)
-				if (callbacks[i])
-					++nvectors;
-		} else {
-			/* Second best: one for change, shared for all vqs. */
-			nvectors = 2;
-		}
-
-		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
-		if (err)
-			goto error_request;
-	}
-
-	vp_dev->per_vq_vectors = per_vq_vectors;
-	allocated_vectors = vp_dev->msix_used_vectors;
-	for (i = 0; i < nvqs; ++i) {
-		if (!names[i]) {
-			vqs[i] = NULL;
-			continue;
-		} else if (!callbacks[i] || !vp_dev->msix_enabled)
-			msix_vec = VIRTIO_MSI_NO_VECTOR;
-		else if (vp_dev->per_vq_vectors)
-			msix_vec = allocated_vectors++;
-		else
-			msix_vec = VP_MSIX_VQ_VECTOR;
-		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
-		if (IS_ERR(vqs[i])) {
-			err = PTR_ERR(vqs[i]);
-			goto error_find;
-		}
-
-		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
-			continue;
-
-		/* allocate per-vq irq if available and necessary */
-		snprintf(vp_dev->msix_names[msix_vec],
-			 sizeof *vp_dev->msix_names,
-			 "%s-%s",
-			 dev_name(&vp_dev->vdev.dev), names[i]);
-		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
-				  vring_interrupt, 0,
-				  vp_dev->msix_names[msix_vec],
-				  vqs[i]);
-		if (err) {
-			vp_del_vq(vqs[i]);
-			goto error_find;
-		}
-	}
-	return 0;
 
-error_find:
-	vp_del_vqs(vdev);
-
-error_request:
-	return err;
+	virtio_pci_del_vqs(vdev, vp_dev->legacy +
+			   VIRTIO_MSI_LEGACY_CONFIG_VECTOR,
+			   del_legacy_vq);
 }
 
 /* the config->find_vqs() implementation */
@@ -500,20 +276,12 @@ static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		       vq_callback_t *callbacks[],
 		       const char *names[])
 {
-	int err;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	/* Try MSI-X with one vector per queue. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
-	if (!err)
-		return 0;
-	/* Fallback: MSI-X with one vector for config, one shared for queues. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				 true, false);
-	if (!err)
-		return 0;
-	/* Finally fall back to regular interrupts. */
-	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
-				  false, false);
+	return virtio_pci_find_vqs(vdev, nvqs, vqs, callbacks, names,
+				   vp_dev->legacy +
+				   VIRTIO_MSI_LEGACY_CONFIG_VECTOR,
+				   setup_legacy_vq, del_legacy_vq);
 }
 
 static const char *vp_bus_name(struct virtio_device *vdev)
@@ -523,35 +291,6 @@ static const char *vp_bus_name(struct virtio_device *vdev)
 	return pci_name(vp_dev->pci_dev);
 }
 
-/* Setup the affinity for a virtqueue:
- * - force the affinity for per vq vector
- * - OR over all affinities for shared MSI
- * - ignore the affinity request if we're using INTX
- */
-static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
-{
-	struct virtio_device *vdev = vq->vdev;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtio_pci_vq_info *info = vq->priv;
-	struct cpumask *mask;
-	unsigned int irq;
-
-	if (!vq->callback)
-		return -EINVAL;
-
-	if (vp_dev->msix_enabled) {
-		mask = vp_dev->msix_affinity_masks[info->msix_vector];
-		irq = vp_dev->msix_entries[info->msix_vector].vector;
-		if (cpu == -1)
-			irq_set_affinity_hint(irq, NULL);
-		else {
-			cpumask_set_cpu(cpu, mask);
-			irq_set_affinity_hint(irq, mask);
-		}
-	}
-	return 0;
-}
-
 static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get8		= vp_get8,
 	.set8		= vp_set8,
@@ -569,7 +308,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
-	.set_vq_affinity = vp_set_vq_affinity,
+	.set_vq_affinity = virtio_pci_set_vq_affinity,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
@@ -698,53 +437,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 }
 
 #ifdef CONFIG_PM
-static int virtio_pci_freeze(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = 0;
-	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
-	if (drv && drv->freeze)
-		ret = drv->freeze(&vp_dev->vdev);
-
-	if (!ret)
-		pci_disable_device(pci_dev);
-	return ret;
-}
-
-static int virtio_pci_restore(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = pci_enable_device(pci_dev);
-	if (ret)
-		return ret;
-
-	pci_set_master(pci_dev);
-	vp_finalize_features(&vp_dev->vdev);
-
-	if (drv && drv->restore)
-		ret = drv->restore(&vp_dev->vdev);
-
-	/* Finally, tell the device we're all set */
-	if (!ret)
-		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
-
-	return ret;
-}
-
 static const struct dev_pm_ops virtio_pci_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
 };
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 21/22] virtio_pci: simplify common helpers.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (19 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 20/22] virtio_pci: share virtqueue setup/teardown between modern and legacy driver Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  2013-03-21  8:29 ` [PATCH 22/22] virtio_pci: fix finalize_features in modern driver Rusty Russell
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Our helpers can take a virtio_pci_device, rather than converting from
a virtio_device all the time.  They couldn't do this when they were
called from the common virtio code, but now we wrap them anyway, it
simplifies things.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci-common.c |   40 +++++++++++++++---------------------
 drivers/virtio/virtio_pci-common.h |    6 +++---
 drivers/virtio/virtio_pci.c        |   10 ++++-----
 drivers/virtio/virtio_pci_legacy.c |    9 ++++----
 4 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_pci-common.c b/drivers/virtio/virtio_pci-common.c
index 837d34b..d4e33ad 100644
--- a/drivers/virtio/virtio_pci-common.c
+++ b/drivers/virtio/virtio_pci-common.c
@@ -93,10 +93,9 @@ void virtio_pci_synchronize_vectors(struct virtio_device *vdev)
 		synchronize_irq(vp_dev->msix_entries[i].vector);
 }
 
-static void vp_free_vectors(struct virtio_device *vdev,
+static void vp_free_vectors(struct virtio_pci_device *vp_dev,
 			    __le16 __iomem *msix_config)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
 	if (vp_dev->intx_enabled) {
@@ -131,12 +130,11 @@ static void vp_free_vectors(struct virtio_device *vdev,
 	vp_dev->msix_affinity_masks = NULL;
 }
 
-static int vp_request_msix_vectors(struct virtio_device *vdev,
+static int vp_request_msix_vectors(struct virtio_pci_device *vp_dev,
 				   int nvectors,
 				   __le16 __iomem *msix_config,
 				   bool per_vq_vectors)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	const char *name = dev_name(&vp_dev->vdev.dev);
 	unsigned i, v;
 	int err = -ENOMEM;
@@ -204,23 +202,22 @@ static int vp_request_msix_vectors(struct virtio_device *vdev,
 	}
 	return 0;
 error:
-	vp_free_vectors(vdev, msix_config);
+	vp_free_vectors(vp_dev, msix_config);
 	return err;
 }
 
-static int vp_request_intx(struct virtio_device *vdev)
+static int vp_request_intx(struct virtio_pci_device *vp_dev)
 {
 	int err;
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
 	err = request_irq(vp_dev->pci_dev->irq, virtio_pci_interrupt,
-			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
+			  IRQF_SHARED, dev_name(&vp_dev->vdev.dev), vp_dev);
 	if (!err)
 		vp_dev->intx_enabled = 1;
 	return err;
 }
 
-static int vp_try_to_find_vqs(struct virtio_device *vdev,
+static int vp_try_to_find_vqs(struct virtio_pci_device *vp_dev,
 			      unsigned nvqs,
 			      struct virtqueue *vqs[],
 			      vq_callback_t *callbacks[],
@@ -231,13 +228,12 @@ static int vp_try_to_find_vqs(struct virtio_device *vdev,
 			      virtio_pci_setup_vq_fn *setup_vq,
 			      void (*del_vq)(struct virtqueue *vq))
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	u16 msix_vec;
 	int i, err, nvectors, allocated_vectors;
 
 	if (!use_msix) {
 		/* Old style: one normal interrupt for change and all vqs. */
-		err = vp_request_intx(vdev);
+		err = vp_request_intx(vp_dev);
 		if (err)
 			goto error_request;
 	} else {
@@ -252,7 +248,7 @@ static int vp_try_to_find_vqs(struct virtio_device *vdev,
 			nvectors = 2;
 		}
 
-		err = vp_request_msix_vectors(vdev, nvectors, 
+		err = vp_request_msix_vectors(vp_dev, nvectors, 
 					      msix_config, per_vq_vectors);
 		if (err)
 			goto error_request;
@@ -270,8 +266,7 @@ static int vp_try_to_find_vqs(struct virtio_device *vdev,
 			msix_vec = allocated_vectors++;
 		else
 			msix_vec = VP_MSIX_VQ_VECTOR;
-		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i],
-				  msix_vec);
+		vqs[i] = setup_vq(vp_dev, i, callbacks[i], names[i], msix_vec);
 		if (IS_ERR(vqs[i])) {
 			err = PTR_ERR(vqs[i]);
 			goto error_find;
@@ -297,13 +292,13 @@ static int vp_try_to_find_vqs(struct virtio_device *vdev,
 	return 0;
 
 error_find:
-	virtio_pci_del_vqs(vdev, msix_config, del_vq);
+	virtio_pci_del_vqs(vp_dev, msix_config, del_vq);
 
 error_request:
 	return err;
 }
 
-int virtio_pci_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+int virtio_pci_find_vqs(struct virtio_pci_device *vp_dev, unsigned nvqs,
 			struct virtqueue *vqs[],
 			vq_callback_t *callbacks[],
 			const char *names[],
@@ -314,29 +309,28 @@ int virtio_pci_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 	int err;
 
 	/* Try MSI-X with one vector per queue. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+	err = vp_try_to_find_vqs(vp_dev, nvqs, vqs, callbacks, names,
 				 true, true, msix_config, setup_vq, del_vq);
 	if (!err)
 		return 0;
 	/* Fallback: MSI-X with one vector for config, one shared for queues. */
-	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+	err = vp_try_to_find_vqs(vp_dev, nvqs, vqs, callbacks, names,
 				 true, false, msix_config, setup_vq, del_vq);
 	if (!err)
 		return 0;
 	/* Finally fall back to regular interrupts. */
-	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
+	return vp_try_to_find_vqs(vp_dev, nvqs, vqs, callbacks, names,
 				  false, false, msix_config, setup_vq, del_vq);
 }
 
-void virtio_pci_del_vqs(struct virtio_device *vdev,
+void virtio_pci_del_vqs(struct virtio_pci_device *vp_dev,
 			__le16 __iomem *msix_config,
 			void (*del_vq)(struct virtqueue *vq))
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtqueue *vq, *n;
 	struct virtio_pci_vq_info *info;
 
-	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+	list_for_each_entry_safe(vq, n, &vp_dev->vdev.vqs, list) {
 		info = vq->priv;
 		if (vp_dev->per_vq_vectors &&
 			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
@@ -346,7 +340,7 @@ void virtio_pci_del_vqs(struct virtio_device *vdev,
 	}
 	vp_dev->per_vq_vectors = false;
 
-	vp_free_vectors(vdev, msix_config);
+	vp_free_vectors(vp_dev, msix_config);
 }
 
 /* Setup the affinity for a virtqueue:
diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h
index 2c4d890..146c3be 100644
--- a/drivers/virtio/virtio_pci-common.h
+++ b/drivers/virtio/virtio_pci-common.h
@@ -90,7 +90,7 @@ irqreturn_t virtio_pci_vring_interrupt(int irq, void *opaque);
 /* Acknowledge, check for config or vq interrupt. */
 irqreturn_t virtio_pci_interrupt(int irq, void *opaque);
 
-typedef struct virtqueue *virtio_pci_setup_vq_fn(struct virtio_device *,
+typedef struct virtqueue *virtio_pci_setup_vq_fn(struct virtio_pci_device *,
 						 unsigned index,
 						 void (*callback)
 							(struct virtqueue *),
@@ -98,7 +98,7 @@ typedef struct virtqueue *virtio_pci_setup_vq_fn(struct virtio_device *,
 						 u16 msix_vec);
 
 /* Core of a config->find_vqs() implementation */
-int virtio_pci_find_vqs(struct virtio_device *vdev,
+int virtio_pci_find_vqs(struct virtio_pci_device *vp_dev,
 			unsigned nvqs,
 			struct virtqueue *vqs[],
 			vq_callback_t *callbacks[],
@@ -108,7 +108,7 @@ int virtio_pci_find_vqs(struct virtio_device *vdev,
 			void (*del_vq)(struct virtqueue *vq));
 
 /* the core of a config->del_vqs() implementation */
-void virtio_pci_del_vqs(struct virtio_device *vdev,
+void virtio_pci_del_vqs(struct virtio_pci_device *vp_dev,
 			__le16 __iomem *msix_config,
 			void (*del_vq)(struct virtqueue *vq));
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 937fae7..97d9b54 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -182,12 +182,12 @@ static void *alloc_virtqueue_pages(u16 *num)
 	return NULL;
 }
 
-static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
+static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
+				  unsigned index,
 				  void (*callback)(struct virtqueue *vq),
 				  const char *name,
 				  u16 msix_vec)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_vq_info *info;
 	struct virtqueue *vq;
 	u16 num;
@@ -241,7 +241,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	}
 
 	/* create the vring */
-	vq = vring_new_virtqueue(index, num, SMP_CACHE_BYTES, vdev,
+	vq = vring_new_virtqueue(index, num, SMP_CACHE_BYTES, &vp_dev->vdev,
 				 true, info->queue, virtio_pci_notify,
 				 callback, name);
 	if (!vq) {
@@ -350,7 +350,7 @@ static void del_vq(struct virtqueue *vq)
 static void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	virtio_pci_del_vqs(vdev, &vp_dev->common->msix_config, del_vq);
+	virtio_pci_del_vqs(vp_dev, &vp_dev->common->msix_config, del_vq);
 }
 
 /* the config->find_vqs() implementation */
@@ -361,7 +361,7 @@ static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	return virtio_pci_find_vqs(vdev, nvqs, vqs, callbacks, names,
+	return virtio_pci_find_vqs(vp_dev, nvqs, vqs, callbacks, names,
 				   &vp_dev->common->msix_config,
 				   setup_vq, del_vq);
 }
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 5ab05c3..f78a858 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -145,13 +145,12 @@ static void vp_reset(struct virtio_device *vdev)
 	virtio_pci_synchronize_vectors(vdev);
 }
 
-static struct virtqueue *setup_legacy_vq(struct virtio_device *vdev,
+static struct virtqueue *setup_legacy_vq(struct virtio_pci_device *vp_dev,
 					 unsigned index,
 					 void (*callback)(struct virtqueue *vq),
 					 const char *name,
 					 u16 msix_vec)
 {
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_vq_info *info;
 	struct virtqueue *vq;
 	unsigned long flags, size;
@@ -187,7 +186,7 @@ static struct virtqueue *setup_legacy_vq(struct virtio_device *vdev,
 
 	/* create the vring */
 	vq = vring_new_virtqueue(index, num,
-				 VIRTIO_PCI_LEGACY_VRING_ALIGN, vdev,
+				 VIRTIO_PCI_LEGACY_VRING_ALIGN, &vp_dev->vdev,
 				 true, info->queue, virtio_pci_notify,
 				 callback, name);
 	if (!vq) {
@@ -265,7 +264,7 @@ static void vp_del_vqs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	virtio_pci_del_vqs(vdev, vp_dev->legacy +
+	virtio_pci_del_vqs(vp_dev, vp_dev->legacy +
 			   VIRTIO_MSI_LEGACY_CONFIG_VECTOR,
 			   del_legacy_vq);
 }
@@ -278,7 +277,7 @@ static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	return virtio_pci_find_vqs(vdev, nvqs, vqs, callbacks, names,
+	return virtio_pci_find_vqs(vp_dev, nvqs, vqs, callbacks, names,
 				   vp_dev->legacy +
 				   VIRTIO_MSI_LEGACY_CONFIG_VECTOR,
 				   setup_legacy_vq, del_legacy_vq);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 22/22] virtio_pci: fix finalize_features in modern driver.
  2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
                   ` (20 preceding siblings ...)
  2013-03-21  8:29 ` [PATCH 21/22] virtio_pci: simplify common helpers Rusty Russell
@ 2013-03-21  8:29 ` Rusty Russell
  21 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-21  8:29 UTC (permalink / raw)
  To: virtualization

Because we have potentially-infinite feature bits, it's hard for the device
to know when features are finalized.

This adds a new status bit, VIRTIO_CONFIG_S_FEATURES_DONE, which is only
set by the modern virtio_pci driver at the moment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_pci-common.c |   28 --------------
 drivers/virtio/virtio_pci-common.h |    1 -
 drivers/virtio/virtio_pci.c        |   73 ++++++++++++++++++++++++++----------
 drivers/virtio/virtio_pci_legacy.c |   27 +++++++++++++
 include/uapi/linux/virtio_config.h |    2 +
 5 files changed, 83 insertions(+), 48 deletions(-)

diff --git a/drivers/virtio/virtio_pci-common.c b/drivers/virtio/virtio_pci-common.c
index d4e33ad..d2c78c8 100644
--- a/drivers/virtio/virtio_pci-common.c
+++ b/drivers/virtio/virtio_pci-common.c
@@ -392,32 +392,4 @@ int virtio_pci_freeze(struct device *dev)
 		pci_disable_device(pci_dev);
 	return ret;
 }
-
-int virtio_pci_restore(struct device *dev)
-{
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-	struct virtio_driver *drv;
-	int ret;
-
-	drv = container_of(vp_dev->vdev.dev.driver,
-			   struct virtio_driver, driver);
-
-	ret = pci_enable_device(pci_dev);
-	if (ret)
-		return ret;
-
-	pci_set_master(pci_dev);
-	vp_dev->vdev.config->finalize_features(&vp_dev->vdev);
-
-	if (drv && drv->restore)
-		ret = drv->restore(&vp_dev->vdev);
-
-	/* Finally, tell the device we're all set */
-	if (!ret)
-		vp_dev->vdev.config->set_status(&vp_dev->vdev,
-						vp_dev->saved_status);
-
-	return ret;
-}
 #endif
diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h
index 146c3be..dac434e 100644
--- a/drivers/virtio/virtio_pci-common.h
+++ b/drivers/virtio/virtio_pci-common.h
@@ -118,5 +118,4 @@ int virtio_pci_set_vq_affinity(struct virtqueue *vq, int cpu);
 
 #ifdef CONFIG_PM
 int virtio_pci_freeze(struct device *dev);
-int virtio_pci_restore(struct device *dev);
 #endif
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 97d9b54..4614a15 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -50,6 +50,21 @@ static u64 ioread64(__le64 *addr)
 	return ioread32(addr) | ((u64)ioread32((__le32 *)addr + 1) << 32);
 }
 
+/* config->{get,set}_status() implementations */
+static u8 vp_get_status(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	return ioread8(&vp_dev->common->device_status);
+}
+
+static void vp_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	/* We should never be setting status to 0. */
+	BUG_ON(status == 0);
+	iowrite8(status, &vp_dev->common->device_status);
+}
+
 static u64 vp_get_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -62,19 +77,27 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	return features;
 }
 
-static void vp_finalize_features(struct virtio_device *vdev)
+static void write_features(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 
-	/* Give virtio_ring a chance to accept features. */
-	vring_transport_features(vdev);
-
 	iowrite32(0, &vp_dev->common->guest_feature_select);
 	iowrite32((u32)vdev->features, &vp_dev->common->guest_feature);
 	iowrite32(1, &vp_dev->common->guest_feature_select);
 	iowrite32(vdev->features >> 32, &vp_dev->common->guest_feature);
 }
 
+static void vp_finalize_features(struct virtio_device *vdev)
+{
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	write_features(vdev);
+
+	/* Update status to lock it in. */
+	vp_set_status(vdev, vp_get_status(vdev)|VIRTIO_CONFIG_S_FEATURES_DONE);
+}
+
 /* virtio config is little-endian for virtio_pci (vs guest-endian for legacy) */
 static u8 vp_get8(struct virtio_device *vdev, unsigned offset)
 {
@@ -132,21 +155,6 @@ static void vp_set64(struct virtio_device *vdev, unsigned offset, u64 val)
 	iowrite64(val, vp_dev->device + offset);
 }
 
-/* config->{get,set}_status() implementations */
-static u8 vp_get_status(struct virtio_device *vdev)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	return ioread8(&vp_dev->common->device_status);
-}
-
-static void vp_set_status(struct virtio_device *vdev, u8 status)
-{
-	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	/* We should never be setting status to 0. */
-	BUG_ON(status == 0);
-	iowrite8(status, &vp_dev->common->device_status);
-}
-
 static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -571,6 +579,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 }
 
 #ifdef CONFIG_PM
+static int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+
+	pci_set_master(pci_dev);
+	write_features(&vp_dev->vdev);
+
+	if (drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	/* Finally, tell the device we're all set */
+	if (!ret)
+		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
+
+	return ret;
+}
+
 static const struct dev_pm_ops virtio_pci_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
 };
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index f78a858..cfff009 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -436,6 +436,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 }
 
 #ifdef CONFIG_PM
+static int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+
+	pci_set_master(pci_dev);
+	vp_finalize_features(&vp_dev->vdev);
+
+	if (drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	/* Finally, tell the device we're all set */
+	if (!ret)
+		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
+
+	return ret;
+}
+
 static const struct dev_pm_ops virtio_pci_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
 };
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index b7cda39..83848b7 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -36,6 +36,8 @@
 #define VIRTIO_CONFIG_S_ACKNOWLEDGE	1
 /* We have found a driver for the device. */
 #define VIRTIO_CONFIG_S_DRIVER		2
+/* Features are finalized (only for new virtio_pci) */
+#define VIRTIO_CONFIG_S_FEATURES_DONE	8
 /* Driver has used its parts of the config, and is happy */
 #define VIRTIO_CONFIG_S_DRIVER_OK	4
 /* We've given up on this device. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
@ 2013-03-21  9:09   ` Cornelia Huck
  2013-03-22  0:31     ` Rusty Russell
  2013-03-22 14:43   ` Sjur Brændeland
  2013-04-02 17:16   ` Pawel Moll
  2 siblings, 1 reply; 94+ messages in thread
From: Cornelia Huck @ 2013-03-21  9:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

On Thu, 21 Mar 2013 18:59:24 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> All transports just pass through at the moment.
> 
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Brian Swetland <swetland@google.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Pawel Moll <pawel.moll@arm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/lguest/lguest_device.c |   79 ++++++++++++++++++++++++++++++++++------
>  drivers/net/caif/caif_virtio.c |    2 +-
>  drivers/s390/kvm/kvm_virtio.c  |   78 +++++++++++++++++++++++++++++++++------
>  drivers/s390/kvm/virtio_ccw.c  |   39 +++++++++++++++++++-
>  drivers/virtio/virtio_mmio.c   |   35 +++++++++++++++++-
>  drivers/virtio/virtio_pci.c    |   39 +++++++++++++++++---
>  include/linux/virtio_config.h  |   70 +++++++++++++++++++++--------------
>  7 files changed, 283 insertions(+), 59 deletions(-)
> 

> diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
> index 6711e65..dcf35b1 100644
> --- a/drivers/s390/kvm/kvm_virtio.c
> +++ b/drivers/s390/kvm/kvm_virtio.c
> @@ -112,26 +112,82 @@ static void kvm_finalize_features(struct virtio_device *vdev)
>  }
> 
>  /*
> - * Reading and writing elements in config space
> + * Reading and writing elements in config space.  Host and guest are always
> + * big-endian, so no conversion necessary.
>   */
> -static void kvm_get(struct virtio_device *vdev, unsigned int offset,
> -		   void *buf, unsigned len)
> +static u8 kvm_get8(struct virtio_device *vdev, unsigned int offset)
>  {
> -	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
               ^^^^^^^^^^^^^^^^^^
 
This looks weird?

> 
> -	BUG_ON(offset + len > desc->config_len);
> -	memcpy(buf, kvm_vq_configspace(desc) + offset, len);
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u8) > desc->config_len);
> +	return *(u8 *)(kvm_vq_configspace(desc) + offset);
>  }
> 
> -static void kvm_set(struct virtio_device *vdev, unsigned int offset,
> -		   const void *buf, unsigned len)
> +static void kvm_set8(struct virtio_device *vdev, unsigned int offset, u8 val)
>  {
> -	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u8 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u16 kvm_get16(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u16) > desc->config_len);
> +	return *(u16 *)(kvm_vq_configspace(desc) + offset);
> +}
> +
> +static void kvm_set16(struct virtio_device *vdev, unsigned int offset, u16 val)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u16 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u32 kvm_get32(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> 
> -	BUG_ON(offset + len > desc->config_len);
> -	memcpy(kvm_vq_configspace(desc) + offset, buf, len);
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u32) > desc->config_len);
> +	return *(u32 *)(kvm_vq_configspace(desc) + offset);
>  }
> 
> +static void kvm_set32(struct virtio_device *vdev, unsigned int offset, u32 val)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u32 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u64 kvm_get64(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u64) > desc->config_len);
> +	return *(u64 *)(kvm_vq_configspace(desc) + offset);
> +}
> +
> +static void kvm_set64(struct virtio_device *vdev, unsigned int offset, u64 val)
> +{
> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u64 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +

The new functions don't seem to be hooked up anywhere?

>  /*
>   * The operations to get and set the status word just access
>   * the status field of the device descriptor. set_status will also
> diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
> index 2029b6c..3652473 100644
> --- a/drivers/s390/kvm/virtio_ccw.c
> +++ b/drivers/s390/kvm/virtio_ccw.c
> @@ -472,6 +472,7 @@ out_free:
>  	kfree(ccw);
>  }
> 
> +/* We don't need to do endian conversion, as it's always big endian like us */
>  static void virtio_ccw_get_config(struct virtio_device *vdev,
>  				  unsigned int offset, void *buf, unsigned len)
>  {
> @@ -505,6 +506,21 @@ out_free:
>  	kfree(ccw);
>  }
> 
> +
> +#define VIRTIO_CCW_GET_CONFIGx(bits)					\
> +static u##bits virtio_ccw_get_config##bits(struct virtio_device *vdev,	\
> +					   unsigned int offset)		\
> +{									\
> +	u##bits v;							\
> +	virtio_ccw_get_config(vdev, offset, &v, sizeof(v));		\
> +	return v;							\
> +}
> +
> +VIRTIO_CCW_GET_CONFIGx(8)
> +VIRTIO_CCW_GET_CONFIGx(16)
> +VIRTIO_CCW_GET_CONFIGx(32)
> +VIRTIO_CCW_GET_CONFIGx(64)
> +
>  static void virtio_ccw_set_config(struct virtio_device *vdev,
>  				  unsigned int offset, const void *buf,
>  				  unsigned len)
> @@ -535,6 +551,19 @@ out_free:
>  	kfree(ccw);
>  }
> 
> +#define VIRTIO_CCW_SET_CONFIGx(bits)					\
> +static void virtio_ccw_set_config##bits(struct virtio_device *vdev,	\
> +					unsigned int offset,		\
> +					u##bits v)			\
> +{									\
> +	virtio_ccw_set_config(vdev, offset, &v, sizeof(v));		\
> +}
> +
> +VIRTIO_CCW_SET_CONFIGx(8)
> +VIRTIO_CCW_SET_CONFIGx(16)
> +VIRTIO_CCW_SET_CONFIGx(32)
> +VIRTIO_CCW_SET_CONFIGx(64)
> +
>  static u8 virtio_ccw_get_status(struct virtio_device *vdev)
>  {
>  	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> @@ -564,8 +593,14 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>  static struct virtio_config_ops virtio_ccw_config_ops = {
>  	.get_features = virtio_ccw_get_features,
>  	.finalize_features = virtio_ccw_finalize_features,
> -	.get = virtio_ccw_get_config,
> -	.set = virtio_ccw_set_config,
> +	.get8 = virtio_ccw_get_config8,
> +	.set8 = virtio_ccw_set_config8,
> +	.get16 = virtio_ccw_get_config16,
> +	.set16 = virtio_ccw_set_config16,
> +	.get32 = virtio_ccw_get_config32,
> +	.set32 = virtio_ccw_set_config32,
> +	.get64 = virtio_ccw_get_config64,
> +	.set64 = virtio_ccw_set_config64,
>  	.get_status = virtio_ccw_get_status,
>  	.set_status = virtio_ccw_set_status,
>  	.reset = virtio_ccw_reset,

virtio-ccw looks sane at first glance.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features
  2013-03-21  8:29 ` [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features Rusty Russell
@ 2013-03-21 10:00   ` Cornelia Huck
  2013-03-22  0:48     ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: Cornelia Huck @ 2013-03-21 10:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

On Thu, 21 Mar 2013 18:59:25 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> It seemed like a good idea, but it's actually a pain when we get more
> than 32 feature bits.  Just change it to a u32 for now.
> 
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Brian Swetland <swetland@google.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Pawel Moll <pawel.moll@arm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/char/virtio_console.c          |    2 +-
>  drivers/lguest/lguest_device.c         |    2 +-
>  drivers/remoteproc/remoteproc_virtio.c |    2 +-
>  drivers/s390/kvm/kvm_virtio.c          |    2 +-
>  drivers/virtio/virtio.c                |   10 +++++-----
>  drivers/virtio/virtio_mmio.c           |    8 ++------
>  drivers/virtio/virtio_pci.c            |    3 +--
>  drivers/virtio/virtio_ring.c           |    2 +-
>  include/linux/virtio.h                 |    3 +--
>  include/linux/virtio_config.h          |    2 +-
>  tools/virtio/linux/virtio.h            |   22 +---------------------
>  tools/virtio/linux/virtio_config.h     |    2 +-
>  tools/virtio/virtio_test.c             |    5 ++---
>  tools/virtio/vringh_test.c             |   16 ++++++++--------
>  14 files changed, 27 insertions(+), 54 deletions(-)

I didn't try this patch, but wouldn't virtio_ccw need something like
the change below as well?

--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -440,7 +440,6 @@ static void virtio_ccw_finalize_features(struct virtio_device *vdev)
 {
 	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
 	struct virtio_feature_desc *features;
-	int i;
 	struct ccw1 *ccw;
 
 	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
@@ -454,19 +453,15 @@ static void virtio_ccw_finalize_features(struct virtio_device *vdev)
 	/* Give virtio_ring a chance to accept features. */
 	vring_transport_features(vdev);
 
-	for (i = 0; i < sizeof(*vdev->features) / sizeof(features->features);
-	     i++) {
-		int highbits = i % 2 ? 32 : 0;
-		features->index = i;
-		features->features = cpu_to_le32(vdev->features[i / 2]
-						 >> highbits);
-		/* Write the feature bits to the host. */
-		ccw->cmd_code = CCW_CMD_WRITE_FEAT;
-		ccw->flags = 0;
-		ccw->count = sizeof(*features);
-		ccw->cda = (__u32)(unsigned long)features;
-		ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_FEAT);
-	}
+	features->index = 0;
+	features->features = cpu_to_le32(vdev->features);
+	/* Write the feature bits to the host. */
+	ccw->cmd_code = CCW_CMD_WRITE_FEAT;
+	ccw->flags = 0;
+	ccw->count = sizeof(*features);
+	ccw->cda = (__u32)(unsigned long)features;
+	ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_FEAT);
+
 out_free:
 	kfree(features);
 	kfree(ccw);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-21  8:29 ` [PATCH 05/22] virtio: add support for 64 bit features Rusty Russell
@ 2013-03-21 10:06   ` Cornelia Huck
  2013-03-22  0:50     ` Rusty Russell
  2013-03-22 14:50     ` Sjur Brændeland
  2013-04-02 17:09   ` Pawel Moll
  1 sibling, 2 replies; 94+ messages in thread
From: Cornelia Huck @ 2013-03-21 10:06 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

On Thu, 21 Mar 2013 18:59:26 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Change the u32 to a u64, and make sure to use 1ULL everywhere!
> 
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Brian Swetland <swetland@google.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Pawel Moll <pawel.moll@arm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/char/virtio_console.c          |    2 +-
>  drivers/lguest/lguest_device.c         |   10 +++++-----
>  drivers/remoteproc/remoteproc_virtio.c |    6 +++++-
>  drivers/s390/kvm/kvm_virtio.c          |   10 +++++-----
>  drivers/virtio/virtio.c                |   12 ++++++------
>  drivers/virtio/virtio_mmio.c           |   14 +++++++++-----
>  drivers/virtio/virtio_pci.c            |    5 ++---
>  drivers/virtio/virtio_ring.c           |    2 +-
>  include/linux/virtio.h                 |    2 +-
>  include/linux/virtio_config.h          |    8 ++++----
>  tools/virtio/linux/virtio.h            |    2 +-
>  tools/virtio/linux/virtio_config.h     |    2 +-
>  12 files changed, 41 insertions(+), 34 deletions(-)
> 

And a not-even-compiled change for virtio_ccw as well:

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index eb0616b..2ca6dc5 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -454,8 +454,17 @@ static void virtio_ccw_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	features->index = 0;
-	features->features = cpu_to_le32(vdev->features);
-	/* Write the feature bits to the host. */
+	features->features = cpu_to_le32((u32)vdev->features);
+	/* Write the first half of the feature bits to the host. */
+	ccw->cmd_code = CCW_CMD_WRITE_FEAT;
+	ccw->flags = 0;
+	ccw->count = sizeof(*features);
+	ccw->cda = (__u32)(unsigned long)features;
+	ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_FEAT);
+
+	features->index = 1;
+	features->features = cpu_to_le32(vdev->features >> 32);
+	/* Write the second half of the feature bits to the host. */
 	ccw->cmd_code = CCW_CMD_WRITE_FEAT;
 	ccw->flags = 0;
 	ccw->count = sizeof(*features);

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-21  8:29 ` [PATCH 16/22] virtio_pci: use separate notification offsets for each vq Rusty Russell
@ 2013-03-21 10:13   ` Michael S. Tsirkin
  2013-03-21 10:35     ` Michael S. Tsirkin
  2013-03-22  2:52     ` Rusty Russell
  0 siblings, 2 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 10:13 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization

On Thu, Mar 21, 2013 at 06:59:37PM +1030, Rusty Russell wrote:
> (MST, is this what you were thinking?)

Almost.

Three points:

1. this is still an offset in BAR so for KVM we are still forced to use
an IO BAR.  I would like an option for hypervisor to simply say "Do IO
to this fixed address for this VQ". Then virtio can avoid using IO BARs
completely.

2.  for a real virtio device, offset is only 16 bit, using a 32 bit
offset in a memory BAR giving each VQ a separate 4K page would allow
priveledge separation where e.g. RXVQ/TXVQ are passed through to
hardware but CVQ is handled by the hypervisor.

3. last thing - (1) applies to ISR reads as well.

So I had in mind a structure like:

	struct vq_notify {
		u32 offset;
		u16 data;
		u16 flags;
	}

enum vq_notify_flags {
	VQ_NOTIFY_BAR0,
	VQ_NOTIFY_BAR1,
	VQ_NOTIFY_BAR2,
	VQ_NOTIFY_BAR3,
	VQ_NOTIFY_BAR4,
	VQ_NOTIFY_BAR5,
	VQ_NOTIFY_FIXED_IOPORT,
}

And then to notify a vq we write a given data at given offset
or into a given port for VQ_NOTIFY_FIXED_IOPORT.

Only point 1 is really important for me though, I can be
flexible on the rest of it.


> This uses the previously-unused field for "queue_notify".  This contains
> the offset from the notification area given by the VIRTIO_PCI_CAP_NOTIFY_CFG
> header.
> 
> (A device can still make them all overlap if it wants, since the queue
> index is written: it can still distinguish different notifications).
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/virtio_pci.c     |   61 +++++++++++++++++++++++++++------------
>  include/uapi/linux/virtio_pci.h |    2 +-
>  2 files changed, 44 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index f252afe..d492361 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -37,11 +37,15 @@ struct virtio_pci_device {
>  	struct virtio_pci_common_cfg __iomem *common;
>  	/* Where to read and clear interrupt */
>  	u8 __iomem *isr;
> -	/* Write the virtqueue index here to notify device of activity. */
> -	__le16 __iomem *notify;
> +	/* Write the vq index here to notify device of activity. */
> +	void __iomem *notify_base;
>  	/* Device-specific data. */
>  	void __iomem *device;
>  
> +	/* So we can sanity-check accesses. */
> +	size_t notify_len;
> +	size_t device_len;
> +
>  	/* a list of queues so we can dispatch IRQs */
>  	spinlock_t lock;
>  	struct list_head virtqueues;
> @@ -84,6 +88,9 @@ struct virtio_pci_vq_info {
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
>  
> +	/* Notify area for this vq. */
> +	u16 __iomem *notify;
> +
>  	/* MSI-X vector (or none) */
>  	unsigned msix_vector;
>  };
> @@ -240,11 +247,11 @@ static void vp_reset(struct virtio_device *vdev)
>  /* the notify function used when creating a virt queue */
>  static void vp_notify(struct virtqueue *vq)
>  {
> -	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> +	struct virtio_pci_vq_info *info = vq->priv;
>  
> -	/* we write the queue's selector into the notification register to
> -	 * signal the other end */
> -	iowrite16(vq->index, vp_dev->notify);
> +	/* we write the queue selector into the notification register
> +	 * to signal the other end */
> +	iowrite16(vq->index, info->notify);
>  }
>  
>  /* Handle a configuration change: Tell driver if it wants to know. */
> @@ -460,7 +467,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
>  	struct virtio_pci_vq_info *info;
>  	struct virtqueue *vq;
>  	u16 num;
> -	int err;
> +	int err, off;
>  
>  	if (index >= ioread16(&vp_dev->common->num_queues))
>  		return ERR_PTR(-ENOENT);
> @@ -492,6 +499,17 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
>  
>  	info->msix_vector = msix_vec;
>  
> +	/* get offset of notification byte for this virtqueue */
> +	off = ioread16(&vp_dev->common->queue_notify);
> +	if (off > vp_dev->notify_len) {
> +		dev_warn(&vp_dev->pci_dev->dev,
> +			 "bad notification offset %u for queue %u (> %u)",
> +			 off, index, vp_dev->notify_len);
> +		err = -EINVAL;
> +		goto out_info;
> +	}
> +	info->notify = vp_dev->notify_base + off;
> +
>  	info->queue = alloc_virtqueue_pages(&num);
>  	if (info->queue == NULL) {
>  		err = -ENOMEM;
> @@ -787,7 +805,8 @@ static void virtio_pci_release_dev(struct device *_d)
>  	 */
>  }
>  
> -static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
> +static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
> +				    size_t *len)
>  {
>  	u8 bar;
>  	u32 offset, length;
> @@ -800,13 +819,16 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
>  	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
>  			     &length);
>  
> -	if (length < expect) {
> +	if (length < minlen) {
>  		dev_err(&dev->dev,
> -			"virtio_pci: small capability len %u (%u expected)\n",
> -			length, expect);
> +			"virtio_pci: small capability len %u (%zu expected)\n",
> +			length, minlen);
>  		return NULL;
>  	}
>  
> +	if (len)
> +		*len = length;
> +
>  	/* We want uncachable mapping, even if bar is cachable. */
>  	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
>  	if (!p)
> @@ -883,16 +905,19 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  
>  	err = -EINVAL;
>  	vp_dev->common = map_capability(pci_dev, common,
> -					sizeof(struct virtio_pci_common_cfg));
> +					sizeof(struct virtio_pci_common_cfg),
> +					NULL);
>  	if (!vp_dev->common)
>  		goto out_req_regions;
> -	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8));
> +	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL);
>  	if (!vp_dev->isr)
>  		goto out_map_common;
> -	vp_dev->notify = map_capability(pci_dev, notify, sizeof(u16));
> -	if (!vp_dev->notify)
> +	vp_dev->notify_base = map_capability(pci_dev, notify, sizeof(u8),
> +					     &vp_dev->notify_len);
> +	if (!vp_dev->notify_len)
>  		goto out_map_isr;
> -	vp_dev->device = map_capability(pci_dev, device, 0);
> +	vp_dev->device = map_capability(pci_dev, device, 0,
> +					&vp_dev->device_len);
>  	if (!vp_dev->device)
>  		goto out_map_notify;
>  
> @@ -917,7 +942,7 @@ out_set_drvdata:
>  	pci_set_drvdata(pci_dev, NULL);
>  	pci_iounmap(pci_dev, vp_dev->device);
>  out_map_notify:
> -	pci_iounmap(pci_dev, vp_dev->notify);
> +	pci_iounmap(pci_dev, vp_dev->notify_base);
>  out_map_isr:
>  	pci_iounmap(pci_dev, vp_dev->isr);
>  out_map_common:
> @@ -940,7 +965,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>  	vp_del_vqs(&vp_dev->vdev);
>  	pci_set_drvdata(pci_dev, NULL);
>  	pci_iounmap(pci_dev, vp_dev->device);
> -	pci_iounmap(pci_dev, vp_dev->notify);
> +	pci_iounmap(pci_dev, vp_dev->notify_base);
>  	pci_iounmap(pci_dev, vp_dev->isr);
>  	pci_iounmap(pci_dev, vp_dev->common);
>  	pci_release_regions(pci_dev);
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index b334cd9..23b90cb 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -144,13 +144,13 @@ struct virtio_pci_common_cfg {
>  	__le16 num_queues;		/* read-only */
>  	__u8 device_status;		/* read-write */
>  	__u8 unused1;
> -	__le16 unused2;
>  
>  	/* About a specific virtqueue. */
>  	__le16 queue_select;	/* read-write */
>  	__le16 queue_size;	/* read-write, power of 2. */
>  	__le16 queue_msix_vector;/* read-write */
>  	__le16 queue_enable;	/* read-write */
> +	__le16 queue_notify;	/* read-only */
>  	__le64 queue_desc;	/* read-write */
>  	__le64 queue_avail;	/* read-write */
>  	__le64 queue_used;	/* read-write */
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 13/22] virtio_pci: new, capability-aware driver.
  2013-03-21  8:29 ` [PATCH 13/22] virtio_pci: new, capability-aware driver Rusty Russell
@ 2013-03-21 10:24   ` Michael S. Tsirkin
  2013-03-22  1:02     ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 10:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization

On Thu, Mar 21, 2013 at 06:59:34PM +1030, Rusty Russell wrote:
> Differences:
> 1) Uses 4 pci capabilities to demark common, irq, notify and dev-specific areas.
> 2) Guest sets queue size, using host-provided maximum.
> 3) Guest sets queue alignment, rather than ABI-defined 4096.
> 4) More than 32 feature bits (a lot more!).
> 
> Signed-off-by: Rusty Russell <rustcorp.com.au>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/Makefile     |    1 +
>  drivers/virtio/virtio_pci.c |  979 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 980 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pci.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 23834f5..eec0a42 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,4 +1,5 @@
>  obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
> +obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  obj-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> new file mode 100644
> index 0000000..b86b99c
> --- /dev/null
> +++ b/drivers/virtio/virtio_pci.c
> @@ -0,0 +1,979 @@
> +/*
> + * Virtio PCI driver
> + *
> + * This module allows virtio devices to be used over a virtual PCI
> + * device.  Copyright 2011, Rusty Russell IBM Corporation, but based
> + * on the older virtio_pci_legacy.c, which was Copyright IBM
> + * Corp. 2007.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#define VIRTIO_PCI_NO_LEGACY
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include <linux/interrupt.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/highmem.h>
> +#include <linux/spinlock.h>
> +
> +MODULE_AUTHOR("Rusty Russell <rusty@rustcorp.com.au>");
> +MODULE_DESCRIPTION("virtio-pci");
> +MODULE_LICENSE("GPL");
> +MODULE_VERSION("2");
> +
> +/* Our device structure */
> +struct virtio_pci_device {
> +	struct virtio_device vdev;
> +	struct pci_dev *pci_dev;
> +
> +	/* The IO mapping for the PCI config space */
> +	struct virtio_pci_common_cfg __iomem *common;
> +	/* Where to read and clear interrupt */
> +	u8 __iomem *isr;
> +	/* Write the virtqueue index here to notify device of activity. */
> +	__le16 __iomem *notify;
> +	/* Device-specific data. */
> +	void __iomem *device;
> +
> +	/* a list of queues so we can dispatch IRQs */
> +	spinlock_t lock;
> +	struct list_head virtqueues;
> +
> +	/* MSI-X support */
> +	int msix_enabled;
> +	int intx_enabled;
> +	struct msix_entry *msix_entries;
> +	cpumask_var_t *msix_affinity_masks;
> +	/* Name strings for interrupts. This size should be enough,
> +	 * and I'm too lazy to allocate each name separately. */
> +	char (*msix_names)[256];
> +	/* Number of available vectors */
> +	unsigned msix_vectors;
> +	/* Vectors allocated, excluding per-vq vectors if any */
> +	unsigned msix_used_vectors;
> +
> +	/* Status saved during hibernate/restore */
> +	u8 saved_status;
> +
> +	/* Whether we have vector per vq */
> +	bool per_vq_vectors;
> +};
> +
> +/* Constants for MSI-X */
> +/* Use first vector for configuration changes, second and the rest for
> + * virtqueues Thus, we need at least 2 vectors for MSI. */
> +enum {
> +	VP_MSIX_CONFIG_VECTOR = 0,
> +	VP_MSIX_VQ_VECTOR = 1,
> +};

In the future, I have a plan to allow one vector only.  To make this
work without exits for data path VQ, we could make hypervisor set a bit
in guest memory whenever it wants to signal a configuration change.
Guest will execute a config write that will make the hypervisor clear
this register.

I guess this can wait, we are putting too stuff much into this
new layout patchset already.

> +
> +struct virtio_pci_vq_info {
> +	/* the actual virtqueue */
> +	struct virtqueue *vq;
> +
> +	/* the pages used for the queue. */
> +	void *queue;
> +
> +	/* the list node for the virtqueues list */
> +	struct list_head node;
> +
> +	/* MSI-X vector (or none) */
> +	unsigned msix_vector;
> +};
> +
> +/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
> +static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
> +	{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
> +	{ 0 }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
> +
> +/* Convert a generic virtio device to our structure */
> +static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
> +{
> +	return container_of(vdev, struct virtio_pci_device, vdev);
> +}
> +
> +/* There is no iowrite64.  We use two 32-bit ops. */
> +static void iowrite64(u64 val, const __le64 *addr)
> +{
> +	iowrite32((u32)val, (__le32 *)addr);
> +	iowrite32(val >> 32, (__le32 *)addr + 1);
> +}
> +
> +/* There is no ioread64.  We use two 32-bit ops. */
> +static u64 ioread64(__le64 *addr)
> +{
> +	return ioread32(addr) | ((u64)ioread32((__le32 *)addr + 1) << 32);
> +}
> +
> +static u64 vp_get_features(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	u64 features;
> +
> +	iowrite32(0, &vp_dev->common->device_feature_select);
> +	features = ioread32(&vp_dev->common->device_feature);
> +	iowrite32(1, &vp_dev->common->device_feature_select);
> +	features |= ((u64)ioread32(&vp_dev->common->device_feature) << 32);
> +	return features;
> +}
> +
> +static void vp_finalize_features(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +	/* Give virtio_ring a chance to accept features. */
> +	vring_transport_features(vdev);
> +
> +	iowrite32(0, &vp_dev->common->guest_feature_select);
> +	iowrite32((u32)vdev->features, &vp_dev->common->guest_feature);
> +	iowrite32(1, &vp_dev->common->guest_feature_select);
> +	iowrite32(vdev->features >> 32, &vp_dev->common->guest_feature);
> +}
> +
> +/* virtio config->get() implementation */
> +static void vp_get(struct virtio_device *vdev, unsigned offset,
> +		   void *buf, unsigned len)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	void __iomem *ioaddr = vp_dev->device + offset;
> +	u8 *ptr = buf;
> +	int i;
> +
> +	for (i = 0; i < len; i++)
> +		ptr[i] = ioread8(ioaddr + i);
> +}
> +
> +#define VP_GETx(bits)							\
> +static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
> +{									\
> +	u##bits v;							\
> +	vp_get(vdev, offset, &v, sizeof(v));				\
> +	return v;							\
> +}
> +
> +VP_GETx(8)
> +VP_GETx(16)
> +VP_GETx(32)
> +VP_GETx(64)
> +
> +/* the config->set() implementation.  it's symmetric to the config->get()
> + * implementation */
> +static void vp_set(struct virtio_device *vdev, unsigned offset,
> +		   const void *buf, unsigned len)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	void __iomem *ioaddr = vp_dev->device + offset;
> +	const u8 *ptr = buf;
> +	int i;
> +
> +	for (i = 0; i < len; i++)
> +		iowrite8(ptr[i], ioaddr + i);
> +}
> +
> +#define VP_SETx(bits)							\
> +static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
> +			 u##bits v)					\
> +{									\
> +	vp_set(vdev, offset, &v, sizeof(v));				\
> +}
> +
> +VP_SETx(8)
> +VP_SETx(16)
> +VP_SETx(32)
> +VP_SETx(64)
> +
> +/* config->{get,set}_status() implementations */
> +static u8 vp_get_status(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	return ioread8(&vp_dev->common->device_status);
> +}
> +
> +static void vp_set_status(struct virtio_device *vdev, u8 status)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	/* We should never be setting status to 0. */
> +	BUG_ON(status == 0);
> +	iowrite8(status, &vp_dev->common->device_status);
> +}
> +
> +/* wait for pending irq handlers */
> +static void vp_synchronize_vectors(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	int i;
> +
> +	if (vp_dev->intx_enabled)
> +		synchronize_irq(vp_dev->pci_dev->irq);
> +
> +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> +		synchronize_irq(vp_dev->msix_entries[i].vector);
> +}
> +
> +static void vp_reset(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	/* 0 status means a reset. */
> +	iowrite8(0, &vp_dev->common->device_status);
> +	/* Flush out the status write, and flush in device writes,
> +	 * including MSi-X interrupts, if any. */

MSI-X ?

> +	ioread8(&vp_dev->common->device_status);
> +	/* Flush pending VQ/configuration callbacks. */
> +	vp_synchronize_vectors(vdev);
> +}
> +
> +/* the notify function used when creating a virt queue */
> +static void vp_notify(struct virtqueue *vq)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> +
> +	/* we write the queue's selector into the notification register to
> +	 * signal the other end */
> +	iowrite16(vq->index, vp_dev->notify);
> +}
> +
> +/* Handle a configuration change: Tell driver if it wants to know. */
> +static irqreturn_t vp_config_changed(int irq, void *opaque)
> +{
> +	struct virtio_pci_device *vp_dev = opaque;
> +	struct virtio_driver *drv;
> +	drv = container_of(vp_dev->vdev.dev.driver,
> +			   struct virtio_driver, driver);
> +
> +	if (drv->config_changed)
> +		drv->config_changed(&vp_dev->vdev);
> +	return IRQ_HANDLED;
> +}
> +
> +/* Notify all virtqueues on an interrupt. */
> +static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
> +{
> +	struct virtio_pci_device *vp_dev = opaque;
> +	struct virtio_pci_vq_info *info;
> +	irqreturn_t ret = IRQ_NONE;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&vp_dev->lock, flags);
> +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> +		if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
> +			ret = IRQ_HANDLED;
> +	}
> +	spin_unlock_irqrestore(&vp_dev->lock, flags);
> +
> +	return ret;
> +}
> +
> +/* A small wrapper to also acknowledge the interrupt when it's handled.
> + * I really need an EIO hook for the vring so I can ack the interrupt once we
> + * know that we'll be handling the IRQ but before we invoke the callback since
> + * the callback may notify the host which results in the host attempting to
> + * raise an interrupt that we would then mask once we acknowledged the
> + * interrupt. */
> +static irqreturn_t vp_interrupt(int irq, void *opaque)
> +{
> +	struct virtio_pci_device *vp_dev = opaque;
> +	u8 isr;
> +
> +	/* reading the ISR has the effect of also clearing it so it's very
> +	 * important to save off the value. */
> +	isr = ioread8(vp_dev->isr);
> +
> +	/* It's definitely not us if the ISR was not high */
> +	if (!isr)
> +		return IRQ_NONE;
> +
> +	/* Configuration change?  Tell driver if it wants to know. */
> +	if (isr & VIRTIO_PCI_ISR_CONFIG)
> +		vp_config_changed(irq, opaque);
> +
> +	return vp_vring_interrupt(irq, opaque);
> +}
> +
> +static void vp_free_vectors(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	int i;
> +
> +	if (vp_dev->intx_enabled) {
> +		free_irq(vp_dev->pci_dev->irq, vp_dev);
> +		vp_dev->intx_enabled = 0;
> +	}
> +
> +	for (i = 0; i < vp_dev->msix_used_vectors; ++i)
> +		free_irq(vp_dev->msix_entries[i].vector, vp_dev);
> +
> +	for (i = 0; i < vp_dev->msix_vectors; i++)
> +		if (vp_dev->msix_affinity_masks[i])
> +			free_cpumask_var(vp_dev->msix_affinity_masks[i]);
> +
> +	if (vp_dev->msix_enabled) {
> +		/* Disable the vector used for configuration */
> +		iowrite16(VIRTIO_MSI_NO_VECTOR, &vp_dev->common->msix_config);
> +		/* Flush the write out to device */
> +		ioread16(&vp_dev->common->msix_config);
> +
> +		pci_disable_msix(vp_dev->pci_dev);
> +		vp_dev->msix_enabled = 0;
> +		vp_dev->msix_vectors = 0;
> +	}
> +
> +	vp_dev->msix_used_vectors = 0;
> +	kfree(vp_dev->msix_names);
> +	vp_dev->msix_names = NULL;
> +	kfree(vp_dev->msix_entries);
> +	vp_dev->msix_entries = NULL;
> +	kfree(vp_dev->msix_affinity_masks);
> +	vp_dev->msix_affinity_masks = NULL;
> +}
> +
> +static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
> +				   bool per_vq_vectors)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	const char *name = dev_name(&vp_dev->vdev.dev);
> +	unsigned i, v;
> +	int err = -ENOMEM;
> +
> +	vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
> +				       GFP_KERNEL);
> +	if (!vp_dev->msix_entries)
> +		goto error;
> +	vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
> +				     GFP_KERNEL);
> +	if (!vp_dev->msix_names)
> +		goto error;
> +	vp_dev->msix_affinity_masks
> +		= kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
> +			  GFP_KERNEL);
> +	if (!vp_dev->msix_affinity_masks)
> +		goto error;
> +	for (i = 0; i < nvectors; ++i)
> +		if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
> +					GFP_KERNEL))
> +			goto error;
> +
> +	for (i = 0; i < nvectors; ++i)
> +		vp_dev->msix_entries[i].entry = i;
> +
> +	/* pci_enable_msix returns positive if we can't get this many. */
> +	err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries, nvectors);
> +	if (err > 0)
> +		err = -ENOSPC;
> +	if (err)
> +		goto error;
> +	vp_dev->msix_vectors = nvectors;
> +	vp_dev->msix_enabled = 1;
> +
> +	/* Set the vector used for configuration */
> +	v = vp_dev->msix_used_vectors;
> +	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> +		 "%s-config", name);
> +	err = request_irq(vp_dev->msix_entries[v].vector,
> +			  vp_config_changed, 0, vp_dev->msix_names[v],
> +			  vp_dev);
> +	if (err)
> +		goto error;
> +	++vp_dev->msix_used_vectors;
> +
> +	iowrite16(v, &vp_dev->common->msix_config);
> +	/* Verify we had enough resources to assign the vector */
> +	v = ioread16(&vp_dev->common->msix_config);
> +	if (v == VIRTIO_MSI_NO_VECTOR) {
> +		err = -EBUSY;
> +		goto error;
> +	}
> +
> +	if (!per_vq_vectors) {
> +		/* Shared vector for all VQs */
> +		v = vp_dev->msix_used_vectors;
> +		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
> +			 "%s-virtqueues", name);
> +		err = request_irq(vp_dev->msix_entries[v].vector,
> +				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
> +				  vp_dev);
> +		if (err)
> +			goto error;
> +		++vp_dev->msix_used_vectors;
> +	}
> +	return 0;
> +error:
> +	vp_free_vectors(vdev);
> +	return err;
> +}
> +
> +static int vp_request_intx(struct virtio_device *vdev)
> +{
> +	int err;
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +	err = request_irq(vp_dev->pci_dev->irq, vp_interrupt,
> +			  IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
> +	if (!err)
> +		vp_dev->intx_enabled = 1;
> +	return err;
> +}
> +
> +static size_t vring_pci_size(u16 num)
> +{
> +	/* We only need a cacheline separation. */
> +	return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES));
> +}
> +
> +static void *alloc_virtqueue_pages(u16 *num)
> +{
> +	void *pages;
> +
> +	/* 1024 entries uses about 32k */
> +	if (*num > 1024)
> +		*num = 1024;
> +
> +	for (; *num; *num /= 2) {
> +		pages = alloc_pages_exact(vring_pci_size(*num),
> +					  GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN);
> +		if (pages)
> +			return pages;
> +	}
> +	return NULL;
> +}
> +
> +static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
> +				  void (*callback)(struct virtqueue *vq),
> +				  const char *name,
> +				  u16 msix_vec)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtio_pci_vq_info *info;
> +	struct virtqueue *vq;
> +	u16 num;
> +	int err;
> +
> +	/* Select the queue we're interested in */
> +	iowrite16(index, &vp_dev->common->queue_select);
> +
> +	switch (ioread64(&vp_dev->common->queue_address)) {
> +	case 0xFFFFFFFFFFFFFFFFULL:
> +		return ERR_PTR(-ENOENT);
> +	case 0:
> +		/* Uninitialized.  Excellent. */
> +		break;
> +	default:
> +		/* We've already set this up? */
> +		return ERR_PTR(-EBUSY);
> +	}
> +
> +	/* Maximum size must be a power of 2. */
> +	num = ioread16(&vp_dev->common->queue_size);
> +	if (num & (num - 1)) {
> +		dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", num);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	/* allocate and fill out our structure the represents an active
> +	 * queue */
> +	info = kmalloc(sizeof(struct virtio_pci_vq_info), GFP_KERNEL);
> +	if (!info)
> +		return ERR_PTR(-ENOMEM);
> +
> +	info->msix_vector = msix_vec;
> +
> +	info->queue = alloc_virtqueue_pages(&num);
> +	if (info->queue == NULL) {
> +		err = -ENOMEM;
> +		goto out_info;
> +	}
> +
> +	/* create the vring */
> +	vq = vring_new_virtqueue(index, num, SMP_CACHE_BYTES, vdev,
> +				 true, info->queue, vp_notify, callback, name);
> +	if (!vq) {
> +		err = -ENOMEM;
> +		goto out_alloc_pages;
> +	}
> +
> +	vq->priv = info;
> +	info->vq = vq;
> +
> +	if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
> +		iowrite16(msix_vec, &vp_dev->common->queue_msix_vector);
> +		msix_vec = ioread16(&vp_dev->common->queue_msix_vector);
> +		if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
> +			err = -EBUSY;
> +			goto out_new_virtqueue;
> +		}
> +	}
> +
> +	if (callback) {
> +		unsigned long flags;
> +		spin_lock_irqsave(&vp_dev->lock, flags);
> +		list_add(&info->node, &vp_dev->virtqueues);
> +		spin_unlock_irqrestore(&vp_dev->lock, flags);
> +	} else {
> +		INIT_LIST_HEAD(&info->node);
> +	}
> +
> +	/* Activate the queue. */
> +	iowrite64(virt_to_phys(info->queue), &vp_dev->common->queue_address);
> +	iowrite16(SMP_CACHE_BYTES, &vp_dev->common->queue_align);
> +	iowrite16(num, &vp_dev->common->queue_size);
> +
> +	return vq;
> +
> +out_new_virtqueue:
> +	vring_del_virtqueue(vq);
> +out_alloc_pages:
> +	free_pages_exact(info->queue, vring_pci_size(num));
> +out_info:
> +	kfree(info);
> +	return ERR_PTR(err);
> +}
> +
> +static void vp_del_vq(struct virtqueue *vq)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
> +	struct virtio_pci_vq_info *info = vq->priv;
> +	unsigned long flags, size = vring_pci_size(vq->vring.num);
> +
> +	spin_lock_irqsave(&vp_dev->lock, flags);
> +	list_del(&info->node);
> +	spin_unlock_irqrestore(&vp_dev->lock, flags);
> +
> +	/* Select and deactivate the queue */
> +	iowrite16(vq->index, &vp_dev->common->queue_select);
> +
> +	if (vp_dev->msix_enabled) {
> +		iowrite16(VIRTIO_MSI_NO_VECTOR,
> +			  &vp_dev->common->queue_msix_vector);
> +		/* Flush the write out to device */
> +		ioread16(&vp_dev->common->queue_msix_vector);
> +	}
> +
> +	vring_del_virtqueue(vq);
> +
> +	/* This is for our own benefit, not the device's! */
> +	iowrite64(0, &vp_dev->common->queue_address);
> +	iowrite16(0, &vp_dev->common->queue_size);
> +	iowrite16(0, &vp_dev->common->queue_align);
> +
> +	free_pages_exact(info->queue, size);
> +	kfree(info);
> +}
> +
> +/* the config->del_vqs() implementation */
> +static void vp_del_vqs(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtqueue *vq, *n;
> +	struct virtio_pci_vq_info *info;
> +
> +	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
> +		info = vq->priv;
> +		if (vp_dev->per_vq_vectors &&
> +			info->msix_vector != VIRTIO_MSI_NO_VECTOR)
> +			free_irq(vp_dev->msix_entries[info->msix_vector].vector,
> +				 vq);
> +		vp_del_vq(vq);
> +	}
> +	vp_dev->per_vq_vectors = false;
> +
> +	vp_free_vectors(vdev);
> +}
> +
> +static int vp_try_to_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> +			      struct virtqueue *vqs[],
> +			      vq_callback_t *callbacks[],
> +			      const char *names[],
> +			      bool use_msix,
> +			      bool per_vq_vectors)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	u16 msix_vec;
> +	int i, err, nvectors, allocated_vectors;
> +
> +	if (!use_msix) {
> +		/* Old style: one normal interrupt for change and all vqs. */
> +		err = vp_request_intx(vdev);
> +		if (err)
> +			goto error_request;
> +	} else {
> +		if (per_vq_vectors) {
> +			/* Best option: one for change interrupt, one per vq. */
> +			nvectors = 1;
> +			for (i = 0; i < nvqs; ++i)
> +				if (callbacks[i])
> +					++nvectors;
> +		} else {
> +			/* Second best: one for change, shared for all vqs. */
> +			nvectors = 2;
> +		}
> +
> +		err = vp_request_msix_vectors(vdev, nvectors, per_vq_vectors);
> +		if (err)
> +			goto error_request;
> +	}
> +
> +	vp_dev->per_vq_vectors = per_vq_vectors;
> +	allocated_vectors = vp_dev->msix_used_vectors;
> +	for (i = 0; i < nvqs; ++i) {
> +		if (!names[i]) {
> +			vqs[i] = NULL;
> +			continue;
> +		} else if (!callbacks[i] || !vp_dev->msix_enabled)
> +			msix_vec = VIRTIO_MSI_NO_VECTOR;
> +		else if (vp_dev->per_vq_vectors)
> +			msix_vec = allocated_vectors++;
> +		else
> +			msix_vec = VP_MSIX_VQ_VECTOR;
> +		vqs[i] = setup_vq(vdev, i, callbacks[i], names[i], msix_vec);
> +		if (IS_ERR(vqs[i])) {
> +			err = PTR_ERR(vqs[i]);
> +			goto error_find;
> +		}
> +
> +		if (!vp_dev->per_vq_vectors || msix_vec == VIRTIO_MSI_NO_VECTOR)
> +			continue;
> +
> +		/* allocate per-vq irq if available and necessary */
> +		snprintf(vp_dev->msix_names[msix_vec],
> +			 sizeof *vp_dev->msix_names,
> +			 "%s-%s",
> +			 dev_name(&vp_dev->vdev.dev), names[i]);
> +		err = request_irq(vp_dev->msix_entries[msix_vec].vector,
> +				  vring_interrupt, 0,
> +				  vp_dev->msix_names[msix_vec],
> +				  vqs[i]);
> +		if (err) {
> +			vp_del_vq(vqs[i]);
> +			goto error_find;
> +		}
> +	}
> +	return 0;
> +
> +error_find:
> +	vp_del_vqs(vdev);
> +
> +error_request:
> +	return err;
> +}
> +
> +/* the config->find_vqs() implementation */
> +static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> +		       struct virtqueue *vqs[],
> +		       vq_callback_t *callbacks[],
> +		       const char *names[])
> +{
> +	int err;
> +
> +	/* Try MSI-X with one vector per queue. */
> +	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names, true, true);
> +	if (!err)
> +		return 0;
> +	/* Fallback: MSI-X with one vector for config, one shared for queues. */
> +	err = vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
> +				 true, false);
> +	if (!err)
> +		return 0;
> +	/* Finally fall back to regular interrupts. */
> +	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
> +				  false, false);
> +}
> +
> +static const char *vp_bus_name(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +	return pci_name(vp_dev->pci_dev);
> +}
> +
> +/* Setup the affinity for a virtqueue:
> + * - force the affinity for per vq vector
> + * - OR over all affinities for shared MSI
> + * - ignore the affinity request if we're using INTX
> + */
> +static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
> +{
> +	struct virtio_device *vdev = vq->vdev;
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	struct virtio_pci_vq_info *info = vq->priv;
> +	struct cpumask *mask;
> +	unsigned int irq;
> +
> +	if (!vq->callback)
> +		return -EINVAL;
> +
> +	if (vp_dev->msix_enabled) {
> +		mask = vp_dev->msix_affinity_masks[info->msix_vector];
> +		irq = vp_dev->msix_entries[info->msix_vector].vector;
> +		if (cpu == -1)
> +			irq_set_affinity_hint(irq, NULL);
> +		else {
> +			cpumask_set_cpu(cpu, mask);
> +			irq_set_affinity_hint(irq, mask);
> +		}
> +	}
> +	return 0;
> +}
> +
> +static const struct virtio_config_ops virtio_pci_config_ops = {
> +	.get8		= vp_get8,
> +	.set8		= vp_set8,
> +	.get16		= vp_get16,
> +	.set16		= vp_set16,
> +	.get32		= vp_get32,
> +	.set32		= vp_set32,
> +	.get64		= vp_get64,
> +	.set64		= vp_set64,
> +	.get_status	= vp_get_status,
> +	.set_status	= vp_set_status,
> +	.reset		= vp_reset,
> +	.find_vqs	= vp_find_vqs,
> +	.del_vqs	= vp_del_vqs,
> +	.get_features	= vp_get_features,
> +	.finalize_features = vp_finalize_features,
> +	.bus_name	= vp_bus_name,
> +	.set_vq_affinity = vp_set_vq_affinity,
> +};
> +
> +static void virtio_pci_release_dev(struct device *_d)
> +{
> +	/*
> +	 * No need for a release method as we allocate/free
> +	 * all devices together with the pci devices.
> +	 * Provide an empty one to avoid getting a warning from core.
> +	 */
> +}
> +
> +static void __iomem *map_capability(struct pci_dev *dev, int off, size_t expect)
> +{
> +	u8 bar;
> +	u32 offset, length;
> +	void __iomem *p;
> +
> +	pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap, bar),
> +			     &bar);
> +	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
> +			     &offset);
> +	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
> +			     &length);
> +
> +	if (length < expect) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: small capability len %u (%u expected)\n",
> +			length, expect);
> +		return NULL;
> +	}
> +
> +	/* We want uncachable mapping, even if bar is cachable. */
> +	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
> +	if (!p)
> +		dev_err(&dev->dev,
> +			"virtio_pci: unable to map virtio %u@%u on bar %i\n",
> +			length, offset, bar);
> +	return p;
> +}
> +
> +
> +/* the PCI probing function */
> +static int virtio_pci_probe(struct pci_dev *pci_dev,
> +			    const struct pci_device_id *id)
> +{
> +	struct virtio_pci_device *vp_dev;
> +	int err, common, isr, notify, device;
> +
> +	/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
> +	if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
> +		return -ENODEV;
> +
> +	if (pci_dev->revision != VIRTIO_PCI_ABI_VERSION) {
> +		printk(KERN_ERR "virtio_pci: expected ABI version %d, got %d\n",
> +		       VIRTIO_PCI_ABI_VERSION, pci_dev->revision);
> +		return -ENODEV;
> +	}
> +
> +	/* check for a common config: if not, use legacy mode (bar 0). */
> +	common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
> +					    IORESOURCE_IO|IORESOURCE_MEM);
> +	if (!common) {
> +		dev_info(&pci_dev->dev,
> +			 "virtio_pci: leaving for legacy driver\n");
> +		return -ENODEV;
> +	}
> +
> +	/* If common is there, these should be too... */
> +	isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
> +					 IORESOURCE_IO|IORESOURCE_MEM);
> +	notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
> +					    IORESOURCE_IO|IORESOURCE_MEM);
> +	device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
> +					    IORESOURCE_IO|IORESOURCE_MEM);
> +	if (!isr || !notify || !device) {
> +		dev_err(&pci_dev->dev,
> +			"virtio_pci: missing capabilities %i/%i/%i/%i\n",
> +			common, isr, notify, device);
> +		return -EINVAL;
> +	}
> +
> +	/* allocate our structure and fill it out */
> +	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
> +	if (vp_dev == NULL)
> +		return -ENOMEM;
> +
> +	vp_dev->vdev.dev.parent = &pci_dev->dev;
> +	vp_dev->vdev.dev.release = virtio_pci_release_dev;
> +	vp_dev->vdev.config = &virtio_pci_config_ops;
> +	vp_dev->pci_dev = pci_dev;
> +	INIT_LIST_HEAD(&vp_dev->virtqueues);
> +	spin_lock_init(&vp_dev->lock);
> +
> +	/* Disable MSI/MSIX to bring device to a known good state. */
> +	pci_msi_off(pci_dev);
> +
> +	/* enable the device */
> +	err = pci_enable_device(pci_dev);
> +	if (err)
> +		goto out;
> +
> +	err = pci_request_regions(pci_dev, "virtio-pci");
> +	if (err)
> +		goto out_enable_device;
> +
> +	err = -EINVAL;
> +	vp_dev->common = map_capability(pci_dev, common,
> +					sizeof(struct virtio_pci_common_cfg));
> +	if (!vp_dev->common)
> +		goto out_req_regions;
> +	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8));
> +	if (!vp_dev->isr)
> +		goto out_map_common;
> +	vp_dev->notify = map_capability(pci_dev, notify, sizeof(u16));
> +	if (!vp_dev->notify)
> +		goto out_map_isr;
> +	vp_dev->device = map_capability(pci_dev, device, 0);
> +	if (!vp_dev->device)
> +		goto out_map_notify;
> +
> +	pci_set_drvdata(pci_dev, vp_dev);
> +	pci_set_master(pci_dev);
> +
> +	/* we use the subsystem vendor/device id as the virtio vendor/device
> +	 * id.  this allows us to use the same PCI vendor/device id for all
> +	 * virtio devices and to identify the particular virtio driver by
> +	 * the subsystem ids */
> +	vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
> +	vp_dev->vdev.id.device = pci_dev->subsystem_device;
> +
> +	/* finally register the virtio device */
> +	err = register_virtio_device(&vp_dev->vdev);
> +	if (err)
> +		goto out_set_drvdata;
> +
> +	return 0;
> +
> +out_set_drvdata:
> +	pci_set_drvdata(pci_dev, NULL);
> +	pci_iounmap(pci_dev, vp_dev->device);
> +out_map_notify:
> +	pci_iounmap(pci_dev, vp_dev->notify);
> +out_map_isr:
> +	pci_iounmap(pci_dev, vp_dev->isr);
> +out_map_common:
> +	pci_iounmap(pci_dev, vp_dev->common);
> +out_req_regions:
> +	pci_release_regions(pci_dev);
> +out_enable_device:
> +	pci_disable_device(pci_dev);
> +out:
> +	kfree(vp_dev);
> +	return err;
> +}
> +
> +static void virtio_pci_remove(struct pci_dev *pci_dev)
> +{
> +	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> +
> +	unregister_virtio_device(&vp_dev->vdev);
> +
> +	vp_del_vqs(&vp_dev->vdev);
> +	pci_set_drvdata(pci_dev, NULL);
> +	pci_iounmap(pci_dev, vp_dev->device);
> +	pci_iounmap(pci_dev, vp_dev->notify);
> +	pci_iounmap(pci_dev, vp_dev->isr);
> +	pci_iounmap(pci_dev, vp_dev->common);
> +	pci_release_regions(pci_dev);
> +	pci_disable_device(pci_dev);
> +	kfree(vp_dev);
> +}
> +
> +#ifdef CONFIG_PM
> +static int virtio_pci_freeze(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> +	struct virtio_driver *drv;
> +	int ret;
> +
> +	drv = container_of(vp_dev->vdev.dev.driver,
> +			   struct virtio_driver, driver);
> +
> +	ret = 0;
> +	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
> +	if (drv && drv->freeze)
> +		ret = drv->freeze(&vp_dev->vdev);
> +
> +	if (!ret)
> +		pci_disable_device(pci_dev);
> +	return ret;
> +}
> +
> +static int virtio_pci_restore(struct device *dev)
> +{
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> +	struct virtio_driver *drv;
> +	int ret;
> +
> +	drv = container_of(vp_dev->vdev.dev.driver,
> +			   struct virtio_driver, driver);
> +
> +	ret = pci_enable_device(pci_dev);
> +	if (ret)
> +		return ret;
> +
> +	pci_set_master(pci_dev);
> +	vp_finalize_features(&vp_dev->vdev);
> +
> +	if (drv && drv->restore)
> +		ret = drv->restore(&vp_dev->vdev);
> +
> +	/* Finally, tell the device we're all set */
> +	if (!ret)
> +		vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
> +
> +	return ret;
> +}
> +
> +static const struct dev_pm_ops virtio_pci_pm_ops = {
> +	SET_SYSTEM_SLEEP_PM_OPS(virtio_pci_freeze, virtio_pci_restore)
> +};
> +#endif
> +
> +static struct pci_driver virtio_pci_driver = {
> +	.name		= "virtio-pci",
> +	.id_table	= virtio_pci_id_table,
> +	.probe		= virtio_pci_probe,
> +	.remove		= virtio_pci_remove,
> +#ifdef CONFIG_PM
> + 	.driver.pm	= &virtio_pci_pm_ops,
> +#endif
> +};
> +
> +module_pci_driver(virtio_pci_driver);
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21  8:29 ` [PATCH 12/22] virtio_pci: allow duplicate capabilities Rusty Russell
@ 2013-03-21 10:28   ` Michael S. Tsirkin
  2013-03-21 14:26     ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 10:28 UTC (permalink / raw)
  To: Rusty Russell; +Cc: H. Peter Anvin, virtualization

On Thu, Mar 21, 2013 at 06:59:33PM +1030, Rusty Russell wrote:
> Another HPA suggestion: that the device be allowed to offer duplicate
> capabilities, particularly so it can offer a mem and an I/O bar and let
> the guest decide (Linux guest probably doesn't care?).
> 
> Cc: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

I think guests is exactly the wrong place to decide,
it really does not know whether it's running on a
hypervisor with fast IO or fast memory.
Also, as long as we have an IO BAR, we have problems allocating it.
So I think we don't need this, see my suggestion
about fixed IO addresses instead.


> ---
>  drivers/virtio/virtio_pci_legacy.c |    3 ++-
>  include/linux/virtio_pci.h         |   20 ++++++++++++++++----
>  2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 501fa79..c7aadcb 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -728,7 +728,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  	}
>  
>  	/* We leave modern virtio-pci for the modern driver. */
> -	cap = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG);
> +	cap = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
> +					 IORESOURCE_IO|IORESOURCE_MEM);
>  	if (cap) {
>  		if (force_nonlegacy)
>  			dev_info(&pci_dev->dev,
> diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
> index 2714160..6d2816b 100644
> --- a/include/linux/virtio_pci.h
> +++ b/include/linux/virtio_pci.h
> @@ -4,18 +4,30 @@
>  #define VIRTIO_PCI_NO_LEGACY
>  #include <uapi/linux/virtio_pci.h>
>  
> -/* Returns offset of the capability, or 0. */
> -static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type)
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
> +					     u32 ioresource_types)
>  {
>  	int pos;
>  
>  	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
>  	     pos > 0;
>  	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> -		u8 type;
> +		u8 type, bar;
>  		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
>  							 cfg_type), &type);
> -		if (type == cfg_type)
> +		if (type != cfg_type)
> +			continue;
> +		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +							 bar), &bar);
> +		if (pci_resource_flags(dev, bar) & ioresource_types)
>  			return pos;
>  	}
>  	return 0;
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-21 10:13   ` Michael S. Tsirkin
@ 2013-03-21 10:35     ` Michael S. Tsirkin
  2013-03-22  2:52     ` Rusty Russell
  1 sibling, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 10:35 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization

On Thu, Mar 21, 2013 at 12:13:00PM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 06:59:37PM +1030, Rusty Russell wrote:
> > (MST, is this what you were thinking?)
> 
> Almost.
> 
> Three points:
> 
> 1. this is still an offset in BAR so for KVM we are still forced to use
> an IO BAR.  I would like an option for hypervisor to simply say "Do IO
> to this fixed address for this VQ". Then virtio can avoid using IO BARs
> completely.
> 
> 2.  for a real virtio device, offset is only 16 bit, using a 32 bit
> offset in a memory BAR giving each VQ a separate 4K page would allow
> priveledge separation where e.g. RXVQ/TXVQ are passed through to
> hardware but CVQ is handled by the hypervisor.
> 
> 3. last thing - (1) applies to ISR reads as well.
> 
> So I had in mind a structure like:
> 
> 	struct vq_notify {
> 		u32 offset;
> 		u16 data;
> 		u16 flags;
> 	}
> 
> enum vq_notify_flags {
> 	VQ_NOTIFY_BAR0,
> 	VQ_NOTIFY_BAR1,
> 	VQ_NOTIFY_BAR2,
> 	VQ_NOTIFY_BAR3,
> 	VQ_NOTIFY_BAR4,
> 	VQ_NOTIFY_BAR5,
> 	VQ_NOTIFY_FIXED_IOPORT,
> }
> 
> And then to notify a vq we write a given data at given offset
> or into a given port for VQ_NOTIFY_FIXED_IOPORT.
> 
> Only point 1 is really important for me though, I can be
> flexible on the rest of it.

So the minimal change on top of this patch, would be adding a FIXED
option to BIR and reporting data and not just offset for queue_notify
(so it can include device info if we share same address between
devices).

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 10:28   ` Michael S. Tsirkin
@ 2013-03-21 14:26     ` H. Peter Anvin
  2013-03-21 14:43       ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 14:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 03:28 AM, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 06:59:33PM +1030, Rusty Russell wrote:
>> Another HPA suggestion: that the device be allowed to offer duplicate
>> capabilities, particularly so it can offer a mem and an I/O bar and let
>> the guest decide (Linux guest probably doesn't care?).
>>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> I think guests is exactly the wrong place to decide,
> it really does not know whether it's running on a
> hypervisor with fast IO or fast memory.
> Also, as long as we have an IO BAR, we have problems allocating it.
> So I think we don't need this, see my suggestion
> about fixed IO addresses instead.
> 

The reason to support this is that a guest written to only handle one or
the other doesn't prevent the hypervisor from offering the other to
guests.  We probably want to specify that if the guest doesn't care, it
should use the first one offered by the host.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 14:26     ` H. Peter Anvin
@ 2013-03-21 14:43       ` Michael S. Tsirkin
  2013-03-21 14:45         ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 14:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 07:26:18AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 03:28 AM, Michael S. Tsirkin wrote:
> > On Thu, Mar 21, 2013 at 06:59:33PM +1030, Rusty Russell wrote:
> >> Another HPA suggestion: that the device be allowed to offer duplicate
> >> capabilities, particularly so it can offer a mem and an I/O bar and let
> >> the guest decide (Linux guest probably doesn't care?).
> >>
> >> Cc: H. Peter Anvin <hpa@zytor.com>
> >> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > 
> > I think guests is exactly the wrong place to decide,
> > it really does not know whether it's running on a
> > hypervisor with fast IO or fast memory.
> > Also, as long as we have an IO BAR, we have problems allocating it.
> > So I think we don't need this, see my suggestion
> > about fixed IO addresses instead.
> > 
> 
> The reason to support this is that a guest written to only handle one or
> the other doesn't prevent the hypervisor from offering the other to
> guests.  We probably want to specify that if the guest doesn't care, it
> should use the first one offered by the host.
> 
> 	-hpa


What are the configurations where having many ways is helpful?
Any examples?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 14:43       ` Michael S. Tsirkin
@ 2013-03-21 14:45         ` H. Peter Anvin
  2013-03-21 15:19           ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 14:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 07:43 AM, Michael S. Tsirkin wrote:
>>
>> The reason to support this is that a guest written to only handle one or
>> the other doesn't prevent the hypervisor from offering the other to
>> guests.  We probably want to specify that if the guest doesn't care, it
>> should use the first one offered by the host.
>>
> 
> What are the configurations where having many ways is helpful?
> Any examples?
> 

In BIOS (e.g. SeaBIOS), using MMIO is very difficult, so your boot ROM
probably wants to use I/O space even if MMIO is available.

If you are preferring I/O space because it is faster (vmexit), then you
still want to allow MMIO if the resource is not available.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 14:45         ` H. Peter Anvin
@ 2013-03-21 15:19           ` Michael S. Tsirkin
  2013-03-21 15:26             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 15:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 07:45:57AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 07:43 AM, Michael S. Tsirkin wrote:
> >>
> >> The reason to support this is that a guest written to only handle one or
> >> the other doesn't prevent the hypervisor from offering the other to
> >> guests.  We probably want to specify that if the guest doesn't care, it
> >> should use the first one offered by the host.
> >>
> > 
> > What are the configurations where having many ways is helpful?
> > Any examples?
> > 
> 
> In BIOS (e.g. SeaBIOS), using MMIO is very difficult, so your boot ROM
> probably wants to use I/O space even if MMIO is available.

Is this a real concern?  Modern cards seem to supply PXE ROMs even
though they have no IO BARs.

> If you are preferring I/O space because it is faster (vmexit), then you
> still want to allow MMIO if the resource is not available.
> 
> 	-hpa

Problem is, BIOS and OS normally assume failure to allocate
any resources means card won't function and disable it.
So it does not seem to be worth it to have such a
device specific failover ability.

> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 15:19           ` Michael S. Tsirkin
@ 2013-03-21 15:26             ` H. Peter Anvin
  2013-03-21 15:58               ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 15:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 08:19 AM, Michael S. Tsirkin wrote:
>>
>> In BIOS (e.g. SeaBIOS), using MMIO is very difficult, so your boot ROM
>> probably wants to use I/O space even if MMIO is available.
> 
> Is this a real concern?  Modern cards seem to supply PXE ROMs even
> though they have no IO BARs.
> 

Most of them do really ugly hacks in hardware (like putting in a "back
door" in config space) to make that possible.

> 
> Problem is, BIOS and OS normally assume failure to allocate
> any resources means card won't function and disable it.
> So it does not seem to be worth it to have such a
> device specific failover ability.
> 

That is a violation of the PCIe spec; the PCIe spec specifically states
that failure to allocate an I/O BAR should still allow the device to
function.  So we shouldn't rule it out going forward.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 15:26             ` H. Peter Anvin
@ 2013-03-21 15:58               ` Michael S. Tsirkin
  2013-03-21 16:04                 ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 15:58 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 08:26:52AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 08:19 AM, Michael S. Tsirkin wrote:
> >>
> >> In BIOS (e.g. SeaBIOS), using MMIO is very difficult, so your boot ROM
> >> probably wants to use I/O space even if MMIO is available.
> > 
> > Is this a real concern?  Modern cards seem to supply PXE ROMs even
> > though they have no IO BARs.
> > 
> 
> Most of them do really ugly hacks in hardware (like putting in a "back
> door" in config space) to make that possible.

config space register that let us access
registers within BAR actually sounds pretty reasonable.
Way better than an I/O BAR.

> > 
> > Problem is, BIOS and OS normally assume failure to allocate
> > any resources means card won't function and disable it.
> > So it does not seem to be worth it to have such a
> > device specific failover ability.
> > 
> 
> That is a violation of the PCIe spec; the PCIe spec specifically states
> that failure to allocate an I/O BAR should still allow the device to
> function.

Where does it say this?

Also, if as you say BIOS is not prepared to handle MMIO,
and the device is needed for boot, where does this leave us?

>  So we shouldn't rule it out going forward.
> 
> 	-hpa

I'm not against this as such I just think we need some
other solution for BIOS.

> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 15:58               ` Michael S. Tsirkin
@ 2013-03-21 16:04                 ` H. Peter Anvin
  2013-03-21 16:11                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 08:58 AM, Michael S. Tsirkin wrote:
>>
>> Most of them do really ugly hacks in hardware (like putting in a "back
>> door" in config space) to make that possible.
> 
> config space register that let us access
> registers within BAR actually sounds pretty reasonable.
> Way better than an I/O BAR.
> 

It is really, really, nasty, not to mention slow.

>>>
>>> Problem is, BIOS and OS normally assume failure to allocate
>>> any resources means card won't function and disable it.
>>> So it does not seem to be worth it to have such a
>>> device specific failover ability.
>>>
>>
>> That is a violation of the PCIe spec; the PCIe spec specifically states
>> that failure to allocate an I/O BAR should still allow the device to
>> function.
> 
> Where does it say this?

In PCI Express 1.1 base, it is section 1.3.2.2, third bullet.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 16:04                 ` H. Peter Anvin
@ 2013-03-21 16:11                   ` Michael S. Tsirkin
  2013-03-21 16:15                     ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 16:11 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 09:04:48AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 08:58 AM, Michael S. Tsirkin wrote:
> >>
> >> Most of them do really ugly hacks in hardware (like putting in a "back
> >> door" in config space) to make that possible.
> > 
> > config space register that let us access
> > registers within BAR actually sounds pretty reasonable.
> > Way better than an I/O BAR.
> > 
> 
> It is really, really, nasty, not to mention slow.

Almost everything we do is through DMA, except a single write
to start transmit and a single read to clear interrupts. So all it means
is we do 2 io writes or reads per packet instead of 1.  Seems harmless
enough. A bit slower than native but should be good enough for
BIOS.  Needs no resources at all.  Why nasty? What's not to like?

> >>> Problem is, BIOS and OS normally assume failure to allocate
> >>> any resources means card won't function and disable it.
> >>> So it does not seem to be worth it to have such a
> >>> device specific failover ability.
> >>>
> >>
> >> That is a violation of the PCIe spec; the PCIe spec specifically states
> >> that failure to allocate an I/O BAR should still allow the device to
> >> function.
> > 
> > Where does it say this?
> 
> In PCI Express 1.1 base, it is section 1.3.2.2, third bullet.
> 
> 	-hpa

Thanks. Same place in latest 3.0:
	A PCI Express Endpoint must not depend on operating system allocation of
	I/O resources claimed through BAR(s).
	A PCI Express Endpoint must not generate I/O Requests.
of course this only applies to express :)

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 16:11                   ` Michael S. Tsirkin
@ 2013-03-21 16:15                     ` H. Peter Anvin
  2013-03-21 16:26                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 16:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 09:11 AM, Michael S. Tsirkin wrote:
>>
>> It is really, really, nasty, not to mention slow.
> 
> Almost everything we do is through DMA, except a single write
> to start transmit and a single read to clear interrupts. So all it means
> is we do 2 io writes or reads per packet instead of 1.  Seems harmless
> enough. A bit slower than native but should be good enough for
> BIOS.  Needs no resources at all.  Why nasty? What's not to like?
> 

Corner cases galore... including the statefulness and non-atomicity of
config space writes (MMCONFIG is obviously not an option here.)  It
requires a minimum of four operations to do it safely.

> 
> Thanks. Same place in latest 3.0:
> 	A PCI Express Endpoint must not depend on operating system allocation of
> 	I/O resources claimed through BAR(s).
> 	A PCI Express Endpoint must not generate I/O Requests.
> of course this only applies to express :)
> 

And it does... but it has implications for the OS resource manager that
if Linux violates, we need to fix it.  We should not fail a device in
generic code because an I/O BAR allocation fails.  The device driver may
opt to fail the allocation.

(Note that having an I/O BAR is not *generating* an I/O request.)

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 16:15                     ` H. Peter Anvin
@ 2013-03-21 16:26                       ` Michael S. Tsirkin
  2013-03-21 16:32                         ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 16:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 09:15:15AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 09:11 AM, Michael S. Tsirkin wrote:
> >>
> >> It is really, really, nasty, not to mention slow.
> > 
> > Almost everything we do is through DMA, except a single write
> > to start transmit and a single read to clear interrupts. So all it means
> > is we do 2 io writes or reads per packet instead of 1.  Seems harmless
> > enough. A bit slower than native but should be good enough for
> > BIOS.  Needs no resources at all.  Why nasty? What's not to like?
> > 
> 
> Corner cases galore... including the statefulness and non-atomicity of
> config space writes (MMCONFIG is obviously not an option here.)  It
> requires a minimum of four operations to do it safely.

Even 4 operations per kick is harmless for BIOS, if you recall that it
exits to host, that's pretty harmless.  Might be an issue for cards that
need you to put the whole packet in NIC memory, virtio rings make this a
non issue - we might not be able to hit 10Gb/s but should be able to do
1Gb/s easily.

Now I am really thinking we need such a config cycle backdoor
for the BIOS.

> > 
> > Thanks. Same place in latest 3.0:
> > 	A PCI Express Endpoint must not depend on operating system allocation of
> > 	I/O resources claimed through BAR(s).
> > 	A PCI Express Endpoint must not generate I/O Requests.
> > of course this only applies to express :)
> > 
> 
> And it does... but it has implications for the OS resource manager that
> if Linux violates, we need to fix it.  We should not fail a device in
> generic code because an I/O BAR allocation fails.  The device driver may
> opt to fail the allocation.
> 
> (Note that having an I/O BAR is not *generating* an I/O request.)
> 
> 	-hpa

Right. So if I read this literally, I should be able to boot
from the device even if it does not have an I/O BAR,
and BIOS really should not assume it has an I/O BAR option,
and if as you suggest it can't use MMIO, what is left?
config cycles.


So coming back to the issue that started it all,
BIOS will be able to boot without I/O BAR, no good
reasons to have any capabilities pointing at I/O BARs,
so no need for duplicate capabilities?

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 16:26                       ` Michael S. Tsirkin
@ 2013-03-21 16:32                         ` H. Peter Anvin
  2013-03-21 17:07                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 16:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 09:26 AM, Michael S. Tsirkin wrote:
>>>
>>> Thanks. Same place in latest 3.0:
>>> 	A PCI Express Endpoint must not depend on operating system allocation of
>>> 	I/O resources claimed through BAR(s).
>>> 	A PCI Express Endpoint must not generate I/O Requests.
>>> of course this only applies to express :)
>>>
>>
>> And it does... but it has implications for the OS resource manager that
>> if Linux violates, we need to fix it.  We should not fail a device in
>> generic code because an I/O BAR allocation fails.  The device driver may
>> opt to fail the allocation.
>>
>> (Note that having an I/O BAR is not *generating* an I/O request.)
> 
> Right. So if I read this literally, I should be able to boot
> from the device even if it does not have an I/O BAR,
> and BIOS really should not assume it has an I/O BAR option,
> and if as you suggest it can't use MMIO, what is left?
> config cycles.
> 
> So coming back to the issue that started it all,
> BIOS will be able to boot without I/O BAR, no good
> reasons to have any capabilities pointing at I/O BARs,
> so no need for duplicate capabilities?
> 

First of all, you appear to be deliberately overinterpreting -- the BIOS
is the resource manager here, so it can obviously make sure the I/O
resource is available to the boot device.

The performance argument, though, which is the more important one, still
remains, so your conclusion is invalid.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 16:32                         ` H. Peter Anvin
@ 2013-03-21 17:07                           ` Michael S. Tsirkin
  2013-03-21 17:09                             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 17:07 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 09:32:06AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 09:26 AM, Michael S. Tsirkin wrote:
> >>>
> >>> Thanks. Same place in latest 3.0:
> >>> 	A PCI Express Endpoint must not depend on operating system allocation of
> >>> 	I/O resources claimed through BAR(s).
> >>> 	A PCI Express Endpoint must not generate I/O Requests.
> >>> of course this only applies to express :)
> >>>
> >>
> >> And it does... but it has implications for the OS resource manager that
> >> if Linux violates, we need to fix it.  We should not fail a device in
> >> generic code because an I/O BAR allocation fails.  The device driver may
> >> opt to fail the allocation.
> >>
> >> (Note that having an I/O BAR is not *generating* an I/O request.)
> > 
> > Right. So if I read this literally, I should be able to boot
> > from the device even if it does not have an I/O BAR,
> > and BIOS really should not assume it has an I/O BAR option,
> > and if as you suggest it can't use MMIO, what is left?
> > config cycles.
> > 
> > So coming back to the issue that started it all,
> > BIOS will be able to boot without I/O BAR, no good
> > reasons to have any capabilities pointing at I/O BARs,
> > so no need for duplicate capabilities?
> > 
> 
> First of all, you appear to be deliberately overinterpreting -- the BIOS
> is the resource manager here, so it can obviously make sure the I/O
> resource is available to the boot device.

Assuming there's one, but it's wrong: we might need serial for
output, -net for downloading stuff, maybe more.

> The performance argument, though, which is the more important one, still
> remains, so your conclusion is invalid.
> 
> 	-hpa

I just think it does not apply to BIOS so much.  A bigger issue for BIOS
virtio performance is that it normally does not implement batching at
all, to keep it simple and reduce memory usage it uses a small number
(often 1) of outstanding buffers per queue.

For example, here's BIOS driver for virtio-blk.


    if (write)
        vring_add_buf(vq, sg, 2, 1, 0, 0);
    else
        vring_add_buf(vq, sg, 1, 2, 0, 0);
    vring_kick(GET_GLOBAL(vdrive_g->ioaddr), vq, 1);

    /* Wait for reply */
    while (!vring_more_used(vq))
        usleep(5);

    /* Reclaim virtqueue element */
    vring_get_buf(vq, NULL);

    /* Clear interrupt status register.  Avoid leaving interrupts stuck if
     * VRING_AVAIL_F_NO_INTERRUPT was ignored and interrupts were
     * raised.
     */
    vp_get_isr(GET_GLOBAL(vdrive_g->ioaddr));

Does it look like we should spend time optimizing vring_kick?

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:07                           ` Michael S. Tsirkin
@ 2013-03-21 17:09                             ` H. Peter Anvin
  2013-03-21 17:13                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 10:07 AM, Michael S. Tsirkin wrote:
> 
> I just think it does not apply to BIOS so much.  A bigger issue for BIOS
> virtio performance is that it normally does not implement batching at
> all, to keep it simple and reduce memory usage it uses a small number
> (often 1) of outstanding buffers per queue.
> 

You asked for examples, so I gave one.  There is no doubt that BIOS can
make use of this if available.

What you describe above is of course a typical problem with BIOS.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:09                             ` H. Peter Anvin
@ 2013-03-21 17:13                               ` Michael S. Tsirkin
  2013-03-21 17:49                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 17:13 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 10:09:02AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 10:07 AM, Michael S. Tsirkin wrote:
> > 
> > I just think it does not apply to BIOS so much.  A bigger issue for BIOS
> > virtio performance is that it normally does not implement batching at
> > all, to keep it simple and reduce memory usage it uses a small number
> > (often 1) of outstanding buffers per queue.
> > 
> 
> You asked for examples, so I gave one.  There is no doubt that BIOS can
> make use of this if available.
> 
> What you describe above is of course a typical problem with BIOS.
> 
> 	-hpa

Right, thanks.
Also thanks for the spec quote, I really think we should add the config
cycle access as a guaranteed feature.  It's easy and I expect all BIOS
drivers to use it.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:13                               ` Michael S. Tsirkin
@ 2013-03-21 17:49                                 ` Michael S. Tsirkin
  2013-03-21 17:54                                   ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 17:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 07:13:19PM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 10:09:02AM -0700, H. Peter Anvin wrote:
> > On 03/21/2013 10:07 AM, Michael S. Tsirkin wrote:
> > > 
> > > I just think it does not apply to BIOS so much.  A bigger issue for BIOS
> > > virtio performance is that it normally does not implement batching at
> > > all, to keep it simple and reduce memory usage it uses a small number
> > > (often 1) of outstanding buffers per queue.
> > > 
> > 
> > You asked for examples, so I gave one.  There is no doubt that BIOS can
> > make use of this if available.
> > 
> > What you describe above is of course a typical problem with BIOS.
> > 
> > 	-hpa
> 
> Right, thanks.
> Also thanks for the spec quote, I really think we should add the config
> cycle access as a guaranteed feature.  It's easy and I expect all BIOS
> drivers to use it.

Just to clarify, I expect BIOS to use it *for config access*.
Notification will support IO anyway because it's
faster on KVM, so BIOS can use it directly.

> -- 
> MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:49                                 ` Michael S. Tsirkin
@ 2013-03-21 17:54                                   ` H. Peter Anvin
  2013-03-21 18:01                                     ` Michael S. Tsirkin
  2013-03-22  0:57                                     ` Rusty Russell
  0 siblings, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-21 17:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/21/2013 10:49 AM, Michael S. Tsirkin wrote:
> 
> Just to clarify, I expect BIOS to use it *for config access*.
> Notification will support IO anyway because it's
> faster on KVM, so BIOS can use it directly.
> 

Ah, yes, of course.

Quite frankly, I don't see any reason to support *anything else* for
configuration, does anyone else?

I thought we were talking about the doorbell/notification/kicker
register.  For I/O space especially it is highly desirable if that can
be in a minimal BAR (4 bytes).

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:54                                   ` H. Peter Anvin
@ 2013-03-21 18:01                                     ` Michael S. Tsirkin
  2013-03-22  0:57                                     ` Rusty Russell
  1 sibling, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 18:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Thu, Mar 21, 2013 at 10:54:31AM -0700, H. Peter Anvin wrote:
> On 03/21/2013 10:49 AM, Michael S. Tsirkin wrote:
> > 
> > Just to clarify, I expect BIOS to use it *for config access*.
> > Notification will support IO anyway because it's
> > faster on KVM, so BIOS can use it directly.
> > 
> 
> Ah, yes, of course.
> 
> Quite frankly, I don't see any reason to support *anything else* for
> configuration, does anyone else?
> 
> I thought we were talking about the doorbell/notification/kicker
> register.  For I/O space especially it is highly desirable if that can
> be in a minimal BAR (4 bytes).
> 
> 	-hpa


Not necessarily a BAR. We have trouble allocating IO BARs.  What I want
to allow is hypervisor which allocates a fixed address/data pair
in the hypervisor, reserves it in ACPI and reports to driver through a
vendor specific capability.

You likely didn't see this discussion, it was only sent to kvm
and virtualization lists, the subject is
'virtio PCI on KVM without IO BARs'


-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-21  9:09   ` Cornelia Huck
@ 2013-03-22  0:31     ` Rusty Russell
  2013-03-22  9:13       ` Cornelia Huck
  0 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  0:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> On Thu, 21 Mar 2013 18:59:24 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
...
>> diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
>> index 6711e65..dcf35b1 100644
>> --- a/drivers/s390/kvm/kvm_virtio.c
>> +++ b/drivers/s390/kvm/kvm_virtio.c
>> @@ -112,26 +112,82 @@ static void kvm_finalize_features(struct virtio_device *vdev)
>>  }
>> 
>>  /*
>> - * Reading and writing elements in config space
>> + * Reading and writing elements in config space.  Host and guest are always
>> + * big-endian, so no conversion necessary.
>>   */
>> -static void kvm_get(struct virtio_device *vdev, unsigned int offset,
>> -		   void *buf, unsigned len)
>> +static u8 kvm_get8(struct virtio_device *vdev, unsigned int offset)
>>  {
>> -	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
>> +	struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
>                ^^^^^^^^^^^^^^^^^^
>  
> This looks weird?

Nice understatement.  I guess you know where I cut & pasted from...

Here is the updated version.

Thanks,
Rusty.

diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 6711e65..9b8c527 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -112,24 +112,79 @@ static void kvm_finalize_features(struct virtio_device *vdev)
 }
 
 /*
- * Reading and writing elements in config space
+ * Reading and writing elements in config space.  Host and guest are always
+ * big-endian, so no conversion necessary.
  */
-static void kvm_get(struct virtio_device *vdev, unsigned int offset,
-		   void *buf, unsigned len)
+static u8 kvm_get8(struct virtio_device *vdev, unsigned int offset)
 {
 	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
 
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(buf, kvm_vq_configspace(desc) + offset, len);
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u8) > desc->config_len);
+	return *(u8 *)(kvm_vq_configspace(desc) + offset);
 }
 
-static void kvm_set(struct virtio_device *vdev, unsigned int offset,
-		   const void *buf, unsigned len)
+static void kvm_set8(struct virtio_device *vdev, unsigned int offset, u8 val)
 {
 	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
 
-	BUG_ON(offset + len > desc->config_len);
-	memcpy(kvm_vq_configspace(desc) + offset, buf, len);
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u8 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u16 kvm_get16(struct virtio_device *vdev, unsigned int offset)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u16) > desc->config_len);
+	return *(u16 *)(kvm_vq_configspace(desc) + offset);
+}
+
+static void kvm_set16(struct virtio_device *vdev, unsigned int offset, u16 val)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u16 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u32 kvm_get32(struct virtio_device *vdev, unsigned int offset)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u32) > desc->config_len);
+	return *(u32 *)(kvm_vq_configspace(desc) + offset);
+}
+
+static void kvm_set32(struct virtio_device *vdev, unsigned int offset, u32 val)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u32 *)(kvm_vq_configspace(desc) + offset) = val;
+}
+
+static u64 kvm_get64(struct virtio_device *vdev, unsigned int offset)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(u64) > desc->config_len);
+	return *(u64 *)(kvm_vq_configspace(desc) + offset);
+}
+
+static void kvm_set64(struct virtio_device *vdev, unsigned int offset, u64 val)
+{
+	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
+
+	/* Check they didn't ask for more than the length of the config! */
+	BUG_ON(offset + sizeof(val) > desc->config_len);
+	*(u64 *)(kvm_vq_configspace(desc) + offset) = val;
 }
 
 /*
@@ -278,8 +333,14 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
 static const struct virtio_config_ops kvm_vq_configspace_ops = {
 	.get_features = kvm_get_features,
 	.finalize_features = kvm_finalize_features,
-	.get = kvm_get,
-	.set = kvm_set,
+	.get8 = kvm_get8,
+	.set8 = kvm_set8,
+	.get16 = kvm_get16,
+	.set16 = kvm_set16,
+	.get32 = kvm_get32,
+	.set32 = kvm_set32,
+	.get64 = kvm_get64,
+	.set64 = kvm_set64,
 	.get_status = kvm_get_status,
 	.set_status = kvm_set_status,
 	.reset = kvm_reset,

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features
  2013-03-21 10:00   ` Cornelia Huck
@ 2013-03-22  0:48     ` Rusty Russell
  0 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  0:48 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> On Thu, 21 Mar 2013 18:59:25 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
>
>> It seemed like a good idea, but it's actually a pain when we get more
>> than 32 feature bits.  Just change it to a u32 for now.
>> 
...
> I didn't try this patch, but wouldn't virtio_ccw need something like
> the change below as well?

Ah, yes, when I refreshed this pre-ccw patch, I didn't re-do the grep.

And thanks, this made me look at lguest, too.  Which is actually fine,
but now has a misleading comment.

Here'st the net change on top of this patch (I've rolled into the
patch in my git tree).

Thanks,
Rusty.

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 48bd2ad..6203562 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -137,9 +137,9 @@ static void lg_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	/*
-	 * The vdev->feature array is a Linux bitmask: this isn't the same as a
-	 * the simple array of bits used by lguest devices for features.  So we
-	 * do this slow, manual conversion which is completely general.
+	 * Since lguest is currently x86-only, we're little-endian.  That
+	 * means we could just memcpy.  But it's not time critical, and in
+	 * case someone copies this code, we do it the slow, obvious way.
 	 */
 	memset(out_features, 0, desc->feature_len);
 	bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 3652473..8c564bf 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -440,7 +440,6 @@ static void virtio_ccw_finalize_features(struct virtio_device *vdev)
 {
 	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
 	struct virtio_feature_desc *features;
-	int i;
 	struct ccw1 *ccw;
 
 	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
@@ -454,19 +453,15 @@ static void virtio_ccw_finalize_features(struct virtio_device *vdev)
 	/* Give virtio_ring a chance to accept features. */
 	vring_transport_features(vdev);
 
-	for (i = 0; i < sizeof(*vdev->features) / sizeof(features->features);
-	     i++) {
-		int highbits = i % 2 ? 32 : 0;
-		features->index = i;
-		features->features = cpu_to_le32(vdev->features[i / 2]
-						 >> highbits);
-		/* Write the feature bits to the host. */
-		ccw->cmd_code = CCW_CMD_WRITE_FEAT;
-		ccw->flags = 0;
-		ccw->count = sizeof(*features);
-		ccw->cda = (__u32)(unsigned long)features;
-		ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_FEAT);
-	}
+	features->index = 0;
+	features->features = cpu_to_le32(vdev->features);
+	/* Write the feature bits to the host. */
+	ccw->cmd_code = CCW_CMD_WRITE_FEAT;
+	ccw->flags = 0;
+	ccw->count = sizeof(*features);
+	ccw->cda = (__u32)(unsigned long)features;
+	ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_FEAT);
+
 out_free:
 	kfree(features);
 	kfree(ccw);

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-21 10:06   ` Cornelia Huck
@ 2013-03-22  0:50     ` Rusty Russell
  2013-03-22  9:15       ` Cornelia Huck
  2013-03-22 14:50     ` Sjur Brændeland
  1 sibling, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  0:50 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> On Thu, 21 Mar 2013 18:59:26 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
>
>> Change the u32 to a u64, and make sure to use 1ULL everywhere!
>
> And a not-even-compiled change for virtio_ccw as well:

Thanks, applied that too.

BTW, this will all be in my virtio-pci-new-layout branch on
git.kernel.org once I've processed all this feedback.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-21 17:54                                   ` H. Peter Anvin
  2013-03-21 18:01                                     ` Michael S. Tsirkin
@ 2013-03-22  0:57                                     ` Rusty Russell
  2013-03-22  3:17                                       ` H. Peter Anvin
  2013-03-24 13:14                                       ` Michael S. Tsirkin
  1 sibling, 2 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  0:57 UTC (permalink / raw)
  To: H. Peter Anvin, Michael S. Tsirkin; +Cc: virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:
> On 03/21/2013 10:49 AM, Michael S. Tsirkin wrote:
>> 
>> Just to clarify, I expect BIOS to use it *for config access*.
>> Notification will support IO anyway because it's
>> faster on KVM, so BIOS can use it directly.
>> 
>
> Ah, yes, of course.
>
> Quite frankly, I don't see any reason to support *anything else* for
> configuration, does anyone else?
>
> I thought we were talking about the doorbell/notification/kicker
> register.  For I/O space especially it is highly desirable if that can
> be in a minimal BAR (4 bytes).

The takeaway from this seems to be that:
1) For the notification, device can supply both.
2) Since only device will know which is faster, driver should use the
   first one it finds which it supports.

My patch implied that the ISR, device config and notification could be
either.  I think only the notification makes sense, as noted here.

Have I got this right?
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 13/22] virtio_pci: new, capability-aware driver.
  2013-03-21 10:24   ` Michael S. Tsirkin
@ 2013-03-22  1:02     ` Rusty Russell
  2013-03-24 13:08       ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  1:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Mar 21, 2013 at 06:59:34PM +1030, Rusty Russell wrote:
>> Differences:
>> 1) Uses 4 pci capabilities to demark common, irq, notify and dev-specific areas.
>> 2) Guest sets queue size, using host-provided maximum.
>> 3) Guest sets queue alignment, rather than ABI-defined 4096.
>> 4) More than 32 feature bits (a lot more!).
...
>> +/* Constants for MSI-X */
>> +/* Use first vector for configuration changes, second and the rest for
>> + * virtqueues Thus, we need at least 2 vectors for MSI. */
>> +enum {
>> +	VP_MSIX_CONFIG_VECTOR = 0,
>> +	VP_MSIX_VQ_VECTOR = 1,
>> +};
>
> In the future, I have a plan to allow one vector only.  To make this
> work without exits for data path VQ, we could make hypervisor set a bit
> in guest memory whenever it wants to signal a configuration change.
> Guest will execute a config write that will make the hypervisor clear
> this register.
>
> I guess this can wait, we are putting too stuff much into this
> new layout patchset already.

Yeah, trying not to boil the ocean... and I'm not sure that reinventing
MSI-X manually is a good idea anyway.

>> +static void vp_reset(struct virtio_device *vdev)
>> +{
>> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +	/* 0 status means a reset. */
>> +	iowrite8(0, &vp_dev->common->device_status);
>> +	/* Flush out the status write, and flush in device writes,
>> +	 * including MSi-X interrupts, if any. */
>
> MSI-X ?

Thanks, fixed.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-21 10:13   ` Michael S. Tsirkin
  2013-03-21 10:35     ` Michael S. Tsirkin
@ 2013-03-22  2:52     ` Rusty Russell
  2013-03-24 14:38       ` Michael S. Tsirkin
  2013-03-24 20:19       ` Michael S. Tsirkin
  1 sibling, 2 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-22  2:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: hpa, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Mar 21, 2013 at 06:59:37PM +1030, Rusty Russell wrote:
>> (MST, is this what you were thinking?)
>
> Almost.
>
> Three points:
>
> 1. this is still an offset in BAR so for KVM we are still forced to use
> an IO BAR. 

Right, because memory bar accesses are slow as per your 'Subject: virtio
PCI on KVM without IO BARs' post.

> I would like an option for hypervisor to simply say "Do IO
> to this fixed address for this VQ". Then virtio can avoid using IO BARs
> completely.

It could be done.  AFAICT, this would be an x86-ism, though, which is a
little nasty.

> 2.  for a real virtio device, offset is only 16 bit, using a 32 bit
> offset in a memory BAR giving each VQ a separate 4K page would allow
> priveledge separation where e.g. RXVQ/TXVQ are passed through to
> hardware but CVQ is handled by the hypervisor.

Hmm, u16 fits nicely :)  Unless you need priv separation between different
vqs, you could have control vq at 0, and start rx/tx from 4096.

(Actually, since the notification base is already an offset into a bar,
you could arrange that at 4094, so control is at 0, vqs start at 1).

> 3. last thing - (1) applies to ISR reads as well.

I've been assuming that optimizing ISR access was pointless with MSI-X,
so keeping that simple.

> So the minimal change on top of this patch, would be adding a FIXED
> option to BIR and reporting data and not just offset for queue_notify
> (so it can include device info if we share same address between
> devices).

The first is easy, since we have a 'u8 bar': 255 could mean FIXED.

I wonder why you only want a u16 for data, when a u32 would be more
flexible?  If we have to enlarge things anyway...

How's this?

Cheers,
Rusty.

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 23b90cb..9a59138 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -123,6 +123,9 @@
 /* Device specific confiuration */
 #define VIRTIO_PCI_CAP_DEVICE_CFG	4
 
+/* Not really a bar: this means notification via outl */
+#define VIRTIO_PCI_BAR_FIXED_IO		255
+
 /* This is the PCI capability header: */
 struct virtio_pci_cap {
 	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
@@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
 	__le16 queue_size;	/* read-write, power of 2. */
 	__le16 queue_msix_vector;/* read-write */
 	__le16 queue_enable;	/* read-write */
-	__le16 queue_notify;	/* read-only */
+	__le16 unused2;
+	__le32 queue_notify_val;/* read-only */
+	__le32 queue_notify_off;/* read-only */
 	__le64 queue_desc;	/* read-write */
 	__le64 queue_avail;	/* read-write */
 	__le64 queue_used;	/* read-write */

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-22  0:57                                     ` Rusty Russell
@ 2013-03-22  3:17                                       ` H. Peter Anvin
  2013-03-24 13:14                                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-22  3:17 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin; +Cc: virtualization

Can we just leave the configuration in config space, or is that undesirable?

Rusty Russell <rusty@rustcorp.com.au> wrote:

>"H. Peter Anvin" <hpa@zytor.com> writes:
>> On 03/21/2013 10:49 AM, Michael S. Tsirkin wrote:
>>> 
>>> Just to clarify, I expect BIOS to use it *for config access*.
>>> Notification will support IO anyway because it's
>>> faster on KVM, so BIOS can use it directly.
>>> 
>>
>> Ah, yes, of course.
>>
>> Quite frankly, I don't see any reason to support *anything else* for
>> configuration, does anyone else?
>>
>> I thought we were talking about the doorbell/notification/kicker
>> register.  For I/O space especially it is highly desirable if that
>can
>> be in a minimal BAR (4 bytes).
>
>The takeaway from this seems to be that:
>1) For the notification, device can supply both.
>2) Since only device will know which is faster, driver should use the
>   first one it finds which it supports.
>
>My patch implied that the ISR, device config and notification could be
>either.  I think only the notification makes sense, as noted here.
>
>Have I got this right?
>Rusty.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-22  0:31     ` Rusty Russell
@ 2013-03-22  9:13       ` Cornelia Huck
  0 siblings, 0 replies; 94+ messages in thread
From: Cornelia Huck @ 2013-03-22  9:13 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

On Fri, 22 Mar 2013 11:01:20 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:


> Nice understatement.  I guess you know where I cut & pasted from...
> 
> Here is the updated version.

Looks sane to me.

> 
> Thanks,
> Rusty.
> 
> diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
> index 6711e65..9b8c527 100644
> --- a/drivers/s390/kvm/kvm_virtio.c
> +++ b/drivers/s390/kvm/kvm_virtio.c
> @@ -112,24 +112,79 @@ static void kvm_finalize_features(struct virtio_device *vdev)
>  }
> 
>  /*
> - * Reading and writing elements in config space
> + * Reading and writing elements in config space.  Host and guest are always
> + * big-endian, so no conversion necessary.
>   */
> -static void kvm_get(struct virtio_device *vdev, unsigned int offset,
> -		   void *buf, unsigned len)
> +static u8 kvm_get8(struct virtio_device *vdev, unsigned int offset)
>  {
>  	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> 
> -	BUG_ON(offset + len > desc->config_len);
> -	memcpy(buf, kvm_vq_configspace(desc) + offset, len);
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u8) > desc->config_len);
> +	return *(u8 *)(kvm_vq_configspace(desc) + offset);
>  }
> 
> -static void kvm_set(struct virtio_device *vdev, unsigned int offset,
> -		   const void *buf, unsigned len)
> +static void kvm_set8(struct virtio_device *vdev, unsigned int offset, u8 val)
>  {
>  	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> 
> -	BUG_ON(offset + len > desc->config_len);
> -	memcpy(kvm_vq_configspace(desc) + offset, buf, len);
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u8 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u16 kvm_get16(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u16) > desc->config_len);
> +	return *(u16 *)(kvm_vq_configspace(desc) + offset);
> +}
> +
> +static void kvm_set16(struct virtio_device *vdev, unsigned int offset, u16 val)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u16 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u32 kvm_get32(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u32) > desc->config_len);
> +	return *(u32 *)(kvm_vq_configspace(desc) + offset);
> +}
> +
> +static void kvm_set32(struct virtio_device *vdev, unsigned int offset, u32 val)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u32 *)(kvm_vq_configspace(desc) + offset) = val;
> +}
> +
> +static u64 kvm_get64(struct virtio_device *vdev, unsigned int offset)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(u64) > desc->config_len);
> +	return *(u64 *)(kvm_vq_configspace(desc) + offset);
> +}
> +
> +static void kvm_set64(struct virtio_device *vdev, unsigned int offset, u64 val)
> +{
> +	struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
> +
> +	/* Check they didn't ask for more than the length of the config! */
> +	BUG_ON(offset + sizeof(val) > desc->config_len);
> +	*(u64 *)(kvm_vq_configspace(desc) + offset) = val;
>  }
> 
>  /*
> @@ -278,8 +333,14 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
>  static const struct virtio_config_ops kvm_vq_configspace_ops = {
>  	.get_features = kvm_get_features,
>  	.finalize_features = kvm_finalize_features,
> -	.get = kvm_get,
> -	.set = kvm_set,
> +	.get8 = kvm_get8,
> +	.set8 = kvm_set8,
> +	.get16 = kvm_get16,
> +	.set16 = kvm_set16,
> +	.get32 = kvm_get32,
> +	.set32 = kvm_set32,
> +	.get64 = kvm_get64,
> +	.set64 = kvm_set64,
>  	.get_status = kvm_get_status,
>  	.set_status = kvm_set_status,
>  	.reset = kvm_reset,
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-22  0:50     ` Rusty Russell
@ 2013-03-22  9:15       ` Cornelia Huck
  0 siblings, 0 replies; 94+ messages in thread
From: Cornelia Huck @ 2013-03-22  9:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Brian Swetland, Christian Borntraeger, Pawel Moll, virtualization

On Fri, 22 Mar 2013 11:20:05 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> > On Thu, 21 Mar 2013 18:59:26 +1030
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> >
> >> Change the u32 to a u64, and make sure to use 1ULL everywhere!
> >
> > And a not-even-compiled change for virtio_ccw as well:
> 
> Thanks, applied that too.
> 
> BTW, this will all be in my virtio-pci-new-layout branch on
> git.kernel.org once I've processed all this feedback.

I'll give it a run then once your branch is ready.

> 
> Thanks,
> Rusty.
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
  2013-03-21  9:09   ` Cornelia Huck
@ 2013-03-22 14:43   ` Sjur Brændeland
  2013-03-24  4:24     ` Rusty Russell
  2013-04-02 17:16   ` Pawel Moll
  2 siblings, 1 reply; 94+ messages in thread
From: Sjur Brændeland @ 2013-03-22 14:43 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

On Thu, Mar 21, 2013 at 9:29 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> All transports just pass through at the moment.
>
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Brian Swetland <swetland@google.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Pawel Moll <pawel.moll@arm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/lguest/lguest_device.c |   79 ++++++++++++++++++++++++++++++++++------
>  drivers/net/caif/caif_virtio.c |    2 +-
>  drivers/s390/kvm/kvm_virtio.c  |   78 +++++++++++++++++++++++++++++++++------
>  drivers/s390/kvm/virtio_ccw.c  |   39 +++++++++++++++++++-
>  drivers/virtio/virtio_mmio.c   |   35 +++++++++++++++++-
>  drivers/virtio/virtio_pci.c    |   39 +++++++++++++++++---
>  include/linux/virtio_config.h  |   70 +++++++++++++++++++++--------------
>  7 files changed, 283 insertions(+), 59 deletions(-)
>

> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -127,7 +127,7 @@ static void vp_finalize_features(struct virtio_device *vdev)
>         iowrite32(vdev->features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
>  }
>
> -/* virtio config->get() implementation */
> +/* Device config access: we use guest endian, as per spec. */
>  static void vp_get(struct virtio_device *vdev, unsigned offset,
>                    void *buf, unsigned len)
>  {
> @@ -141,8 +141,19 @@ static void vp_get(struct virtio_device *vdev, unsigned offset,
>                 ptr[i] = ioread8(ioaddr + i);
>  }
>
> -/* the config->set() implementation.  it's symmetric to the config->get()
> - * implementation */
> +#define VP_GETx(bits)                                                  \
> +static u##bits vp_get##bits(struct virtio_device *vdev, unsigned int offset) \
> +{                                                                      \
> +       u##bits v;                                                      \
> +       vp_get(vdev, offset, &v, sizeof(v));                            \
> +       return v;                                                       \
> +}
> +
> +VP_GETx(8)
> +VP_GETx(16)
> +VP_GETx(32)
> +VP_GETx(64)
> +
>  static void vp_set(struct virtio_device *vdev, unsigned offset,
>                    const void *buf, unsigned len)
>  {
> @@ -156,6 +167,18 @@ static void vp_set(struct virtio_device *vdev, unsigned offset,
>                 iowrite8(ptr[i], ioaddr + i);
>  }
>
> +#define VP_SETx(bits)                                                  \
> +static void vp_set##bits(struct virtio_device *vdev, unsigned int offset, \
> +                        u##bits v)                                     \
> +{                                                                      \
> +       vp_set(vdev, offset, &v, sizeof(v));                            \
> +}
> +
> +VP_SETx(8)
> +VP_SETx(16)
> +VP_SETx(32)
> +VP_SETx(64)
> +
>  /* config->{get,set}_status() implementations */
>  static u8 vp_get_status(struct virtio_device *vdev)
>  {
> @@ -653,8 +676,14 @@ static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
>  }
>
>  static const struct virtio_config_ops virtio_pci_config_ops = {
> -       .get            = vp_get,
> -       .set            = vp_set,
> +       .get8           = vp_get8,
> +       .set8           = vp_set8,
> +       .get16          = vp_get16,
> +       .set16          = vp_set16,
> +       .get32          = vp_get32,
> +       .set32          = vp_set32,
> +       .get64          = vp_get64,
> +       .set64          = vp_set64,
>         .get_status     = vp_get_status,
>         .set_status     = vp_set_status,
>         .reset          = vp_reset,

Would it be possible to make this simpler and less verbose somehow?
At least three virtio devices: virtio_pci_legacy.c, virtio_mmio.c and
soon remoteproc_virtio.c will duplicate variants of the code above.

What if set8/get8 was mandatory, and the 16,32,64 variants where optional,
and then virtio_creadX() virtio_cwriteX did the magic to make things work?

Regards,
Sjur

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-21 10:06   ` Cornelia Huck
  2013-03-22  0:50     ` Rusty Russell
@ 2013-03-22 14:50     ` Sjur Brændeland
  2013-03-22 20:12       ` Ohad Ben-Cohen
  1 sibling, 1 reply; 94+ messages in thread
From: Sjur Brændeland @ 2013-03-22 14:50 UTC (permalink / raw)
  To: Rusty Russell, Ohad Ben-Cohen
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

On Thu, Mar 21, 2013 at 11:06 AM, Cornelia Huck
<cornelia.huck@de.ibm.com> wrote:
> On Thu, 21 Mar 2013 18:59:26 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
>
>> Change the u32 to a u64, and make sure to use 1ULL everywhere!
>>
>> Cc: Ohad Ben-Cohen <ohad@wizery.com>
>> Cc: Brian Swetland <swetland@google.com>
>> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
>> Cc: Pawel Moll <pawel.moll@arm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>> ---
>>  drivers/char/virtio_console.c          |    2 +-
>>  drivers/lguest/lguest_device.c         |   10 +++++-----
>>  drivers/remoteproc/remoteproc_virtio.c |    6 +++++-
>>  drivers/s390/kvm/kvm_virtio.c          |   10 +++++-----
>>  drivers/virtio/virtio.c                |   12 ++++++------
>>  drivers/virtio/virtio_mmio.c           |   14 +++++++++-----
>>  drivers/virtio/virtio_pci.c            |    5 ++---
>>  drivers/virtio/virtio_ring.c           |    2 +-
>>  include/linux/virtio.h                 |    2 +-
>>  include/linux/virtio_config.h          |    8 ++++----
>>  tools/virtio/linux/virtio.h            |    2 +-
>>  tools/virtio/linux/virtio_config.h     |    2 +-
>>  12 files changed, 41 insertions(+), 34 deletions(-)

I guess you would need to update the feature bits in remoteproc as well?
e.g. something like:

diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index faf3332..148a503 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -296,8 +296,8 @@ struct fw_rsc_vdev_vring {
 struct fw_rsc_vdev {
        u32 id;
        u32 notifyid;
-       u32 dfeatures;
-       u32 gfeatures;
+       u64 dfeatures;
+       u64 gfeatures;
        u32 config_len;
        u8 status;
        u8 num_of_vrings;
@@ -470,8 +470,8 @@ struct rproc_vdev {
        struct rproc *rproc;
        struct virtio_device vdev;
        struct rproc_vring vring[RVDEV_NUM_VRINGS];
-       unsigned long dfeatures;
-       unsigned long gfeatures;
+       u64 dfeatures;
+       u64 gfeatures;
 };

Thanks,
Sjur

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-22 14:50     ` Sjur Brændeland
@ 2013-03-22 20:12       ` Ohad Ben-Cohen
  2013-03-25  8:30         ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: Ohad Ben-Cohen @ 2013-03-22 20:12 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

On Fri, Mar 22, 2013 at 4:50 PM, Sjur Brændeland <sjurbren@gmail.com> wrote:
> I guess you would need to update the feature bits in remoteproc as well?
> e.g. something like:
>
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index faf3332..148a503 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -296,8 +296,8 @@ struct fw_rsc_vdev_vring {
>  struct fw_rsc_vdev {
>         u32 id;
>         u32 notifyid;
> -       u32 dfeatures;
> -       u32 gfeatures;
> +       u64 dfeatures;
> +       u64 gfeatures;
>         u32 config_len;
>         u8 status;
>         u8 num_of_vrings;

We will break existing firmware if we do that.

Initially we thought it's a good idea to announce that remoteproc's
binary interface isn't stable so we could keep changing it, but at
this point changing the binary interface means pain for too many
people.

I'm thinking that at this stage any changes to the binary interface
will have to bump up the binary version so we can still support older
images, despite our "unstable" policy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-22 14:43   ` Sjur Brændeland
@ 2013-03-24  4:24     ` Rusty Russell
  2013-04-03 15:58       ` Sjur Brændeland
  0 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-24  4:24 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

Sjur Brændeland <sjurbren@gmail.com> writes:
> On Thu, Mar 21, 2013 at 9:29 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> @@ -653,8 +676,14 @@ static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
>>  }
>>
>>  static const struct virtio_config_ops virtio_pci_config_ops = {
>> -       .get            = vp_get,
>> -       .set            = vp_set,
>> +       .get8           = vp_get8,
>> +       .set8           = vp_set8,
>> +       .get16          = vp_get16,
>> +       .set16          = vp_set16,
>> +       .get32          = vp_get32,
>> +       .set32          = vp_set32,
>> +       .get64          = vp_get64,
>> +       .set64          = vp_set64,
>>         .get_status     = vp_get_status,
>>         .set_status     = vp_set_status,
>>         .reset          = vp_reset,
>
> Would it be possible to make this simpler and less verbose somehow?
> At least three virtio devices: virtio_pci_legacy.c, virtio_mmio.c and
> soon remoteproc_virtio.c will duplicate variants of the code above.
>
> What if set8/get8 was mandatory, and the 16,32,64 variants where optional,
> and then virtio_creadX() virtio_cwriteX did the magic to make things work?

But this is a case where simpler and less verbose are opposites.  These
patches are really straightforward.  Noone can forget to write or assign
an accessor and have still work on x86 and fail for BE machines.

So I like explicit accessors, set by the backend.  But it doesn't have
to be quite this ugly.

How's this (completely untested!)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index e342692..6ffa542 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -219,6 +219,75 @@ out:
 }
 EXPORT_SYMBOL_GPL(register_virtio_device);
 
+static void noconv_get(struct virtio_device *vdev, unsigned offset,
+		       size_t len, void *p)
+{
+	u8 *buf = p;
+	while (len) {
+		*buf = vdev->config->get8(vdev, offset);
+		buf++;
+		offset++;
+		len--;
+	}
+}
+
+u16 virtio_config_get_noconv16(struct virtio_device *vdev, unsigned offset)
+{
+	u16 v;
+	noconv_get(vdev, offset, sizeof(v), &v);
+	return v;
+}
+EXPORT_SYMBOL_GPL(virtio_config_get_noconv16);
+
+u32 virtio_config_get_noconv32(struct virtio_device *vdev, unsigned offset)
+{
+	u32 v;
+	noconv_get(vdev, offset, sizeof(v), &v);
+	return v;
+}
+EXPORT_SYMBOL_GPL(virtio_config_get_noconv32);
+
+u64 virtio_config_get_noconv64(struct virtio_device *vdev, unsigned offset)
+{
+	u64 v;
+	noconv_get(vdev, offset, sizeof(v), &v);
+	return v;
+}
+EXPORT_SYMBOL_GPL(virtio_config_get_noconv64);
+
+static void noconv_set(struct virtio_device *vdev, unsigned offset,
+		       size_t len, const void *p)
+{
+	const u8 *buf = p;
+	while (len) {
+		vdev->config->set8(vdev, offset, *buf);
+		buf++;
+		offset++;
+		len--;
+	}
+}
+
+void virtio_config_set_noconv16(struct virtio_device *vdev,
+				unsigned offset, u16 v)
+{
+	noconv_set(vdev, offset, sizeof(v), &v);
+}
+EXPORT_SYMBOL_GPL(virtio_config_set_noconv16);
+
+void virtio_config_set_noconv32(struct virtio_device *vdev,
+				unsigned offset, u32 v)
+{
+	noconv_set(vdev, offset, sizeof(v), &v);
+}
+EXPORT_SYMBOL_GPL(virtio_config_set_noconv32);
+
+void virtio_config_set_noconv64(struct virtio_device *vdev,
+				unsigned offset, u64 v)
+{
+	noconv_set(vdev, offset, sizeof(v), &v);
+}
+EXPORT_SYMBOL_GPL(virtio_config_set_noconv64);
+
 void unregister_virtio_device(struct virtio_device *dev)
 {
 	int index = dev->index; /* save for after device release */
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 9dfe116..9a9aaa0 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -286,4 +286,23 @@ static inline void virtio_cwrite64(struct virtio_device *vdev,
 		_r;							\
 	})
 
+/* Helpers for non-endian converting transports. */
+u16 virtio_config_get_noconv16(struct virtio_device *vdev, unsigned offset);
+u32 virtio_config_get_noconv32(struct virtio_device *vdev, unsigned offset);
+u64 virtio_config_get_noconv64(struct virtio_device *vdev, unsigned offset);
+void virtio_config_set_noconv16(struct virtio_device *vdev,
+				unsigned offset, u16 v);
+void virtio_config_set_noconv32(struct virtio_device *vdev,
+				unsigned offset, u32 v);
+void virtio_config_set_noconv64(struct virtio_device *vdev,
+				unsigned offset, u64 v);
+
+#define VIRTIO_CONFIG_OPS_NOCONV		\
+	.get16 = virtio_config_get_noconv16,	\
+	.set16 = virtio_config_set_noconv16,	\
+	.get32 = virtio_config_get_noconv32,	\
+	.set32 = virtio_config_set_noconv32,	\
+	.get64 = virtio_config_get_noconv64,	\
+	.set64 = virtio_config_set_noconv64
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 13/22] virtio_pci: new, capability-aware driver.
  2013-03-22  1:02     ` Rusty Russell
@ 2013-03-24 13:08       ` Michael S. Tsirkin
  0 siblings, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-24 13:08 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization

On Fri, Mar 22, 2013 at 11:32:33AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Thu, Mar 21, 2013 at 06:59:34PM +1030, Rusty Russell wrote:
> >> Differences:
> >> 1) Uses 4 pci capabilities to demark common, irq, notify and dev-specific areas.
> >> 2) Guest sets queue size, using host-provided maximum.
> >> 3) Guest sets queue alignment, rather than ABI-defined 4096.
> >> 4) More than 32 feature bits (a lot more!).
> ...
> >> +/* Constants for MSI-X */
> >> +/* Use first vector for configuration changes, second and the rest for
> >> + * virtqueues Thus, we need at least 2 vectors for MSI. */
> >> +enum {
> >> +	VP_MSIX_CONFIG_VECTOR = 0,
> >> +	VP_MSIX_VQ_VECTOR = 1,
> >> +};
> >
> > In the future, I have a plan to allow one vector only.  To make this
> > work without exits for data path VQ, we could make hypervisor set a bit
> > in guest memory whenever it wants to signal a configuration change.
> > Guest will execute a config write that will make the hypervisor clear
> > this register.
> >
> > I guess this can wait, we are putting too stuff much into this
> > new layout patchset already.
> 
> Yeah, trying not to boil the ocean... and I'm not sure that reinventing
> MSI-X manually is a good idea anyway.

I'm not sure this is reinventing MSI-X but I agree let's do things
gradually.

> >> +static void vp_reset(struct virtio_device *vdev)
> >> +{
> >> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >> +	/* 0 status means a reset. */
> >> +	iowrite8(0, &vp_dev->common->device_status);
> >> +	/* Flush out the status write, and flush in device writes,
> >> +	 * including MSi-X interrupts, if any. */
> >
> > MSI-X ?
> 
> Thanks, fixed.
> 
> Cheers,
> Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-22  0:57                                     ` Rusty Russell
  2013-03-22  3:17                                       ` H. Peter Anvin
@ 2013-03-24 13:14                                       ` Michael S. Tsirkin
  2013-03-24 23:23                                         ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-24 13:14 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, H. Peter Anvin

On Fri, Mar 22, 2013 at 11:27:59AM +1030, Rusty Russell wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> > On 03/21/2013 10:49 AM, Michael S. Tsirkin wrote:
> >> 
> >> Just to clarify, I expect BIOS to use it *for config access*.
> >> Notification will support IO anyway because it's
> >> faster on KVM, so BIOS can use it directly.
> >> 
> >
> > Ah, yes, of course.
> >
> > Quite frankly, I don't see any reason to support *anything else* for
> > configuration, does anyone else?
> >
> > I thought we were talking about the doorbell/notification/kicker
> > register.  For I/O space especially it is highly desirable if that can
> > be in a minimal BAR (4 bytes).
> 
> The takeaway from this seems to be that:
> 1) For the notification, device can supply both.
> 2) Since only device will know which is faster, driver should use the
>    first one it finds which it supports.
> 
> My patch implied that the ISR, device config and notification could be
> either.  I think only the notification makes sense, as noted here.
> 
> Have I got this right?
> Rusty.

Peter is also saying we need a way to
do configuration/ISR without memory accesses.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-22  2:52     ` Rusty Russell
@ 2013-03-24 14:38       ` Michael S. Tsirkin
  2013-03-24 20:19       ` Michael S. Tsirkin
  1 sibling, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-24 14:38 UTC (permalink / raw)
  To: Rusty Russell; +Cc: hpa, virtualization

On Fri, Mar 22, 2013 at 01:22:57PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Thu, Mar 21, 2013 at 06:59:37PM +1030, Rusty Russell wrote:
> >> (MST, is this what you were thinking?)
> >
> > Almost.
> >
> > Three points:
> >
> > 1. this is still an offset in BAR so for KVM we are still forced to use
> > an IO BAR. 
> 
> Right, because memory bar accesses are slow as per your 'Subject: virtio
> PCI on KVM without IO BARs' post.
> 
> > I would like an option for hypervisor to simply say "Do IO
> > to this fixed address for this VQ". Then virtio can avoid using IO BARs
> > completely.
> 
> It could be done.  AFAICT, this would be an x86-ism, though, which is a
> little nasty.
> 
> > 2.  for a real virtio device, offset is only 16 bit, using a 32 bit
> > offset in a memory BAR giving each VQ a separate 4K page would allow
> > priveledge separation where e.g. RXVQ/TXVQ are passed through to
> > hardware but CVQ is handled by the hypervisor.
> 
> Hmm, u16 fits nicely :)  Unless you need priv separation between different
> vqs, you could have control vq at 0, and start rx/tx from 4096.
> 
> (Actually, since the notification base is already an offset into a bar,
> you could arrange that at 4094, so control is at 0, vqs start at 1).
> 
> > 3. last thing - (1) applies to ISR reads as well.
> 
> I've been assuming that optimizing ISR access was pointless with MSI-X,
> so keeping that simple.
> 
> > So the minimal change on top of this patch, would be adding a FIXED
> > option to BIR and reporting data and not just offset for queue_notify
> > (so it can include device info if we share same address between
> > devices).
> 
> The first is easy, since we have a 'u8 bar': 255 could mean FIXED.
> 
> I wonder why you only want a u16 for data, when a u32 would be more
> flexible?  If we have to enlarge things anyway...
> 
> How's this?
> 
> Cheers,
> Rusty.
> 
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index 23b90cb..9a59138 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -123,6 +123,9 @@
>  /* Device specific confiuration */
>  #define VIRTIO_PCI_CAP_DEVICE_CFG	4
>  
> +/* Not really a bar: this means notification via outl */
> +#define VIRTIO_PCI_BAR_FIXED_IO		255
> +
>  /* This is the PCI capability header: */
>  struct virtio_pci_cap {
>  	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
>  	__le16 queue_size;	/* read-write, power of 2. */
>  	__le16 queue_msix_vector;/* read-write */
>  	__le16 queue_enable;	/* read-write */
> -	__le16 queue_notify;	/* read-only */
> +	__le16 unused2;
> +	__le32 queue_notify_val;/* read-only */
> +	__le32 queue_notify_off;/* read-only */
>  	__le64 queue_desc;	/* read-write */
>  	__le64 queue_avail;	/* read-write */
>  	__le64 queue_used;	/* read-write */


HPA has convinced me that it's not worth worrying about: let's use IO
for notification if available, and setups with > 16 devices where
it's not available can use memory which is slower but works.

Maybe reconsider hypercalls down the line.

So let's not overengineer and drop this patch for now until we have
some real users.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-22  2:52     ` Rusty Russell
  2013-03-24 14:38       ` Michael S. Tsirkin
@ 2013-03-24 20:19       ` Michael S. Tsirkin
  2013-03-24 23:27         ` H. Peter Anvin
  2013-03-25 10:00         ` Rusty Russell
  1 sibling, 2 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-24 20:19 UTC (permalink / raw)
  To: Rusty Russell; +Cc: hpa, virtualization

On Fri, Mar 22, 2013 at 01:22:57PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Thu, Mar 21, 2013 at 06:59:37PM +1030, Rusty Russell wrote:
> >> (MST, is this what you were thinking?)
> >
> > Almost.
> >
> > Three points:
> >
> > 1. this is still an offset in BAR so for KVM we are still forced to use
> > an IO BAR. 
> 
> Right, because memory bar accesses are slow as per your 'Subject: virtio
> PCI on KVM without IO BARs' post.
> 
> > I would like an option for hypervisor to simply say "Do IO
> > to this fixed address for this VQ". Then virtio can avoid using IO BARs
> > completely.
> 
> It could be done.  AFAICT, this would be an x86-ism, though, which is a
> little nasty.

Okay, talked to HPA and he suggests a useful extension of my
or rather Gleb's earlier idea
(which was accessing mmio from special asm code which puts the value in
a known predefined register):
if we make each queue use a different address, then we avoid
the need to emulate the instruction (because we get GPA in the VMCS),
and the value can just be ignored.

There's still some overhead (CPU simply seems to take a bit more
time to handle an EPT violation than an IO access)
and we need to actually add such code in kvm in host kernel,
but it sure looks nice since unlike my idea it does not
need anything special in the guest, and it will just work
for a physical virtio device if such ever surfaces.

> 
> > 2.  for a real virtio device, offset is only 16 bit, using a 32 bit
> > offset in a memory BAR giving each VQ a separate 4K page would allow
> > priveledge separation where e.g. RXVQ/TXVQ are passed through to
> > hardware but CVQ is handled by the hypervisor.
> 
> Hmm, u16 fits nicely :)  Unless you need priv separation between different
> vqs, you could have control vq at 0, and start rx/tx from 4096.
> 
> (Actually, since the notification base is already an offset into a bar,
> you could arrange that at 4094, so control is at 0, vqs start at 1).
> 
> > 3. last thing - (1) applies to ISR reads as well.
> 
> I've been assuming that optimizing ISR access was pointless with MSI-X,
> so keeping that simple.
> 
> > So the minimal change on top of this patch, would be adding a FIXED
> > option to BIR and reporting data and not just offset for queue_notify
> > (so it can include device info if we share same address between
> > devices).
> 
> The first is easy, since we have a 'u8 bar': 255 could mean FIXED.
> 
> I wonder why you only want a u16 for data, when a u32 would be more
> flexible?  If we have to enlarge things anyway...
> 
> How's this?
> 
> Cheers,
> Rusty.
> 
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index 23b90cb..9a59138 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -123,6 +123,9 @@
>  /* Device specific confiuration */
>  #define VIRTIO_PCI_CAP_DEVICE_CFG	4
>  
> +/* Not really a bar: this means notification via outl */
> +#define VIRTIO_PCI_BAR_FIXED_IO		255
> +
>  /* This is the PCI capability header: */
>  struct virtio_pci_cap {
>  	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
>  	__le16 queue_size;	/* read-write, power of 2. */
>  	__le16 queue_msix_vector;/* read-write */
>  	__le16 queue_enable;	/* read-write */
> -	__le16 queue_notify;	/* read-only */
> +	__le16 unused2;
> +	__le32 queue_notify_val;/* read-only */
> +	__le32 queue_notify_off;/* read-only */
>  	__le64 queue_desc;	/* read-write */
>  	__le64 queue_avail;	/* read-write */
>  	__le64 queue_used;	/* read-write */

So how exactly do the offsets mesh with the dual capability?  For IO we
want to use the same address and get queue from the data, for memory we
want a per queue address ...

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-24 13:14                                       ` Michael S. Tsirkin
@ 2013-03-24 23:23                                         ` H. Peter Anvin
  2013-03-25  6:53                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-24 23:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/24/2013 06:14 AM, Michael S. Tsirkin wrote:
> 
> Peter is also saying we need a way to
> do configuration/ISR without memory accesses.
> 

I'm not 100% sure what you mean with ISR here (mostly because virtio
only has a limited fraction of my attention right now.)

For configuration, are we dealing with too much data to just put it in
configuration space?

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-24 20:19       ` Michael S. Tsirkin
@ 2013-03-24 23:27         ` H. Peter Anvin
  2013-03-25  7:05           ` Michael S. Tsirkin
  2013-03-25 10:00         ` Rusty Russell
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-24 23:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/24/2013 01:19 PM, Michael S. Tsirkin wrote:
>>  struct virtio_pci_cap {
>>  	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
>> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
>>  	__le16 queue_size;	/* read-write, power of 2. */
>>  	__le16 queue_msix_vector;/* read-write */
>>  	__le16 queue_enable;	/* read-write */
>> -	__le16 queue_notify;	/* read-only */
>> +	__le16 unused2;
>> +	__le32 queue_notify_val;/* read-only */
>> +	__le32 queue_notify_off;/* read-only */
>>  	__le64 queue_desc;	/* read-write */
>>  	__le64 queue_avail;	/* read-write */
>>  	__le64 queue_used;	/* read-write */
> 
> So how exactly do the offsets mesh with the dual capability?  For IO we
> want to use the same address and get queue from the data, for memory we
> want a per queue address ...
> 

How about having a readonly field which is "address increment per trigger"?

The guest would be required to always write the queue number as the
data, however, the host would not be required to interpret it if the
address increment is nonzero?

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-24 23:23                                         ` H. Peter Anvin
@ 2013-03-25  6:53                                           ` Michael S. Tsirkin
  2013-03-25  6:54                                             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-25  6:53 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Sun, Mar 24, 2013 at 04:23:57PM -0700, H. Peter Anvin wrote:
> On 03/24/2013 06:14 AM, Michael S. Tsirkin wrote:
> > 
> > Peter is also saying we need a way to
> > do configuration/ISR without memory accesses.
> > 
> 
> I'm not 100% sure what you mean with ISR here (mostly because virtio
> only has a limited fraction of my attention right now.)

It's a virtio register used to clear interrupts for when INTA# is used.

> For configuration, are we dealing with too much data to just put it in
> configuration space?
> 
> 	-hpa

All of it? Yes, too big: for virtio-blk it's about 64 bytes already.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-25  6:53                                           ` Michael S. Tsirkin
@ 2013-03-25  6:54                                             ` H. Peter Anvin
  2013-03-25 10:03                                               ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-25  6:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

That might be pushing it, fitting into the 192-byte nonstandard area with everything else that might have to go there...

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>On Sun, Mar 24, 2013 at 04:23:57PM -0700, H. Peter Anvin wrote:
>> On 03/24/2013 06:14 AM, Michael S. Tsirkin wrote:
>> > 
>> > Peter is also saying we need a way to
>> > do configuration/ISR without memory accesses.
>> > 
>> 
>> I'm not 100% sure what you mean with ISR here (mostly because virtio
>> only has a limited fraction of my attention right now.)
>
>It's a virtio register used to clear interrupts for when INTA# is used.
>
>> For configuration, are we dealing with too much data to just put it
>in
>> configuration space?
>> 
>> 	-hpa
>
>All of it? Yes, too big: for virtio-blk it's about 64 bytes already.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-24 23:27         ` H. Peter Anvin
@ 2013-03-25  7:05           ` Michael S. Tsirkin
  0 siblings, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-25  7:05 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Sun, Mar 24, 2013 at 04:27:15PM -0700, H. Peter Anvin wrote:
> On 03/24/2013 01:19 PM, Michael S. Tsirkin wrote:
> >>  struct virtio_pci_cap {
> >>  	u8 cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
> >> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
> >>  	__le16 queue_size;	/* read-write, power of 2. */
> >>  	__le16 queue_msix_vector;/* read-write */
> >>  	__le16 queue_enable;	/* read-write */
> >> -	__le16 queue_notify;	/* read-only */
> >> +	__le16 unused2;
> >> +	__le32 queue_notify_val;/* read-only */
> >> +	__le32 queue_notify_off;/* read-only */
> >>  	__le64 queue_desc;	/* read-write */
> >>  	__le64 queue_avail;	/* read-write */
> >>  	__le64 queue_used;	/* read-write */
> > 
> > So how exactly do the offsets mesh with the dual capability?  For IO we
> > want to use the same address and get queue from the data, for memory we
> > want a per queue address ...
> > 
> 
> How about having a readonly field which is "address increment per trigger"?
> 
> The guest would be required to always write the queue number as the
> data, however, the host would not be required to interpret it if the
> address increment is nonzero?
> 
> 	-hpa

Not sure what increment means here. The interface that Rusty proposes
reports queue offset for each queue. My question was that we probably
can't affort offsets for IO since the address space is so restricted.
Maybe rename memory_offset/memory_val and have it only apply to memory?


-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-22 20:12       ` Ohad Ben-Cohen
@ 2013-03-25  8:30         ` Rusty Russell
  0 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-25  8:30 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Sjur Brændeland
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

Ohad Ben-Cohen <ohad@wizery.com> writes:
> On Fri, Mar 22, 2013 at 4:50 PM, Sjur Brændeland <sjurbren@gmail.com> wrote:
>> I guess you would need to update the feature bits in remoteproc as well?
>> e.g. something like:
>>
>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>> index faf3332..148a503 100644
>> --- a/include/linux/remoteproc.h
>> +++ b/include/linux/remoteproc.h
>> @@ -296,8 +296,8 @@ struct fw_rsc_vdev_vring {
>>  struct fw_rsc_vdev {
>>         u32 id;
>>         u32 notifyid;
>> -       u32 dfeatures;
>> -       u32 gfeatures;
>> +       u64 dfeatures;
>> +       u64 gfeatures;
>>         u32 config_len;
>>         u8 status;
>>         u8 num_of_vrings;
>
> We will break existing firmware if we do that.
>
> Initially we thought it's a good idea to announce that remoteproc's
> binary interface isn't stable so we could keep changing it, but at
> this point changing the binary interface means pain for too many
> people.
>
> I'm thinking that at this stage any changes to the binary interface
> will have to bump up the binary version so we can still support older
> images, despite our "unstable" policy.

Yeah, that's why I left it alone.

Obviously, existing firmware won't set features >= 32 anyway, but you
may need to come up with a method for future extensions.

Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-24 20:19       ` Michael S. Tsirkin
  2013-03-24 23:27         ` H. Peter Anvin
@ 2013-03-25 10:00         ` Rusty Russell
  2013-03-26 19:39           ` Michael S. Tsirkin
  1 sibling, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-25 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: hpa, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Fri, Mar 22, 2013 at 01:22:57PM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > I would like an option for hypervisor to simply say "Do IO
>> > to this fixed address for this VQ". Then virtio can avoid using IO BARs
>> > completely.
>> 
>> It could be done.  AFAICT, this would be an x86-ism, though, which is a
>> little nasty.
>
> Okay, talked to HPA and he suggests a useful extension of my
> or rather Gleb's earlier idea
> (which was accessing mmio from special asm code which puts the value in
> a known predefined register):
> if we make each queue use a different address, then we avoid
> the need to emulate the instruction (because we get GPA in the VMCS),
> and the value can just be ignored.

I had the same thought, but obviously lost it when I re-parsed your
message.

> There's still some overhead (CPU simply seems to take a bit more
> time to handle an EPT violation than an IO access)
> and we need to actually add such code in kvm in host kernel,
> but it sure looks nice since unlike my idea it does not
> need anything special in the guest, and it will just work
> for a physical virtio device if such ever surfaces.

I think a physical virtio device would be a bit weird, but it's a nice
sanity check.

But if we do this, let's drop back to the simpler layout suggested in
the original patch (a u16 offset, and you write the vq index there).

>> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
>>  	__le16 queue_size;	/* read-write, power of 2. */
>>  	__le16 queue_msix_vector;/* read-write */
>>  	__le16 queue_enable;	/* read-write */
>> -	__le16 queue_notify;	/* read-only */
>> +	__le16 unused2;
>> +	__le32 queue_notify_val;/* read-only */
>> +	__le32 queue_notify_off;/* read-only */
>>  	__le64 queue_desc;	/* read-write */
>>  	__le64 queue_avail;	/* read-write */
>>  	__le64 queue_used;	/* read-write */
>
> So how exactly do the offsets mesh with the dual capability?  For IO we
> want to use the same address and get queue from the data, for memory we
> want a per queue address ...

Let's go back a level.  Do we still need I/O bars at all now?  Or can we
say "if you want hundreds of vqs, use mem bars"?

hpa wanted the option to have either, but do we still want that?

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/22] virtio_pci: allow duplicate capabilities.
  2013-03-25  6:54                                             ` H. Peter Anvin
@ 2013-03-25 10:03                                               ` Rusty Russell
  0 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-25 10:03 UTC (permalink / raw)
  To: H. Peter Anvin, Michael S. Tsirkin; +Cc: virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:
> That might be pushing it, fitting into the 192-byte nonstandard area with everything else that might have to go there...

Yeah, and it only ever grows, since we add new fields at the end.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-25 10:00         ` Rusty Russell
@ 2013-03-26 19:39           ` Michael S. Tsirkin
  2013-03-27  0:07             ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-26 19:39 UTC (permalink / raw)
  To: Rusty Russell; +Cc: hpa, virtualization

On Mon, Mar 25, 2013 at 08:30:28PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Mar 22, 2013 at 01:22:57PM +1030, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> > I would like an option for hypervisor to simply say "Do IO
> >> > to this fixed address for this VQ". Then virtio can avoid using IO BARs
> >> > completely.
> >> 
> >> It could be done.  AFAICT, this would be an x86-ism, though, which is a
> >> little nasty.
> >
> > Okay, talked to HPA and he suggests a useful extension of my
> > or rather Gleb's earlier idea
> > (which was accessing mmio from special asm code which puts the value in
> > a known predefined register):
> > if we make each queue use a different address, then we avoid
> > the need to emulate the instruction (because we get GPA in the VMCS),
> > and the value can just be ignored.
> 
> I had the same thought, but obviously lost it when I re-parsed your
> message.

I will try to implement this in KVM, and benchmark. Then we'll see.

> > There's still some overhead (CPU simply seems to take a bit more
> > time to handle an EPT violation than an IO access)
> > and we need to actually add such code in kvm in host kernel,
> > but it sure looks nice since unlike my idea it does not
> > need anything special in the guest, and it will just work
> > for a physical virtio device if such ever surfaces.
> 
> I think a physical virtio device would be a bit weird, but it's a nice
> sanity check.
> 
> But if we do this, let's drop back to the simpler layout suggested in
> the original patch (a u16 offset, and you write the vq index there).
> >> @@ -150,7 +153,9 @@ struct virtio_pci_common_cfg {
> >>  	__le16 queue_size;	/* read-write, power of 2. */
> >>  	__le16 queue_msix_vector;/* read-write */
> >>  	__le16 queue_enable;	/* read-write */
> >> -	__le16 queue_notify;	/* read-only */
> >> +	__le16 unused2;
> >> +	__le32 queue_notify_val;/* read-only */
> >> +	__le32 queue_notify_off;/* read-only */
> >>  	__le64 queue_desc;	/* read-write */
> >>  	__le64 queue_avail;	/* read-write */
> >>  	__le64 queue_used;	/* read-write */
> >
> > So how exactly do the offsets mesh with the dual capability?  For IO we
> > want to use the same address and get queue from the data, for memory we
> > want a per queue address ...
> 
> Let's go back a level.  Do we still need I/O bars at all now?  Or can we
> say "if you want hundreds of vqs, use mem bars"?
> 
> hpa wanted the option to have either, but do we still want that?
> 
> Cheers,
> Rusty.

hpa says having both is required for BIOS, not just for speed with KVM.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-26 19:39           ` Michael S. Tsirkin
@ 2013-03-27  0:07             ` Rusty Russell
  2013-03-27  0:22               ` H. Peter Anvin
  2013-03-27 11:25               ` Michael S. Tsirkin
  0 siblings, 2 replies; 94+ messages in thread
From: Rusty Russell @ 2013-03-27  0:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: hpa, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Mon, Mar 25, 2013 at 08:30:28PM +1030, Rusty Russell wrote:
>> Let's go back a level.  Do we still need I/O bars at all now?  Or can we
>> say "if you want hundreds of vqs, use mem bars"?
>> 
>> hpa wanted the option to have either, but do we still want that?
>
> hpa says having both is required for BIOS, not just for speed with KVM.

OK so the offset must not be applied to the I/O bar as you suggested.

Since AFAICT I/O bars are deprecated, should we insist that there be a
memory bar, and the I/O bar is optional?  Or just leave it entirely
undefined, and say there can be either or both?

I dislike the idea of BIOS code which assumed an I/O bar and thus won't
work with a compliant device which doesn't provide one.  I'd prefer all
compliant drivers to work with all compliant devices.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27  0:07             ` Rusty Russell
@ 2013-03-27  0:22               ` H. Peter Anvin
  2013-03-27  2:31                 ` H. Peter Anvin
  2013-03-27 11:25               ` Michael S. Tsirkin
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-27  0:22 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin; +Cc: virtualization

I would say let it be undefined... in most cases the host will know what device(s) will matter; e.g. if the guest is ppc no point in providing an I/O BAR.

Rusty Russell <rusty@rustcorp.com.au> wrote:

>"Michael S. Tsirkin" <mst@redhat.com> writes:
>> On Mon, Mar 25, 2013 at 08:30:28PM +1030, Rusty Russell wrote:
>>> Let's go back a level.  Do we still need I/O bars at all now?  Or
>can we
>>> say "if you want hundreds of vqs, use mem bars"?
>>> 
>>> hpa wanted the option to have either, but do we still want that?
>>
>> hpa says having both is required for BIOS, not just for speed with
>KVM.
>
>OK so the offset must not be applied to the I/O bar as you suggested.
>
>Since AFAICT I/O bars are deprecated, should we insist that there be a
>memory bar, and the I/O bar is optional?  Or just leave it entirely
>undefined, and say there can be either or both?
>
>I dislike the idea of BIOS code which assumed an I/O bar and thus won't
>work with a compliant device which doesn't provide one.  I'd prefer all
>compliant drivers to work with all compliant devices.
>
>Cheers,
>Rusty.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27  0:22               ` H. Peter Anvin
@ 2013-03-27  2:31                 ` H. Peter Anvin
  2013-03-27 11:26                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-27  2:31 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin; +Cc: virtualization

On 03/26/2013 05:22 PM, H. Peter Anvin wrote:
> I would say let it be undefined... in most cases the host will know what device(s) will matter; e.g. if the guest is ppc no point in providing an I/O BAR.

For pluggable physical devices, though, both should be provided.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27  0:07             ` Rusty Russell
  2013-03-27  0:22               ` H. Peter Anvin
@ 2013-03-27 11:25               ` Michael S. Tsirkin
  2013-03-28  4:50                 ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-27 11:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: hpa, virtualization

On Wed, Mar 27, 2013 at 10:37:20AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Mon, Mar 25, 2013 at 08:30:28PM +1030, Rusty Russell wrote:
> >> Let's go back a level.  Do we still need I/O bars at all now?  Or can we
> >> say "if you want hundreds of vqs, use mem bars"?
> >> 
> >> hpa wanted the option to have either, but do we still want that?
> >
> > hpa says having both is required for BIOS, not just for speed with KVM.
> 
> OK so the offset must not be applied to the I/O bar as you suggested.

Aha. Yes, good idea.  As for how large the offsets are,
I am guessing we should either just say offset is vqn * X and data is
vqn, or give hypervisors full flexibility with 32 bit offset and
arbitrary data.
16 bit offsets seem neither here nor there ...
Not a strong preference.

> Since AFAICT I/O bars are deprecated, should we insist that there be a
> memory bar, and the I/O bar is optional?  Or just leave it entirely
> undefined, and say there can be either or both?

I would make the memory bar required and the I/O bar optional.
Again not a strong preference.

> I dislike the idea of BIOS code which assumed an I/O bar and thus won't
> work with a compliant device which doesn't provide one.  I'd prefer all
> compliant drivers to work with all compliant devices.
> 
> Cheers,
> Rusty.

In any case, the only thing we would want in the IO BAR is the
notification.  So we should add a way to control device
configuration through PCI configuration. An offset/data pair
will do the trick.

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27  2:31                 ` H. Peter Anvin
@ 2013-03-27 11:26                   ` Michael S. Tsirkin
  2013-03-27 14:21                     ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-03-27 11:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Tue, Mar 26, 2013 at 07:31:31PM -0700, H. Peter Anvin wrote:
> On 03/26/2013 05:22 PM, H. Peter Anvin wrote:
> > I would say let it be undefined... in most cases the host will know what device(s) will matter; e.g. if the guest is ppc no point in providing an I/O BAR.
> 
> For pluggable physical devices, though, both should be provided.
> 
> 	-hpa

Yes but what Rusty asked, is whether we should make the memory BAR
optional.

> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27 11:26                   ` Michael S. Tsirkin
@ 2013-03-27 14:21                     ` H. Peter Anvin
  0 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-27 14:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

We probably shouldn't given the requirement for PCIe devices to be able to operate without configured I/O BARs.

On the other hand what about a host which can only virtualize I/O for some reason... not sure if that is a realistic scenario.

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>On Tue, Mar 26, 2013 at 07:31:31PM -0700, H. Peter Anvin wrote:
>> On 03/26/2013 05:22 PM, H. Peter Anvin wrote:
>> > I would say let it be undefined... in most cases the host will know
>what device(s) will matter; e.g. if the guest is ppc no point in
>providing an I/O BAR.
>> 
>> For pluggable physical devices, though, both should be provided.
>> 
>> 	-hpa
>
>Yes but what Rusty asked, is whether we should make the memory BAR
>optional.
>
>> -- 
>> H. Peter Anvin, Intel Open Source Technology Center
>> I work for Intel.  I don't speak on their behalf.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-27 11:25               ` Michael S. Tsirkin
@ 2013-03-28  4:50                 ` H. Peter Anvin
  2013-03-30  3:19                   ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-03-28  4:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

On 03/27/2013 04:25 AM, Michael S. Tsirkin wrote:
> 
> Aha. Yes, good idea.  As for how large the offsets are,
> I am guessing we should either just say offset is vqn * X and data is
> vqn, or give hypervisors full flexibility with 32 bit offset and
> arbitrary data.
> 16 bit offsets seem neither here nor there ...

Shift count?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-28  4:50                 ` H. Peter Anvin
@ 2013-03-30  3:19                   ` Rusty Russell
  2013-04-02 22:51                     ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-03-30  3:19 UTC (permalink / raw)
  To: H. Peter Anvin, Michael S. Tsirkin; +Cc: virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:
> On 03/27/2013 04:25 AM, Michael S. Tsirkin wrote:
>> 
>> Aha. Yes, good idea.  As for how large the offsets are,
>> I am guessing we should either just say offset is vqn * X and data is
>> vqn, or give hypervisors full flexibility with 32 bit offset and
>> arbitrary data.
>> 16 bit offsets seem neither here nor there ...
>
> Shift count?

You can only have 2^16 vqs per device.  Is it verboten to write 16-bit
values to odd offsets?  If so, we've just dropped it to 2^15 before you
have to do some decoding to do.  Hard to care...

I dislike saying "multiply offset by 2" because implementations will get
it wrong.  That's because 0 will work either way, and that's going to be
the common case.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/22] virtio: add support for 64 bit features.
  2013-03-21  8:29 ` [PATCH 05/22] virtio: add support for 64 bit features Rusty Russell
  2013-03-21 10:06   ` Cornelia Huck
@ 2013-04-02 17:09   ` Pawel Moll
  1 sibling, 0 replies; 94+ messages in thread
From: Pawel Moll @ 2013-04-02 17:09 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Brian Swetland, Christian Borntraeger, virtualization

On Thu, 2013-03-21 at 08:29 +0000, Rusty Russell wrote:
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index d933150..84ef5fc 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
[...]
>  static void vm_finalize_features(struct virtio_device *vdev)
> @@ -160,7 +162,9 @@ static void vm_finalize_features(struct virtio_device *vdev)
>  	vring_transport_features(vdev);
>  
>  	writel(0, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
> -	writel(vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
> +	writel((u32)vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
> +	writel(1, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
> +	writel(vdev->features >> 32, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);

Maybe (u32)(vdev->features >> 32), just to keep the lines consistent?

I'm just being fussy... ;-)

>  }

Otherwise perfectly fine with me (together with "04/22 virtio: use u32,
not bitmap for struct virtio_device's features")

Acked-by: Pawel Moll <pawel.moll@arm.com>

Pawel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
  2013-03-21  9:09   ` Cornelia Huck
  2013-03-22 14:43   ` Sjur Brændeland
@ 2013-04-02 17:16   ` Pawel Moll
  2 siblings, 0 replies; 94+ messages in thread
From: Pawel Moll @ 2013-04-02 17:16 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Brian Swetland, Christian Borntraeger, virtualization

On Thu, 2013-03-21 at 08:29 +0000, Rusty Russell wrote:
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 1ba0d68..ad7f38f 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -178,6 +178,19 @@ static void vm_get(struct virtio_device *vdev, unsigned offset,
>                 ptr[i] = readb(vm_dev->base + VIRTIO_MMIO_CONFIG + offset + i);
>  }
> 
> +#define VM_GETx(bits)                                                  \
> +static u##bits vm_get##bits(struct virtio_device *vdev, unsigned int offset) \
> +{                                                                      \
> +       u##bits v;                                                      \
> +       vm_get(vdev, offset, &v, sizeof(v));                            \
> +       return v;                                                       \
> +}
> +
> +VM_GETx(8)
> +VM_GETx(16)
> +VM_GETx(32)
> +VM_GETx(64)
> +
>  static void vm_set(struct virtio_device *vdev, unsigned offset,
>                    const void *buf, unsigned len)
>  {
> @@ -189,6 +202,18 @@ static void vm_set(struct virtio_device *vdev, unsigned offset,
>                 writeb(ptr[i], vm_dev->base + VIRTIO_MMIO_CONFIG + offset + i);
>  }
> 
> +#define VM_SETx(bits)                                                  \
> +static void vm_set##bits(struct virtio_device *vdev, unsigned int offset, \
> +                        u##bits v)                                     \
> +{                                                                      \
> +       vm_set(vdev, offset, &v, sizeof(v));                            \
> +}
> +
> +VM_SETx(8)
> +VM_SETx(16)
> +VM_SETx(32)
> +VM_SETx(64)
> +
>  static u8 vm_get_status(struct virtio_device *vdev)
>  {
>         struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> @@ -424,8 +449,14 @@ static const char *vm_bus_name(struct virtio_device *vdev)
>  }
> 
>  static const struct virtio_config_ops virtio_mmio_config_ops = {
> -       .get            = vm_get,
> -       .set            = vm_set,
> +       .get8           = vm_get8,
> +       .set8           = vm_set8,
> +       .get16          = vm_get16,
> +       .set16          = vm_set16,
> +       .get32          = vm_get32,
> +       .set32          = vm_set32,
> +       .get64          = vm_get64,
> +       .set64          = vm_set64,
>         .get_status     = vm_get_status,
>         .set_status     = vm_set_status,
>         .reset          = vm_reset,

The idea is by all means fine with me. I wouldn't write it like this,
but I see you're already toying with other alternatives. And I'll have
to make it LE only anyway, so I guess the implementation details don't
really matter right now.

Paweł


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-03-30  3:19                   ` Rusty Russell
@ 2013-04-02 22:51                     ` H. Peter Anvin
  2013-04-03  6:10                       ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-04-02 22:51 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Michael S. Tsirkin

On 03/29/2013 08:19 PM, Rusty Russell wrote:
>>
>> Shift count?
> 
> You can only have 2^16 vqs per device.  Is it verboten to write 16-bit
> values to odd offsets?  If so, we've just dropped it to 2^15 before you
> have to do some decoding to do.  Hard to care...
> 
> I dislike saying "multiply offset by 2" because implementations will get
> it wrong.  That's because 0 will work either way, and that's going to be
> the common case.
> 

The main reason to use a shift count is that it lets the guest driver
assume that the spacing is a power of two, requiring only shift, as
opposed to an arbitrary number, requiring a multiply.  It seems unlikely
that there would be a legitimate reason for a non-power-of-two spacing
between the VQ notifiers.

The other reason is that if a particular host implementation needs
separate pages for each notifier, that can be a pretty large number.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-02 22:51                     ` H. Peter Anvin
@ 2013-04-03  6:10                       ` Rusty Russell
  2013-04-03 11:22                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-04-03  6:10 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization, Michael S. Tsirkin

"H. Peter Anvin" <hpa@zytor.com> writes:
> On 03/29/2013 08:19 PM, Rusty Russell wrote:
>>>
>>> Shift count?
>> 
>> You can only have 2^16 vqs per device.  Is it verboten to write 16-bit
>> values to odd offsets?  If so, we've just dropped it to 2^15 before you
>> have to do some decoding to do.  Hard to care...
>> 
>> I dislike saying "multiply offset by 2" because implementations will get
>> it wrong.  That's because 0 will work either way, and that's going to be
>> the common case.
>> 
>
> The main reason to use a shift count is that it lets the guest driver
> assume that the spacing is a power of two, requiring only shift, as
> opposed to an arbitrary number, requiring a multiply.  It seems unlikely
> that there would be a legitimate reason for a non-power-of-two spacing
> between the VQ notifiers.
>
> The other reason is that if a particular host implementation needs
> separate pages for each notifier, that can be a pretty large number.

Ah, sorry, we're talking across each other a bit.

Current proposal is a 16 bit 'offset' field in the queue data for each
queue, ie.
        addr = dev->notify_base + vq->notify_off;

You propose a per-device 'shift' field:
        addr = dev->notify_base + (vq->index << dev->notify_shift);

Which allows greater offsets, but insists on a unique offset per queue.
Might be a fair trade-off...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03  6:10                       ` Rusty Russell
@ 2013-04-03 11:22                         ` Michael S. Tsirkin
  2013-04-03 14:10                           ` H. Peter Anvin
  2013-04-04  5:48                           ` Rusty Russell
  0 siblings, 2 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-04-03 11:22 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, H. Peter Anvin

On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> > On 03/29/2013 08:19 PM, Rusty Russell wrote:
> >>>
> >>> Shift count?
> >> 
> >> You can only have 2^16 vqs per device.  Is it verboten to write 16-bit
> >> values to odd offsets?  If so, we've just dropped it to 2^15 before you
> >> have to do some decoding to do.  Hard to care...
> >> 
> >> I dislike saying "multiply offset by 2" because implementations will get
> >> it wrong.  That's because 0 will work either way, and that's going to be
> >> the common case.
> >> 
> >
> > The main reason to use a shift count is that it lets the guest driver
> > assume that the spacing is a power of two, requiring only shift, as
> > opposed to an arbitrary number, requiring a multiply.  It seems unlikely
> > that there would be a legitimate reason for a non-power-of-two spacing
> > between the VQ notifiers.
> >
> > The other reason is that if a particular host implementation needs
> > separate pages for each notifier, that can be a pretty large number.
> 
> Ah, sorry, we're talking across each other a bit.
> 
> Current proposal is a 16 bit 'offset' field in the queue data for each
> queue, ie.
>         addr = dev->notify_base + vq->notify_off;
> 
> You propose a per-device 'shift' field:
>         addr = dev->notify_base + (vq->index << dev->notify_shift);
> 
> Which allows greater offsets, but insists on a unique offset per queue.
> Might be a fair trade-off...
> 
> Cheers,
> Rusty.

Or even
         addr = dev->notify_base + (vq->notify_off << dev->notify_shift);

since notify_base is per capability, shift can be per capability too.
And for IO we can allow it to be 32 to mean "always use base".

This is a bit more elegant than just saying "no offsets for IO".

-- 
MST

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03 11:22                         ` Michael S. Tsirkin
@ 2013-04-03 14:10                           ` H. Peter Anvin
  2013-04-03 14:35                             ` Michael S. Tsirkin
  2013-04-04  5:48                           ` Rusty Russell
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-04-03 14:10 UTC (permalink / raw)
  To: Michael S. Tsirkin, Rusty Russell; +Cc: virtualization

0 should probably mean no shift; that way we explicitly prohibit odd offsets, which is a good thing, too.

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
>> "H. Peter Anvin" <hpa@zytor.com> writes:
>> > On 03/29/2013 08:19 PM, Rusty Russell wrote:
>> >>>
>> >>> Shift count?
>> >> 
>> >> You can only have 2^16 vqs per device.  Is it verboten to write
>16-bit
>> >> values to odd offsets?  If so, we've just dropped it to 2^15
>before you
>> >> have to do some decoding to do.  Hard to care...
>> >> 
>> >> I dislike saying "multiply offset by 2" because implementations
>will get
>> >> it wrong.  That's because 0 will work either way, and that's going
>to be
>> >> the common case.
>> >> 
>> >
>> > The main reason to use a shift count is that it lets the guest
>driver
>> > assume that the spacing is a power of two, requiring only shift, as
>> > opposed to an arbitrary number, requiring a multiply.  It seems
>unlikely
>> > that there would be a legitimate reason for a non-power-of-two
>spacing
>> > between the VQ notifiers.
>> >
>> > The other reason is that if a particular host implementation needs
>> > separate pages for each notifier, that can be a pretty large
>number.
>> 
>> Ah, sorry, we're talking across each other a bit.
>> 
>> Current proposal is a 16 bit 'offset' field in the queue data for
>each
>> queue, ie.
>>         addr = dev->notify_base + vq->notify_off;
>> 
>> You propose a per-device 'shift' field:
>>         addr = dev->notify_base + (vq->index << dev->notify_shift);
>> 
>> Which allows greater offsets, but insists on a unique offset per
>queue.
>> Might be a fair trade-off...
>> 
>> Cheers,
>> Rusty.
>
>Or even
>       addr = dev->notify_base + (vq->notify_off << dev->notify_shift);
>
>since notify_base is per capability, shift can be per capability too.
>And for IO we can allow it to be 32 to mean "always use base".
>
>This is a bit more elegant than just saying "no offsets for IO".

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03 14:35                             ` Michael S. Tsirkin
@ 2013-04-03 14:35                               ` H. Peter Anvin
  2013-04-03 17:02                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2013-04-03 14:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization

I mean no offset.

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> >On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
>> >> "H. Peter Anvin" <hpa@zytor.com> writes:
>> >> > On 03/29/2013 08:19 PM, Rusty Russell wrote:
>> >> >>>
>> >> >>> Shift count?
>> >> >> 
>> >> >> You can only have 2^16 vqs per device.  Is it verboten to write
>> >16-bit
>> >> >> values to odd offsets?  If so, we've just dropped it to 2^15
>> >before you
>> >> >> have to do some decoding to do.  Hard to care...
>> >> >> 
>> >> >> I dislike saying "multiply offset by 2" because implementations
>> >will get
>> >> >> it wrong.  That's because 0 will work either way, and that's
>going
>> >to be
>> >> >> the common case.
>> >> >> 
>> >> >
>> >> > The main reason to use a shift count is that it lets the guest
>> >driver
>> >> > assume that the spacing is a power of two, requiring only shift,
>as
>> >> > opposed to an arbitrary number, requiring a multiply.  It seems
>> >unlikely
>> >> > that there would be a legitimate reason for a non-power-of-two
>> >spacing
>> >> > between the VQ notifiers.
>> >> >
>> >> > The other reason is that if a particular host implementation
>needs
>> >> > separate pages for each notifier, that can be a pretty large
>> >number.
>> >> 
>> >> Ah, sorry, we're talking across each other a bit.
>> >> 
>> >> Current proposal is a 16 bit 'offset' field in the queue data for
>> >each
>> >> queue, ie.
>> >>         addr = dev->notify_base + vq->notify_off;
>> >> 
>> >> You propose a per-device 'shift' field:
>> >>         addr = dev->notify_base + (vq->index <<
>dev->notify_shift);
>> >> 
>> >> Which allows greater offsets, but insists on a unique offset per
>> >queue.
>> >> Might be a fair trade-off...
>> >> 
>> >> Cheers,
>> >> Rusty.
>> >
>> >Or even
>> >       addr = dev->notify_base + (vq->notify_off <<
>dev->notify_shift);
>> >
>> >since notify_base is per capability, shift can be per capability
>too.
>> >And for IO we can allow it to be 32 to mean "always use base".
>> >
>> >This is a bit more elegant than just saying "no offsets for IO".
>> 
>
>On Wed, Apr 03, 2013 at 07:10:42AM -0700, H. Peter Anvin wrote:
>> 0 should probably mean no shift;
>
>Sure. Note no shift is not same as "no offset".
>
>> that way we explicitly prohibit odd offsets, which is a good thing,
>too.
>
>Odd offsets?
>
>> -- 
>> Sent from my mobile phone. Please excuse brevity and lack of
>formatting.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03 14:10                           ` H. Peter Anvin
@ 2013-04-03 14:35                             ` Michael S. Tsirkin
  2013-04-03 14:35                               ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-04-03 14:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> >On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
> >> "H. Peter Anvin" <hpa@zytor.com> writes:
> >> > On 03/29/2013 08:19 PM, Rusty Russell wrote:
> >> >>>
> >> >>> Shift count?
> >> >> 
> >> >> You can only have 2^16 vqs per device.  Is it verboten to write
> >16-bit
> >> >> values to odd offsets?  If so, we've just dropped it to 2^15
> >before you
> >> >> have to do some decoding to do.  Hard to care...
> >> >> 
> >> >> I dislike saying "multiply offset by 2" because implementations
> >will get
> >> >> it wrong.  That's because 0 will work either way, and that's going
> >to be
> >> >> the common case.
> >> >> 
> >> >
> >> > The main reason to use a shift count is that it lets the guest
> >driver
> >> > assume that the spacing is a power of two, requiring only shift, as
> >> > opposed to an arbitrary number, requiring a multiply.  It seems
> >unlikely
> >> > that there would be a legitimate reason for a non-power-of-two
> >spacing
> >> > between the VQ notifiers.
> >> >
> >> > The other reason is that if a particular host implementation needs
> >> > separate pages for each notifier, that can be a pretty large
> >number.
> >> 
> >> Ah, sorry, we're talking across each other a bit.
> >> 
> >> Current proposal is a 16 bit 'offset' field in the queue data for
> >each
> >> queue, ie.
> >>         addr = dev->notify_base + vq->notify_off;
> >> 
> >> You propose a per-device 'shift' field:
> >>         addr = dev->notify_base + (vq->index << dev->notify_shift);
> >> 
> >> Which allows greater offsets, but insists on a unique offset per
> >queue.
> >> Might be a fair trade-off...
> >> 
> >> Cheers,
> >> Rusty.
> >
> >Or even
> >       addr = dev->notify_base + (vq->notify_off << dev->notify_shift);
> >
> >since notify_base is per capability, shift can be per capability too.
> >And for IO we can allow it to be 32 to mean "always use base".
> >
> >This is a bit more elegant than just saying "no offsets for IO".
> 

On Wed, Apr 03, 2013 at 07:10:42AM -0700, H. Peter Anvin wrote:
> 0 should probably mean no shift;

Sure. Note no shift is not same as "no offset".

> that way we explicitly prohibit odd offsets, which is a good thing, too.

Odd offsets?

> -- 
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 03/22] virtio_config: make transports implement accessors.
  2013-03-24  4:24     ` Rusty Russell
@ 2013-04-03 15:58       ` Sjur Brændeland
  0 siblings, 0 replies; 94+ messages in thread
From: Sjur Brændeland @ 2013-04-03 15:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Pawel Moll, Brian Swetland, Linus Walleij, Erwan YVIN,
	virtualization, Christian Borntraeger

Hi Rusty,

On Sun, Mar 24, 2013 at 5:24 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>
> Sjur Brændeland <sjurbren@gmail.com> writes:
> > On Thu, Mar 21, 2013 at 9:29 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > Would it be possible to make this simpler and less verbose somehow?
> > At least three virtio devices: virtio_pci_legacy.c, virtio_mmio.c and
> > soon remoteproc_virtio.c will duplicate variants of the code above.
> >
> > What if set8/get8 was mandatory, and the 16,32,64 variants where optional,
> > and then virtio_creadX() virtio_cwriteX did the magic to make things work?
>
> But this is a case where simpler and less verbose are opposites.  These
> patches are really straightforward.  Noone can forget to write or assign
> an accessor and have still work on x86 and fail for BE machines.
>
> So I like explicit accessors, set by the backend.  But it doesn't have
> to be quite this ugly.
>
> How's this (completely untested!)

Looks really good to me. This way we avoid copying code around between
virtio devices.

BTW: If you are planning to merge this is for 3.10 we will get some
compile issues in linux-next when merging with my latest remoteproc patches.
But with the changes below it should be easy enough to fix.

Thanks,
Sjur

>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index e342692..6ffa542 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -219,6 +219,75 @@ out:
>  }
>  EXPORT_SYMBOL_GPL(register_virtio_device);
>
> +static void noconv_get(struct virtio_device *vdev, unsigned offset,
> +                      size_t len, void *p)
> +{
> +       u8 *buf = p;
> +       while (len) {
> +               *buf = vdev->config->get8(vdev, offset);
> +               buf++;
> +               offset++;
> +               len--;
> +       }
> +}
> +
> +u16 virtio_config_get_noconv16(struct virtio_device *vdev, unsigned offset)
> +{
> +       u16 v;
> +       noconv_get(vdev, offset, sizeof(v), &v);
> +       return v;
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_get_noconv16);
> +
> +u32 virtio_config_get_noconv32(struct virtio_device *vdev, unsigned offset)
> +{
> +       u32 v;
> +       noconv_get(vdev, offset, sizeof(v), &v);
> +       return v;
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_get_noconv32);
> +
> +u64 virtio_config_get_noconv64(struct virtio_device *vdev, unsigned offset)
> +{
> +       u64 v;
> +       noconv_get(vdev, offset, sizeof(v), &v);
> +       return v;
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_get_noconv64);
> +
> +static void noconv_set(struct virtio_device *vdev, unsigned offset,
> +                      size_t len, const void *p)
> +{
> +       const u8 *buf = p;
> +       while (len) {
> +               vdev->config->set8(vdev, offset, *buf);
> +               buf++;
> +               offset++;
> +               len--;
> +       }
> +}
> +
> +void virtio_config_set_noconv16(struct virtio_device *vdev,
> +                               unsigned offset, u16 v)
> +{
> +       noconv_set(vdev, offset, sizeof(v), &v);
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_set_noconv16);
> +
> +void virtio_config_set_noconv32(struct virtio_device *vdev,
> +                               unsigned offset, u32 v)
> +{
> +       noconv_set(vdev, offset, sizeof(v), &v);
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_set_noconv32);
> +
> +void virtio_config_set_noconv64(struct virtio_device *vdev,
> +                               unsigned offset, u64 v)
> +{
> +       noconv_set(vdev, offset, sizeof(v), &v);
> +}
> +EXPORT_SYMBOL_GPL(virtio_config_set_noconv64);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>         int index = dev->index; /* save for after device release */
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 9dfe116..9a9aaa0 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -286,4 +286,23 @@ static inline void virtio_cwrite64(struct virtio_device *vdev,
>                 _r;                                                     \
>         })
>
> +/* Helpers for non-endian converting transports. */
> +u16 virtio_config_get_noconv16(struct virtio_device *vdev, unsigned offset);
> +u32 virtio_config_get_noconv32(struct virtio_device *vdev, unsigned offset);
> +u64 virtio_config_get_noconv64(struct virtio_device *vdev, unsigned offset);
> +void virtio_config_set_noconv16(struct virtio_device *vdev,
> +                               unsigned offset, u16 v);
> +void virtio_config_set_noconv32(struct virtio_device *vdev,
> +                               unsigned offset, u32 v);
> +void virtio_config_set_noconv64(struct virtio_device *vdev,
> +                               unsigned offset, u64 v);
> +
> +#define VIRTIO_CONFIG_OPS_NOCONV               \
> +       .get16 = virtio_config_get_noconv16,    \
> +       .set16 = virtio_config_set_noconv16,    \
> +       .get32 = virtio_config_get_noconv32,    \
> +       .set32 = virtio_config_set_noconv32,    \
> +       .get64 = virtio_config_get_noconv64,    \
> +       .set64 = virtio_config_set_noconv64
> +
>  #endif /* _LINUX_VIRTIO_CONFIG_H */
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03 14:35                               ` H. Peter Anvin
@ 2013-04-03 17:02                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-04-03 17:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: virtualization

On Wed, Apr 03, 2013 at 07:35:31AM -0700, H. Peter Anvin wrote:
> I mean no offset.

I see. Fine with me.

> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> >On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
> >> >> "H. Peter Anvin" <hpa@zytor.com> writes:
> >> >> > On 03/29/2013 08:19 PM, Rusty Russell wrote:
> >> >> >>>
> >> >> >>> Shift count?
> >> >> >> 
> >> >> >> You can only have 2^16 vqs per device.  Is it verboten to write
> >> >16-bit
> >> >> >> values to odd offsets?  If so, we've just dropped it to 2^15
> >> >before you
> >> >> >> have to do some decoding to do.  Hard to care...
> >> >> >> 
> >> >> >> I dislike saying "multiply offset by 2" because implementations
> >> >will get
> >> >> >> it wrong.  That's because 0 will work either way, and that's
> >going
> >> >to be
> >> >> >> the common case.
> >> >> >> 
> >> >> >
> >> >> > The main reason to use a shift count is that it lets the guest
> >> >driver
> >> >> > assume that the spacing is a power of two, requiring only shift,
> >as
> >> >> > opposed to an arbitrary number, requiring a multiply.  It seems
> >> >unlikely
> >> >> > that there would be a legitimate reason for a non-power-of-two
> >> >spacing
> >> >> > between the VQ notifiers.
> >> >> >
> >> >> > The other reason is that if a particular host implementation
> >needs
> >> >> > separate pages for each notifier, that can be a pretty large
> >> >number.
> >> >> 
> >> >> Ah, sorry, we're talking across each other a bit.
> >> >> 
> >> >> Current proposal is a 16 bit 'offset' field in the queue data for
> >> >each
> >> >> queue, ie.
> >> >>         addr = dev->notify_base + vq->notify_off;
> >> >> 
> >> >> You propose a per-device 'shift' field:
> >> >>         addr = dev->notify_base + (vq->index <<
> >dev->notify_shift);
> >> >> 
> >> >> Which allows greater offsets, but insists on a unique offset per
> >> >queue.
> >> >> Might be a fair trade-off...
> >> >> 
> >> >> Cheers,
> >> >> Rusty.
> >> >
> >> >Or even
> >> >       addr = dev->notify_base + (vq->notify_off <<
> >dev->notify_shift);
> >> >
> >> >since notify_base is per capability, shift can be per capability
> >too.
> >> >And for IO we can allow it to be 32 to mean "always use base".
> >> >
> >> >This is a bit more elegant than just saying "no offsets for IO".
> >> 
> >
> >On Wed, Apr 03, 2013 at 07:10:42AM -0700, H. Peter Anvin wrote:
> >> 0 should probably mean no shift;
> >
> >Sure. Note no shift is not same as "no offset".
> >
> >> that way we explicitly prohibit odd offsets, which is a good thing,
> >too.
> >
> >Odd offsets?
> >
> >> -- 
> >> Sent from my mobile phone. Please excuse brevity and lack of
> >formatting.
> 
> -- 
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-03 11:22                         ` Michael S. Tsirkin
  2013-04-03 14:10                           ` H. Peter Anvin
@ 2013-04-04  5:48                           ` Rusty Russell
  2013-04-04  8:25                             ` Michael S. Tsirkin
  1 sibling, 1 reply; 94+ messages in thread
From: Rusty Russell @ 2013-04-04  5:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, H. Peter Anvin

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
>> Current proposal is a 16 bit 'offset' field in the queue data for each
>> queue, ie.
>>         addr = dev->notify_base + vq->notify_off;
>> 
>> You propose a per-device 'shift' field:
>>         addr = dev->notify_base + (vq->index << dev->notify_shift);
>> 
>> Which allows greater offsets, but insists on a unique offset per queue.
>> Might be a fair trade-off...
>> 
>> Cheers,
>> Rusty.
>
> Or even
>          addr = dev->notify_base + (vq->notify_off << dev->notify_shift);
>
> since notify_base is per capability, shift can be per capability too.
> And for IO we can allow it to be 32 to mean "always use base".
>
> This is a bit more elegant than just saying "no offsets for IO".

Yes, I shied away from this because it makes the capabilities different
sizes, but per capability is elegant.  Except it really needs to be a
multiplier, not a shift, since we want a "0".  And magic numbers are
horrible.

Since the multiply can be done at device init time, I don't think it's a
big issue.

The results looks something like this...

Cheers,
Rusty.

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index c917e3a..f2ce171 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -46,8 +46,8 @@ struct virtio_pci_device {
 	size_t notify_len;
 	size_t device_len;
 
-	/* We use the queue_notify_moff only for MEM bars. */
-	bool notify_use_offset;
+	/* We use the queue_notify_off only for MEM bars. */
+	u32 notify_offset_multiplier;
 
 	/* a list of queues so we can dispatch IRQs */
 	spinlock_t lock;
@@ -469,7 +469,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_vq_info *info;
 	struct virtqueue *vq;
-	u16 num;
+	u16 num, off;
 	int err;
 
 	if (index >= ioread16(&vp_dev->common->num_queues))
@@ -502,19 +502,16 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	info->msix_vector = msix_vec;
 
-	if (vp_dev->notify_use_offset) {
-		/* get offset of notification byte for this virtqueue */
-		u16 off = ioread16(&vp_dev->common->queue_notify_moff);
-		if (off > vp_dev->notify_len) {
-			dev_warn(&vp_dev->pci_dev->dev,
-				 "bad notification offset %u for queue %u (> %u)",
-				 off, index, vp_dev->notify_len);
-			err = -EINVAL;
-			goto out_info;
-		}
-		info->notify = vp_dev->notify_base + off;
-	} else
-		info->notify = vp_dev->notify_base;
+	/* get offset of notification byte for this virtqueue */
+	off = ioread16(&vp_dev->common->queue_notify_off);
+	if (off * vp_dev->notify_offset_multiplier > vp_dev->notify_len) {
+		dev_warn(&vp_dev->pci_dev->dev,
+			 "bad notification offset %u for queue %u (> %u)",
+			 off, index, vp_dev->notify_len);
+		err = -EINVAL;
+		goto out_info;
+	}
+	info->notify = vp_dev->notify_base + off * vp_dev->notify_offset_multiplier;
 
 	info->queue = alloc_virtqueue_pages(&num);
 	if (info->queue == NULL) {
@@ -812,7 +809,7 @@ static void virtio_pci_release_dev(struct device *_d)
 }
 
 static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
-				    size_t *len, bool *is_mem)
+				    size_t *len)
 {
 	u8 bar;
 	u32 offset, length;
@@ -834,8 +831,6 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
 
 	if (len)
 		*len = length;
-	if (is_mem)
-		*is_mem = pci_resource_flags(dev, bar) & IORESOURCE_MEM;
 
 	/* We want uncachable mapping, even if bar is cachable. */
 	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
@@ -914,19 +909,23 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	err = -EINVAL;
 	vp_dev->common = map_capability(pci_dev, common,
 					sizeof(struct virtio_pci_common_cfg),
-					NULL, NULL);
+					NULL);
 	if (!vp_dev->common)
 		goto out_req_regions;
-	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL, NULL);
+	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL);
 	if (!vp_dev->isr)
 		goto out_map_common;
+
+	/* Read notify_off_multiplier from config space. */
+	pci_read_config_dword(pci_dev,
+			      notify + offsetof(struct virtio_pci_notify_cap,
+						notify_off_multiplier),
+			      &vp_dev->notify_offset_multiplier);
 	vp_dev->notify_base = map_capability(pci_dev, notify, sizeof(u8),
-					     &vp_dev->notify_len,
-					     &vp_dev->notify_use_offset);
+					     &vp_dev->notify_len);
 	if (!vp_dev->notify_len)
 		goto out_map_isr;
-	vp_dev->device = map_capability(pci_dev, device, 0,
-					&vp_dev->device_len, NULL);
+	vp_dev->device = map_capability(pci_dev, device, 0, &vp_dev->device_len);
 	if (!vp_dev->device)
 		goto out_map_notify;
 
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 942135a..3e61d55 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -133,6 +133,11 @@ struct virtio_pci_cap {
 	__le32 length;	/* Length. */
 };
 
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	__le32 notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
 /* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
 struct virtio_pci_common_cfg {
 	/* About the whole device. */
@@ -146,13 +151,13 @@ struct virtio_pci_common_cfg {
 	__u8 unused1;
 
 	/* About a specific virtqueue. */
-	__le16 queue_select;	/* read-write */
-	__le16 queue_size;	/* read-write, power of 2. */
-	__le16 queue_msix_vector;/* read-write */
-	__le16 queue_enable;	/* read-write */
-	__le16 queue_notify_moff; /* read-only */
-	__le64 queue_desc;	/* read-write */
-	__le64 queue_avail;	/* read-write */
-	__le64 queue_used;	/* read-write */
+	__le16 queue_select;		/* read-write */
+	__le16 queue_size;		/* read-write, power of 2. */
+	__le16 queue_msix_vector;	/* read-write */
+	__le16 queue_enable;		/* read-write */
+	__le16 queue_notify_off;	/* read-only */
+	__le64 queue_desc;		/* read-write */
+	__le64 queue_avail;		/* read-write */
+	__le64 queue_used;		/* read-write */
 };
 #endif /* _UAPI_LINUX_VIRTIO_PCI_H */

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-04  5:48                           ` Rusty Russell
@ 2013-04-04  8:25                             ` Michael S. Tsirkin
  2013-04-05  1:25                               ` Rusty Russell
  0 siblings, 1 reply; 94+ messages in thread
From: Michael S. Tsirkin @ 2013-04-04  8:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, H. Peter Anvin

On Thu, Apr 04, 2013 at 04:18:03PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Wed, Apr 03, 2013 at 04:40:29PM +1030, Rusty Russell wrote:
> >> Current proposal is a 16 bit 'offset' field in the queue data for each
> >> queue, ie.
> >>         addr = dev->notify_base + vq->notify_off;
> >> 
> >> You propose a per-device 'shift' field:
> >>         addr = dev->notify_base + (vq->index << dev->notify_shift);
> >> 
> >> Which allows greater offsets, but insists on a unique offset per queue.
> >> Might be a fair trade-off...
> >> 
> >> Cheers,
> >> Rusty.
> >
> > Or even
> >          addr = dev->notify_base + (vq->notify_off << dev->notify_shift);
> >
> > since notify_base is per capability, shift can be per capability too.
> > And for IO we can allow it to be 32 to mean "always use base".
> >
> > This is a bit more elegant than just saying "no offsets for IO".
> 
> Yes, I shied away from this because it makes the capabilities different
> sizes, but per capability is elegant.  Except it really needs to be a
> multiplier, not a shift, since we want a "0".  And magic numbers are
> horrible.
> Since the multiply can be done at device init time, I don't think it's a
> big issue.
> 
> The results looks something like this...
> 
> Cheers,
> Rusty.

Looks good, I'm implementing the wildcard MMIO to check that
this actually works as fast as PIO.

By the way, Gleb pointed out that on older hosts MMIO will
always be slower since we need to do a shadow page walk to
translate virtual to physical address.
Hopefully not a big concern, and after all we are still
keeping PIO around for use by BIOS ...

> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index c917e3a..f2ce171 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -46,8 +46,8 @@ struct virtio_pci_device {
>  	size_t notify_len;
>  	size_t device_len;
>  
> -	/* We use the queue_notify_moff only for MEM bars. */
> -	bool notify_use_offset;
> +	/* We use the queue_notify_off only for MEM bars. */
> +	u32 notify_offset_multiplier;
>  
>  	/* a list of queues so we can dispatch IRQs */
>  	spinlock_t lock;
> @@ -469,7 +469,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	struct virtio_pci_vq_info *info;
>  	struct virtqueue *vq;
> -	u16 num;
> +	u16 num, off;
>  	int err;
>  
>  	if (index >= ioread16(&vp_dev->common->num_queues))
> @@ -502,19 +502,16 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
>  
>  	info->msix_vector = msix_vec;
>  
> -	if (vp_dev->notify_use_offset) {
> -		/* get offset of notification byte for this virtqueue */
> -		u16 off = ioread16(&vp_dev->common->queue_notify_moff);
> -		if (off > vp_dev->notify_len) {
> -			dev_warn(&vp_dev->pci_dev->dev,
> -				 "bad notification offset %u for queue %u (> %u)",
> -				 off, index, vp_dev->notify_len);
> -			err = -EINVAL;
> -			goto out_info;
> -		}
> -		info->notify = vp_dev->notify_base + off;
> -	} else
> -		info->notify = vp_dev->notify_base;
> +	/* get offset of notification byte for this virtqueue */
> +	off = ioread16(&vp_dev->common->queue_notify_off);
> +	if (off * vp_dev->notify_offset_multiplier > vp_dev->notify_len) {

maybe check there's no overflow too?
if (UINT32_MAX / vp_dev->notify_offset_multiplier > off)
	return -EINVAL;

> +		dev_warn(&vp_dev->pci_dev->dev,
> +			 "bad notification offset %u for queue %u (> %u)",
> +			 off, index, vp_dev->notify_len);
> +		err = -EINVAL;
> +		goto out_info;
> +	}

I don't know if you want to limit this to "0 or power of two",
if yes you'd do
	if (vp_dev->notify_offset_multiplier & (vp_dev->notify_offset_multiplier - 1))
		return -EINVAL;

> +	info->notify = vp_dev->notify_base + off * vp_dev->notify_offset_multiplier;
>  
>  	info->queue = alloc_virtqueue_pages(&num);
>  	if (info->queue == NULL) {
> @@ -812,7 +809,7 @@ static void virtio_pci_release_dev(struct device *_d)
>  }
>  
>  static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
> -				    size_t *len, bool *is_mem)
> +				    size_t *len)
>  {
>  	u8 bar;
>  	u32 offset, length;
> @@ -834,8 +831,6 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen,
>  
>  	if (len)
>  		*len = length;
> -	if (is_mem)
> -		*is_mem = pci_resource_flags(dev, bar) & IORESOURCE_MEM;
>  
>  	/* We want uncachable mapping, even if bar is cachable. */
>  	p = pci_iomap_range(dev, bar, offset, length, PAGE_SIZE, true);
> @@ -914,19 +909,23 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  	err = -EINVAL;
>  	vp_dev->common = map_capability(pci_dev, common,
>  					sizeof(struct virtio_pci_common_cfg),
> -					NULL, NULL);
> +					NULL);
>  	if (!vp_dev->common)
>  		goto out_req_regions;
> -	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL, NULL);
> +	vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), NULL);
>  	if (!vp_dev->isr)
>  		goto out_map_common;
> +
> +	/* Read notify_off_multiplier from config space. */
> +	pci_read_config_dword(pci_dev,
> +			      notify + offsetof(struct virtio_pci_notify_cap,
> +						notify_off_multiplier),
> +			      &vp_dev->notify_offset_multiplier);
>  	vp_dev->notify_base = map_capability(pci_dev, notify, sizeof(u8),
> -					     &vp_dev->notify_len,
> -					     &vp_dev->notify_use_offset);
> +					     &vp_dev->notify_len);
>  	if (!vp_dev->notify_len)
>  		goto out_map_isr;
> -	vp_dev->device = map_capability(pci_dev, device, 0,
> -					&vp_dev->device_len, NULL);
> +	vp_dev->device = map_capability(pci_dev, device, 0, &vp_dev->device_len);
>  	if (!vp_dev->device)
>  		goto out_map_notify;
>  
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index 942135a..3e61d55 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -133,6 +133,11 @@ struct virtio_pci_cap {
>  	__le32 length;	/* Length. */
>  };
>  
> +struct virtio_pci_notify_cap {
> +	struct virtio_pci_cap cap;
> +	__le32 notify_off_multiplier;	/* Multiplier for queue_notify_off. */
> +};
> +
>  /* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
>  struct virtio_pci_common_cfg {
>  	/* About the whole device. */
> @@ -146,13 +151,13 @@ struct virtio_pci_common_cfg {
>  	__u8 unused1;
>  
>  	/* About a specific virtqueue. */
> -	__le16 queue_select;	/* read-write */
> -	__le16 queue_size;	/* read-write, power of 2. */
> -	__le16 queue_msix_vector;/* read-write */
> -	__le16 queue_enable;	/* read-write */
> -	__le16 queue_notify_moff; /* read-only */
> -	__le64 queue_desc;	/* read-write */
> -	__le64 queue_avail;	/* read-write */
> -	__le64 queue_used;	/* read-write */
> +	__le16 queue_select;		/* read-write */
> +	__le16 queue_size;		/* read-write, power of 2. */
> +	__le16 queue_msix_vector;	/* read-write */
> +	__le16 queue_enable;		/* read-write */
> +	__le16 queue_notify_off;	/* read-only */
> +	__le64 queue_desc;		/* read-write */
> +	__le64 queue_avail;		/* read-write */
> +	__le64 queue_used;		/* read-write */
>  };
>  #endif /* _UAPI_LINUX_VIRTIO_PCI_H */
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/22] virtio_pci: use separate notification offsets for each vq.
  2013-04-04  8:25                             ` Michael S. Tsirkin
@ 2013-04-05  1:25                               ` Rusty Russell
  0 siblings, 0 replies; 94+ messages in thread
From: Rusty Russell @ 2013-04-05  1:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, H. Peter Anvin

"Michael S. Tsirkin" <mst@redhat.com> writes:
> By the way, Gleb pointed out that on older hosts MMIO will
> always be slower since we need to do a shadow page walk to
> translate virtual to physical address.
> Hopefully not a big concern, and after all we are still
> keeping PIO around for use by BIOS ...

Yeah, slow hosts will be slow :)

>> +	/* get offset of notification byte for this virtqueue */
>> +	off = ioread16(&vp_dev->common->queue_notify_off);
>> +	if (off * vp_dev->notify_offset_multiplier > vp_dev->notify_len) {
>
> maybe check there's no overflow too?
> if (UINT32_MAX / vp_dev->notify_offset_multiplier > off)
> 	return -EINVAL;

I think it's clearer to just do:

        if ((u64)off * vp_dev->notify_offset_multiplier > vp_dev->notify_len) {

Since off is 16 bits and notify_offset_multiplier is 32, this catches
overflow.

>> +		dev_warn(&vp_dev->pci_dev->dev,
>> +			 "bad notification offset %u for queue %u (> %u)",
>> +			 off, index, vp_dev->notify_len);
>> +		err = -EINVAL;
>> +		goto out_info;
>> +	}
>
> I don't know if you want to limit this to "0 or power of two",
> if yes you'd do
> 	if (vp_dev->notify_offset_multiplier & (vp_dev->notify_offset_multiplier - 1))
> 		return -EINVAL;
>
>> +	info->notify = vp_dev->notify_base + off * vp_dev->notify_offset_multiplier;
>>  
>>  	info->queue = alloc_virtqueue_pages(&num);
>>  	if (info->queue == NULL) {

I don't see a reason to restrict it.  It's not like you'll be
calculating this value more than once even in a micro implementation...

Also, let's add 2, since we write a word in there...

Cheers,
Rusty.

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 3d0318d..1c3591a 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -502,12 +502,14 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 
 	info->msix_vector = msix_vec;
 
-	/* get offset of notification byte for this virtqueue */
+	/* get offset of notification word for this vq (shouldn't wrap) */
 	off = ioread16(&vp_dev->common->queue_notify_off);
-	if (off * vp_dev->notify_offset_multiplier + 2 > vp_dev->notify_len) {
+	if ((u64)off * vp_dev->notify_offset_multiplier + 2
+	    > vp_dev->notify_len) {
 		dev_warn(&vp_dev->pci_dev->dev,
-			 "bad notification offset %u for queue %u (> %u)",
-			 off, index, vp_dev->notify_len);
+			 "bad notification offset %u (x %u) for queue %u > %u",
+			 off, vp_dev->notify_offset_multiplier, 
+			 index, vp_dev->notify_len);
 		err = -EINVAL;
 		goto out_info;
 	}

^ permalink raw reply related	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2013-04-05  1:25 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-21  8:29 [PATCH 00/22] New virtio PCI layout Rusty Russell
2013-03-21  8:29 ` [PATCH 01/22] virtio_config: introduce size-based accessors Rusty Russell
2013-03-21  8:29 ` [PATCH 02/22] virtio_config: use " Rusty Russell
2013-03-21  8:29 ` [PATCH 03/22] virtio_config: make transports implement accessors Rusty Russell
2013-03-21  9:09   ` Cornelia Huck
2013-03-22  0:31     ` Rusty Russell
2013-03-22  9:13       ` Cornelia Huck
2013-03-22 14:43   ` Sjur Brændeland
2013-03-24  4:24     ` Rusty Russell
2013-04-03 15:58       ` Sjur Brændeland
2013-04-02 17:16   ` Pawel Moll
2013-03-21  8:29 ` [PATCH 04/22] virtio: use u32, not bitmap for struct virtio_device's features Rusty Russell
2013-03-21 10:00   ` Cornelia Huck
2013-03-22  0:48     ` Rusty Russell
2013-03-21  8:29 ` [PATCH 05/22] virtio: add support for 64 bit features Rusty Russell
2013-03-21 10:06   ` Cornelia Huck
2013-03-22  0:50     ` Rusty Russell
2013-03-22  9:15       ` Cornelia Huck
2013-03-22 14:50     ` Sjur Brændeland
2013-03-22 20:12       ` Ohad Ben-Cohen
2013-03-25  8:30         ` Rusty Russell
2013-04-02 17:09   ` Pawel Moll
2013-03-21  8:29 ` [PATCH 06/22] virtio: move vring structure into struct virtqueue Rusty Russell
2013-03-21  8:29 ` [PATCH 07/22] pci: add pci_iomap_range Rusty Russell
2013-03-21  8:29 ` [PATCH 08/22] virtio-pci: define layout for virtio vendor-specific capabilities Rusty Russell
2013-03-21  8:29 ` [PATCH 09/22] virtio_pci: move old defines to legacy, introduce new structure Rusty Russell
2013-03-21  8:29 ` [PATCH 10/22] virtio_pci: use _LEGACY_ defines in virtio_pci_legacy.c Rusty Russell
2013-03-21  8:29 ` [PATCH 11/22] virtio_pci: don't use the legacy driver if we find the new PCI capabilities Rusty Russell
2013-03-21  8:29 ` [PATCH 12/22] virtio_pci: allow duplicate capabilities Rusty Russell
2013-03-21 10:28   ` Michael S. Tsirkin
2013-03-21 14:26     ` H. Peter Anvin
2013-03-21 14:43       ` Michael S. Tsirkin
2013-03-21 14:45         ` H. Peter Anvin
2013-03-21 15:19           ` Michael S. Tsirkin
2013-03-21 15:26             ` H. Peter Anvin
2013-03-21 15:58               ` Michael S. Tsirkin
2013-03-21 16:04                 ` H. Peter Anvin
2013-03-21 16:11                   ` Michael S. Tsirkin
2013-03-21 16:15                     ` H. Peter Anvin
2013-03-21 16:26                       ` Michael S. Tsirkin
2013-03-21 16:32                         ` H. Peter Anvin
2013-03-21 17:07                           ` Michael S. Tsirkin
2013-03-21 17:09                             ` H. Peter Anvin
2013-03-21 17:13                               ` Michael S. Tsirkin
2013-03-21 17:49                                 ` Michael S. Tsirkin
2013-03-21 17:54                                   ` H. Peter Anvin
2013-03-21 18:01                                     ` Michael S. Tsirkin
2013-03-22  0:57                                     ` Rusty Russell
2013-03-22  3:17                                       ` H. Peter Anvin
2013-03-24 13:14                                       ` Michael S. Tsirkin
2013-03-24 23:23                                         ` H. Peter Anvin
2013-03-25  6:53                                           ` Michael S. Tsirkin
2013-03-25  6:54                                             ` H. Peter Anvin
2013-03-25 10:03                                               ` Rusty Russell
2013-03-21  8:29 ` [PATCH 13/22] virtio_pci: new, capability-aware driver Rusty Russell
2013-03-21 10:24   ` Michael S. Tsirkin
2013-03-22  1:02     ` Rusty Russell
2013-03-24 13:08       ` Michael S. Tsirkin
2013-03-21  8:29 ` [PATCH 14/22] virtio_pci: layout changes as per hpa's suggestions Rusty Russell
2013-03-21  8:29 ` [PATCH 15/22] virtio_pci: use little endian for config space Rusty Russell
2013-03-21  8:29 ` [PATCH 16/22] virtio_pci: use separate notification offsets for each vq Rusty Russell
2013-03-21 10:13   ` Michael S. Tsirkin
2013-03-21 10:35     ` Michael S. Tsirkin
2013-03-22  2:52     ` Rusty Russell
2013-03-24 14:38       ` Michael S. Tsirkin
2013-03-24 20:19       ` Michael S. Tsirkin
2013-03-24 23:27         ` H. Peter Anvin
2013-03-25  7:05           ` Michael S. Tsirkin
2013-03-25 10:00         ` Rusty Russell
2013-03-26 19:39           ` Michael S. Tsirkin
2013-03-27  0:07             ` Rusty Russell
2013-03-27  0:22               ` H. Peter Anvin
2013-03-27  2:31                 ` H. Peter Anvin
2013-03-27 11:26                   ` Michael S. Tsirkin
2013-03-27 14:21                     ` H. Peter Anvin
2013-03-27 11:25               ` Michael S. Tsirkin
2013-03-28  4:50                 ` H. Peter Anvin
2013-03-30  3:19                   ` Rusty Russell
2013-04-02 22:51                     ` H. Peter Anvin
2013-04-03  6:10                       ` Rusty Russell
2013-04-03 11:22                         ` Michael S. Tsirkin
2013-04-03 14:10                           ` H. Peter Anvin
2013-04-03 14:35                             ` Michael S. Tsirkin
2013-04-03 14:35                               ` H. Peter Anvin
2013-04-03 17:02                                 ` Michael S. Tsirkin
2013-04-04  5:48                           ` Rusty Russell
2013-04-04  8:25                             ` Michael S. Tsirkin
2013-04-05  1:25                               ` Rusty Russell
2013-03-21  8:29 ` [PATCH 17/22] virtio_pci_legacy: cleanup struct virtio_pci_vq_info Rusty Russell
2013-03-21  8:29 ` [PATCH 18/22] virtio_pci: share structure between legacy and modern Rusty Russell
2013-03-21  8:29 ` [PATCH 19/22] virtio_pci: share interrupt/notify handlers " Rusty Russell
2013-03-21  8:29 ` [PATCH 20/22] virtio_pci: share virtqueue setup/teardown between modern and legacy driver Rusty Russell
2013-03-21  8:29 ` [PATCH 21/22] virtio_pci: simplify common helpers Rusty Russell
2013-03-21  8:29 ` [PATCH 22/22] virtio_pci: fix finalize_features in modern driver Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.