All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/9] rework on the IRQ hardening of virtio
@ 2022-05-27  6:01 ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo

Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.

Note that, I only did compile test on ccw and MMIO transport.

Please review.

Changes since V5:

- Various tweaks on the comments

Changes since V4:

- use spin_lock_irq()/spin_unlock_irq() to synchronize with
  vring_interrupt() for ccw
- use spin_lock()/spin_unlock() to protect vring_interrupt() for non
  airq
- add comment to explain the ordering implications of set_status() for
  PCI, ccw and MMIO
- various tweaks on the comments and changelogs

Changes since V3:

- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
  document it should be only used for probing
- switch to use rwlock to synchornize the non airq for ccw

Changes since V2:

- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
  attributes for the future virtqueue reset support
  - remove unnecssary READ_ONCE()/WRITE_ONCE()
  - a new patch to remove device triggerable BUG_ON()
  - more tweaks on the comments

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Jason Wang (8):
  virtio: use virtio_reset_device() when possible
  virtio: introduce config op to synchronize vring callbacks
  virtio-pci: implement synchronize_cbs()
  virtio-mmio: implement synchronize_cbs()
  virtio-ccw: implement synchronize_cbs()
  virtio: allow to unbreak virtqueue
  virtio: harden vring IRQ
  virtio: use WARN_ON() to warning illegal status value

Stefano Garzarella (1):
  virtio: use virtio_device_ready() in virtio_device_restore()

 drivers/s390/virtio/virtio_ccw.c       | 34 +++++++++++++++++++
 drivers/virtio/virtio.c                | 24 +++++++++----
 drivers/virtio/virtio_mmio.c           | 13 +++++++
 drivers/virtio/virtio_pci_legacy.c     |  1 +
 drivers/virtio/virtio_pci_modern.c     |  2 ++
 drivers/virtio/virtio_pci_modern_dev.c |  5 +++
 drivers/virtio/virtio_ring.c           | 33 +++++++++++++++---
 include/linux/virtio.h                 |  1 +
 include/linux/virtio_config.h          | 47 +++++++++++++++++++++++++-
 9 files changed, 148 insertions(+), 12 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH V6 0/9] rework on the IRQ hardening of virtio
@ 2022-05-27  6:01 ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, peterz, maz, cohuck, pasic, eperezma, tglx

Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.

Note that, I only did compile test on ccw and MMIO transport.

Please review.

Changes since V5:

- Various tweaks on the comments

Changes since V4:

- use spin_lock_irq()/spin_unlock_irq() to synchronize with
  vring_interrupt() for ccw
- use spin_lock()/spin_unlock() to protect vring_interrupt() for non
  airq
- add comment to explain the ordering implications of set_status() for
  PCI, ccw and MMIO
- various tweaks on the comments and changelogs

Changes since V3:

- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
  document it should be only used for probing
- switch to use rwlock to synchornize the non airq for ccw

Changes since V2:

- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
  attributes for the future virtqueue reset support
  - remove unnecssary READ_ONCE()/WRITE_ONCE()
  - a new patch to remove device triggerable BUG_ON()
  - more tweaks on the comments

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Jason Wang (8):
  virtio: use virtio_reset_device() when possible
  virtio: introduce config op to synchronize vring callbacks
  virtio-pci: implement synchronize_cbs()
  virtio-mmio: implement synchronize_cbs()
  virtio-ccw: implement synchronize_cbs()
  virtio: allow to unbreak virtqueue
  virtio: harden vring IRQ
  virtio: use WARN_ON() to warning illegal status value

Stefano Garzarella (1):
  virtio: use virtio_device_ready() in virtio_device_restore()

 drivers/s390/virtio/virtio_ccw.c       | 34 +++++++++++++++++++
 drivers/virtio/virtio.c                | 24 +++++++++----
 drivers/virtio/virtio_mmio.c           | 13 +++++++
 drivers/virtio/virtio_pci_legacy.c     |  1 +
 drivers/virtio/virtio_pci_modern.c     |  2 ++
 drivers/virtio/virtio_pci_modern_dev.c |  5 +++
 drivers/virtio/virtio_ring.c           | 33 +++++++++++++++---
 include/linux/virtio.h                 |  1 +
 include/linux/virtio_config.h          | 47 +++++++++++++++++++++++++-
 9 files changed, 148 insertions(+), 12 deletions(-)

-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH V6 1/9] virtio: use virtio_device_ready() in virtio_device_restore()
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

From: Stefano Garzarella <sgarzare@redhat.com>

It will allow us to do extension on virtio_device_ready() without
duplicating code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ce424c16997d..938e975029d4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
 			goto err;
 	}
 
-	/* Finally, tell the device we're all set */
-	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
+	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
+		virtio_device_ready(dev);
 
 	virtio_config_enable(dev);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 1/9] virtio: use virtio_device_ready() in virtio_device_restore()
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

From: Stefano Garzarella <sgarzare@redhat.com>

It will allow us to do extension on virtio_device_ready() without
duplicating code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ce424c16997d..938e975029d4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
 			goto err;
 	}
 
-	/* Finally, tell the device we're all set */
-	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
+	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
+		virtio_device_ready(dev);
 
 	virtio_config_enable(dev);
 
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

This allows us to do common extension without duplicating code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 938e975029d4..aa1eb5132767 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
 
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up.  This also tests that code path a little. */
-	dev->config->reset(dev);
+	virtio_reset_device(dev);
 
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up. */
-	dev->config->reset(dev);
+	virtio_reset_device(dev);
 
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

This allows us to do common extension without duplicating code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 938e975029d4..aa1eb5132767 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
 
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up.  This also tests that code path a little. */
-	dev->config->reset(dev);
+	virtio_reset_device(dev);
 
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up. */
-	dev->config->reset(dev);
+	virtio_reset_device(dev);
 
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

This patch introduces new virtio config op to vring
callbacks. Transport specific method is required to make sure the
write before this function is visible to the vring_interrupt() that is
called after the return of this function. For the transport that
doesn't provide synchronize_vqs(), use synchornize_rcu() which
synchronize with IRQ implicitly as a fallback.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..25be018810a7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,11 @@ struct virtio_shm_region {
  *		include a NULL entry for vqs unused by driver
  *	Returns 0 on success or error status
  * @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
+ *      The function guarantees that all memory operations on the
+ *      queue before it are visible to the vring_interrupt() that is
+ *      called after it.
+ *      vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  *	vdev: the virtio_device
  *	Returns the first 64 feature bits (all we currently need).
@@ -89,6 +94,7 @@ struct virtio_config_ops {
 			const char * const names[], const bool *ctx,
 			struct irq_affinity *desc);
 	void (*del_vqs)(struct virtio_device *);
+	void (*synchronize_cbs)(struct virtio_device *);
 	u64 (*get_features)(struct virtio_device *vdev);
 	int (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
 				      desc);
 }
 
+/**
+ * virtio_synchronize_cbs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_cbs(struct virtio_device *dev)
+{
+	if (dev->config->synchronize_cbs) {
+		dev->config->synchronize_cbs(dev);
+	} else {
+		/*
+		 * A best effort fallback to synchronize with
+		 * interrupts, preemption and softirq disabled
+		 * regions. See comment above synchronize_rcu().
+		 */
+		synchronize_rcu();
+	}
+}
+
 /**
  * virtio_device_ready - enable vq use in probe function
  * @vdev: the device
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

This patch introduces new virtio config op to vring
callbacks. Transport specific method is required to make sure the
write before this function is visible to the vring_interrupt() that is
called after the return of this function. For the transport that
doesn't provide synchronize_vqs(), use synchornize_rcu() which
synchronize with IRQ implicitly as a fallback.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..25be018810a7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,11 @@ struct virtio_shm_region {
  *		include a NULL entry for vqs unused by driver
  *	Returns 0 on success or error status
  * @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
+ *      The function guarantees that all memory operations on the
+ *      queue before it are visible to the vring_interrupt() that is
+ *      called after it.
+ *      vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  *	vdev: the virtio_device
  *	Returns the first 64 feature bits (all we currently need).
@@ -89,6 +94,7 @@ struct virtio_config_ops {
 			const char * const names[], const bool *ctx,
 			struct irq_affinity *desc);
 	void (*del_vqs)(struct virtio_device *);
+	void (*synchronize_cbs)(struct virtio_device *);
 	u64 (*get_features)(struct virtio_device *vdev);
 	int (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
 				      desc);
 }
 
+/**
+ * virtio_synchronize_cbs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_cbs(struct virtio_device *dev)
+{
+	if (dev->config->synchronize_cbs) {
+		dev->config->synchronize_cbs(dev);
+	} else {
+		/*
+		 * A best effort fallback to synchronize with
+		 * interrupts, preemption and softirq disabled
+		 * regions. See comment above synchronize_rcu().
+		 */
+		synchronize_rcu();
+	}
+}
+
 /**
  * virtio_device_ready - enable vq use in probe function
  * @vdev: the device
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 4/9] virtio-pci: implement synchronize_cbs()
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_legacy.c | 1 +
 drivers/virtio/virtio_pci_modern.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 7fe4caa4b519..a5e5721145c7 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 4acb34409f0b..623906b4996c 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
@@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 4/9] virtio-pci: implement synchronize_cbs()
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_legacy.c | 1 +
 drivers/virtio/virtio_pci_modern.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 7fe4caa4b519..a5e5721145c7 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 4acb34409f0b..623906b4996c 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
@@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.reset		= vp_reset,
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
+	.synchronize_cbs = vp_synchronize_vectors,
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 5/9] virtio-mmio: implement synchronize_cbs()
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

Simply synchronize the platform irq that is used by us.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_mmio.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 839684d672af..c9699a59f93c 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -345,6 +345,13 @@ static void vm_del_vqs(struct virtio_device *vdev)
 	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
 }
 
+static void vm_synchronize_cbs(struct virtio_device *vdev)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	synchronize_irq(platform_get_irq(vm_dev->pdev, 0));
+}
+
 static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int index,
 				  void (*callback)(struct virtqueue *vq),
 				  const char *name, bool ctx)
@@ -541,6 +548,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = {
 	.finalize_features = vm_finalize_features,
 	.bus_name	= vm_bus_name,
 	.get_shm_region = vm_get_shm_region,
+	.synchronize_cbs = vm_synchronize_cbs,
 };
 
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 5/9] virtio-mmio: implement synchronize_cbs()
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

Simply synchronize the platform irq that is used by us.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_mmio.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 839684d672af..c9699a59f93c 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -345,6 +345,13 @@ static void vm_del_vqs(struct virtio_device *vdev)
 	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
 }
 
+static void vm_synchronize_cbs(struct virtio_device *vdev)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	synchronize_irq(platform_get_irq(vm_dev->pdev, 0));
+}
+
 static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int index,
 				  void (*callback)(struct virtqueue *vq),
 				  const char *name, bool ctx)
@@ -541,6 +548,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = {
 	.finalize_features = vm_finalize_features,
 	.bus_name	= vm_bus_name,
 	.get_shm_region = vm_get_shm_region,
+	.synchronize_cbs = vm_synchronize_cbs,
 };
 
 
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 6/9] virtio-ccw: implement synchronize_cbs()
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

This patch tries to implement the synchronize_cbs() for ccw. For the
vring_interrupt() that is called via virtio_airq_handler(), the
synchronization is simply done via the airq_info's lock. For the
vring_interrupt() that is called via virtio_ccw_int_handler(), a per
device rwlock is introduced and used in the synchronization method.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/s390/virtio/virtio_ccw.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..c188e4f20ca3 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -62,6 +62,7 @@ struct virtio_ccw_device {
 	unsigned int revision; /* Transport revision */
 	wait_queue_head_t wait_q;
 	spinlock_t lock;
+	rwlock_t irq_lock;
 	struct mutex io_lock; /* Serializes I/O requests */
 	struct list_head virtqueues;
 	bool is_thinint;
@@ -984,6 +985,30 @@ static const char *virtio_ccw_bus_name(struct virtio_device *vdev)
 	return dev_name(&vcdev->cdev->dev);
 }
 
+static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
+{
+	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
+	struct airq_info *info = vcdev->airq_info;
+
+	if (info) {
+		/*
+		 * This device uses adapter interrupts: synchronize with
+		 * vring_interrupt() called by virtio_airq_handler()
+		 * via the indicator area lock.
+		 */
+		write_lock_irq(&info->lock);
+		write_unlock_irq(&info->lock);
+	} else {
+		/* This device uses classic interrupts: synchronize
+		 * with vring_interrupt() called by
+		 * virtio_ccw_int_handler() via the per-device
+		 * irq_lock
+		 */
+		write_lock_irq(&vcdev->irq_lock);
+		write_unlock_irq(&vcdev->irq_lock);
+	}
+}
+
 static const struct virtio_config_ops virtio_ccw_config_ops = {
 	.get_features = virtio_ccw_get_features,
 	.finalize_features = virtio_ccw_finalize_features,
@@ -995,6 +1020,7 @@ static const struct virtio_config_ops virtio_ccw_config_ops = {
 	.find_vqs = virtio_ccw_find_vqs,
 	.del_vqs = virtio_ccw_del_vqs,
 	.bus_name = virtio_ccw_bus_name,
+	.synchronize_cbs = virtio_ccw_synchronize_cbs,
 };
 
 
@@ -1106,6 +1132,8 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 			vcdev->err = -EIO;
 	}
 	virtio_ccw_check_activity(vcdev, activity);
+	/* Interrupts are disabled here */
+	read_lock(&vcdev->irq_lock);
 	for_each_set_bit(i, indicators(vcdev),
 			 sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
 		/* The bit clear must happen before the vring kick. */
@@ -1114,6 +1142,7 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 		vq = virtio_ccw_vq_by_ind(vcdev, i);
 		vring_interrupt(0, vq);
 	}
+	read_unlock(&vcdev->irq_lock);
 	if (test_bit(0, indicators2(vcdev))) {
 		virtio_config_changed(&vcdev->vdev);
 		clear_bit(0, indicators2(vcdev));
@@ -1284,6 +1313,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 	init_waitqueue_head(&vcdev->wait_q);
 	INIT_LIST_HEAD(&vcdev->virtqueues);
 	spin_lock_init(&vcdev->lock);
+	rwlock_init(&vcdev->irq_lock);
 	mutex_init(&vcdev->io_lock);
 
 	spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 6/9] virtio-ccw: implement synchronize_cbs()
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

This patch tries to implement the synchronize_cbs() for ccw. For the
vring_interrupt() that is called via virtio_airq_handler(), the
synchronization is simply done via the airq_info's lock. For the
vring_interrupt() that is called via virtio_ccw_int_handler(), a per
device rwlock is introduced and used in the synchronization method.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/s390/virtio/virtio_ccw.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..c188e4f20ca3 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -62,6 +62,7 @@ struct virtio_ccw_device {
 	unsigned int revision; /* Transport revision */
 	wait_queue_head_t wait_q;
 	spinlock_t lock;
+	rwlock_t irq_lock;
 	struct mutex io_lock; /* Serializes I/O requests */
 	struct list_head virtqueues;
 	bool is_thinint;
@@ -984,6 +985,30 @@ static const char *virtio_ccw_bus_name(struct virtio_device *vdev)
 	return dev_name(&vcdev->cdev->dev);
 }
 
+static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
+{
+	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
+	struct airq_info *info = vcdev->airq_info;
+
+	if (info) {
+		/*
+		 * This device uses adapter interrupts: synchronize with
+		 * vring_interrupt() called by virtio_airq_handler()
+		 * via the indicator area lock.
+		 */
+		write_lock_irq(&info->lock);
+		write_unlock_irq(&info->lock);
+	} else {
+		/* This device uses classic interrupts: synchronize
+		 * with vring_interrupt() called by
+		 * virtio_ccw_int_handler() via the per-device
+		 * irq_lock
+		 */
+		write_lock_irq(&vcdev->irq_lock);
+		write_unlock_irq(&vcdev->irq_lock);
+	}
+}
+
 static const struct virtio_config_ops virtio_ccw_config_ops = {
 	.get_features = virtio_ccw_get_features,
 	.finalize_features = virtio_ccw_finalize_features,
@@ -995,6 +1020,7 @@ static const struct virtio_config_ops virtio_ccw_config_ops = {
 	.find_vqs = virtio_ccw_find_vqs,
 	.del_vqs = virtio_ccw_del_vqs,
 	.bus_name = virtio_ccw_bus_name,
+	.synchronize_cbs = virtio_ccw_synchronize_cbs,
 };
 
 
@@ -1106,6 +1132,8 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 			vcdev->err = -EIO;
 	}
 	virtio_ccw_check_activity(vcdev, activity);
+	/* Interrupts are disabled here */
+	read_lock(&vcdev->irq_lock);
 	for_each_set_bit(i, indicators(vcdev),
 			 sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
 		/* The bit clear must happen before the vring kick. */
@@ -1114,6 +1142,7 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 		vq = virtio_ccw_vq_by_ind(vcdev, i);
 		vring_interrupt(0, vq);
 	}
+	read_unlock(&vcdev->irq_lock);
 	if (test_bit(0, indicators2(vcdev))) {
 		virtio_config_changed(&vcdev->vdev);
 		clear_bit(0, indicators2(vcdev));
@@ -1284,6 +1313,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 	init_waitqueue_head(&vcdev->wait_q);
 	INIT_LIST_HEAD(&vcdev->virtqueues);
 	spin_lock_init(&vcdev->lock);
+	rwlock_init(&vcdev->irq_lock);
 	mutex_init(&vcdev->io_lock);
 
 	spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 7/9] virtio: allow to unbreak virtqueue
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

This patch allows the new introduced __virtio_break_device() to
unbreak the virtqueue.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
 include/linux/virtio.h       |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9d0bae4293be..9c231e1fded7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
 
+/*
+ * This should allow the device to be used by the driver. You may
+ * need to grab appropriate locks to flush the write to
+ * vq->broken. This should only be used in some specific case e.g
+ * (probing and restoring). This function should only be called by the
+ * core, not directly by the driver.
+ */
+void __virtio_unbreak_device(struct virtio_device *dev)
+{
+	struct virtqueue *_vq;
+
+	spin_lock(&dev->vqs_list_lock);
+	list_for_each_entry(_vq, &dev->vqs, list) {
+		struct vring_virtqueue *vq = to_vvq(_vq);
+
+		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+		WRITE_ONCE(vq->broken, false);
+	}
+	spin_unlock(&dev->vqs_list_lock);
+}
+EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
+
 dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..d8fdf170637c 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
 bool is_virtio_device(struct device *dev);
 
 void virtio_break_device(struct virtio_device *dev);
+void __virtio_unbreak_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 7/9] virtio: allow to unbreak virtqueue
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

This patch allows the new introduced __virtio_break_device() to
unbreak the virtqueue.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
 include/linux/virtio.h       |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9d0bae4293be..9c231e1fded7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
 
+/*
+ * This should allow the device to be used by the driver. You may
+ * need to grab appropriate locks to flush the write to
+ * vq->broken. This should only be used in some specific case e.g
+ * (probing and restoring). This function should only be called by the
+ * core, not directly by the driver.
+ */
+void __virtio_unbreak_device(struct virtio_device *dev)
+{
+	struct virtqueue *_vq;
+
+	spin_lock(&dev->vqs_list_lock);
+	list_for_each_entry(_vq, &dev->vqs, list) {
+		struct vring_virtqueue *vq = to_vvq(_vq);
+
+		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+		WRITE_ONCE(vq->broken, false);
+	}
+	spin_unlock(&dev->vqs_list_lock);
+}
+EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
+
 dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..d8fdf170637c 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
 bool is_virtio_device(struct device *dev);
 
 void virtio_break_device(struct virtio_device *dev);
+void __virtio_unbreak_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 8/9] virtio: harden vring IRQ
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
   that is used by some device such as virtio-blk
2) done only for PCI transport

The vq->broken is re-used in this patch for implementing the IRQ
hardening. The vq->broken is set to true during both initialization
and reset. And the vq->broken is set to false in
virtio_device_ready(). Then vring_interrupt() can check and return
when vq->broken is true. And in this case, switch to return IRQ_NONE
to let the interrupt core aware of such invalid interrupt to prevent
IRQ storm.

The reason of using a per queue variable instead of a per device one
is that we may need it for per queue reset hardening in the future.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/s390/virtio/virtio_ccw.c       |  4 ++++
 drivers/virtio/virtio.c                | 15 ++++++++++++---
 drivers/virtio/virtio_mmio.c           |  5 +++++
 drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
 drivers/virtio/virtio_ring.c           | 11 +++++++----
 include/linux/virtio_config.h          | 20 ++++++++++++++++++++
 6 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index c188e4f20ca3..97e51c34e6cf 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
 	ccw->flags = 0;
 	ccw->count = sizeof(status);
 	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
+	/* We use ssch for setting the status which is a serializing
+	 * instruction that guarantees the memory writes have
+	 * completed before ssch.
+	 */
 	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
 	/* Write failed? We assume status is unchanged. */
 	if (ret)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index aa1eb5132767..95fac4c97c8b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
  * */
 void virtio_reset_device(struct virtio_device *dev)
 {
+	/*
+	 * The below virtio_synchronize_cbs() guarantees that any
+	 * interrupt for this line arriving after
+	 * virtio_synchronize_vqs() has completed is guaranteed to see
+	 * vq->broken as true.
+	 */
+	virtio_break_device(dev);
+	virtio_synchronize_cbs(dev);
+
 	dev->config->reset(dev);
 }
 EXPORT_SYMBOL_GPL(virtio_reset_device);
@@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
 	dev->config_enabled = false;
 	dev->config_change_pending = false;
 
+	INIT_LIST_HEAD(&dev->vqs);
+	spin_lock_init(&dev->vqs_list_lock);
+
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up.  This also tests that code path a little. */
 	virtio_reset_device(dev);
@@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
-	INIT_LIST_HEAD(&dev->vqs);
-	spin_lock_init(&dev->vqs_list_lock);
-
 	/*
 	 * device_add() causes the bus infrastructure to look for a matching
 	 * driver.
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index c9699a59f93c..f9a36bc7ac27 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
 
+	/*
+	 * Per memory-barriers.txt, wmb() is not needed to guarantee
+	 * that the the cache coherent memory writes have completed
+	 * before writing to the MMIO region.
+	 */
 	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
 }
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 4093f9cca7a6..a0fa14f28a7f 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
 
+	/*
+	 * Per memory-barriers.txt, wmb() is not needed to guarantee
+	 * that the the cache coherent memory writes have completed
+	 * before writing to the MMIO region.
+	 */
 	vp_iowrite8(status, &cfg->device_status);
 }
 EXPORT_SYMBOL_GPL(vp_modern_set_status);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9c231e1fded7..13a7348cedff 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->we_own_ring = true;
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
-	vq->broken = false;
+	vq->broken = true;
 	vq->last_used_idx = 0;
 	vq->event_triggered = false;
 	vq->num_added = 0;
@@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 		return IRQ_NONE;
 	}
 
-	if (unlikely(vq->broken))
-		return IRQ_HANDLED;
+	if (unlikely(vq->broken)) {
+		dev_warn_once(&vq->vq.vdev->dev,
+			      "virtio vring IRQ raised before DRIVER_OK");
+		return IRQ_NONE;
+	}
 
 	/* Just a hint for performance: so it's ok that this can be racy! */
 	if (vq->event)
@@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->we_own_ring = false;
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
-	vq->broken = false;
+	vq->broken = true;
 	vq->last_used_idx = 0;
 	vq->event_triggered = false;
 	vq->num_added = 0;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 25be018810a7..d4edfd7d91bb 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
 	unsigned status = dev->config->get_status(dev);
 
 	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+
+	/*
+	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
+	 * will see the driver specific setup if it sees vq->broken
+	 * as false (even if the notifications come before DRIVER_OK).
+	 */
+	virtio_synchronize_cbs(dev);
+	__virtio_unbreak_device(dev);
+	/*
+	 * The transport should ensure the visibility of vq->broken
+	 * before setting DRIVER_OK. See the comments for the transport
+	 * specific set_status() method.
+	 *
+	 * A well behaved device will only notify a virtqueue after
+	 * DRIVER_OK, this means the device should "see" the coherenct
+	 * memory write that set vq->broken as false which is done by
+	 * the driver when it sees DRIVER_OK, then the following
+	 * driver's vring_interrupt() will see vq->broken as false so
+	 * we won't lose any notification.
+	 */
 	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
 
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
   that is used by some device such as virtio-blk
2) done only for PCI transport

The vq->broken is re-used in this patch for implementing the IRQ
hardening. The vq->broken is set to true during both initialization
and reset. And the vq->broken is set to false in
virtio_device_ready(). Then vring_interrupt() can check and return
when vq->broken is true. And in this case, switch to return IRQ_NONE
to let the interrupt core aware of such invalid interrupt to prevent
IRQ storm.

The reason of using a per queue variable instead of a per device one
is that we may need it for per queue reset hardening in the future.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/s390/virtio/virtio_ccw.c       |  4 ++++
 drivers/virtio/virtio.c                | 15 ++++++++++++---
 drivers/virtio/virtio_mmio.c           |  5 +++++
 drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
 drivers/virtio/virtio_ring.c           | 11 +++++++----
 include/linux/virtio_config.h          | 20 ++++++++++++++++++++
 6 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index c188e4f20ca3..97e51c34e6cf 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
 	ccw->flags = 0;
 	ccw->count = sizeof(status);
 	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
+	/* We use ssch for setting the status which is a serializing
+	 * instruction that guarantees the memory writes have
+	 * completed before ssch.
+	 */
 	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
 	/* Write failed? We assume status is unchanged. */
 	if (ret)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index aa1eb5132767..95fac4c97c8b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
  * */
 void virtio_reset_device(struct virtio_device *dev)
 {
+	/*
+	 * The below virtio_synchronize_cbs() guarantees that any
+	 * interrupt for this line arriving after
+	 * virtio_synchronize_vqs() has completed is guaranteed to see
+	 * vq->broken as true.
+	 */
+	virtio_break_device(dev);
+	virtio_synchronize_cbs(dev);
+
 	dev->config->reset(dev);
 }
 EXPORT_SYMBOL_GPL(virtio_reset_device);
@@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
 	dev->config_enabled = false;
 	dev->config_change_pending = false;
 
+	INIT_LIST_HEAD(&dev->vqs);
+	spin_lock_init(&dev->vqs_list_lock);
+
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up.  This also tests that code path a little. */
 	virtio_reset_device(dev);
@@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
 	/* Acknowledge that we've seen the device. */
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
-	INIT_LIST_HEAD(&dev->vqs);
-	spin_lock_init(&dev->vqs_list_lock);
-
 	/*
 	 * device_add() causes the bus infrastructure to look for a matching
 	 * driver.
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index c9699a59f93c..f9a36bc7ac27 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
 
+	/*
+	 * Per memory-barriers.txt, wmb() is not needed to guarantee
+	 * that the the cache coherent memory writes have completed
+	 * before writing to the MMIO region.
+	 */
 	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
 }
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 4093f9cca7a6..a0fa14f28a7f 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
 {
 	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
 
+	/*
+	 * Per memory-barriers.txt, wmb() is not needed to guarantee
+	 * that the the cache coherent memory writes have completed
+	 * before writing to the MMIO region.
+	 */
 	vp_iowrite8(status, &cfg->device_status);
 }
 EXPORT_SYMBOL_GPL(vp_modern_set_status);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9c231e1fded7..13a7348cedff 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->we_own_ring = true;
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
-	vq->broken = false;
+	vq->broken = true;
 	vq->last_used_idx = 0;
 	vq->event_triggered = false;
 	vq->num_added = 0;
@@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 		return IRQ_NONE;
 	}
 
-	if (unlikely(vq->broken))
-		return IRQ_HANDLED;
+	if (unlikely(vq->broken)) {
+		dev_warn_once(&vq->vq.vdev->dev,
+			      "virtio vring IRQ raised before DRIVER_OK");
+		return IRQ_NONE;
+	}
 
 	/* Just a hint for performance: so it's ok that this can be racy! */
 	if (vq->event)
@@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->we_own_ring = false;
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
-	vq->broken = false;
+	vq->broken = true;
 	vq->last_used_idx = 0;
 	vq->event_triggered = false;
 	vq->num_added = 0;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 25be018810a7..d4edfd7d91bb 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
 	unsigned status = dev->config->get_status(dev);
 
 	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+
+	/*
+	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
+	 * will see the driver specific setup if it sees vq->broken
+	 * as false (even if the notifications come before DRIVER_OK).
+	 */
+	virtio_synchronize_cbs(dev);
+	__virtio_unbreak_device(dev);
+	/*
+	 * The transport should ensure the visibility of vq->broken
+	 * before setting DRIVER_OK. See the comments for the transport
+	 * specific set_status() method.
+	 *
+	 * A well behaved device will only notify a virtqueue after
+	 * DRIVER_OK, this means the device should "see" the coherenct
+	 * memory write that set vq->broken as false which is done by
+	 * the driver when it sees DRIVER_OK, then the following
+	 * driver's vring_interrupt() will see vq->broken as false so
+	 * we won't lose any notification.
+	 */
 	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
  2022-05-27  6:01 ` Jason Wang
@ 2022-05-27  6:01   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

We used to use BUG_ON() in virtio_device_ready() to detect illegal
status value, this seems sub-optimal since the value is under the
control of the device. Switch to use WARN_ON() instead.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index d4edfd7d91bb..9a36051ceb76 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
 {
 	unsigned status = dev->config->get_status(dev);
 
-	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 
 	/*
 	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
@ 2022-05-27  6:01   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-27  6:01 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, cohuck,
	Peter Oberparleiter, pasic, eperezma, Vineeth Vijayan, tglx

We used to use BUG_ON() in virtio_device_ready() to detect illegal
status value, this seems sub-optimal since the value is under the
control of the device. Switch to use WARN_ON() instead.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index d4edfd7d91bb..9a36051ceb76 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
 {
 	unsigned status = dev->config->get_status(dev);
 
-	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 
 	/*
 	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 1/9] virtio: use virtio_device_ready() in virtio_device_restore()
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:29     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> It will allow us to do extension on virtio_device_ready() without
> duplicating code.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index ce424c16997d..938e975029d4 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
>  			goto err;
>  	}
>
> -	/* Finally, tell the device we're all set */
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
> +	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> +		virtio_device_ready(dev);
>
>  	virtio_config_enable(dev);
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 1/9] virtio: use virtio_device_ready() in virtio_device_restore()
@ 2022-05-27  7:29     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> It will allow us to do extension on virtio_device_ready() without
> duplicating code.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index ce424c16997d..938e975029d4 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
>  			goto err;
>  	}
>
> -	/* Finally, tell the device we're all set */
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
> +	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> +		virtio_device_ready(dev);
>
>  	virtio_config_enable(dev);
>
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:30     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:13 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This allows us to do common extension without duplicating code.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 938e975029d4..aa1eb5132767 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
>
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
> -	dev->config->reset(dev);
> +	virtio_reset_device(dev);
>
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
>
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up. */
> -	dev->config->reset(dev);
> +	virtio_reset_device(dev);
>
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
@ 2022-05-27  7:30     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:13 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This allows us to do common extension without duplicating code.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 938e975029d4..aa1eb5132767 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
>
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
> -	dev->config->reset(dev);
> +	virtio_reset_device(dev);
>
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
>
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up. */
> -	dev->config->reset(dev);
> +	virtio_reset_device(dev);
>
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:30     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This patch introduces new virtio config op to vring
> callbacks. Transport specific method is required to make sure the
> write before this function is visible to the vring_interrupt() that is
> called after the return of this function. For the transport that
> doesn't provide synchronize_vqs(), use synchornize_rcu() which
> synchronize with IRQ implicitly as a fallback.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index b341dd62aa4d..25be018810a7 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -57,6 +57,11 @@ struct virtio_shm_region {
>   *		include a NULL entry for vqs unused by driver
>   *	Returns 0 on success or error status
>   * @del_vqs: free virtqueues found by find_vqs().
> + * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
> + *      The function guarantees that all memory operations on the
> + *      queue before it are visible to the vring_interrupt() that is
> + *      called after it.
> + *      vdev: the virtio_device
>   * @get_features: get the array of feature bits for this device.
>   *	vdev: the virtio_device
>   *	Returns the first 64 feature bits (all we currently need).
> @@ -89,6 +94,7 @@ struct virtio_config_ops {
>  			const char * const names[], const bool *ctx,
>  			struct irq_affinity *desc);
>  	void (*del_vqs)(struct virtio_device *);
> +	void (*synchronize_cbs)(struct virtio_device *);
>  	u64 (*get_features)(struct virtio_device *vdev);
>  	int (*finalize_features)(struct virtio_device *vdev);
>  	const char *(*bus_name)(struct virtio_device *vdev);
> @@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
>  				      desc);
>  }
>
> +/**
> + * virtio_synchronize_cbs - synchronize with virtqueue callbacks
> + * @vdev: the device
> + */
> +static inline
> +void virtio_synchronize_cbs(struct virtio_device *dev)
> +{
> +	if (dev->config->synchronize_cbs) {
> +		dev->config->synchronize_cbs(dev);
> +	} else {
> +		/*
> +		 * A best effort fallback to synchronize with
> +		 * interrupts, preemption and softirq disabled
> +		 * regions. See comment above synchronize_rcu().
> +		 */
> +		synchronize_rcu();
> +	}
> +}
> +
>  /**
>   * virtio_device_ready - enable vq use in probe function
>   * @vdev: the device
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
@ 2022-05-27  7:30     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This patch introduces new virtio config op to vring
> callbacks. Transport specific method is required to make sure the
> write before this function is visible to the vring_interrupt() that is
> called after the return of this function. For the transport that
> doesn't provide synchronize_vqs(), use synchornize_rcu() which
> synchronize with IRQ implicitly as a fallback.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index b341dd62aa4d..25be018810a7 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -57,6 +57,11 @@ struct virtio_shm_region {
>   *		include a NULL entry for vqs unused by driver
>   *	Returns 0 on success or error status
>   * @del_vqs: free virtqueues found by find_vqs().
> + * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
> + *      The function guarantees that all memory operations on the
> + *      queue before it are visible to the vring_interrupt() that is
> + *      called after it.
> + *      vdev: the virtio_device
>   * @get_features: get the array of feature bits for this device.
>   *	vdev: the virtio_device
>   *	Returns the first 64 feature bits (all we currently need).
> @@ -89,6 +94,7 @@ struct virtio_config_ops {
>  			const char * const names[], const bool *ctx,
>  			struct irq_affinity *desc);
>  	void (*del_vqs)(struct virtio_device *);
> +	void (*synchronize_cbs)(struct virtio_device *);
>  	u64 (*get_features)(struct virtio_device *vdev);
>  	int (*finalize_features)(struct virtio_device *vdev);
>  	const char *(*bus_name)(struct virtio_device *vdev);
> @@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
>  				      desc);
>  }
>
> +/**
> + * virtio_synchronize_cbs - synchronize with virtqueue callbacks
> + * @vdev: the device
> + */
> +static inline
> +void virtio_synchronize_cbs(struct virtio_device *dev)
> +{
> +	if (dev->config->synchronize_cbs) {
> +		dev->config->synchronize_cbs(dev);
> +	} else {
> +		/*
> +		 * A best effort fallback to synchronize with
> +		 * interrupts, preemption and softirq disabled
> +		 * regions. See comment above synchronize_rcu().
> +		 */
> +		synchronize_rcu();
> +	}
> +}
> +
>  /**
>   * virtio_device_ready - enable vq use in probe function
>   * @vdev: the device
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 4/9] virtio-pci: implement synchronize_cbs()
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:31     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:15 +0800, Jason Wang <jasowang@redhat.com> wrote:
> We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_pci_legacy.c | 1 +
>  drivers/virtio/virtio_pci_modern.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 7fe4caa4b519..a5e5721145c7 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 4acb34409f0b..623906b4996c 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_modern_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> @@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_modern_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 4/9] virtio-pci: implement synchronize_cbs()
@ 2022-05-27  7:31     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:15 +0800, Jason Wang <jasowang@redhat.com> wrote:
> We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_pci_legacy.c | 1 +
>  drivers/virtio/virtio_pci_modern.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 7fe4caa4b519..a5e5721145c7 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 4acb34409f0b..623906b4996c 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_modern_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> @@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
>  	.reset		= vp_reset,
>  	.find_vqs	= vp_modern_find_vqs,
>  	.del_vqs	= vp_del_vqs,
> +	.synchronize_cbs = vp_synchronize_vectors,
>  	.get_features	= vp_get_features,
>  	.finalize_features = vp_finalize_features,
>  	.bus_name	= vp_bus_name,
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 5/9] virtio-mmio: implement synchronize_cbs()
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:32     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:16 +0800, Jason Wang <jasowang@redhat.com> wrote:
> Simply synchronize the platform irq that is used by us.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_mmio.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 839684d672af..c9699a59f93c 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -345,6 +345,13 @@ static void vm_del_vqs(struct virtio_device *vdev)
>  	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
>  }
>
> +static void vm_synchronize_cbs(struct virtio_device *vdev)
> +{
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> +	synchronize_irq(platform_get_irq(vm_dev->pdev, 0));
> +}
> +
>  static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int index,
>  				  void (*callback)(struct virtqueue *vq),
>  				  const char *name, bool ctx)
> @@ -541,6 +548,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = {
>  	.finalize_features = vm_finalize_features,
>  	.bus_name	= vm_bus_name,
>  	.get_shm_region = vm_get_shm_region,
> +	.synchronize_cbs = vm_synchronize_cbs,
>  };
>
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 5/9] virtio-mmio: implement synchronize_cbs()
@ 2022-05-27  7:32     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:16 +0800, Jason Wang <jasowang@redhat.com> wrote:
> Simply synchronize the platform irq that is used by us.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_mmio.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 839684d672af..c9699a59f93c 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -345,6 +345,13 @@ static void vm_del_vqs(struct virtio_device *vdev)
>  	free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
>  }
>
> +static void vm_synchronize_cbs(struct virtio_device *vdev)
> +{
> +	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> +	synchronize_irq(platform_get_irq(vm_dev->pdev, 0));
> +}
> +
>  static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int index,
>  				  void (*callback)(struct virtqueue *vq),
>  				  const char *name, bool ctx)
> @@ -541,6 +548,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = {
>  	.finalize_features = vm_finalize_features,
>  	.bus_name	= vm_bus_name,
>  	.get_shm_region = vm_get_shm_region,
> +	.synchronize_cbs = vm_synchronize_cbs,
>  };
>
>
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 7/9] virtio: allow to unbreak virtqueue
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:33     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This patch allows the new introduced __virtio_break_device() to
> unbreak the virtqueue.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
>  include/linux/virtio.h       |  1 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d0bae4293be..9c231e1fded7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
>
> +/*
> + * This should allow the device to be used by the driver. You may
> + * need to grab appropriate locks to flush the write to
> + * vq->broken. This should only be used in some specific case e.g
> + * (probing and restoring). This function should only be called by the
> + * core, not directly by the driver.
> + */
> +void __virtio_unbreak_device(struct virtio_device *dev)
> +{
> +	struct virtqueue *_vq;
> +
> +	spin_lock(&dev->vqs_list_lock);
> +	list_for_each_entry(_vq, &dev->vqs, list) {
> +		struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> +		WRITE_ONCE(vq->broken, false);
> +	}
> +	spin_unlock(&dev->vqs_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
> +
>  dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 5464f398912a..d8fdf170637c 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>
>  void virtio_break_device(struct virtio_device *dev);
> +void __virtio_unbreak_device(struct virtio_device *dev);
>
>  void virtio_config_changed(struct virtio_device *dev);
>  #ifdef CONFIG_PM_SLEEP
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 7/9] virtio: allow to unbreak virtqueue
@ 2022-05-27  7:33     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This patch allows the new introduced __virtio_break_device() to
> unbreak the virtqueue.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
>  include/linux/virtio.h       |  1 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d0bae4293be..9c231e1fded7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
>
> +/*
> + * This should allow the device to be used by the driver. You may
> + * need to grab appropriate locks to flush the write to
> + * vq->broken. This should only be used in some specific case e.g
> + * (probing and restoring). This function should only be called by the
> + * core, not directly by the driver.
> + */
> +void __virtio_unbreak_device(struct virtio_device *dev)
> +{
> +	struct virtqueue *_vq;
> +
> +	spin_lock(&dev->vqs_list_lock);
> +	list_for_each_entry(_vq, &dev->vqs, list) {
> +		struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> +		WRITE_ONCE(vq->broken, false);
> +	}
> +	spin_unlock(&dev->vqs_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
> +
>  dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 5464f398912a..d8fdf170637c 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>
>  void virtio_break_device(struct virtio_device *dev);
> +void __virtio_unbreak_device(struct virtio_device *dev);
>
>  void virtio_config_changed(struct virtio_device *dev);
>  #ifdef CONFIG_PM_SLEEP
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:49     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index c188e4f20ca3..97e51c34e6cf 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>  	ccw->flags = 0;
>  	ccw->count = sizeof(status);
>  	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> +	/* We use ssch for setting the status which is a serializing
> +	 * instruction that guarantees the memory writes have
> +	 * completed before ssch.
> +	 */
>  	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
>  	/* Write failed? We assume status is unchanged. */
>  	if (ret)
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index aa1eb5132767..95fac4c97c8b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> +	/*
> +	 * The below virtio_synchronize_cbs() guarantees that any
> +	 * interrupt for this line arriving after
> +	 * virtio_synchronize_vqs() has completed is guaranteed to see
> +	 * vq->broken as true.
> +	 */
> +	virtio_break_device(dev);
> +	virtio_synchronize_cbs(dev);
> +
>  	dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>  	dev->config_enabled = false;
>  	dev->config_change_pending = false;
>
> +	INIT_LIST_HEAD(&dev->vqs);
> +	spin_lock_init(&dev->vqs_list_lock);
> +
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
>  	virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>
> -	INIT_LIST_HEAD(&dev->vqs);
> -	spin_lock_init(&dev->vqs_list_lock);
> -
>  	/*
>  	 * device_add() causes the bus infrastructure to look for a matching
>  	 * driver.
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index c9699a59f93c..f9a36bc7ac27 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
>  	/* We should never be setting status to 0. */
>  	BUG_ON(status == 0);
>
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
>  }
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 4093f9cca7a6..a0fa14f28a7f 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	vp_iowrite8(status, &cfg->device_status);
>  }
>  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9c231e1fded7..13a7348cedff 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->we_own_ring = true;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;
> @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  		return IRQ_NONE;
>  	}
>
> -	if (unlikely(vq->broken))
> -		return IRQ_HANDLED;
> +	if (unlikely(vq->broken)) {
> +		dev_warn_once(&vq->vq.vdev->dev,
> +			      "virtio vring IRQ raised before DRIVER_OK");
> +		return IRQ_NONE;
> +	}
>
>  	/* Just a hint for performance: so it's ok that this can be racy! */
>  	if (vq->event)
> @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->we_own_ring = false;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 25be018810a7..d4edfd7d91bb 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
>  	unsigned status = dev->config->get_status(dev);
>
>  	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> +	/*
> +	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> +	 * will see the driver specific setup if it sees vq->broken
> +	 * as false (even if the notifications come before DRIVER_OK).
> +	 */
> +	virtio_synchronize_cbs(dev);
> +	__virtio_unbreak_device(dev);
> +	/*
> +	 * The transport should ensure the visibility of vq->broken
> +	 * before setting DRIVER_OK. See the comments for the transport
> +	 * specific set_status() method.
> +	 *
> +	 * A well behaved device will only notify a virtqueue after
> +	 * DRIVER_OK, this means the device should "see" the coherenct
> +	 * memory write that set vq->broken as false which is done by
> +	 * the driver when it sees DRIVER_OK, then the following
> +	 * driver's vring_interrupt() will see vq->broken as false so
> +	 * we won't lose any notification.
> +	 */
>  	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-05-27  7:49     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index c188e4f20ca3..97e51c34e6cf 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>  	ccw->flags = 0;
>  	ccw->count = sizeof(status);
>  	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> +	/* We use ssch for setting the status which is a serializing
> +	 * instruction that guarantees the memory writes have
> +	 * completed before ssch.
> +	 */
>  	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
>  	/* Write failed? We assume status is unchanged. */
>  	if (ret)
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index aa1eb5132767..95fac4c97c8b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> +	/*
> +	 * The below virtio_synchronize_cbs() guarantees that any
> +	 * interrupt for this line arriving after
> +	 * virtio_synchronize_vqs() has completed is guaranteed to see
> +	 * vq->broken as true.
> +	 */
> +	virtio_break_device(dev);
> +	virtio_synchronize_cbs(dev);
> +
>  	dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>  	dev->config_enabled = false;
>  	dev->config_change_pending = false;
>
> +	INIT_LIST_HEAD(&dev->vqs);
> +	spin_lock_init(&dev->vqs_list_lock);
> +
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
>  	virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>
> -	INIT_LIST_HEAD(&dev->vqs);
> -	spin_lock_init(&dev->vqs_list_lock);
> -
>  	/*
>  	 * device_add() causes the bus infrastructure to look for a matching
>  	 * driver.
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index c9699a59f93c..f9a36bc7ac27 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
>  	/* We should never be setting status to 0. */
>  	BUG_ON(status == 0);
>
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
>  }
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 4093f9cca7a6..a0fa14f28a7f 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	vp_iowrite8(status, &cfg->device_status);
>  }
>  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9c231e1fded7..13a7348cedff 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->we_own_ring = true;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;
> @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  		return IRQ_NONE;
>  	}
>
> -	if (unlikely(vq->broken))
> -		return IRQ_HANDLED;
> +	if (unlikely(vq->broken)) {
> +		dev_warn_once(&vq->vq.vdev->dev,
> +			      "virtio vring IRQ raised before DRIVER_OK");
> +		return IRQ_NONE;
> +	}
>
>  	/* Just a hint for performance: so it's ok that this can be racy! */
>  	if (vq->event)
> @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->we_own_ring = false;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 25be018810a7..d4edfd7d91bb 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
>  	unsigned status = dev->config->get_status(dev);
>
>  	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> +	/*
> +	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> +	 * will see the driver specific setup if it sees vq->broken
> +	 * as false (even if the notifications come before DRIVER_OK).
> +	 */
> +	virtio_synchronize_cbs(dev);
> +	__virtio_unbreak_device(dev);
> +	/*
> +	 * The transport should ensure the visibility of vq->broken
> +	 * before setting DRIVER_OK. See the comments for the transport
> +	 * specific set_status() method.
> +	 *
> +	 * A well behaved device will only notify a virtqueue after
> +	 * DRIVER_OK, this means the device should "see" the coherenct
> +	 * memory write that set vq->broken as false which is done by
> +	 * the driver when it sees DRIVER_OK, then the following
> +	 * driver's vring_interrupt() will see vq->broken as false so
> +	 * we won't lose any notification.
> +	 */
>  	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27  7:49     ` Xuan Zhuo
  -1 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, lulu, paulmck, mst, peterz, maz, cohuck,
	Peter Oberparleiter, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx, linux-kernel

On Fri, 27 May 2022 14:01:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> We used to use BUG_ON() in virtio_device_ready() to detect illegal
> status value, this seems sub-optimal since the value is under the
> control of the device. Switch to use WARN_ON() instead.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  include/linux/virtio_config.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d4edfd7d91bb..9a36051ceb76 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>  	unsigned status = dev->config->get_status(dev);
>
> -	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>
>  	/*
>  	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> --
> 2.25.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
@ 2022-05-27  7:49     ` Xuan Zhuo
  0 siblings, 0 replies; 106+ messages in thread
From: Xuan Zhuo @ 2022-05-27  7:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, Vineeth Vijayan, Peter Oberparleiter, linux-s390, mst,
	jasowang, virtualization, linux-kernel

On Fri, 27 May 2022 14:01:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> We used to use BUG_ON() in virtio_device_ready() to detect illegal
> status value, this seems sub-optimal since the value is under the
> control of the device. Switch to use WARN_ON() instead.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  include/linux/virtio_config.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d4edfd7d91bb..9a36051ceb76 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>  	unsigned status = dev->config->get_status(dev);
>
> -	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>
>  	/*
>  	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
  2022-05-27  6:01   ` Jason Wang
  (?)
  (?)
@ 2022-05-27  8:52   ` Eugenio Perez Martin
  -1 siblings, 0 replies; 106+ messages in thread
From: Eugenio Perez Martin @ 2022-05-27  8:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael Tsirkin, virtualization, linux-kernel, tglx, peterz,
	paulmck, maz, Halil Pasic, Cornelia Huck, Cindy Lu,
	Stefano Garzarella, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 8:01 AM Jason Wang <jasowang@redhat.com> wrote:
>
> This allows us to do common extension without duplicating code.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/virtio/virtio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 938e975029d4..aa1eb5132767 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
>
>         /* We always start by resetting the device, in case a previous
>          * driver messed it up.  This also tests that code path a little. */
> -       dev->config->reset(dev);
> +       virtio_reset_device(dev);
>
>         /* Acknowledge that we've seen the device. */
>         virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
>
>         /* We always start by resetting the device, in case a previous
>          * driver messed it up. */
> -       dev->config->reset(dev);
> +       virtio_reset_device(dev);
>
>         /* Acknowledge that we've seen the device. */
>         virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27 10:34     ` Stefano Garzarella
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, tglx, peterz, paulmck, maz,
	pasic, cohuck, eperezma, lulu, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 02:01:13PM +0800, Jason Wang wrote:
>This allows us to do common extension without duplicating code.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> drivers/virtio/virtio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Maybe I had already reviewed it :-), anyway:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 2/9] virtio: use virtio_reset_device() when possible
@ 2022-05-27 10:34     ` Stefano Garzarella
  0 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, Peter Oberparleiter, lulu, paulmck, mst, peterz, maz,
	cohuck, linux-kernel, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx

On Fri, May 27, 2022 at 02:01:13PM +0800, Jason Wang wrote:
>This allows us to do common extension without duplicating code.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> drivers/virtio/virtio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Maybe I had already reviewed it :-), anyway:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27 10:35     ` Stefano Garzarella
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, tglx, peterz, paulmck, maz,
	pasic, cohuck, eperezma, lulu, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
>We used to use BUG_ON() in virtio_device_ready() to detect illegal
>status value, this seems sub-optimal since the value is under the
>control of the device. Switch to use WARN_ON() instead.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> include/linux/virtio_config.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
@ 2022-05-27 10:35     ` Stefano Garzarella
  0 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, Peter Oberparleiter, lulu, paulmck, mst, peterz, maz,
	cohuck, linux-kernel, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx

On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
>We used to use BUG_ON() in virtio_device_ready() to detect illegal
>status value, this seems sub-optimal since the value is under the
>control of the device. Switch to use WARN_ON() instead.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> include/linux/virtio_config.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27 10:36     ` Stefano Garzarella
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, tglx, peterz, paulmck, maz,
	pasic, cohuck, eperezma, lulu, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 02:01:14PM +0800, Jason Wang wrote:
>This patch introduces new virtio config op to vring
>callbacks. Transport specific method is required to make sure the
>write before this function is visible to the vring_interrupt() that is
>called after the return of this function. For the transport that
>doesn't provide synchronize_vqs(), use synchornize_rcu() which
>synchronize with IRQ implicitly as a fallback.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks
@ 2022-05-27 10:36     ` Stefano Garzarella
  0 siblings, 0 replies; 106+ messages in thread
From: Stefano Garzarella @ 2022-05-27 10:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-s390, Peter Oberparleiter, lulu, paulmck, mst, peterz, maz,
	cohuck, linux-kernel, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx

On Fri, May 27, 2022 at 02:01:14PM +0800, Jason Wang wrote:
>This patch introduces new virtio config op to vring
>callbacks. Transport specific method is required to make sure the
>write before this function is visible to the vring_interrupt() that is
>called after the return of this function. For the transport that
>doesn't provide synchronize_vqs(), use synchornize_rcu() which
>synchronize with IRQ implicitly as a fallback.
>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: "Paul E. McKenney" <paulmck@kernel.org>
>Cc: Marc Zyngier <maz@kernel.org>
>Cc: Halil Pasic <pasic@linux.ibm.com>
>Cc: Cornelia Huck <cohuck@redhat.com>
>Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
>Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
>Cc: linux-s390@vger.kernel.org
>Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-27 10:50     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-05-27 10:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, tglx, peterz, paulmck, maz, pasic,
	cohuck, eperezma, lulu, sgarzare, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

At a minimum, I don't see why it's part of the series. Host can always
crash the guest if it wants to ...
The point of BUG_ON is device or driver is already corrupted so we
should not try to drive it.  If you still want this in pls come up with
a better commit log explaining the why.

On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
> We used to use BUG_ON() in virtio_device_ready() to detect illegal

not really, BUG_ON just crashes the kernel.  we detect by checking
status.

> status value, this seems sub-optimal since the value is under the
> control of the device. Switch to use WARN_ON() instead.

some people use crash on warn so ...

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---
>  include/linux/virtio_config.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d4edfd7d91bb..9a36051ceb76 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>  	unsigned status = dev->config->get_status(dev);
>  
> -	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>  

we lose debuggability as guest will try to continue.
if we are doing this let us print a helpful message and dump a lot of
state right here.

>  	/*
>  	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
@ 2022-05-27 10:50     ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-05-27 10:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, lulu, paulmck, linux-s390, peterz, maz,
	cohuck, linux-kernel, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx

At a minimum, I don't see why it's part of the series. Host can always
crash the guest if it wants to ...
The point of BUG_ON is device or driver is already corrupted so we
should not try to drive it.  If you still want this in pls come up with
a better commit log explaining the why.

On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
> We used to use BUG_ON() in virtio_device_ready() to detect illegal

not really, BUG_ON just crashes the kernel.  we detect by checking
status.

> status value, this seems sub-optimal since the value is under the
> control of the device. Switch to use WARN_ON() instead.

some people use crash on warn so ...

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---
>  include/linux/virtio_config.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d4edfd7d91bb..9a36051ceb76 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>  	unsigned status = dev->config->get_status(dev);
>  
> -	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +	WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>  

we lose debuggability as guest will try to continue.
if we are doing this let us print a helpful message and dump a lot of
state right here.

>  	/*
>  	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
  2022-05-27 10:50     ` Michael S. Tsirkin
@ 2022-05-30  3:48       ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-30  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 6:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> At a minimum, I don't see why it's part of the series. Host can always
> crash the guest if it wants to ...

Probably not with some recent technology. In those cases, a fault will
be generated if the hypervisor tries to access the memory that is
private to the guest.

> The point of BUG_ON is device or driver is already corrupted so we
> should not try to drive it.  If you still want this in pls come up with
> a better commit log explaining the why.

A question here, should we always use BUG_ON for the buggy/malicious hypervisor?

The interrupt hardening logic in this series tries to make guest
survive, so did this patch.

>
> On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
> > We used to use BUG_ON() in virtio_device_ready() to detect illegal
>
> not really, BUG_ON just crashes the kernel.  we detect by checking
> status.

We need a kind of notification otherwise there's no way for the user
to know about this expected value.

>
> > status value, this seems sub-optimal since the value is under the
> > control of the device. Switch to use WARN_ON() instead.
>
> some people use crash on warn so ...

Yes, but the policy is under the control of the user.

>
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> > ---
> >  include/linux/virtio_config.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index d4edfd7d91bb..9a36051ceb76 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
> >  {
> >       unsigned status = dev->config->get_status(dev);
> >
> > -     BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +     WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> >
>
> we lose debuggability as guest will try to continue.
> if we are doing this let us print a helpful message and dump a lot of
> state right here.

I'm ok with dropping this patch from the series. And revisit it in the future.

Thanks

>
> >       /*
> >        * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value
@ 2022-05-30  3:48       ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-05-30  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Fri, May 27, 2022 at 6:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> At a minimum, I don't see why it's part of the series. Host can always
> crash the guest if it wants to ...

Probably not with some recent technology. In those cases, a fault will
be generated if the hypervisor tries to access the memory that is
private to the guest.

> The point of BUG_ON is device or driver is already corrupted so we
> should not try to drive it.  If you still want this in pls come up with
> a better commit log explaining the why.

A question here, should we always use BUG_ON for the buggy/malicious hypervisor?

The interrupt hardening logic in this series tries to make guest
survive, so did this patch.

>
> On Fri, May 27, 2022 at 02:01:20PM +0800, Jason Wang wrote:
> > We used to use BUG_ON() in virtio_device_ready() to detect illegal
>
> not really, BUG_ON just crashes the kernel.  we detect by checking
> status.

We need a kind of notification otherwise there's no way for the user
to know about this expected value.

>
> > status value, this seems sub-optimal since the value is under the
> > control of the device. Switch to use WARN_ON() instead.
>
> some people use crash on warn so ...

Yes, but the policy is under the control of the user.

>
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> > ---
> >  include/linux/virtio_config.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index d4edfd7d91bb..9a36051ceb76 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
> >  {
> >       unsigned status = dev->config->get_status(dev);
> >
> > -     BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +     WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> >
>
> we lose debuggability as guest will try to continue.
> if we are doing this let us print a helpful message and dump a lot of
> state right here.

I'm ok with dropping this patch from the series. And revisit it in the future.

Thanks

>
> >       /*
> >        * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > --
> > 2.25.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 6/9] virtio-ccw: implement synchronize_cbs()
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-30 15:12     ` Cornelia Huck
  -1 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:12 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, Peter Oberparleiter,
	pasic, eperezma, Vineeth Vijayan, tglx

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This patch tries to implement the synchronize_cbs() for ccw. For the
> vring_interrupt() that is called via virtio_airq_handler(), the
> synchronization is simply done via the airq_info's lock. For the
> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> device rwlock is introduced and used in the synchronization method.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 6/9] virtio-ccw: implement synchronize_cbs()
@ 2022-05-30 15:12     ` Cornelia Huck
  0 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:12 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, eperezma, lulu, sgarzare,
	xuanzhuo, Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This patch tries to implement the synchronize_cbs() for ccw. For the
> vring_interrupt() that is called via virtio_airq_handler(), the
> synchronization is simply done via the airq_info's lock. For the
> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> device rwlock is introduced and used in the synchronization method.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 7/9] virtio: allow to unbreak virtqueue
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-30 15:15     ` Cornelia Huck
  -1 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:15 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, Peter Oberparleiter,
	pasic, eperezma, Vineeth Vijayan, tglx

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This patch allows the new introduced __virtio_break_device() to
> unbreak the virtqueue.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
>  include/linux/virtio.h       |  1 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d0bae4293be..9c231e1fded7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
>  
> +/*
> + * This should allow the device to be used by the driver. You may
> + * need to grab appropriate locks to flush the write to
> + * vq->broken. This should only be used in some specific case e.g
> + * (probing and restoring). This function should only be called by the

Minor: "...some specific cases, e.g. probing and restoring."

But no need to respin from my side.

> + * core, not directly by the driver.
> + */

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 7/9] virtio: allow to unbreak virtqueue
@ 2022-05-30 15:15     ` Cornelia Huck
  0 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:15 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, eperezma, lulu, sgarzare,
	xuanzhuo, Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This patch allows the new introduced __virtio_break_device() to
> unbreak the virtqueue.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++++
>  include/linux/virtio.h       |  1 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d0bae4293be..9c231e1fded7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2395,6 +2395,28 @@ void virtio_break_device(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
>  
> +/*
> + * This should allow the device to be used by the driver. You may
> + * need to grab appropriate locks to flush the write to
> + * vq->broken. This should only be used in some specific case e.g
> + * (probing and restoring). This function should only be called by the

Minor: "...some specific cases, e.g. probing and restoring."

But no need to respin from my side.

> + * core, not directly by the driver.
> + */

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-05-27  6:01   ` Jason Wang
@ 2022-05-30 15:18     ` Cornelia Huck
  -1 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:18 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: lulu, paulmck, linux-s390, peterz, maz, Peter Oberparleiter,
	pasic, eperezma, Vineeth Vijayan, tglx

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-05-30 15:18     ` Cornelia Huck
  0 siblings, 0 replies; 106+ messages in thread
From: Cornelia Huck @ 2022-05-30 15:18 UTC (permalink / raw)
  To: Jason Wang, mst, jasowang, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, eperezma, lulu, sgarzare,
	xuanzhuo, Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Fri, May 27 2022, Jason Wang <jasowang@redhat.com> wrote:

> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-05-27  6:01   ` Jason Wang
@ 2022-06-11  5:12     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-11  5:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, tglx, peterz, paulmck, maz, pasic,
	cohuck, eperezma, lulu, sgarzare, xuanzhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390

On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
> 
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
> 
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
> 
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
> 
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Jason, I am really concerned by all the fallout.
I propose adding a flag to suppress the hardening -
this will be a debugging aid and a work around for
users if we find more buggy drivers.

suppress_interrupt_hardening ?


> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index c188e4f20ca3..97e51c34e6cf 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>  	ccw->flags = 0;
>  	ccw->count = sizeof(status);
>  	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> +	/* We use ssch for setting the status which is a serializing
> +	 * instruction that guarantees the memory writes have
> +	 * completed before ssch.
> +	 */
>  	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
>  	/* Write failed? We assume status is unchanged. */
>  	if (ret)
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index aa1eb5132767..95fac4c97c8b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> +	/*
> +	 * The below virtio_synchronize_cbs() guarantees that any
> +	 * interrupt for this line arriving after
> +	 * virtio_synchronize_vqs() has completed is guaranteed to see
> +	 * vq->broken as true.
> +	 */
> +	virtio_break_device(dev);

So make this conditional

> +	virtio_synchronize_cbs(dev);
> +
>  	dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>  	dev->config_enabled = false;
>  	dev->config_change_pending = false;
>  
> +	INIT_LIST_HEAD(&dev->vqs);
> +	spin_lock_init(&dev->vqs_list_lock);
> +
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
>  	virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
> -	INIT_LIST_HEAD(&dev->vqs);
> -	spin_lock_init(&dev->vqs_list_lock);
> -
>  	/*
>  	 * device_add() causes the bus infrastructure to look for a matching
>  	 * driver.
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index c9699a59f93c..f9a36bc7ac27 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
>  	/* We should never be setting status to 0. */
>  	BUG_ON(status == 0);
>  
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
>  }
>  
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 4093f9cca7a6..a0fa14f28a7f 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>  
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	vp_iowrite8(status, &cfg->device_status);
>  }
>  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9c231e1fded7..13a7348cedff 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->we_own_ring = true;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;

and make this conditional

> @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  		return IRQ_NONE;
>  	}
>  
> -	if (unlikely(vq->broken))
> -		return IRQ_HANDLED;
> +	if (unlikely(vq->broken)) {
> +		dev_warn_once(&vq->vq.vdev->dev,
> +			      "virtio vring IRQ raised before DRIVER_OK");
> +		return IRQ_NONE;
> +	}
>  
>  	/* Just a hint for performance: so it's ok that this can be racy! */
>  	if (vq->event)
> @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->we_own_ring = false;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;

and make this conditional

> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 25be018810a7..d4edfd7d91bb 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
>  	unsigned status = dev->config->get_status(dev);
>  
>  	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> +	/*
> +	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> +	 * will see the driver specific setup if it sees vq->broken
> +	 * as false (even if the notifications come before DRIVER_OK).
> +	 */
> +	virtio_synchronize_cbs(dev);
> +	__virtio_unbreak_device(dev);
> +	/*
> +	 * The transport should ensure the visibility of vq->broken
> +	 * before setting DRIVER_OK. See the comments for the transport
> +	 * specific set_status() method.
> +	 *
> +	 * A well behaved device will only notify a virtqueue after
> +	 * DRIVER_OK, this means the device should "see" the coherenct
> +	 * memory write that set vq->broken as false which is done by
> +	 * the driver when it sees DRIVER_OK, then the following
> +	 * driver's vring_interrupt() will see vq->broken as false so
> +	 * we won't lose any notification.
> +	 */
>  	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>  
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-11  5:12     ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-11  5:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, lulu, paulmck, linux-s390, peterz, maz,
	cohuck, linux-kernel, virtualization, pasic, eperezma,
	Vineeth Vijayan, tglx

On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
> 
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>    that is used by some device such as virtio-blk
> 2) done only for PCI transport
> 
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
> 
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
> 
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Jason, I am really concerned by all the fallout.
I propose adding a flag to suppress the hardening -
this will be a debugging aid and a work around for
users if we find more buggy drivers.

suppress_interrupt_hardening ?


> ---
>  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>  drivers/virtio/virtio.c                | 15 ++++++++++++---
>  drivers/virtio/virtio_mmio.c           |  5 +++++
>  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>  drivers/virtio/virtio_ring.c           | 11 +++++++----
>  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>  6 files changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index c188e4f20ca3..97e51c34e6cf 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>  	ccw->flags = 0;
>  	ccw->count = sizeof(status);
>  	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> +	/* We use ssch for setting the status which is a serializing
> +	 * instruction that guarantees the memory writes have
> +	 * completed before ssch.
> +	 */
>  	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
>  	/* Write failed? We assume status is unchanged. */
>  	if (ret)
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index aa1eb5132767..95fac4c97c8b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> +	/*
> +	 * The below virtio_synchronize_cbs() guarantees that any
> +	 * interrupt for this line arriving after
> +	 * virtio_synchronize_vqs() has completed is guaranteed to see
> +	 * vq->broken as true.
> +	 */
> +	virtio_break_device(dev);

So make this conditional

> +	virtio_synchronize_cbs(dev);
> +
>  	dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>  	dev->config_enabled = false;
>  	dev->config_change_pending = false;
>  
> +	INIT_LIST_HEAD(&dev->vqs);
> +	spin_lock_init(&dev->vqs_list_lock);
> +
>  	/* We always start by resetting the device, in case a previous
>  	 * driver messed it up.  This also tests that code path a little. */
>  	virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>  	/* Acknowledge that we've seen the device. */
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
> -	INIT_LIST_HEAD(&dev->vqs);
> -	spin_lock_init(&dev->vqs_list_lock);
> -
>  	/*
>  	 * device_add() causes the bus infrastructure to look for a matching
>  	 * driver.
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index c9699a59f93c..f9a36bc7ac27 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
>  	/* We should never be setting status to 0. */
>  	BUG_ON(status == 0);
>  
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
>  }
>  
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 4093f9cca7a6..a0fa14f28a7f 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
>  {
>  	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>  
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>  	vp_iowrite8(status, &cfg->device_status);
>  }
>  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9c231e1fded7..13a7348cedff 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->we_own_ring = true;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;

and make this conditional

> @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  		return IRQ_NONE;
>  	}
>  
> -	if (unlikely(vq->broken))
> -		return IRQ_HANDLED;
> +	if (unlikely(vq->broken)) {
> +		dev_warn_once(&vq->vq.vdev->dev,
> +			      "virtio vring IRQ raised before DRIVER_OK");
> +		return IRQ_NONE;
> +	}
>  
>  	/* Just a hint for performance: so it's ok that this can be racy! */
>  	if (vq->event)
> @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->we_own_ring = false;
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>  	vq->last_used_idx = 0;
>  	vq->event_triggered = false;
>  	vq->num_added = 0;

and make this conditional

> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 25be018810a7..d4edfd7d91bb 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
>  	unsigned status = dev->config->get_status(dev);
>  
>  	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> +	/*
> +	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> +	 * will see the driver specific setup if it sees vq->broken
> +	 * as false (even if the notifications come before DRIVER_OK).
> +	 */
> +	virtio_synchronize_cbs(dev);
> +	__virtio_unbreak_device(dev);
> +	/*
> +	 * The transport should ensure the visibility of vq->broken
> +	 * before setting DRIVER_OK. See the comments for the transport
> +	 * specific set_status() method.
> +	 *
> +	 * A well behaved device will only notify a virtqueue after
> +	 * DRIVER_OK, this means the device should "see" the coherenct
> +	 * memory write that set vq->broken as false which is done by
> +	 * the driver when it sees DRIVER_OK, then the following
> +	 * driver's vring_interrupt() will see vq->broken as false so
> +	 * we won't lose any notification.
> +	 */
>  	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>  
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-11  5:12     ` Michael S. Tsirkin
@ 2022-06-13  5:26       ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  5:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > This is a rework on the previous IRQ hardening that is done for
> > virtio-pci where several drawbacks were found and were reverted:
> >
> > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> >    that is used by some device such as virtio-blk
> > 2) done only for PCI transport
> >
> > The vq->broken is re-used in this patch for implementing the IRQ
> > hardening. The vq->broken is set to true during both initialization
> > and reset. And the vq->broken is set to false in
> > virtio_device_ready(). Then vring_interrupt() can check and return
> > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > to let the interrupt core aware of such invalid interrupt to prevent
> > IRQ storm.
> >
> > The reason of using a per queue variable instead of a per device one
> > is that we may need it for per queue reset hardening in the future.
> >
> > Note that the hardening is only done for vring interrupt since the
> > config interrupt hardening is already done in commit 22b7050a024d7
> > ("virtio: defer config changed notifications"). But the method that is
> > used by config interrupt can't be reused by the vring interrupt
> > handler because it uses spinlock to do the synchronization which is
> > expensive.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> Jason, I am really concerned by all the fallout.
> I propose adding a flag to suppress the hardening -
> this will be a debugging aid and a work around for
> users if we find more buggy drivers.
>
> suppress_interrupt_hardening ?

I can post a patch but I'm afraid if we disable it by default, it
won't be used by the users so there's no way for us to receive the bug
report. Or we need a plan to enable it by default.

It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
simply warn instead of disable it by default.

Thanks

>
>
> > ---
> >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> >  drivers/virtio/virtio_mmio.c           |  5 +++++
> >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> >  6 files changed, 53 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > index c188e4f20ca3..97e51c34e6cf 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> >       ccw->flags = 0;
> >       ccw->count = sizeof(status);
> >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > +     /* We use ssch for setting the status which is a serializing
> > +      * instruction that guarantees the memory writes have
> > +      * completed before ssch.
> > +      */
> >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> >       /* Write failed? We assume status is unchanged. */
> >       if (ret)
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index aa1eb5132767..95fac4c97c8b 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> >   * */
> >  void virtio_reset_device(struct virtio_device *dev)
> >  {
> > +     /*
> > +      * The below virtio_synchronize_cbs() guarantees that any
> > +      * interrupt for this line arriving after
> > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > +      * vq->broken as true.
> > +      */
> > +     virtio_break_device(dev);
>
> So make this conditional
>
> > +     virtio_synchronize_cbs(dev);
> > +
> >       dev->config->reset(dev);
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> >       dev->config_enabled = false;
> >       dev->config_change_pending = false;
> >
> > +     INIT_LIST_HEAD(&dev->vqs);
> > +     spin_lock_init(&dev->vqs_list_lock);
> > +
> >       /* We always start by resetting the device, in case a previous
> >        * driver messed it up.  This also tests that code path a little. */
> >       virtio_reset_device(dev);
> > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> >       /* Acknowledge that we've seen the device. */
> >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> > -     INIT_LIST_HEAD(&dev->vqs);
> > -     spin_lock_init(&dev->vqs_list_lock);
> > -
> >       /*
> >        * device_add() causes the bus infrastructure to look for a matching
> >        * driver.
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index c9699a59f93c..f9a36bc7ac27 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> >       /* We should never be setting status to 0. */
> >       BUG_ON(status == 0);
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> >  }
> >
> > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > index 4093f9cca7a6..a0fa14f28a7f 100644
> > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> >  {
> >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       vp_iowrite8(status, &cfg->device_status);
> >  }
> >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 9c231e1fded7..13a7348cedff 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->we_own_ring = true;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
>
> and make this conditional
>
> > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >               return IRQ_NONE;
> >       }
> >
> > -     if (unlikely(vq->broken))
> > -             return IRQ_HANDLED;
> > +     if (unlikely(vq->broken)) {
> > +             dev_warn_once(&vq->vq.vdev->dev,
> > +                           "virtio vring IRQ raised before DRIVER_OK");
> > +             return IRQ_NONE;
> > +     }
> >
> >       /* Just a hint for performance: so it's ok that this can be racy! */
> >       if (vq->event)
> > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >       vq->we_own_ring = false;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
>
> and make this conditional
>
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 25be018810a7..d4edfd7d91bb 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> >       unsigned status = dev->config->get_status(dev);
> >
> >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> > +     /*
> > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > +      * will see the driver specific setup if it sees vq->broken
> > +      * as false (even if the notifications come before DRIVER_OK).
> > +      */
> > +     virtio_synchronize_cbs(dev);
> > +     __virtio_unbreak_device(dev);
> > +     /*
> > +      * The transport should ensure the visibility of vq->broken
> > +      * before setting DRIVER_OK. See the comments for the transport
> > +      * specific set_status() method.
> > +      *
> > +      * A well behaved device will only notify a virtqueue after
> > +      * DRIVER_OK, this means the device should "see" the coherenct
> > +      * memory write that set vq->broken as false which is done by
> > +      * the driver when it sees DRIVER_OK, then the following
> > +      * driver's vring_interrupt() will see vq->broken as false so
> > +      * we won't lose any notification.
> > +      */
> >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> >  }
> >
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  5:26       ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  5:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > This is a rework on the previous IRQ hardening that is done for
> > virtio-pci where several drawbacks were found and were reverted:
> >
> > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> >    that is used by some device such as virtio-blk
> > 2) done only for PCI transport
> >
> > The vq->broken is re-used in this patch for implementing the IRQ
> > hardening. The vq->broken is set to true during both initialization
> > and reset. And the vq->broken is set to false in
> > virtio_device_ready(). Then vring_interrupt() can check and return
> > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > to let the interrupt core aware of such invalid interrupt to prevent
> > IRQ storm.
> >
> > The reason of using a per queue variable instead of a per device one
> > is that we may need it for per queue reset hardening in the future.
> >
> > Note that the hardening is only done for vring interrupt since the
> > config interrupt hardening is already done in commit 22b7050a024d7
> > ("virtio: defer config changed notifications"). But the method that is
> > used by config interrupt can't be reused by the vring interrupt
> > handler because it uses spinlock to do the synchronization which is
> > expensive.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> Jason, I am really concerned by all the fallout.
> I propose adding a flag to suppress the hardening -
> this will be a debugging aid and a work around for
> users if we find more buggy drivers.
>
> suppress_interrupt_hardening ?

I can post a patch but I'm afraid if we disable it by default, it
won't be used by the users so there's no way for us to receive the bug
report. Or we need a plan to enable it by default.

It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
simply warn instead of disable it by default.

Thanks

>
>
> > ---
> >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> >  drivers/virtio/virtio_mmio.c           |  5 +++++
> >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> >  6 files changed, 53 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > index c188e4f20ca3..97e51c34e6cf 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> >       ccw->flags = 0;
> >       ccw->count = sizeof(status);
> >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > +     /* We use ssch for setting the status which is a serializing
> > +      * instruction that guarantees the memory writes have
> > +      * completed before ssch.
> > +      */
> >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> >       /* Write failed? We assume status is unchanged. */
> >       if (ret)
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index aa1eb5132767..95fac4c97c8b 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> >   * */
> >  void virtio_reset_device(struct virtio_device *dev)
> >  {
> > +     /*
> > +      * The below virtio_synchronize_cbs() guarantees that any
> > +      * interrupt for this line arriving after
> > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > +      * vq->broken as true.
> > +      */
> > +     virtio_break_device(dev);
>
> So make this conditional
>
> > +     virtio_synchronize_cbs(dev);
> > +
> >       dev->config->reset(dev);
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> >       dev->config_enabled = false;
> >       dev->config_change_pending = false;
> >
> > +     INIT_LIST_HEAD(&dev->vqs);
> > +     spin_lock_init(&dev->vqs_list_lock);
> > +
> >       /* We always start by resetting the device, in case a previous
> >        * driver messed it up.  This also tests that code path a little. */
> >       virtio_reset_device(dev);
> > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> >       /* Acknowledge that we've seen the device. */
> >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> > -     INIT_LIST_HEAD(&dev->vqs);
> > -     spin_lock_init(&dev->vqs_list_lock);
> > -
> >       /*
> >        * device_add() causes the bus infrastructure to look for a matching
> >        * driver.
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index c9699a59f93c..f9a36bc7ac27 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> >       /* We should never be setting status to 0. */
> >       BUG_ON(status == 0);
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> >  }
> >
> > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > index 4093f9cca7a6..a0fa14f28a7f 100644
> > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> >  {
> >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       vp_iowrite8(status, &cfg->device_status);
> >  }
> >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 9c231e1fded7..13a7348cedff 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->we_own_ring = true;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
>
> and make this conditional
>
> > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >               return IRQ_NONE;
> >       }
> >
> > -     if (unlikely(vq->broken))
> > -             return IRQ_HANDLED;
> > +     if (unlikely(vq->broken)) {
> > +             dev_warn_once(&vq->vq.vdev->dev,
> > +                           "virtio vring IRQ raised before DRIVER_OK");
> > +             return IRQ_NONE;
> > +     }
> >
> >       /* Just a hint for performance: so it's ok that this can be racy! */
> >       if (vq->event)
> > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >       vq->we_own_ring = false;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
>
> and make this conditional
>
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 25be018810a7..d4edfd7d91bb 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> >       unsigned status = dev->config->get_status(dev);
> >
> >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> > +     /*
> > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > +      * will see the driver specific setup if it sees vq->broken
> > +      * as false (even if the notifications come before DRIVER_OK).
> > +      */
> > +     virtio_synchronize_cbs(dev);
> > +     __virtio_unbreak_device(dev);
> > +     /*
> > +      * The transport should ensure the visibility of vq->broken
> > +      * before setting DRIVER_OK. See the comments for the transport
> > +      * specific set_status() method.
> > +      *
> > +      * A well behaved device will only notify a virtqueue after
> > +      * DRIVER_OK, this means the device should "see" the coherenct
> > +      * memory write that set vq->broken as false which is done by
> > +      * the driver when it sees DRIVER_OK, then the following
> > +      * driver's vring_interrupt() will see vq->broken as false so
> > +      * we won't lose any notification.
> > +      */
> >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> >  }
> >
> > --
> > 2.25.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  5:26       ` Jason Wang
@ 2022-06-13  7:23         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  7:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > This is a rework on the previous IRQ hardening that is done for
> > > virtio-pci where several drawbacks were found and were reverted:
> > >
> > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > >    that is used by some device such as virtio-blk
> > > 2) done only for PCI transport
> > >
> > > The vq->broken is re-used in this patch for implementing the IRQ
> > > hardening. The vq->broken is set to true during both initialization
> > > and reset. And the vq->broken is set to false in
> > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > to let the interrupt core aware of such invalid interrupt to prevent
> > > IRQ storm.
> > >
> > > The reason of using a per queue variable instead of a per device one
> > > is that we may need it for per queue reset hardening in the future.
> > >
> > > Note that the hardening is only done for vring interrupt since the
> > > config interrupt hardening is already done in commit 22b7050a024d7
> > > ("virtio: defer config changed notifications"). But the method that is
> > > used by config interrupt can't be reused by the vring interrupt
> > > handler because it uses spinlock to do the synchronization which is
> > > expensive.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: Marc Zyngier <maz@kernel.org>
> > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > Cc: linux-s390@vger.kernel.org
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> >
> > Jason, I am really concerned by all the fallout.
> > I propose adding a flag to suppress the hardening -
> > this will be a debugging aid and a work around for
> > users if we find more buggy drivers.
> >
> > suppress_interrupt_hardening ?
> 
> I can post a patch but I'm afraid if we disable it by default, it
> won't be used by the users so there's no way for us to receive the bug
> report. Or we need a plan to enable it by default.
> 
> It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> simply warn instead of disable it by default.
> 
> Thanks

I meant more like a flag in struct virtio_driver.
For now, could you audit all drivers which don't call _ready?
I found 5 of these:

drivers/bluetooth/virtio_bt.c
drivers/gpu/drm/virtio/virtgpu_drv.c
drivers/i2c/busses/i2c-virtio.c
drivers/net/caif/caif_virtio.c
drivers/nvdimm/virtio_pmem.c




> >
> >
> > > ---
> > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > index c188e4f20ca3..97e51c34e6cf 100644
> > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > >       ccw->flags = 0;
> > >       ccw->count = sizeof(status);
> > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > +     /* We use ssch for setting the status which is a serializing
> > > +      * instruction that guarantees the memory writes have
> > > +      * completed before ssch.
> > > +      */
> > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > >       /* Write failed? We assume status is unchanged. */
> > >       if (ret)
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index aa1eb5132767..95fac4c97c8b 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > >   * */
> > >  void virtio_reset_device(struct virtio_device *dev)
> > >  {
> > > +     /*
> > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > +      * interrupt for this line arriving after
> > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > +      * vq->broken as true.
> > > +      */
> > > +     virtio_break_device(dev);
> >
> > So make this conditional
> >
> > > +     virtio_synchronize_cbs(dev);
> > > +
> > >       dev->config->reset(dev);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > >       dev->config_enabled = false;
> > >       dev->config_change_pending = false;
> > >
> > > +     INIT_LIST_HEAD(&dev->vqs);
> > > +     spin_lock_init(&dev->vqs_list_lock);
> > > +
> > >       /* We always start by resetting the device, in case a previous
> > >        * driver messed it up.  This also tests that code path a little. */
> > >       virtio_reset_device(dev);
> > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > >       /* Acknowledge that we've seen the device. */
> > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > >
> > > -     INIT_LIST_HEAD(&dev->vqs);
> > > -     spin_lock_init(&dev->vqs_list_lock);
> > > -
> > >       /*
> > >        * device_add() causes the bus infrastructure to look for a matching
> > >        * driver.
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index c9699a59f93c..f9a36bc7ac27 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > >       /* We should never be setting status to 0. */
> > >       BUG_ON(status == 0);
> > >
> > > +     /*
> > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > +      * that the the cache coherent memory writes have completed
> > > +      * before writing to the MMIO region.
> > > +      */
> > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > >  }
> > >
> > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > >  {
> > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > >
> > > +     /*
> > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > +      * that the the cache coherent memory writes have completed
> > > +      * before writing to the MMIO region.
> > > +      */
> > >       vp_iowrite8(status, &cfg->device_status);
> > >  }
> > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 9c231e1fded7..13a7348cedff 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >       vq->we_own_ring = true;
> > >       vq->notify = notify;
> > >       vq->weak_barriers = weak_barriers;
> > > -     vq->broken = false;
> > > +     vq->broken = true;
> > >       vq->last_used_idx = 0;
> > >       vq->event_triggered = false;
> > >       vq->num_added = 0;
> >
> > and make this conditional
> >
> > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > >               return IRQ_NONE;
> > >       }
> > >
> > > -     if (unlikely(vq->broken))
> > > -             return IRQ_HANDLED;
> > > +     if (unlikely(vq->broken)) {
> > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > +             return IRQ_NONE;
> > > +     }
> > >
> > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > >       if (vq->event)
> > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >       vq->we_own_ring = false;
> > >       vq->notify = notify;
> > >       vq->weak_barriers = weak_barriers;
> > > -     vq->broken = false;
> > > +     vq->broken = true;
> > >       vq->last_used_idx = 0;
> > >       vq->event_triggered = false;
> > >       vq->num_added = 0;
> >
> > and make this conditional
> >
> > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > index 25be018810a7..d4edfd7d91bb 100644
> > > --- a/include/linux/virtio_config.h
> > > +++ b/include/linux/virtio_config.h
> > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > >       unsigned status = dev->config->get_status(dev);
> > >
> > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > +
> > > +     /*
> > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > +      * will see the driver specific setup if it sees vq->broken
> > > +      * as false (even if the notifications come before DRIVER_OK).
> > > +      */
> > > +     virtio_synchronize_cbs(dev);
> > > +     __virtio_unbreak_device(dev);
> > > +     /*
> > > +      * The transport should ensure the visibility of vq->broken
> > > +      * before setting DRIVER_OK. See the comments for the transport
> > > +      * specific set_status() method.
> > > +      *
> > > +      * A well behaved device will only notify a virtqueue after
> > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > +      * memory write that set vq->broken as false which is done by
> > > +      * the driver when it sees DRIVER_OK, then the following
> > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > +      * we won't lose any notification.
> > > +      */
> > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > >  }
> > >
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  7:23         ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  7:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > This is a rework on the previous IRQ hardening that is done for
> > > virtio-pci where several drawbacks were found and were reverted:
> > >
> > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > >    that is used by some device such as virtio-blk
> > > 2) done only for PCI transport
> > >
> > > The vq->broken is re-used in this patch for implementing the IRQ
> > > hardening. The vq->broken is set to true during both initialization
> > > and reset. And the vq->broken is set to false in
> > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > to let the interrupt core aware of such invalid interrupt to prevent
> > > IRQ storm.
> > >
> > > The reason of using a per queue variable instead of a per device one
> > > is that we may need it for per queue reset hardening in the future.
> > >
> > > Note that the hardening is only done for vring interrupt since the
> > > config interrupt hardening is already done in commit 22b7050a024d7
> > > ("virtio: defer config changed notifications"). But the method that is
> > > used by config interrupt can't be reused by the vring interrupt
> > > handler because it uses spinlock to do the synchronization which is
> > > expensive.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: Marc Zyngier <maz@kernel.org>
> > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > Cc: linux-s390@vger.kernel.org
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> >
> > Jason, I am really concerned by all the fallout.
> > I propose adding a flag to suppress the hardening -
> > this will be a debugging aid and a work around for
> > users if we find more buggy drivers.
> >
> > suppress_interrupt_hardening ?
> 
> I can post a patch but I'm afraid if we disable it by default, it
> won't be used by the users so there's no way for us to receive the bug
> report. Or we need a plan to enable it by default.
> 
> It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> simply warn instead of disable it by default.
> 
> Thanks

I meant more like a flag in struct virtio_driver.
For now, could you audit all drivers which don't call _ready?
I found 5 of these:

drivers/bluetooth/virtio_bt.c
drivers/gpu/drm/virtio/virtgpu_drv.c
drivers/i2c/busses/i2c-virtio.c
drivers/net/caif/caif_virtio.c
drivers/nvdimm/virtio_pmem.c




> >
> >
> > > ---
> > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > index c188e4f20ca3..97e51c34e6cf 100644
> > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > >       ccw->flags = 0;
> > >       ccw->count = sizeof(status);
> > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > +     /* We use ssch for setting the status which is a serializing
> > > +      * instruction that guarantees the memory writes have
> > > +      * completed before ssch.
> > > +      */
> > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > >       /* Write failed? We assume status is unchanged. */
> > >       if (ret)
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index aa1eb5132767..95fac4c97c8b 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > >   * */
> > >  void virtio_reset_device(struct virtio_device *dev)
> > >  {
> > > +     /*
> > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > +      * interrupt for this line arriving after
> > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > +      * vq->broken as true.
> > > +      */
> > > +     virtio_break_device(dev);
> >
> > So make this conditional
> >
> > > +     virtio_synchronize_cbs(dev);
> > > +
> > >       dev->config->reset(dev);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > >       dev->config_enabled = false;
> > >       dev->config_change_pending = false;
> > >
> > > +     INIT_LIST_HEAD(&dev->vqs);
> > > +     spin_lock_init(&dev->vqs_list_lock);
> > > +
> > >       /* We always start by resetting the device, in case a previous
> > >        * driver messed it up.  This also tests that code path a little. */
> > >       virtio_reset_device(dev);
> > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > >       /* Acknowledge that we've seen the device. */
> > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > >
> > > -     INIT_LIST_HEAD(&dev->vqs);
> > > -     spin_lock_init(&dev->vqs_list_lock);
> > > -
> > >       /*
> > >        * device_add() causes the bus infrastructure to look for a matching
> > >        * driver.
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index c9699a59f93c..f9a36bc7ac27 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > >       /* We should never be setting status to 0. */
> > >       BUG_ON(status == 0);
> > >
> > > +     /*
> > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > +      * that the the cache coherent memory writes have completed
> > > +      * before writing to the MMIO region.
> > > +      */
> > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > >  }
> > >
> > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > >  {
> > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > >
> > > +     /*
> > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > +      * that the the cache coherent memory writes have completed
> > > +      * before writing to the MMIO region.
> > > +      */
> > >       vp_iowrite8(status, &cfg->device_status);
> > >  }
> > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 9c231e1fded7..13a7348cedff 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >       vq->we_own_ring = true;
> > >       vq->notify = notify;
> > >       vq->weak_barriers = weak_barriers;
> > > -     vq->broken = false;
> > > +     vq->broken = true;
> > >       vq->last_used_idx = 0;
> > >       vq->event_triggered = false;
> > >       vq->num_added = 0;
> >
> > and make this conditional
> >
> > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > >               return IRQ_NONE;
> > >       }
> > >
> > > -     if (unlikely(vq->broken))
> > > -             return IRQ_HANDLED;
> > > +     if (unlikely(vq->broken)) {
> > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > +             return IRQ_NONE;
> > > +     }
> > >
> > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > >       if (vq->event)
> > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >       vq->we_own_ring = false;
> > >       vq->notify = notify;
> > >       vq->weak_barriers = weak_barriers;
> > > -     vq->broken = false;
> > > +     vq->broken = true;
> > >       vq->last_used_idx = 0;
> > >       vq->event_triggered = false;
> > >       vq->num_added = 0;
> >
> > and make this conditional
> >
> > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > index 25be018810a7..d4edfd7d91bb 100644
> > > --- a/include/linux/virtio_config.h
> > > +++ b/include/linux/virtio_config.h
> > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > >       unsigned status = dev->config->get_status(dev);
> > >
> > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > +
> > > +     /*
> > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > +      * will see the driver specific setup if it sees vq->broken
> > > +      * as false (even if the notifications come before DRIVER_OK).
> > > +      */
> > > +     virtio_synchronize_cbs(dev);
> > > +     __virtio_unbreak_device(dev);
> > > +     /*
> > > +      * The transport should ensure the visibility of vq->broken
> > > +      * before setting DRIVER_OK. See the comments for the transport
> > > +      * specific set_status() method.
> > > +      *
> > > +      * A well behaved device will only notify a virtqueue after
> > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > +      * memory write that set vq->broken as false which is done by
> > > +      * the driver when it sees DRIVER_OK, then the following
> > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > +      * we won't lose any notification.
> > > +      */
> > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > >  }
> > >
> > > --
> > > 2.25.1
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  7:23         ` Michael S. Tsirkin
@ 2022-06-13  8:07           ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > This is a rework on the previous IRQ hardening that is done for
> > > > virtio-pci where several drawbacks were found and were reverted:
> > > >
> > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > >    that is used by some device such as virtio-blk
> > > > 2) done only for PCI transport
> > > >
> > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > hardening. The vq->broken is set to true during both initialization
> > > > and reset. And the vq->broken is set to false in
> > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > IRQ storm.
> > > >
> > > > The reason of using a per queue variable instead of a per device one
> > > > is that we may need it for per queue reset hardening in the future.
> > > >
> > > > Note that the hardening is only done for vring interrupt since the
> > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > ("virtio: defer config changed notifications"). But the method that is
> > > > used by config interrupt can't be reused by the vring interrupt
> > > > handler because it uses spinlock to do the synchronization which is
> > > > expensive.
> > > >
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > Cc: linux-s390@vger.kernel.org
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > >
> > > Jason, I am really concerned by all the fallout.
> > > I propose adding a flag to suppress the hardening -
> > > this will be a debugging aid and a work around for
> > > users if we find more buggy drivers.
> > >
> > > suppress_interrupt_hardening ?
> >
> > I can post a patch but I'm afraid if we disable it by default, it
> > won't be used by the users so there's no way for us to receive the bug
> > report. Or we need a plan to enable it by default.
> >
> > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > simply warn instead of disable it by default.
> >
> > Thanks
>
> I meant more like a flag in struct virtio_driver.
> For now, could you audit all drivers which don't call _ready?
> I found 5 of these:
>
> drivers/bluetooth/virtio_bt.c

This driver seems to be fine, it doesn't use the device/vq in its probe().

> drivers/gpu/drm/virtio/virtgpu_drv.c

It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
me the code is correct.

> drivers/i2c/busses/i2c-virtio.c
> drivers/net/caif/caif_virtio.c
> drivers/nvdimm/virtio_pmem.c

The above looks fine and we have three more:

arm_scmi: probe() doesn't use vq
mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
it looks to me we need a device_ready before the kick.
virtio_rpmsg_bus.c: doesn't use vq

I will post a patch for mac80211_hwsim.c.

Thanks

>
>
>
>
> > >
> > >
> > > > ---
> > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > >       ccw->flags = 0;
> > > >       ccw->count = sizeof(status);
> > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > +     /* We use ssch for setting the status which is a serializing
> > > > +      * instruction that guarantees the memory writes have
> > > > +      * completed before ssch.
> > > > +      */
> > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > >       /* Write failed? We assume status is unchanged. */
> > > >       if (ret)
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > >   * */
> > > >  void virtio_reset_device(struct virtio_device *dev)
> > > >  {
> > > > +     /*
> > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > +      * interrupt for this line arriving after
> > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > +      * vq->broken as true.
> > > > +      */
> > > > +     virtio_break_device(dev);
> > >
> > > So make this conditional
> > >
> > > > +     virtio_synchronize_cbs(dev);
> > > > +
> > > >       dev->config->reset(dev);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > >       dev->config_enabled = false;
> > > >       dev->config_change_pending = false;
> > > >
> > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > +
> > > >       /* We always start by resetting the device, in case a previous
> > > >        * driver messed it up.  This also tests that code path a little. */
> > > >       virtio_reset_device(dev);
> > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > >       /* Acknowledge that we've seen the device. */
> > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > >
> > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > -
> > > >       /*
> > > >        * device_add() causes the bus infrastructure to look for a matching
> > > >        * driver.
> > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > --- a/drivers/virtio/virtio_mmio.c
> > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > >       /* We should never be setting status to 0. */
> > > >       BUG_ON(status == 0);
> > > >
> > > > +     /*
> > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > +      * that the the cache coherent memory writes have completed
> > > > +      * before writing to the MMIO region.
> > > > +      */
> > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > >  }
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > >  {
> > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > >
> > > > +     /*
> > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > +      * that the the cache coherent memory writes have completed
> > > > +      * before writing to the MMIO region.
> > > > +      */
> > > >       vp_iowrite8(status, &cfg->device_status);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 9c231e1fded7..13a7348cedff 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >       vq->we_own_ring = true;
> > > >       vq->notify = notify;
> > > >       vq->weak_barriers = weak_barriers;
> > > > -     vq->broken = false;
> > > > +     vq->broken = true;
> > > >       vq->last_used_idx = 0;
> > > >       vq->event_triggered = false;
> > > >       vq->num_added = 0;
> > >
> > > and make this conditional
> > >
> > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > >               return IRQ_NONE;
> > > >       }
> > > >
> > > > -     if (unlikely(vq->broken))
> > > > -             return IRQ_HANDLED;
> > > > +     if (unlikely(vq->broken)) {
> > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > +             return IRQ_NONE;
> > > > +     }
> > > >
> > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > >       if (vq->event)
> > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >       vq->we_own_ring = false;
> > > >       vq->notify = notify;
> > > >       vq->weak_barriers = weak_barriers;
> > > > -     vq->broken = false;
> > > > +     vq->broken = true;
> > > >       vq->last_used_idx = 0;
> > > >       vq->event_triggered = false;
> > > >       vq->num_added = 0;
> > >
> > > and make this conditional
> > >
> > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > --- a/include/linux/virtio_config.h
> > > > +++ b/include/linux/virtio_config.h
> > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > >       unsigned status = dev->config->get_status(dev);
> > > >
> > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > +
> > > > +     /*
> > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > +      * will see the driver specific setup if it sees vq->broken
> > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > +      */
> > > > +     virtio_synchronize_cbs(dev);
> > > > +     __virtio_unbreak_device(dev);
> > > > +     /*
> > > > +      * The transport should ensure the visibility of vq->broken
> > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > +      * specific set_status() method.
> > > > +      *
> > > > +      * A well behaved device will only notify a virtqueue after
> > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > +      * memory write that set vq->broken as false which is done by
> > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > +      * we won't lose any notification.
> > > > +      */
> > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > >  }
> > > >
> > > > --
> > > > 2.25.1
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  8:07           ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > This is a rework on the previous IRQ hardening that is done for
> > > > virtio-pci where several drawbacks were found and were reverted:
> > > >
> > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > >    that is used by some device such as virtio-blk
> > > > 2) done only for PCI transport
> > > >
> > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > hardening. The vq->broken is set to true during both initialization
> > > > and reset. And the vq->broken is set to false in
> > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > IRQ storm.
> > > >
> > > > The reason of using a per queue variable instead of a per device one
> > > > is that we may need it for per queue reset hardening in the future.
> > > >
> > > > Note that the hardening is only done for vring interrupt since the
> > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > ("virtio: defer config changed notifications"). But the method that is
> > > > used by config interrupt can't be reused by the vring interrupt
> > > > handler because it uses spinlock to do the synchronization which is
> > > > expensive.
> > > >
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > Cc: linux-s390@vger.kernel.org
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > >
> > > Jason, I am really concerned by all the fallout.
> > > I propose adding a flag to suppress the hardening -
> > > this will be a debugging aid and a work around for
> > > users if we find more buggy drivers.
> > >
> > > suppress_interrupt_hardening ?
> >
> > I can post a patch but I'm afraid if we disable it by default, it
> > won't be used by the users so there's no way for us to receive the bug
> > report. Or we need a plan to enable it by default.
> >
> > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > simply warn instead of disable it by default.
> >
> > Thanks
>
> I meant more like a flag in struct virtio_driver.
> For now, could you audit all drivers which don't call _ready?
> I found 5 of these:
>
> drivers/bluetooth/virtio_bt.c

This driver seems to be fine, it doesn't use the device/vq in its probe().

> drivers/gpu/drm/virtio/virtgpu_drv.c

It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
me the code is correct.

> drivers/i2c/busses/i2c-virtio.c
> drivers/net/caif/caif_virtio.c
> drivers/nvdimm/virtio_pmem.c

The above looks fine and we have three more:

arm_scmi: probe() doesn't use vq
mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
it looks to me we need a device_ready before the kick.
virtio_rpmsg_bus.c: doesn't use vq

I will post a patch for mac80211_hwsim.c.

Thanks

>
>
>
>
> > >
> > >
> > > > ---
> > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > >       ccw->flags = 0;
> > > >       ccw->count = sizeof(status);
> > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > +     /* We use ssch for setting the status which is a serializing
> > > > +      * instruction that guarantees the memory writes have
> > > > +      * completed before ssch.
> > > > +      */
> > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > >       /* Write failed? We assume status is unchanged. */
> > > >       if (ret)
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > >   * */
> > > >  void virtio_reset_device(struct virtio_device *dev)
> > > >  {
> > > > +     /*
> > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > +      * interrupt for this line arriving after
> > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > +      * vq->broken as true.
> > > > +      */
> > > > +     virtio_break_device(dev);
> > >
> > > So make this conditional
> > >
> > > > +     virtio_synchronize_cbs(dev);
> > > > +
> > > >       dev->config->reset(dev);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > >       dev->config_enabled = false;
> > > >       dev->config_change_pending = false;
> > > >
> > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > +
> > > >       /* We always start by resetting the device, in case a previous
> > > >        * driver messed it up.  This also tests that code path a little. */
> > > >       virtio_reset_device(dev);
> > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > >       /* Acknowledge that we've seen the device. */
> > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > >
> > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > -
> > > >       /*
> > > >        * device_add() causes the bus infrastructure to look for a matching
> > > >        * driver.
> > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > --- a/drivers/virtio/virtio_mmio.c
> > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > >       /* We should never be setting status to 0. */
> > > >       BUG_ON(status == 0);
> > > >
> > > > +     /*
> > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > +      * that the the cache coherent memory writes have completed
> > > > +      * before writing to the MMIO region.
> > > > +      */
> > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > >  }
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > >  {
> > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > >
> > > > +     /*
> > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > +      * that the the cache coherent memory writes have completed
> > > > +      * before writing to the MMIO region.
> > > > +      */
> > > >       vp_iowrite8(status, &cfg->device_status);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 9c231e1fded7..13a7348cedff 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >       vq->we_own_ring = true;
> > > >       vq->notify = notify;
> > > >       vq->weak_barriers = weak_barriers;
> > > > -     vq->broken = false;
> > > > +     vq->broken = true;
> > > >       vq->last_used_idx = 0;
> > > >       vq->event_triggered = false;
> > > >       vq->num_added = 0;
> > >
> > > and make this conditional
> > >
> > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > >               return IRQ_NONE;
> > > >       }
> > > >
> > > > -     if (unlikely(vq->broken))
> > > > -             return IRQ_HANDLED;
> > > > +     if (unlikely(vq->broken)) {
> > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > +             return IRQ_NONE;
> > > > +     }
> > > >
> > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > >       if (vq->event)
> > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >       vq->we_own_ring = false;
> > > >       vq->notify = notify;
> > > >       vq->weak_barriers = weak_barriers;
> > > > -     vq->broken = false;
> > > > +     vq->broken = true;
> > > >       vq->last_used_idx = 0;
> > > >       vq->event_triggered = false;
> > > >       vq->num_added = 0;
> > >
> > > and make this conditional
> > >
> > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > --- a/include/linux/virtio_config.h
> > > > +++ b/include/linux/virtio_config.h
> > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > >       unsigned status = dev->config->get_status(dev);
> > > >
> > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > +
> > > > +     /*
> > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > +      * will see the driver specific setup if it sees vq->broken
> > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > +      */
> > > > +     virtio_synchronize_cbs(dev);
> > > > +     __virtio_unbreak_device(dev);
> > > > +     /*
> > > > +      * The transport should ensure the visibility of vq->broken
> > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > +      * specific set_status() method.
> > > > +      *
> > > > +      * A well behaved device will only notify a virtqueue after
> > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > +      * memory write that set vq->broken as false which is done by
> > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > +      * we won't lose any notification.
> > > > +      */
> > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > >  }
> > > >
> > > > --
> > > > 2.25.1
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  8:07           ` Jason Wang
@ 2022-06-13  8:19             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  8:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > >
> > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > >    that is used by some device such as virtio-blk
> > > > > 2) done only for PCI transport
> > > > >
> > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > hardening. The vq->broken is set to true during both initialization
> > > > > and reset. And the vq->broken is set to false in
> > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > IRQ storm.
> > > > >
> > > > > The reason of using a per queue variable instead of a per device one
> > > > > is that we may need it for per queue reset hardening in the future.
> > > > >
> > > > > Note that the hardening is only done for vring interrupt since the
> > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > handler because it uses spinlock to do the synchronization which is
> > > > > expensive.
> > > > >
> > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > Cc: linux-s390@vger.kernel.org
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > >
> > > > Jason, I am really concerned by all the fallout.
> > > > I propose adding a flag to suppress the hardening -
> > > > this will be a debugging aid and a work around for
> > > > users if we find more buggy drivers.
> > > >
> > > > suppress_interrupt_hardening ?
> > >
> > > I can post a patch but I'm afraid if we disable it by default, it
> > > won't be used by the users so there's no way for us to receive the bug
> > > report. Or we need a plan to enable it by default.
> > >
> > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > simply warn instead of disable it by default.
> > >
> > > Thanks
> >
> > I meant more like a flag in struct virtio_driver.
> > For now, could you audit all drivers which don't call _ready?
> > I found 5 of these:
> >
> > drivers/bluetooth/virtio_bt.c
> 
> This driver seems to be fine, it doesn't use the device/vq in its probe().


But it calls hci_register_dev and that in turn queues all kind of
work. Also, can linux start using the device immediately after
it's registered?


> > drivers/gpu/drm/virtio/virtgpu_drv.c
> 
> It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> me the code is correct.

OK.

> > drivers/i2c/busses/i2c-virtio.c
> > drivers/net/caif/caif_virtio.c
> > drivers/nvdimm/virtio_pmem.c
> 
> The above looks fine and we have three more:
> 
> arm_scmi: probe() doesn't use vq
> mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> it looks to me we need a device_ready before the kick.
> virtio_rpmsg_bus.c: doesn't use vq
> 
> I will post a patch for mac80211_hwsim.c.
> Thanks

Same comments for all of the above. Might linux not start using the
device once it's registered?

> >
> >
> >
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > >       ccw->flags = 0;
> > > > >       ccw->count = sizeof(status);
> > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > +      * instruction that guarantees the memory writes have
> > > > > +      * completed before ssch.
> > > > > +      */
> > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > >       /* Write failed? We assume status is unchanged. */
> > > > >       if (ret)
> > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > --- a/drivers/virtio/virtio.c
> > > > > +++ b/drivers/virtio/virtio.c
> > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > >   * */
> > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > >  {
> > > > > +     /*
> > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > +      * interrupt for this line arriving after
> > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > +      * vq->broken as true.
> > > > > +      */
> > > > > +     virtio_break_device(dev);
> > > >
> > > > So make this conditional
> > > >
> > > > > +     virtio_synchronize_cbs(dev);
> > > > > +
> > > > >       dev->config->reset(dev);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > >       dev->config_enabled = false;
> > > > >       dev->config_change_pending = false;
> > > > >
> > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > +
> > > > >       /* We always start by resetting the device, in case a previous
> > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > >       virtio_reset_device(dev);
> > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > >       /* Acknowledge that we've seen the device. */
> > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > >
> > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > -
> > > > >       /*
> > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > >        * driver.
> > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > >       /* We should never be setting status to 0. */
> > > > >       BUG_ON(status == 0);
> > > > >
> > > > > +     /*
> > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > +      * that the the cache coherent memory writes have completed
> > > > > +      * before writing to the MMIO region.
> > > > > +      */
> > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > >  }
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > >  {
> > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > >
> > > > > +     /*
> > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > +      * that the the cache coherent memory writes have completed
> > > > > +      * before writing to the MMIO region.
> > > > > +      */
> > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > >       vq->we_own_ring = true;
> > > > >       vq->notify = notify;
> > > > >       vq->weak_barriers = weak_barriers;
> > > > > -     vq->broken = false;
> > > > > +     vq->broken = true;
> > > > >       vq->last_used_idx = 0;
> > > > >       vq->event_triggered = false;
> > > > >       vq->num_added = 0;
> > > >
> > > > and make this conditional
> > > >
> > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > >               return IRQ_NONE;
> > > > >       }
> > > > >
> > > > > -     if (unlikely(vq->broken))
> > > > > -             return IRQ_HANDLED;
> > > > > +     if (unlikely(vq->broken)) {
> > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > +             return IRQ_NONE;
> > > > > +     }
> > > > >
> > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > >       if (vq->event)
> > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > >       vq->we_own_ring = false;
> > > > >       vq->notify = notify;
> > > > >       vq->weak_barriers = weak_barriers;
> > > > > -     vq->broken = false;
> > > > > +     vq->broken = true;
> > > > >       vq->last_used_idx = 0;
> > > > >       vq->event_triggered = false;
> > > > >       vq->num_added = 0;
> > > >
> > > > and make this conditional
> > > >
> > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > --- a/include/linux/virtio_config.h
> > > > > +++ b/include/linux/virtio_config.h
> > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > >       unsigned status = dev->config->get_status(dev);
> > > > >
> > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > +
> > > > > +     /*
> > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > +      */
> > > > > +     virtio_synchronize_cbs(dev);
> > > > > +     __virtio_unbreak_device(dev);
> > > > > +     /*
> > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > +      * specific set_status() method.
> > > > > +      *
> > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > +      * memory write that set vq->broken as false which is done by
> > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > +      * we won't lose any notification.
> > > > > +      */
> > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > >  }
> > > > >
> > > > > --
> > > > > 2.25.1
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  8:19             ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  8:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > >
> > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > >    that is used by some device such as virtio-blk
> > > > > 2) done only for PCI transport
> > > > >
> > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > hardening. The vq->broken is set to true during both initialization
> > > > > and reset. And the vq->broken is set to false in
> > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > IRQ storm.
> > > > >
> > > > > The reason of using a per queue variable instead of a per device one
> > > > > is that we may need it for per queue reset hardening in the future.
> > > > >
> > > > > Note that the hardening is only done for vring interrupt since the
> > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > handler because it uses spinlock to do the synchronization which is
> > > > > expensive.
> > > > >
> > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > Cc: linux-s390@vger.kernel.org
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > >
> > > > Jason, I am really concerned by all the fallout.
> > > > I propose adding a flag to suppress the hardening -
> > > > this will be a debugging aid and a work around for
> > > > users if we find more buggy drivers.
> > > >
> > > > suppress_interrupt_hardening ?
> > >
> > > I can post a patch but I'm afraid if we disable it by default, it
> > > won't be used by the users so there's no way for us to receive the bug
> > > report. Or we need a plan to enable it by default.
> > >
> > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > simply warn instead of disable it by default.
> > >
> > > Thanks
> >
> > I meant more like a flag in struct virtio_driver.
> > For now, could you audit all drivers which don't call _ready?
> > I found 5 of these:
> >
> > drivers/bluetooth/virtio_bt.c
> 
> This driver seems to be fine, it doesn't use the device/vq in its probe().


But it calls hci_register_dev and that in turn queues all kind of
work. Also, can linux start using the device immediately after
it's registered?


> > drivers/gpu/drm/virtio/virtgpu_drv.c
> 
> It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> me the code is correct.

OK.

> > drivers/i2c/busses/i2c-virtio.c
> > drivers/net/caif/caif_virtio.c
> > drivers/nvdimm/virtio_pmem.c
> 
> The above looks fine and we have three more:
> 
> arm_scmi: probe() doesn't use vq
> mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> it looks to me we need a device_ready before the kick.
> virtio_rpmsg_bus.c: doesn't use vq
> 
> I will post a patch for mac80211_hwsim.c.
> Thanks

Same comments for all of the above. Might linux not start using the
device once it's registered?

> >
> >
> >
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > >       ccw->flags = 0;
> > > > >       ccw->count = sizeof(status);
> > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > +      * instruction that guarantees the memory writes have
> > > > > +      * completed before ssch.
> > > > > +      */
> > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > >       /* Write failed? We assume status is unchanged. */
> > > > >       if (ret)
> > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > --- a/drivers/virtio/virtio.c
> > > > > +++ b/drivers/virtio/virtio.c
> > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > >   * */
> > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > >  {
> > > > > +     /*
> > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > +      * interrupt for this line arriving after
> > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > +      * vq->broken as true.
> > > > > +      */
> > > > > +     virtio_break_device(dev);
> > > >
> > > > So make this conditional
> > > >
> > > > > +     virtio_synchronize_cbs(dev);
> > > > > +
> > > > >       dev->config->reset(dev);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > >       dev->config_enabled = false;
> > > > >       dev->config_change_pending = false;
> > > > >
> > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > +
> > > > >       /* We always start by resetting the device, in case a previous
> > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > >       virtio_reset_device(dev);
> > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > >       /* Acknowledge that we've seen the device. */
> > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > >
> > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > -
> > > > >       /*
> > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > >        * driver.
> > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > >       /* We should never be setting status to 0. */
> > > > >       BUG_ON(status == 0);
> > > > >
> > > > > +     /*
> > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > +      * that the the cache coherent memory writes have completed
> > > > > +      * before writing to the MMIO region.
> > > > > +      */
> > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > >  }
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > >  {
> > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > >
> > > > > +     /*
> > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > +      * that the the cache coherent memory writes have completed
> > > > > +      * before writing to the MMIO region.
> > > > > +      */
> > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > >       vq->we_own_ring = true;
> > > > >       vq->notify = notify;
> > > > >       vq->weak_barriers = weak_barriers;
> > > > > -     vq->broken = false;
> > > > > +     vq->broken = true;
> > > > >       vq->last_used_idx = 0;
> > > > >       vq->event_triggered = false;
> > > > >       vq->num_added = 0;
> > > >
> > > > and make this conditional
> > > >
> > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > >               return IRQ_NONE;
> > > > >       }
> > > > >
> > > > > -     if (unlikely(vq->broken))
> > > > > -             return IRQ_HANDLED;
> > > > > +     if (unlikely(vq->broken)) {
> > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > +             return IRQ_NONE;
> > > > > +     }
> > > > >
> > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > >       if (vq->event)
> > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > >       vq->we_own_ring = false;
> > > > >       vq->notify = notify;
> > > > >       vq->weak_barriers = weak_barriers;
> > > > > -     vq->broken = false;
> > > > > +     vq->broken = true;
> > > > >       vq->last_used_idx = 0;
> > > > >       vq->event_triggered = false;
> > > > >       vq->num_added = 0;
> > > >
> > > > and make this conditional
> > > >
> > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > --- a/include/linux/virtio_config.h
> > > > > +++ b/include/linux/virtio_config.h
> > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > >       unsigned status = dev->config->get_status(dev);
> > > > >
> > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > +
> > > > > +     /*
> > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > +      */
> > > > > +     virtio_synchronize_cbs(dev);
> > > > > +     __virtio_unbreak_device(dev);
> > > > > +     /*
> > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > +      * specific set_status() method.
> > > > > +      *
> > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > +      * memory write that set vq->broken as false which is done by
> > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > +      * we won't lose any notification.
> > > > > +      */
> > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > >  }
> > > > >
> > > > > --
> > > > > 2.25.1
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  8:19             ` Michael S. Tsirkin
@ 2022-06-13  8:51               ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  8:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > >
> > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > >    that is used by some device such as virtio-blk
> > > > > > 2) done only for PCI transport
> > > > > >
> > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > and reset. And the vq->broken is set to false in
> > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > IRQ storm.
> > > > > >
> > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > >
> > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > expensive.
> > > > > >
> > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > >
> > > > > Jason, I am really concerned by all the fallout.
> > > > > I propose adding a flag to suppress the hardening -
> > > > > this will be a debugging aid and a work around for
> > > > > users if we find more buggy drivers.
> > > > >
> > > > > suppress_interrupt_hardening ?
> > > >
> > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > won't be used by the users so there's no way for us to receive the bug
> > > > report. Or we need a plan to enable it by default.
> > > >
> > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > simply warn instead of disable it by default.
> > > >
> > > > Thanks
> > >
> > > I meant more like a flag in struct virtio_driver.
> > > For now, could you audit all drivers which don't call _ready?
> > > I found 5 of these:
> > >
> > > drivers/bluetooth/virtio_bt.c
> >
> > This driver seems to be fine, it doesn't use the device/vq in its probe().
>
>
> But it calls hci_register_dev and that in turn queues all kind of
> work. Also, can linux start using the device immediately after
> it's registered?

So I think the driver is allowed to queue before DRIVER_OK. If yes,
the only side effect is the delay of the tx interrupt after DRIVER_OK
for a well behaved device. If not, we need to clarify it in the spec
and call virtio_device_ready() before subsystem registration.

>
>
> > > drivers/gpu/drm/virtio/virtgpu_drv.c
> >
> > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > me the code is correct.
>
> OK.
>
> > > drivers/i2c/busses/i2c-virtio.c
> > > drivers/net/caif/caif_virtio.c
> > > drivers/nvdimm/virtio_pmem.c
> >
> > The above looks fine and we have three more:
> >
> > arm_scmi: probe() doesn't use vq
> > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > it looks to me we need a device_ready before the kick.
> > virtio_rpmsg_bus.c: doesn't use vq
> >
> > I will post a patch for mac80211_hwsim.c.
> > Thanks
>
> Same comments for all of the above. Might linux not start using the
> device once it's registered?

It depends on the specific subsystem.

For the subsystem that can't use the device immediately, calling
virtio_device_ready() after the subsystem's registration should be
fine. E.g for the networking subsystem, the TX won't happen if
ndo_open() is not called, calling virtio_device_ready() after
netdev_register() seems to be fine.

For the subsystem that can use the device immediately, if the
subsystem does not depend on the result of a request in the probe to
proceed, we are still fine. Since those requests will be proceed after
DRIVER_OK.

For the rest we need to do virtio_device_ready() before registration.

Thanks

>
> > >
> > >
> > >
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > >       ccw->flags = 0;
> > > > > >       ccw->count = sizeof(status);
> > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > +      * instruction that guarantees the memory writes have
> > > > > > +      * completed before ssch.
> > > > > > +      */
> > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > >       if (ret)
> > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > --- a/drivers/virtio/virtio.c
> > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > >   * */
> > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > >  {
> > > > > > +     /*
> > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > +      * interrupt for this line arriving after
> > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > +      * vq->broken as true.
> > > > > > +      */
> > > > > > +     virtio_break_device(dev);
> > > > >
> > > > > So make this conditional
> > > > >
> > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > +
> > > > > >       dev->config->reset(dev);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > >       dev->config_enabled = false;
> > > > > >       dev->config_change_pending = false;
> > > > > >
> > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > +
> > > > > >       /* We always start by resetting the device, in case a previous
> > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > >       virtio_reset_device(dev);
> > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > >       /* Acknowledge that we've seen the device. */
> > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > >
> > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > -
> > > > > >       /*
> > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > >        * driver.
> > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > >       /* We should never be setting status to 0. */
> > > > > >       BUG_ON(status == 0);
> > > > > >
> > > > > > +     /*
> > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > +      * before writing to the MMIO region.
> > > > > > +      */
> > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > >  }
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > >  {
> > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > >
> > > > > > +     /*
> > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > +      * before writing to the MMIO region.
> > > > > > +      */
> > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > >       vq->we_own_ring = true;
> > > > > >       vq->notify = notify;
> > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > -     vq->broken = false;
> > > > > > +     vq->broken = true;
> > > > > >       vq->last_used_idx = 0;
> > > > > >       vq->event_triggered = false;
> > > > > >       vq->num_added = 0;
> > > > >
> > > > > and make this conditional
> > > > >
> > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > >               return IRQ_NONE;
> > > > > >       }
> > > > > >
> > > > > > -     if (unlikely(vq->broken))
> > > > > > -             return IRQ_HANDLED;
> > > > > > +     if (unlikely(vq->broken)) {
> > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > +             return IRQ_NONE;
> > > > > > +     }
> > > > > >
> > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > >       if (vq->event)
> > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > >       vq->we_own_ring = false;
> > > > > >       vq->notify = notify;
> > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > -     vq->broken = false;
> > > > > > +     vq->broken = true;
> > > > > >       vq->last_used_idx = 0;
> > > > > >       vq->event_triggered = false;
> > > > > >       vq->num_added = 0;
> > > > >
> > > > > and make this conditional
> > > > >
> > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > --- a/include/linux/virtio_config.h
> > > > > > +++ b/include/linux/virtio_config.h
> > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > >
> > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > +
> > > > > > +     /*
> > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > +      */
> > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > +     __virtio_unbreak_device(dev);
> > > > > > +     /*
> > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > +      * specific set_status() method.
> > > > > > +      *
> > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > +      * we won't lose any notification.
> > > > > > +      */
> > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > >  }
> > > > > >
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  8:51               ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  8:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > >
> > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > >    that is used by some device such as virtio-blk
> > > > > > 2) done only for PCI transport
> > > > > >
> > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > and reset. And the vq->broken is set to false in
> > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > IRQ storm.
> > > > > >
> > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > >
> > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > expensive.
> > > > > >
> > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >
> > > > >
> > > > > Jason, I am really concerned by all the fallout.
> > > > > I propose adding a flag to suppress the hardening -
> > > > > this will be a debugging aid and a work around for
> > > > > users if we find more buggy drivers.
> > > > >
> > > > > suppress_interrupt_hardening ?
> > > >
> > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > won't be used by the users so there's no way for us to receive the bug
> > > > report. Or we need a plan to enable it by default.
> > > >
> > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > simply warn instead of disable it by default.
> > > >
> > > > Thanks
> > >
> > > I meant more like a flag in struct virtio_driver.
> > > For now, could you audit all drivers which don't call _ready?
> > > I found 5 of these:
> > >
> > > drivers/bluetooth/virtio_bt.c
> >
> > This driver seems to be fine, it doesn't use the device/vq in its probe().
>
>
> But it calls hci_register_dev and that in turn queues all kind of
> work. Also, can linux start using the device immediately after
> it's registered?

So I think the driver is allowed to queue before DRIVER_OK. If yes,
the only side effect is the delay of the tx interrupt after DRIVER_OK
for a well behaved device. If not, we need to clarify it in the spec
and call virtio_device_ready() before subsystem registration.

>
>
> > > drivers/gpu/drm/virtio/virtgpu_drv.c
> >
> > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > me the code is correct.
>
> OK.
>
> > > drivers/i2c/busses/i2c-virtio.c
> > > drivers/net/caif/caif_virtio.c
> > > drivers/nvdimm/virtio_pmem.c
> >
> > The above looks fine and we have three more:
> >
> > arm_scmi: probe() doesn't use vq
> > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > it looks to me we need a device_ready before the kick.
> > virtio_rpmsg_bus.c: doesn't use vq
> >
> > I will post a patch for mac80211_hwsim.c.
> > Thanks
>
> Same comments for all of the above. Might linux not start using the
> device once it's registered?

It depends on the specific subsystem.

For the subsystem that can't use the device immediately, calling
virtio_device_ready() after the subsystem's registration should be
fine. E.g for the networking subsystem, the TX won't happen if
ndo_open() is not called, calling virtio_device_ready() after
netdev_register() seems to be fine.

For the subsystem that can use the device immediately, if the
subsystem does not depend on the result of a request in the probe to
proceed, we are still fine. Since those requests will be proceed after
DRIVER_OK.

For the rest we need to do virtio_device_ready() before registration.

Thanks

>
> > >
> > >
> > >
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > >       ccw->flags = 0;
> > > > > >       ccw->count = sizeof(status);
> > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > +      * instruction that guarantees the memory writes have
> > > > > > +      * completed before ssch.
> > > > > > +      */
> > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > >       if (ret)
> > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > --- a/drivers/virtio/virtio.c
> > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > >   * */
> > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > >  {
> > > > > > +     /*
> > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > +      * interrupt for this line arriving after
> > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > +      * vq->broken as true.
> > > > > > +      */
> > > > > > +     virtio_break_device(dev);
> > > > >
> > > > > So make this conditional
> > > > >
> > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > +
> > > > > >       dev->config->reset(dev);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > >       dev->config_enabled = false;
> > > > > >       dev->config_change_pending = false;
> > > > > >
> > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > +
> > > > > >       /* We always start by resetting the device, in case a previous
> > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > >       virtio_reset_device(dev);
> > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > >       /* Acknowledge that we've seen the device. */
> > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > >
> > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > -
> > > > > >       /*
> > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > >        * driver.
> > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > >       /* We should never be setting status to 0. */
> > > > > >       BUG_ON(status == 0);
> > > > > >
> > > > > > +     /*
> > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > +      * before writing to the MMIO region.
> > > > > > +      */
> > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > >  }
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > >  {
> > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > >
> > > > > > +     /*
> > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > +      * before writing to the MMIO region.
> > > > > > +      */
> > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > >       vq->we_own_ring = true;
> > > > > >       vq->notify = notify;
> > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > -     vq->broken = false;
> > > > > > +     vq->broken = true;
> > > > > >       vq->last_used_idx = 0;
> > > > > >       vq->event_triggered = false;
> > > > > >       vq->num_added = 0;
> > > > >
> > > > > and make this conditional
> > > > >
> > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > >               return IRQ_NONE;
> > > > > >       }
> > > > > >
> > > > > > -     if (unlikely(vq->broken))
> > > > > > -             return IRQ_HANDLED;
> > > > > > +     if (unlikely(vq->broken)) {
> > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > +             return IRQ_NONE;
> > > > > > +     }
> > > > > >
> > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > >       if (vq->event)
> > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > >       vq->we_own_ring = false;
> > > > > >       vq->notify = notify;
> > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > -     vq->broken = false;
> > > > > > +     vq->broken = true;
> > > > > >       vq->last_used_idx = 0;
> > > > > >       vq->event_triggered = false;
> > > > > >       vq->num_added = 0;
> > > > >
> > > > > and make this conditional
> > > > >
> > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > --- a/include/linux/virtio_config.h
> > > > > > +++ b/include/linux/virtio_config.h
> > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > >
> > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > +
> > > > > > +     /*
> > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > +      */
> > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > +     __virtio_unbreak_device(dev);
> > > > > > +     /*
> > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > +      * specific set_status() method.
> > > > > > +      *
> > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > +      * we won't lose any notification.
> > > > > > +      */
> > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > >  }
> > > > > >
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  8:51               ` Jason Wang
@ 2022-06-13  8:59                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  8:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > >
> > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > >    that is used by some device such as virtio-blk
> > > > > > > 2) done only for PCI transport
> > > > > > >
> > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > IRQ storm.
> > > > > > >
> > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > >
> > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > expensive.
> > > > > > >
> > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > >
> > > > > >
> > > > > > Jason, I am really concerned by all the fallout.
> > > > > > I propose adding a flag to suppress the hardening -
> > > > > > this will be a debugging aid and a work around for
> > > > > > users if we find more buggy drivers.
> > > > > >
> > > > > > suppress_interrupt_hardening ?
> > > > >
> > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > report. Or we need a plan to enable it by default.
> > > > >
> > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > simply warn instead of disable it by default.
> > > > >
> > > > > Thanks
> > > >
> > > > I meant more like a flag in struct virtio_driver.
> > > > For now, could you audit all drivers which don't call _ready?
> > > > I found 5 of these:
> > > >
> > > > drivers/bluetooth/virtio_bt.c
> > >
> > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> >
> >
> > But it calls hci_register_dev and that in turn queues all kind of
> > work. Also, can linux start using the device immediately after
> > it's registered?
> 
> So I think the driver is allowed to queue before DRIVER_OK.

it's not allowed to kick

> If yes,
> the only side effect is the delay of the tx interrupt after DRIVER_OK
> for a well behaved device.

your patches drop the interrupt though, it won't be just delayed.

> If not, we need to clarify it in the spec
> and call virtio_device_ready() before subsystem registration.

hmm, i don't get what we need to clarify

> >
> >
> > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > >
> > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > me the code is correct.
> >
> > OK.
> >
> > > > drivers/i2c/busses/i2c-virtio.c
> > > > drivers/net/caif/caif_virtio.c
> > > > drivers/nvdimm/virtio_pmem.c
> > >
> > > The above looks fine and we have three more:
> > >
> > > arm_scmi: probe() doesn't use vq
> > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > it looks to me we need a device_ready before the kick.
> > > virtio_rpmsg_bus.c: doesn't use vq
> > >
> > > I will post a patch for mac80211_hwsim.c.
> > > Thanks
> >
> > Same comments for all of the above. Might linux not start using the
> > device once it's registered?
> 
> It depends on the specific subsystem.
> 
> For the subsystem that can't use the device immediately, calling
> virtio_device_ready() after the subsystem's registration should be
> fine. E.g for the networking subsystem, the TX won't happen if
> ndo_open() is not called, calling virtio_device_ready() after
> netdev_register() seems to be fine.

exactly

> For the subsystem that can use the device immediately, if the
> subsystem does not depend on the result of a request in the probe to
> proceed, we are still fine. Since those requests will be proceed after
> DRIVER_OK.

Well first won't driver code normally kick as well?
And without kick, won't everything just be blocked?


> For the rest we need to do virtio_device_ready() before registration.
> 
> Thanks

Then we can get an interrupt for an unregistered device.


> >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > >       ccw->flags = 0;
> > > > > > >       ccw->count = sizeof(status);
> > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > +      * completed before ssch.
> > > > > > > +      */
> > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > >       if (ret)
> > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > >   * */
> > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > >  {
> > > > > > > +     /*
> > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > +      * interrupt for this line arriving after
> > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > +      * vq->broken as true.
> > > > > > > +      */
> > > > > > > +     virtio_break_device(dev);
> > > > > >
> > > > > > So make this conditional
> > > > > >
> > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > +
> > > > > > >       dev->config->reset(dev);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > >       dev->config_enabled = false;
> > > > > > >       dev->config_change_pending = false;
> > > > > > >
> > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > +
> > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > >       virtio_reset_device(dev);
> > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > >
> > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > -
> > > > > > >       /*
> > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > >        * driver.
> > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > >       /* We should never be setting status to 0. */
> > > > > > >       BUG_ON(status == 0);
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > +      * before writing to the MMIO region.
> > > > > > > +      */
> > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > >  }
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > >  {
> > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > +      * before writing to the MMIO region.
> > > > > > > +      */
> > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > >       vq->we_own_ring = true;
> > > > > > >       vq->notify = notify;
> > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > -     vq->broken = false;
> > > > > > > +     vq->broken = true;
> > > > > > >       vq->last_used_idx = 0;
> > > > > > >       vq->event_triggered = false;
> > > > > > >       vq->num_added = 0;
> > > > > >
> > > > > > and make this conditional
> > > > > >
> > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > >               return IRQ_NONE;
> > > > > > >       }
> > > > > > >
> > > > > > > -     if (unlikely(vq->broken))
> > > > > > > -             return IRQ_HANDLED;
> > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > +             return IRQ_NONE;
> > > > > > > +     }
> > > > > > >
> > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > >       if (vq->event)
> > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > >       vq->we_own_ring = false;
> > > > > > >       vq->notify = notify;
> > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > -     vq->broken = false;
> > > > > > > +     vq->broken = true;
> > > > > > >       vq->last_used_idx = 0;
> > > > > > >       vq->event_triggered = false;
> > > > > > >       vq->num_added = 0;
> > > > > >
> > > > > > and make this conditional
> > > > > >
> > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > >
> > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > +      */
> > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > +     /*
> > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > +      * specific set_status() method.
> > > > > > > +      *
> > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > +      * we won't lose any notification.
> > > > > > > +      */
> > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > >  }
> > > > > > >
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  8:59                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  8:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > >
> > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > >    that is used by some device such as virtio-blk
> > > > > > > 2) done only for PCI transport
> > > > > > >
> > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > IRQ storm.
> > > > > > >
> > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > >
> > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > expensive.
> > > > > > >
> > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > >
> > > > > >
> > > > > > Jason, I am really concerned by all the fallout.
> > > > > > I propose adding a flag to suppress the hardening -
> > > > > > this will be a debugging aid and a work around for
> > > > > > users if we find more buggy drivers.
> > > > > >
> > > > > > suppress_interrupt_hardening ?
> > > > >
> > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > report. Or we need a plan to enable it by default.
> > > > >
> > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > simply warn instead of disable it by default.
> > > > >
> > > > > Thanks
> > > >
> > > > I meant more like a flag in struct virtio_driver.
> > > > For now, could you audit all drivers which don't call _ready?
> > > > I found 5 of these:
> > > >
> > > > drivers/bluetooth/virtio_bt.c
> > >
> > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> >
> >
> > But it calls hci_register_dev and that in turn queues all kind of
> > work. Also, can linux start using the device immediately after
> > it's registered?
> 
> So I think the driver is allowed to queue before DRIVER_OK.

it's not allowed to kick

> If yes,
> the only side effect is the delay of the tx interrupt after DRIVER_OK
> for a well behaved device.

your patches drop the interrupt though, it won't be just delayed.

> If not, we need to clarify it in the spec
> and call virtio_device_ready() before subsystem registration.

hmm, i don't get what we need to clarify

> >
> >
> > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > >
> > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > me the code is correct.
> >
> > OK.
> >
> > > > drivers/i2c/busses/i2c-virtio.c
> > > > drivers/net/caif/caif_virtio.c
> > > > drivers/nvdimm/virtio_pmem.c
> > >
> > > The above looks fine and we have three more:
> > >
> > > arm_scmi: probe() doesn't use vq
> > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > it looks to me we need a device_ready before the kick.
> > > virtio_rpmsg_bus.c: doesn't use vq
> > >
> > > I will post a patch for mac80211_hwsim.c.
> > > Thanks
> >
> > Same comments for all of the above. Might linux not start using the
> > device once it's registered?
> 
> It depends on the specific subsystem.
> 
> For the subsystem that can't use the device immediately, calling
> virtio_device_ready() after the subsystem's registration should be
> fine. E.g for the networking subsystem, the TX won't happen if
> ndo_open() is not called, calling virtio_device_ready() after
> netdev_register() seems to be fine.

exactly

> For the subsystem that can use the device immediately, if the
> subsystem does not depend on the result of a request in the probe to
> proceed, we are still fine. Since those requests will be proceed after
> DRIVER_OK.

Well first won't driver code normally kick as well?
And without kick, won't everything just be blocked?


> For the rest we need to do virtio_device_ready() before registration.
> 
> Thanks

Then we can get an interrupt for an unregistered device.


> >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > >       ccw->flags = 0;
> > > > > > >       ccw->count = sizeof(status);
> > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > +      * completed before ssch.
> > > > > > > +      */
> > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > >       if (ret)
> > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > >   * */
> > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > >  {
> > > > > > > +     /*
> > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > +      * interrupt for this line arriving after
> > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > +      * vq->broken as true.
> > > > > > > +      */
> > > > > > > +     virtio_break_device(dev);
> > > > > >
> > > > > > So make this conditional
> > > > > >
> > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > +
> > > > > > >       dev->config->reset(dev);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > >       dev->config_enabled = false;
> > > > > > >       dev->config_change_pending = false;
> > > > > > >
> > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > +
> > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > >       virtio_reset_device(dev);
> > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > >
> > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > -
> > > > > > >       /*
> > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > >        * driver.
> > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > >       /* We should never be setting status to 0. */
> > > > > > >       BUG_ON(status == 0);
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > +      * before writing to the MMIO region.
> > > > > > > +      */
> > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > >  }
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > >  {
> > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > +      * before writing to the MMIO region.
> > > > > > > +      */
> > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > >       vq->we_own_ring = true;
> > > > > > >       vq->notify = notify;
> > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > -     vq->broken = false;
> > > > > > > +     vq->broken = true;
> > > > > > >       vq->last_used_idx = 0;
> > > > > > >       vq->event_triggered = false;
> > > > > > >       vq->num_added = 0;
> > > > > >
> > > > > > and make this conditional
> > > > > >
> > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > >               return IRQ_NONE;
> > > > > > >       }
> > > > > > >
> > > > > > > -     if (unlikely(vq->broken))
> > > > > > > -             return IRQ_HANDLED;
> > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > +             return IRQ_NONE;
> > > > > > > +     }
> > > > > > >
> > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > >       if (vq->event)
> > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > >       vq->we_own_ring = false;
> > > > > > >       vq->notify = notify;
> > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > -     vq->broken = false;
> > > > > > > +     vq->broken = true;
> > > > > > >       vq->last_used_idx = 0;
> > > > > > >       vq->event_triggered = false;
> > > > > > >       vq->num_added = 0;
> > > > > >
> > > > > > and make this conditional
> > > > > >
> > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > >
> > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > +      */
> > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > +     /*
> > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > +      * specific set_status() method.
> > > > > > > +      *
> > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > +      * we won't lose any notification.
> > > > > > > +      */
> > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > >  }
> > > > > > >
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  8:59                 ` Michael S. Tsirkin
@ 2022-06-13  9:08                   ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > >
> > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > 2) done only for PCI transport
> > > > > > > >
> > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > IRQ storm.
> > > > > > > >
> > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > >
> > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > expensive.
> > > > > > > >
> > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > >
> > > > > > >
> > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > this will be a debugging aid and a work around for
> > > > > > > users if we find more buggy drivers.
> > > > > > >
> > > > > > > suppress_interrupt_hardening ?
> > > > > >
> > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > report. Or we need a plan to enable it by default.
> > > > > >
> > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > simply warn instead of disable it by default.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I meant more like a flag in struct virtio_driver.
> > > > > For now, could you audit all drivers which don't call _ready?
> > > > > I found 5 of these:
> > > > >
> > > > > drivers/bluetooth/virtio_bt.c
> > > >
> > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > >
> > >
> > > But it calls hci_register_dev and that in turn queues all kind of
> > > work. Also, can linux start using the device immediately after
> > > it's registered?
> >
> > So I think the driver is allowed to queue before DRIVER_OK.
>
> it's not allowed to kick

Yes.

>
> > If yes,
> > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > for a well behaved device.
>
> your patches drop the interrupt though, it won't be just delayed.

For a well behaved device, it can only trigger the interrupt after DRIVER_OK.

So for virtio bt, it works like:

1) driver queue buffer and kick
2) driver set DRIVER_OK
3) device start to process the buffer
4) device send an notification

The only risk is that the virtqueue could be filled before DRIVER_OK,
or anything I missed?

>
> > If not, we need to clarify it in the spec
> > and call virtio_device_ready() before subsystem registration.
>
> hmm, i don't get what we need to clarify

E.g the driver is not allowed to kick or after DRIVER_OK should the
device only process the buffer after a kick after DRIVER_OK (I think
no)?

>
> > >
> > >
> > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > >
> > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > me the code is correct.
> > >
> > > OK.
> > >
> > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > drivers/net/caif/caif_virtio.c
> > > > > drivers/nvdimm/virtio_pmem.c
> > > >
> > > > The above looks fine and we have three more:
> > > >
> > > > arm_scmi: probe() doesn't use vq
> > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > it looks to me we need a device_ready before the kick.
> > > > virtio_rpmsg_bus.c: doesn't use vq
> > > >
> > > > I will post a patch for mac80211_hwsim.c.
> > > > Thanks
> > >
> > > Same comments for all of the above. Might linux not start using the
> > > device once it's registered?
> >
> > It depends on the specific subsystem.
> >
> > For the subsystem that can't use the device immediately, calling
> > virtio_device_ready() after the subsystem's registration should be
> > fine. E.g for the networking subsystem, the TX won't happen if
> > ndo_open() is not called, calling virtio_device_ready() after
> > netdev_register() seems to be fine.
>
> exactly
>
> > For the subsystem that can use the device immediately, if the
> > subsystem does not depend on the result of a request in the probe to
> > proceed, we are still fine. Since those requests will be proceed after
> > DRIVER_OK.
>
> Well first won't driver code normally kick as well?

Kick itself is not blocked.

> And without kick, won't everything just be blocked?

It depends on the subsystem. E.g driver can choose to use a callback
instead of polling the used buffer in the probe.

>
>
> > For the rest we need to do virtio_device_ready() before registration.
> >
> > Thanks
>
> Then we can get an interrupt for an unregistered device.

It depends on the device. For the device that doesn't have an rx queue
(or device to driver queue), we are fine:

E.g in virtio-blk:

        virtio_device_ready(vdev);

        err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
        if (err)
                goto out_cleanup_disk;

Thanks

>
>
> > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > >       ccw->flags = 0;
> > > > > > > >       ccw->count = sizeof(status);
> > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > +      * completed before ssch.
> > > > > > > > +      */
> > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > >       if (ret)
> > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > >   * */
> > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > >  {
> > > > > > > > +     /*
> > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > +      * vq->broken as true.
> > > > > > > > +      */
> > > > > > > > +     virtio_break_device(dev);
> > > > > > >
> > > > > > > So make this conditional
> > > > > > >
> > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > +
> > > > > > > >       dev->config->reset(dev);
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > >       dev->config_enabled = false;
> > > > > > > >       dev->config_change_pending = false;
> > > > > > > >
> > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > +
> > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > >       virtio_reset_device(dev);
> > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > >
> > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > -
> > > > > > > >       /*
> > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > >        * driver.
> > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > >       BUG_ON(status == 0);
> > > > > > > >
> > > > > > > > +     /*
> > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > +      */
> > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > >  {
> > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > >
> > > > > > > > +     /*
> > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > +      */
> > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > >       vq->we_own_ring = true;
> > > > > > > >       vq->notify = notify;
> > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > -     vq->broken = false;
> > > > > > > > +     vq->broken = true;
> > > > > > > >       vq->last_used_idx = 0;
> > > > > > > >       vq->event_triggered = false;
> > > > > > > >       vq->num_added = 0;
> > > > > > >
> > > > > > > and make this conditional
> > > > > > >
> > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > >               return IRQ_NONE;
> > > > > > > >       }
> > > > > > > >
> > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > +             return IRQ_NONE;
> > > > > > > > +     }
> > > > > > > >
> > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > >       if (vq->event)
> > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > >       vq->we_own_ring = false;
> > > > > > > >       vq->notify = notify;
> > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > -     vq->broken = false;
> > > > > > > > +     vq->broken = true;
> > > > > > > >       vq->last_used_idx = 0;
> > > > > > > >       vq->event_triggered = false;
> > > > > > > >       vq->num_added = 0;
> > > > > > >
> > > > > > > and make this conditional
> > > > > > >
> > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > >
> > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > +
> > > > > > > > +     /*
> > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > +      */
> > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > +     /*
> > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > +      * specific set_status() method.
> > > > > > > > +      *
> > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > +      * we won't lose any notification.
> > > > > > > > +      */
> > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  9:08                   ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > >
> > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > 2) done only for PCI transport
> > > > > > > >
> > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > IRQ storm.
> > > > > > > >
> > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > >
> > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > expensive.
> > > > > > > >
> > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > >
> > > > > > >
> > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > this will be a debugging aid and a work around for
> > > > > > > users if we find more buggy drivers.
> > > > > > >
> > > > > > > suppress_interrupt_hardening ?
> > > > > >
> > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > report. Or we need a plan to enable it by default.
> > > > > >
> > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > simply warn instead of disable it by default.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I meant more like a flag in struct virtio_driver.
> > > > > For now, could you audit all drivers which don't call _ready?
> > > > > I found 5 of these:
> > > > >
> > > > > drivers/bluetooth/virtio_bt.c
> > > >
> > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > >
> > >
> > > But it calls hci_register_dev and that in turn queues all kind of
> > > work. Also, can linux start using the device immediately after
> > > it's registered?
> >
> > So I think the driver is allowed to queue before DRIVER_OK.
>
> it's not allowed to kick

Yes.

>
> > If yes,
> > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > for a well behaved device.
>
> your patches drop the interrupt though, it won't be just delayed.

For a well behaved device, it can only trigger the interrupt after DRIVER_OK.

So for virtio bt, it works like:

1) driver queue buffer and kick
2) driver set DRIVER_OK
3) device start to process the buffer
4) device send an notification

The only risk is that the virtqueue could be filled before DRIVER_OK,
or anything I missed?

>
> > If not, we need to clarify it in the spec
> > and call virtio_device_ready() before subsystem registration.
>
> hmm, i don't get what we need to clarify

E.g the driver is not allowed to kick or after DRIVER_OK should the
device only process the buffer after a kick after DRIVER_OK (I think
no)?

>
> > >
> > >
> > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > >
> > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > me the code is correct.
> > >
> > > OK.
> > >
> > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > drivers/net/caif/caif_virtio.c
> > > > > drivers/nvdimm/virtio_pmem.c
> > > >
> > > > The above looks fine and we have three more:
> > > >
> > > > arm_scmi: probe() doesn't use vq
> > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > it looks to me we need a device_ready before the kick.
> > > > virtio_rpmsg_bus.c: doesn't use vq
> > > >
> > > > I will post a patch for mac80211_hwsim.c.
> > > > Thanks
> > >
> > > Same comments for all of the above. Might linux not start using the
> > > device once it's registered?
> >
> > It depends on the specific subsystem.
> >
> > For the subsystem that can't use the device immediately, calling
> > virtio_device_ready() after the subsystem's registration should be
> > fine. E.g for the networking subsystem, the TX won't happen if
> > ndo_open() is not called, calling virtio_device_ready() after
> > netdev_register() seems to be fine.
>
> exactly
>
> > For the subsystem that can use the device immediately, if the
> > subsystem does not depend on the result of a request in the probe to
> > proceed, we are still fine. Since those requests will be proceed after
> > DRIVER_OK.
>
> Well first won't driver code normally kick as well?

Kick itself is not blocked.

> And without kick, won't everything just be blocked?

It depends on the subsystem. E.g driver can choose to use a callback
instead of polling the used buffer in the probe.

>
>
> > For the rest we need to do virtio_device_ready() before registration.
> >
> > Thanks
>
> Then we can get an interrupt for an unregistered device.

It depends on the device. For the device that doesn't have an rx queue
(or device to driver queue), we are fine:

E.g in virtio-blk:

        virtio_device_ready(vdev);

        err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
        if (err)
                goto out_cleanup_disk;

Thanks

>
>
> > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > >       ccw->flags = 0;
> > > > > > > >       ccw->count = sizeof(status);
> > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > +      * completed before ssch.
> > > > > > > > +      */
> > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > >       if (ret)
> > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > >   * */
> > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > >  {
> > > > > > > > +     /*
> > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > +      * vq->broken as true.
> > > > > > > > +      */
> > > > > > > > +     virtio_break_device(dev);
> > > > > > >
> > > > > > > So make this conditional
> > > > > > >
> > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > +
> > > > > > > >       dev->config->reset(dev);
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > >       dev->config_enabled = false;
> > > > > > > >       dev->config_change_pending = false;
> > > > > > > >
> > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > +
> > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > >       virtio_reset_device(dev);
> > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > >
> > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > -
> > > > > > > >       /*
> > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > >        * driver.
> > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > >       BUG_ON(status == 0);
> > > > > > > >
> > > > > > > > +     /*
> > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > +      */
> > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > >  {
> > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > >
> > > > > > > > +     /*
> > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > +      */
> > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > >       vq->we_own_ring = true;
> > > > > > > >       vq->notify = notify;
> > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > -     vq->broken = false;
> > > > > > > > +     vq->broken = true;
> > > > > > > >       vq->last_used_idx = 0;
> > > > > > > >       vq->event_triggered = false;
> > > > > > > >       vq->num_added = 0;
> > > > > > >
> > > > > > > and make this conditional
> > > > > > >
> > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > >               return IRQ_NONE;
> > > > > > > >       }
> > > > > > > >
> > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > +             return IRQ_NONE;
> > > > > > > > +     }
> > > > > > > >
> > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > >       if (vq->event)
> > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > >       vq->we_own_ring = false;
> > > > > > > >       vq->notify = notify;
> > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > -     vq->broken = false;
> > > > > > > > +     vq->broken = true;
> > > > > > > >       vq->last_used_idx = 0;
> > > > > > > >       vq->event_triggered = false;
> > > > > > > >       vq->num_added = 0;
> > > > > > >
> > > > > > > and make this conditional
> > > > > > >
> > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > >
> > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > +
> > > > > > > > +     /*
> > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > +      */
> > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > +     /*
> > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > +      * specific set_status() method.
> > > > > > > > +      *
> > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > +      * we won't lose any notification.
> > > > > > > > +      */
> > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > >
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  9:08                   ` Jason Wang
@ 2022-06-13  9:14                     ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > IRQ storm.
> > > > > > > > >
> > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > >
> > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > expensive.
> > > > > > > > >
> > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > users if we find more buggy drivers.
> > > > > > > >
> > > > > > > > suppress_interrupt_hardening ?
> > > > > > >
> > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > report. Or we need a plan to enable it by default.
> > > > > > >
> > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > simply warn instead of disable it by default.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > I found 5 of these:
> > > > > >
> > > > > > drivers/bluetooth/virtio_bt.c
> > > > >
> > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > >
> > > >
> > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > work. Also, can linux start using the device immediately after
> > > > it's registered?
> > >
> > > So I think the driver is allowed to queue before DRIVER_OK.
> >
> > it's not allowed to kick
>
> Yes.
>
> >
> > > If yes,
> > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > for a well behaved device.
> >
> > your patches drop the interrupt though, it won't be just delayed.
>
> For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
>
> So for virtio bt, it works like:
>
> 1) driver queue buffer and kick
> 2) driver set DRIVER_OK
> 3) device start to process the buffer
> 4) device send an notification
>
> The only risk is that the virtqueue could be filled before DRIVER_OK,
> or anything I missed?

btw, hci has an open and close method and we do rx refill in
hdev->open, so we're probably fine here.

Thanks

>
> >
> > > If not, we need to clarify it in the spec
> > > and call virtio_device_ready() before subsystem registration.
> >
> > hmm, i don't get what we need to clarify
>
> E.g the driver is not allowed to kick or after DRIVER_OK should the
> device only process the buffer after a kick after DRIVER_OK (I think
> no)?
>
> >
> > > >
> > > >
> > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > >
> > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > me the code is correct.
> > > >
> > > > OK.
> > > >
> > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > drivers/net/caif/caif_virtio.c
> > > > > > drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > The above looks fine and we have three more:
> > > > >
> > > > > arm_scmi: probe() doesn't use vq
> > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > it looks to me we need a device_ready before the kick.
> > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > >
> > > > > I will post a patch for mac80211_hwsim.c.
> > > > > Thanks
> > > >
> > > > Same comments for all of the above. Might linux not start using the
> > > > device once it's registered?
> > >
> > > It depends on the specific subsystem.
> > >
> > > For the subsystem that can't use the device immediately, calling
> > > virtio_device_ready() after the subsystem's registration should be
> > > fine. E.g for the networking subsystem, the TX won't happen if
> > > ndo_open() is not called, calling virtio_device_ready() after
> > > netdev_register() seems to be fine.
> >
> > exactly
> >
> > > For the subsystem that can use the device immediately, if the
> > > subsystem does not depend on the result of a request in the probe to
> > > proceed, we are still fine. Since those requests will be proceed after
> > > DRIVER_OK.
> >
> > Well first won't driver code normally kick as well?
>
> Kick itself is not blocked.
>
> > And without kick, won't everything just be blocked?
>
> It depends on the subsystem. E.g driver can choose to use a callback
> instead of polling the used buffer in the probe.
>
> >
> >
> > > For the rest we need to do virtio_device_ready() before registration.
> > >
> > > Thanks
> >
> > Then we can get an interrupt for an unregistered device.
>
> It depends on the device. For the device that doesn't have an rx queue
> (or device to driver queue), we are fine:
>
> E.g in virtio-blk:
>
>         virtio_device_ready(vdev);
>
>         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
>         if (err)
>                 goto out_cleanup_disk;
>
> Thanks
>
> >
> >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       ccw->flags = 0;
> > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > +      * completed before ssch.
> > > > > > > > > +      */
> > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > >       if (ret)
> > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > >   * */
> > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > >  {
> > > > > > > > > +     /*
> > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > +      * vq->broken as true.
> > > > > > > > > +      */
> > > > > > > > > +     virtio_break_device(dev);
> > > > > > > >
> > > > > > > > So make this conditional
> > > > > > > >
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +
> > > > > > > > >       dev->config->reset(dev);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       dev->config_enabled = false;
> > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > >
> > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > +
> > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > >
> > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > -
> > > > > > > > >       /*
> > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > >        * driver.
> > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > >  {
> > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > >               return IRQ_NONE;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > +     }
> > > > > > > > >
> > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > >       if (vq->event)
> > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > >
> > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > +
> > > > > > > > > +     /*
> > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > +      */
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > +     /*
> > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > +      * specific set_status() method.
> > > > > > > > > +      *
> > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > +      */
> > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  9:14                     ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-13  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > IRQ storm.
> > > > > > > > >
> > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > >
> > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > expensive.
> > > > > > > > >
> > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > users if we find more buggy drivers.
> > > > > > > >
> > > > > > > > suppress_interrupt_hardening ?
> > > > > > >
> > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > report. Or we need a plan to enable it by default.
> > > > > > >
> > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > simply warn instead of disable it by default.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > I found 5 of these:
> > > > > >
> > > > > > drivers/bluetooth/virtio_bt.c
> > > > >
> > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > >
> > > >
> > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > work. Also, can linux start using the device immediately after
> > > > it's registered?
> > >
> > > So I think the driver is allowed to queue before DRIVER_OK.
> >
> > it's not allowed to kick
>
> Yes.
>
> >
> > > If yes,
> > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > for a well behaved device.
> >
> > your patches drop the interrupt though, it won't be just delayed.
>
> For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
>
> So for virtio bt, it works like:
>
> 1) driver queue buffer and kick
> 2) driver set DRIVER_OK
> 3) device start to process the buffer
> 4) device send an notification
>
> The only risk is that the virtqueue could be filled before DRIVER_OK,
> or anything I missed?

btw, hci has an open and close method and we do rx refill in
hdev->open, so we're probably fine here.

Thanks

>
> >
> > > If not, we need to clarify it in the spec
> > > and call virtio_device_ready() before subsystem registration.
> >
> > hmm, i don't get what we need to clarify
>
> E.g the driver is not allowed to kick or after DRIVER_OK should the
> device only process the buffer after a kick after DRIVER_OK (I think
> no)?
>
> >
> > > >
> > > >
> > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > >
> > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > me the code is correct.
> > > >
> > > > OK.
> > > >
> > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > drivers/net/caif/caif_virtio.c
> > > > > > drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > The above looks fine and we have three more:
> > > > >
> > > > > arm_scmi: probe() doesn't use vq
> > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > it looks to me we need a device_ready before the kick.
> > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > >
> > > > > I will post a patch for mac80211_hwsim.c.
> > > > > Thanks
> > > >
> > > > Same comments for all of the above. Might linux not start using the
> > > > device once it's registered?
> > >
> > > It depends on the specific subsystem.
> > >
> > > For the subsystem that can't use the device immediately, calling
> > > virtio_device_ready() after the subsystem's registration should be
> > > fine. E.g for the networking subsystem, the TX won't happen if
> > > ndo_open() is not called, calling virtio_device_ready() after
> > > netdev_register() seems to be fine.
> >
> > exactly
> >
> > > For the subsystem that can use the device immediately, if the
> > > subsystem does not depend on the result of a request in the probe to
> > > proceed, we are still fine. Since those requests will be proceed after
> > > DRIVER_OK.
> >
> > Well first won't driver code normally kick as well?
>
> Kick itself is not blocked.
>
> > And without kick, won't everything just be blocked?
>
> It depends on the subsystem. E.g driver can choose to use a callback
> instead of polling the used buffer in the probe.
>
> >
> >
> > > For the rest we need to do virtio_device_ready() before registration.
> > >
> > > Thanks
> >
> > Then we can get an interrupt for an unregistered device.
>
> It depends on the device. For the device that doesn't have an rx queue
> (or device to driver queue), we are fine:
>
> E.g in virtio-blk:
>
>         virtio_device_ready(vdev);
>
>         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
>         if (err)
>                 goto out_cleanup_disk;
>
> Thanks
>
> >
> >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       ccw->flags = 0;
> > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > +      * completed before ssch.
> > > > > > > > > +      */
> > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > >       if (ret)
> > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > >   * */
> > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > >  {
> > > > > > > > > +     /*
> > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > +      * vq->broken as true.
> > > > > > > > > +      */
> > > > > > > > > +     virtio_break_device(dev);
> > > > > > > >
> > > > > > > > So make this conditional
> > > > > > > >
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +
> > > > > > > > >       dev->config->reset(dev);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       dev->config_enabled = false;
> > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > >
> > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > +
> > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > >
> > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > -
> > > > > > > > >       /*
> > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > >        * driver.
> > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > >  {
> > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > >               return IRQ_NONE;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > +     }
> > > > > > > > >
> > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > >       if (vq->event)
> > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > >
> > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > +
> > > > > > > > > +     /*
> > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > +      */
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > +     /*
> > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > +      * specific set_status() method.
> > > > > > > > > +      *
> > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > +      */
> > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  9:08                   ` Jason Wang
@ 2022-06-13  9:26                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 05:08:20PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > IRQ storm.
> > > > > > > > >
> > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > >
> > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > expensive.
> > > > > > > > >
> > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > users if we find more buggy drivers.
> > > > > > > >
> > > > > > > > suppress_interrupt_hardening ?
> > > > > > >
> > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > report. Or we need a plan to enable it by default.
> > > > > > >
> > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > simply warn instead of disable it by default.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > I found 5 of these:
> > > > > >
> > > > > > drivers/bluetooth/virtio_bt.c
> > > > >
> > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > >
> > > >
> > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > work. Also, can linux start using the device immediately after
> > > > it's registered?
> > >
> > > So I think the driver is allowed to queue before DRIVER_OK.
> >
> > it's not allowed to kick
> 
> Yes.
> 
> >
> > > If yes,
> > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > for a well behaved device.
> >
> > your patches drop the interrupt though, it won't be just delayed.
> 
> For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> 
> So for virtio bt, it works like:
> 
> 1) driver queue buffer and kick
> 2) driver set DRIVER_OK
> 3) device start to process the buffer
> 4) device send an notification
> 
> The only risk is that the virtqueue could be filled before DRIVER_OK,
> or anything I missed?
> 
> >
> > > If not, we need to clarify it in the spec
> > > and call virtio_device_ready() before subsystem registration.
> >
> > hmm, i don't get what we need to clarify
> 
> E.g the driver is not allowed to kick or after DRIVER_OK should the
> device only process the buffer after a kick after DRIVER_OK (I think
> no)?

I am not sure I understand. Are you asking whether device
must check vqs for buffers upon DRIVER_OK? I don't think so,
if driver wants buffers processed it must kick after DRIVER_OK.

And kicking before DRIVER_OK is out of spec.


> >
> > > >
> > > >
> > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > >
> > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > me the code is correct.
> > > >
> > > > OK.
> > > >
> > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > drivers/net/caif/caif_virtio.c
> > > > > > drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > The above looks fine and we have three more:
> > > > >
> > > > > arm_scmi: probe() doesn't use vq
> > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > it looks to me we need a device_ready before the kick.
> > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > >
> > > > > I will post a patch for mac80211_hwsim.c.
> > > > > Thanks
> > > >
> > > > Same comments for all of the above. Might linux not start using the
> > > > device once it's registered?
> > >
> > > It depends on the specific subsystem.
> > >
> > > For the subsystem that can't use the device immediately, calling
> > > virtio_device_ready() after the subsystem's registration should be
> > > fine. E.g for the networking subsystem, the TX won't happen if
> > > ndo_open() is not called, calling virtio_device_ready() after
> > > netdev_register() seems to be fine.
> >
> > exactly
> >
> > > For the subsystem that can use the device immediately, if the
> > > subsystem does not depend on the result of a request in the probe to
> > > proceed, we are still fine. Since those requests will be proceed after
> > > DRIVER_OK.
> >
> > Well first won't driver code normally kick as well?
> 
> Kick itself is not blocked.

It is out of spec though.

> > And without kick, won't everything just be blocked?
> 
> It depends on the subsystem. E.g driver can choose to use a callback
> instead of polling the used buffer in the probe.
> 
> >
> >
> > > For the rest we need to do virtio_device_ready() before registration.
> > >
> > > Thanks
> >
> > Then we can get an interrupt for an unregistered device.
> 
> It depends on the device. For the device that doesn't have an rx queue
> (or device to driver queue), we are fine:
> 
> E.g in virtio-blk:
> 
>         virtio_device_ready(vdev);
> 
>         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
>         if (err)
>                 goto out_cleanup_disk;
> 
> Thanks

yes - as long as no buffers are used, no callback is expected.
However wasn't the point of your patches to handle a malicious device?

> >
> >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       ccw->flags = 0;
> > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > +      * completed before ssch.
> > > > > > > > > +      */
> > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > >       if (ret)
> > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > >   * */
> > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > >  {
> > > > > > > > > +     /*
> > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > +      * vq->broken as true.
> > > > > > > > > +      */
> > > > > > > > > +     virtio_break_device(dev);
> > > > > > > >
> > > > > > > > So make this conditional
> > > > > > > >
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +
> > > > > > > > >       dev->config->reset(dev);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       dev->config_enabled = false;
> > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > >
> > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > +
> > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > >
> > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > -
> > > > > > > > >       /*
> > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > >        * driver.
> > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > >  {
> > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > >               return IRQ_NONE;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > +     }
> > > > > > > > >
> > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > >       if (vq->event)
> > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > >
> > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > +
> > > > > > > > > +     /*
> > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > +      */
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > +     /*
> > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > +      * specific set_status() method.
> > > > > > > > > +      *
> > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > +      */
> > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  9:26                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 05:08:20PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > IRQ storm.
> > > > > > > > >
> > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > >
> > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > expensive.
> > > > > > > > >
> > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > users if we find more buggy drivers.
> > > > > > > >
> > > > > > > > suppress_interrupt_hardening ?
> > > > > > >
> > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > report. Or we need a plan to enable it by default.
> > > > > > >
> > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > simply warn instead of disable it by default.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > I found 5 of these:
> > > > > >
> > > > > > drivers/bluetooth/virtio_bt.c
> > > > >
> > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > >
> > > >
> > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > work. Also, can linux start using the device immediately after
> > > > it's registered?
> > >
> > > So I think the driver is allowed to queue before DRIVER_OK.
> >
> > it's not allowed to kick
> 
> Yes.
> 
> >
> > > If yes,
> > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > for a well behaved device.
> >
> > your patches drop the interrupt though, it won't be just delayed.
> 
> For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> 
> So for virtio bt, it works like:
> 
> 1) driver queue buffer and kick
> 2) driver set DRIVER_OK
> 3) device start to process the buffer
> 4) device send an notification
> 
> The only risk is that the virtqueue could be filled before DRIVER_OK,
> or anything I missed?
> 
> >
> > > If not, we need to clarify it in the spec
> > > and call virtio_device_ready() before subsystem registration.
> >
> > hmm, i don't get what we need to clarify
> 
> E.g the driver is not allowed to kick or after DRIVER_OK should the
> device only process the buffer after a kick after DRIVER_OK (I think
> no)?

I am not sure I understand. Are you asking whether device
must check vqs for buffers upon DRIVER_OK? I don't think so,
if driver wants buffers processed it must kick after DRIVER_OK.

And kicking before DRIVER_OK is out of spec.


> >
> > > >
> > > >
> > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > >
> > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > me the code is correct.
> > > >
> > > > OK.
> > > >
> > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > drivers/net/caif/caif_virtio.c
> > > > > > drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > The above looks fine and we have three more:
> > > > >
> > > > > arm_scmi: probe() doesn't use vq
> > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > it looks to me we need a device_ready before the kick.
> > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > >
> > > > > I will post a patch for mac80211_hwsim.c.
> > > > > Thanks
> > > >
> > > > Same comments for all of the above. Might linux not start using the
> > > > device once it's registered?
> > >
> > > It depends on the specific subsystem.
> > >
> > > For the subsystem that can't use the device immediately, calling
> > > virtio_device_ready() after the subsystem's registration should be
> > > fine. E.g for the networking subsystem, the TX won't happen if
> > > ndo_open() is not called, calling virtio_device_ready() after
> > > netdev_register() seems to be fine.
> >
> > exactly
> >
> > > For the subsystem that can use the device immediately, if the
> > > subsystem does not depend on the result of a request in the probe to
> > > proceed, we are still fine. Since those requests will be proceed after
> > > DRIVER_OK.
> >
> > Well first won't driver code normally kick as well?
> 
> Kick itself is not blocked.

It is out of spec though.

> > And without kick, won't everything just be blocked?
> 
> It depends on the subsystem. E.g driver can choose to use a callback
> instead of polling the used buffer in the probe.
> 
> >
> >
> > > For the rest we need to do virtio_device_ready() before registration.
> > >
> > > Thanks
> >
> > Then we can get an interrupt for an unregistered device.
> 
> It depends on the device. For the device that doesn't have an rx queue
> (or device to driver queue), we are fine:
> 
> E.g in virtio-blk:
> 
>         virtio_device_ready(vdev);
> 
>         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
>         if (err)
>                 goto out_cleanup_disk;
> 
> Thanks

yes - as long as no buffers are used, no callback is expected.
However wasn't the point of your patches to handle a malicious device?

> >
> >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       ccw->flags = 0;
> > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > +      * completed before ssch.
> > > > > > > > > +      */
> > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > >       if (ret)
> > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > >   * */
> > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > >  {
> > > > > > > > > +     /*
> > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > +      * vq->broken as true.
> > > > > > > > > +      */
> > > > > > > > > +     virtio_break_device(dev);
> > > > > > > >
> > > > > > > > So make this conditional
> > > > > > > >
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +
> > > > > > > > >       dev->config->reset(dev);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       dev->config_enabled = false;
> > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > >
> > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > +
> > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > >
> > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > -
> > > > > > > > >       /*
> > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > >        * driver.
> > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > >  {
> > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > >
> > > > > > > > > +     /*
> > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > +      */
> > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > >               return IRQ_NONE;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > +     }
> > > > > > > > >
> > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > >       if (vq->event)
> > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > >       vq->notify = notify;
> > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > -     vq->broken = false;
> > > > > > > > > +     vq->broken = true;
> > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > >       vq->event_triggered = false;
> > > > > > > > >       vq->num_added = 0;
> > > > > > > >
> > > > > > > > and make this conditional
> > > > > > > >
> > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > >
> > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > +
> > > > > > > > > +     /*
> > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > +      */
> > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > +     /*
> > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > +      * specific set_status() method.
> > > > > > > > > +      *
> > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > +      */
> > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  9:14                     ` Jason Wang
@ 2022-06-13  9:27                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  9:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > IRQ storm.
> > > > > > > > > >
> > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > >
> > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > expensive.
> > > > > > > > > >
> > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > users if we find more buggy drivers.
> > > > > > > > >
> > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > >
> > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > >
> > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > simply warn instead of disable it by default.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > I found 5 of these:
> > > > > > >
> > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > >
> > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > >
> > > > >
> > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > work. Also, can linux start using the device immediately after
> > > > > it's registered?
> > > >
> > > > So I think the driver is allowed to queue before DRIVER_OK.
> > >
> > > it's not allowed to kick
> >
> > Yes.
> >
> > >
> > > > If yes,
> > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > for a well behaved device.
> > >
> > > your patches drop the interrupt though, it won't be just delayed.
> >
> > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> >
> > So for virtio bt, it works like:
> >
> > 1) driver queue buffer and kick
> > 2) driver set DRIVER_OK
> > 3) device start to process the buffer
> > 4) device send an notification
> >
> > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > or anything I missed?
> 
> btw, hci has an open and close method and we do rx refill in
> hdev->open, so we're probably fine here.
> 
> Thanks


Sounds good. Now to audit the rest of them from this POV ;)

 drivers/i2c/busses/i2c-virtio.c
 drivers/net/caif/caif_virtio.c
 drivers/nvdimm/virtio_pmem.c
 arm_scmi
 virtio_rpmsg_bus.c



> >
> > >
> > > > If not, we need to clarify it in the spec
> > > > and call virtio_device_ready() before subsystem registration.
> > >
> > > hmm, i don't get what we need to clarify
> >
> > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > device only process the buffer after a kick after DRIVER_OK (I think
> > no)?
> >
> > >
> > > > >
> > > > >
> > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > >
> > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > me the code is correct.
> > > > >
> > > > > OK.
> > > > >
> > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > The above looks fine and we have three more:
> > > > > >
> > > > > > arm_scmi: probe() doesn't use vq
> > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > it looks to me we need a device_ready before the kick.
> > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > >
> > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > Thanks
> > > > >
> > > > > Same comments for all of the above. Might linux not start using the
> > > > > device once it's registered?
> > > >
> > > > It depends on the specific subsystem.
> > > >
> > > > For the subsystem that can't use the device immediately, calling
> > > > virtio_device_ready() after the subsystem's registration should be
> > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > netdev_register() seems to be fine.
> > >
> > > exactly
> > >
> > > > For the subsystem that can use the device immediately, if the
> > > > subsystem does not depend on the result of a request in the probe to
> > > > proceed, we are still fine. Since those requests will be proceed after
> > > > DRIVER_OK.
> > >
> > > Well first won't driver code normally kick as well?
> >
> > Kick itself is not blocked.
> >
> > > And without kick, won't everything just be blocked?
> >
> > It depends on the subsystem. E.g driver can choose to use a callback
> > instead of polling the used buffer in the probe.
> >
> > >
> > >
> > > > For the rest we need to do virtio_device_ready() before registration.
> > > >
> > > > Thanks
> > >
> > > Then we can get an interrupt for an unregistered device.
> >
> > It depends on the device. For the device that doesn't have an rx queue
> > (or device to driver queue), we are fine:
> >
> > E.g in virtio-blk:
> >
> >         virtio_device_ready(vdev);
> >
> >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> >         if (err)
> >                 goto out_cleanup_disk;
> >
> > Thanks
> >
> > >
> > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > +      */
> > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > >       if (ret)
> > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > >   * */
> > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > >  {
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > >
> > > > > > > > > So make this conditional
> > > > > > > > >
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +
> > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > >
> > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > +
> > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > >
> > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > -
> > > > > > > > > >       /*
> > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > >        * driver.
> > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > >  {
> > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > +     }
> > > > > > > > > >
> > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > >       if (vq->event)
> > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > >
> > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > +
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > +      *
> > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > +      */
> > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-13  9:27                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-13  9:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > IRQ storm.
> > > > > > > > > >
> > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > >
> > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > expensive.
> > > > > > > > > >
> > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > users if we find more buggy drivers.
> > > > > > > > >
> > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > >
> > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > >
> > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > simply warn instead of disable it by default.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > I found 5 of these:
> > > > > > >
> > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > >
> > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > >
> > > > >
> > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > work. Also, can linux start using the device immediately after
> > > > > it's registered?
> > > >
> > > > So I think the driver is allowed to queue before DRIVER_OK.
> > >
> > > it's not allowed to kick
> >
> > Yes.
> >
> > >
> > > > If yes,
> > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > for a well behaved device.
> > >
> > > your patches drop the interrupt though, it won't be just delayed.
> >
> > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> >
> > So for virtio bt, it works like:
> >
> > 1) driver queue buffer and kick
> > 2) driver set DRIVER_OK
> > 3) device start to process the buffer
> > 4) device send an notification
> >
> > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > or anything I missed?
> 
> btw, hci has an open and close method and we do rx refill in
> hdev->open, so we're probably fine here.
> 
> Thanks


Sounds good. Now to audit the rest of them from this POV ;)

 drivers/i2c/busses/i2c-virtio.c
 drivers/net/caif/caif_virtio.c
 drivers/nvdimm/virtio_pmem.c
 arm_scmi
 virtio_rpmsg_bus.c



> >
> > >
> > > > If not, we need to clarify it in the spec
> > > > and call virtio_device_ready() before subsystem registration.
> > >
> > > hmm, i don't get what we need to clarify
> >
> > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > device only process the buffer after a kick after DRIVER_OK (I think
> > no)?
> >
> > >
> > > > >
> > > > >
> > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > >
> > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > me the code is correct.
> > > > >
> > > > > OK.
> > > > >
> > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > The above looks fine and we have three more:
> > > > > >
> > > > > > arm_scmi: probe() doesn't use vq
> > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > it looks to me we need a device_ready before the kick.
> > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > >
> > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > Thanks
> > > > >
> > > > > Same comments for all of the above. Might linux not start using the
> > > > > device once it's registered?
> > > >
> > > > It depends on the specific subsystem.
> > > >
> > > > For the subsystem that can't use the device immediately, calling
> > > > virtio_device_ready() after the subsystem's registration should be
> > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > netdev_register() seems to be fine.
> > >
> > > exactly
> > >
> > > > For the subsystem that can use the device immediately, if the
> > > > subsystem does not depend on the result of a request in the probe to
> > > > proceed, we are still fine. Since those requests will be proceed after
> > > > DRIVER_OK.
> > >
> > > Well first won't driver code normally kick as well?
> >
> > Kick itself is not blocked.
> >
> > > And without kick, won't everything just be blocked?
> >
> > It depends on the subsystem. E.g driver can choose to use a callback
> > instead of polling the used buffer in the probe.
> >
> > >
> > >
> > > > For the rest we need to do virtio_device_ready() before registration.
> > > >
> > > > Thanks
> > >
> > > Then we can get an interrupt for an unregistered device.
> >
> > It depends on the device. For the device that doesn't have an rx queue
> > (or device to driver queue), we are fine:
> >
> > E.g in virtio-blk:
> >
> >         virtio_device_ready(vdev);
> >
> >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> >         if (err)
> >                 goto out_cleanup_disk;
> >
> > Thanks
> >
> > >
> > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > +      */
> > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > >       if (ret)
> > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > >   * */
> > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > >  {
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > >
> > > > > > > > > So make this conditional
> > > > > > > > >
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +
> > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > >
> > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > +
> > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > >
> > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > -
> > > > > > > > > >       /*
> > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > >        * driver.
> > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > >  {
> > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > +     }
> > > > > > > > > >
> > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > >       if (vq->event)
> > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > >
> > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > +
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > +      *
> > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > +      */
> > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  9:26                     ` Michael S. Tsirkin
@ 2022-06-14  7:19                       ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-14  7:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Oberparleiter, Cindy Lu, Paul E. McKenney, linux-s390,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Mon, Jun 13, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 05:08:20PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > IRQ storm.
> > > > > > > > > >
> > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > >
> > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > expensive.
> > > > > > > > > >
> > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > users if we find more buggy drivers.
> > > > > > > > >
> > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > >
> > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > >
> > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > simply warn instead of disable it by default.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > I found 5 of these:
> > > > > > >
> > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > >
> > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > >
> > > > >
> > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > work. Also, can linux start using the device immediately after
> > > > > it's registered?
> > > >
> > > > So I think the driver is allowed to queue before DRIVER_OK.
> > >
> > > it's not allowed to kick
> >
> > Yes.
> >
> > >
> > > > If yes,
> > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > for a well behaved device.
> > >
> > > your patches drop the interrupt though, it won't be just delayed.
> >
> > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> >
> > So for virtio bt, it works like:
> >
> > 1) driver queue buffer and kick
> > 2) driver set DRIVER_OK
> > 3) device start to process the buffer
> > 4) device send an notification
> >
> > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > or anything I missed?
> >
> > >
> > > > If not, we need to clarify it in the spec
> > > > and call virtio_device_ready() before subsystem registration.
> > >
> > > hmm, i don't get what we need to clarify
> >
> > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > device only process the buffer after a kick after DRIVER_OK (I think
> > no)?
>
> I am not sure I understand. Are you asking whether device
> must check vqs for buffers upon DRIVER_OK?

Yes.

> I don't think so,
> if driver wants buffers processed it must kick after DRIVER_OK.

Ok.

>
> And kicking before DRIVER_OK is out of spec.

Right.

>
>
> > >
> > > > >
> > > > >
> > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > >
> > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > me the code is correct.
> > > > >
> > > > > OK.
> > > > >
> > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > The above looks fine and we have three more:
> > > > > >
> > > > > > arm_scmi: probe() doesn't use vq
> > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > it looks to me we need a device_ready before the kick.
> > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > >
> > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > Thanks
> > > > >
> > > > > Same comments for all of the above. Might linux not start using the
> > > > > device once it's registered?
> > > >
> > > > It depends on the specific subsystem.
> > > >
> > > > For the subsystem that can't use the device immediately, calling
> > > > virtio_device_ready() after the subsystem's registration should be
> > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > netdev_register() seems to be fine.
> > >
> > > exactly
> > >
> > > > For the subsystem that can use the device immediately, if the
> > > > subsystem does not depend on the result of a request in the probe to
> > > > proceed, we are still fine. Since those requests will be proceed after
> > > > DRIVER_OK.
> > >
> > > Well first won't driver code normally kick as well?
> >
> > Kick itself is not blocked.
>
> It is out of spec though.

Yes.

>
> > > And without kick, won't everything just be blocked?
> >
> > It depends on the subsystem. E.g driver can choose to use a callback
> > instead of polling the used buffer in the probe.
> >
> > >
> > >
> > > > For the rest we need to do virtio_device_ready() before registration.
> > > >
> > > > Thanks
> > >
> > > Then we can get an interrupt for an unregistered device.
> >
> > It depends on the device. For the device that doesn't have an rx queue
> > (or device to driver queue), we are fine:
> >
> > E.g in virtio-blk:
> >
> >         virtio_device_ready(vdev);
> >
> >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> >         if (err)
> >                 goto out_cleanup_disk;
> >
> > Thanks
>
> yes - as long as no buffers are used, no callback is expected.
> However wasn't the point of your patches to handle a malicious device?

Right, but for subsystems that don't have a way to open/close devices
like the block layer, there's not much we can do except depending on
the callback - virtblk_done itself which seems to be safe.

We guard the malicious hypervisor by checking:

1) checking the token
2) when token is NULL, no dereference to any subsystem's data structure

Thanks

>
> > >
> > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > +      */
> > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > >       if (ret)
> > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > >   * */
> > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > >  {
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > >
> > > > > > > > > So make this conditional
> > > > > > > > >
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +
> > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > >
> > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > +
> > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > >
> > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > -
> > > > > > > > > >       /*
> > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > >        * driver.
> > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > >  {
> > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > +     }
> > > > > > > > > >
> > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > >       if (vq->event)
> > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > >
> > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > +
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > +      *
> > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > +      */
> > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-14  7:19                       ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-14  7:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Mon, Jun 13, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 05:08:20PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > IRQ storm.
> > > > > > > > > >
> > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > >
> > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > expensive.
> > > > > > > > > >
> > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > users if we find more buggy drivers.
> > > > > > > > >
> > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > >
> > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > >
> > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > simply warn instead of disable it by default.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > I found 5 of these:
> > > > > > >
> > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > >
> > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > >
> > > > >
> > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > work. Also, can linux start using the device immediately after
> > > > > it's registered?
> > > >
> > > > So I think the driver is allowed to queue before DRIVER_OK.
> > >
> > > it's not allowed to kick
> >
> > Yes.
> >
> > >
> > > > If yes,
> > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > for a well behaved device.
> > >
> > > your patches drop the interrupt though, it won't be just delayed.
> >
> > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> >
> > So for virtio bt, it works like:
> >
> > 1) driver queue buffer and kick
> > 2) driver set DRIVER_OK
> > 3) device start to process the buffer
> > 4) device send an notification
> >
> > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > or anything I missed?
> >
> > >
> > > > If not, we need to clarify it in the spec
> > > > and call virtio_device_ready() before subsystem registration.
> > >
> > > hmm, i don't get what we need to clarify
> >
> > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > device only process the buffer after a kick after DRIVER_OK (I think
> > no)?
>
> I am not sure I understand. Are you asking whether device
> must check vqs for buffers upon DRIVER_OK?

Yes.

> I don't think so,
> if driver wants buffers processed it must kick after DRIVER_OK.

Ok.

>
> And kicking before DRIVER_OK is out of spec.

Right.

>
>
> > >
> > > > >
> > > > >
> > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > >
> > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > me the code is correct.
> > > > >
> > > > > OK.
> > > > >
> > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > The above looks fine and we have three more:
> > > > > >
> > > > > > arm_scmi: probe() doesn't use vq
> > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > it looks to me we need a device_ready before the kick.
> > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > >
> > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > Thanks
> > > > >
> > > > > Same comments for all of the above. Might linux not start using the
> > > > > device once it's registered?
> > > >
> > > > It depends on the specific subsystem.
> > > >
> > > > For the subsystem that can't use the device immediately, calling
> > > > virtio_device_ready() after the subsystem's registration should be
> > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > netdev_register() seems to be fine.
> > >
> > > exactly
> > >
> > > > For the subsystem that can use the device immediately, if the
> > > > subsystem does not depend on the result of a request in the probe to
> > > > proceed, we are still fine. Since those requests will be proceed after
> > > > DRIVER_OK.
> > >
> > > Well first won't driver code normally kick as well?
> >
> > Kick itself is not blocked.
>
> It is out of spec though.

Yes.

>
> > > And without kick, won't everything just be blocked?
> >
> > It depends on the subsystem. E.g driver can choose to use a callback
> > instead of polling the used buffer in the probe.
> >
> > >
> > >
> > > > For the rest we need to do virtio_device_ready() before registration.
> > > >
> > > > Thanks
> > >
> > > Then we can get an interrupt for an unregistered device.
> >
> > It depends on the device. For the device that doesn't have an rx queue
> > (or device to driver queue), we are fine:
> >
> > E.g in virtio-blk:
> >
> >         virtio_device_ready(vdev);
> >
> >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> >         if (err)
> >                 goto out_cleanup_disk;
> >
> > Thanks
>
> yes - as long as no buffers are used, no callback is expected.
> However wasn't the point of your patches to handle a malicious device?

Right, but for subsystems that don't have a way to open/close devices
like the block layer, there's not much we can do except depending on
the callback - virtblk_done itself which seems to be safe.

We guard the malicious hypervisor by checking:

1) checking the token
2) when token is NULL, no dereference to any subsystem's data structure

Thanks

>
> > >
> > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > +      */
> > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > >       if (ret)
> > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > >   * */
> > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > >  {
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > >
> > > > > > > > > So make this conditional
> > > > > > > > >
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +
> > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > >
> > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > +
> > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > >
> > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > -
> > > > > > > > > >       /*
> > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > >        * driver.
> > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > >  {
> > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > >
> > > > > > > > > > +     /*
> > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > +      */
> > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > >  }
> > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > +     }
> > > > > > > > > >
> > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > >       if (vq->event)
> > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > >       vq->notify = notify;
> > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > +     vq->broken = true;
> > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > >       vq->num_added = 0;
> > > > > > > > >
> > > > > > > > > and make this conditional
> > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > >
> > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > +
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > +      */
> > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > +     /*
> > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > +      *
> > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > +      */
> > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-13  9:27                       ` Michael S. Tsirkin
@ 2022-06-14  7:40                         ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-14  7:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > >
> > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > >
> > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > IRQ storm.
> > > > > > > > > > >
> > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > >
> > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > expensive.
> > > > > > > > > > >
> > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > >
> > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > >
> > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > >
> > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > I found 5 of these:
> > > > > > > >
> > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > >
> > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > >
> > > > > >
> > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > work. Also, can linux start using the device immediately after
> > > > > > it's registered?
> > > > >
> > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > >
> > > > it's not allowed to kick
> > >
> > > Yes.
> > >
> > > >
> > > > > If yes,
> > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > for a well behaved device.
> > > >
> > > > your patches drop the interrupt though, it won't be just delayed.
> > >
> > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > >
> > > So for virtio bt, it works like:
> > >
> > > 1) driver queue buffer and kick
> > > 2) driver set DRIVER_OK
> > > 3) device start to process the buffer
> > > 4) device send an notification
> > >
> > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > or anything I missed?
> >
> > btw, hci has an open and close method and we do rx refill in
> > hdev->open, so we're probably fine here.
> >
> > Thanks
>
>
> Sounds good. Now to audit the rest of them from this POV ;)

Adding maintainers.

>
>  drivers/i2c/busses/i2c-virtio.c

It looks to me the device could be used immediately after
i2c_add_adapter() return. So we probably need to add
virtio_device_ready() before that. Fortunately, there's no rx vq in
i2c and the callback looks safe if the callback is called before the
i2c registration and after virtio_device_ready().

>  drivers/net/caif/caif_virtio.c

A networking device, RX is backed by vringh so we don't need to
refill. TX is backed by virtio and is available until ndo_open. So
it's fine to let the core to set DRIVER_OK after probe().

>  drivers/nvdimm/virtio_pmem.c

It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.

But the device could be used by the subsystem immediately after
nvdimm_pmem_region_create(), this means the flush could be issued
before DRIVER_OK. We need virtio_device_ready() before. We don't have
a RX virtqueue and the callback looks safe if the callback is called
after virtio_device_ready() but before the nvdimm region creating.

And it looks to me there's a race between the assignment of
provider_data and virtio_pmem_flush(). If the flush was issued before
the assignment we will end up with a NULL pointer dereference. This is
something we need to fix.

>  arm_scmi

It looks to me the singleton device could be used by SCMI immediately after

        /* Ensure initialized scmi_vdev is visible */
        smp_store_mb(scmi_vdev, vdev);

So we probably need to do virtio_device_ready() before that. It has an
optional rx queue but the filling is done after the above assignment,
so it's safe. And the callback looks safe is a callback is triggered
after virtio_device_ready() buy before the above assignment.

>  virtio_rpmsg_bus.c
>

This is somehow more complicated. It has an rx queue, the rx filling
is done before virtio_device_ready() but the kick is done after. And
it looks to me the device could be used by subsystem immediately
rpmsg_virtio_add_ctrl_dev() returns.

This means, if we do virtio_device_ready() after
rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
be exploited.

It requires more thoughts.

Thanks

>
>
> > >
> > > >
> > > > > If not, we need to clarify it in the spec
> > > > > and call virtio_device_ready() before subsystem registration.
> > > >
> > > > hmm, i don't get what we need to clarify
> > >
> > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > device only process the buffer after a kick after DRIVER_OK (I think
> > > no)?
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > >
> > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > me the code is correct.
> > > > > >
> > > > > > OK.
> > > > > >
> > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > >
> > > > > > > The above looks fine and we have three more:
> > > > > > >
> > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > >
> > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > Thanks
> > > > > >
> > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > device once it's registered?
> > > > >
> > > > > It depends on the specific subsystem.
> > > > >
> > > > > For the subsystem that can't use the device immediately, calling
> > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > netdev_register() seems to be fine.
> > > >
> > > > exactly
> > > >
> > > > > For the subsystem that can use the device immediately, if the
> > > > > subsystem does not depend on the result of a request in the probe to
> > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > DRIVER_OK.
> > > >
> > > > Well first won't driver code normally kick as well?
> > >
> > > Kick itself is not blocked.
> > >
> > > > And without kick, won't everything just be blocked?
> > >
> > > It depends on the subsystem. E.g driver can choose to use a callback
> > > instead of polling the used buffer in the probe.
> > >
> > > >
> > > >
> > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > >
> > > > > Thanks
> > > >
> > > > Then we can get an interrupt for an unregistered device.
> > >
> > > It depends on the device. For the device that doesn't have an rx queue
> > > (or device to driver queue), we are fine:
> > >
> > > E.g in virtio-blk:
> > >
> > >         virtio_device_ready(vdev);
> > >
> > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > >         if (err)
> > >                 goto out_cleanup_disk;
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > +      */
> > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > >       if (ret)
> > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > >   * */
> > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > >  {
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > +      */
> > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > >
> > > > > > > > > > So make this conditional
> > > > > > > > > >
> > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > +
> > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > >  }
> > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > >
> > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > +
> > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > >
> > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > -
> > > > > > > > > > >       /*
> > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > >        * driver.
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > >
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > +      */
> > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > >  {
> > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > >
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > +      */
> > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > >  }
> > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > >
> > > > > > > > > > and make this conditional
> > > > > > > > > >
> > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > >       }
> > > > > > > > > > >
> > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > +     }
> > > > > > > > > > >
> > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > >
> > > > > > > > > > and make this conditional
> > > > > > > > > >
> > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > >
> > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > +
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > +      */
> > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > +      *
> > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > +      */
> > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-14  7:40                         ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-14  7:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > >
> > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > >
> > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > IRQ storm.
> > > > > > > > > > >
> > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > >
> > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > expensive.
> > > > > > > > > > >
> > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > >
> > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > >
> > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > >
> > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > I found 5 of these:
> > > > > > > >
> > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > >
> > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > >
> > > > > >
> > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > work. Also, can linux start using the device immediately after
> > > > > > it's registered?
> > > > >
> > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > >
> > > > it's not allowed to kick
> > >
> > > Yes.
> > >
> > > >
> > > > > If yes,
> > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > for a well behaved device.
> > > >
> > > > your patches drop the interrupt though, it won't be just delayed.
> > >
> > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > >
> > > So for virtio bt, it works like:
> > >
> > > 1) driver queue buffer and kick
> > > 2) driver set DRIVER_OK
> > > 3) device start to process the buffer
> > > 4) device send an notification
> > >
> > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > or anything I missed?
> >
> > btw, hci has an open and close method and we do rx refill in
> > hdev->open, so we're probably fine here.
> >
> > Thanks
>
>
> Sounds good. Now to audit the rest of them from this POV ;)

Adding maintainers.

>
>  drivers/i2c/busses/i2c-virtio.c

It looks to me the device could be used immediately after
i2c_add_adapter() return. So we probably need to add
virtio_device_ready() before that. Fortunately, there's no rx vq in
i2c and the callback looks safe if the callback is called before the
i2c registration and after virtio_device_ready().

>  drivers/net/caif/caif_virtio.c

A networking device, RX is backed by vringh so we don't need to
refill. TX is backed by virtio and is available until ndo_open. So
it's fine to let the core to set DRIVER_OK after probe().

>  drivers/nvdimm/virtio_pmem.c

It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.

But the device could be used by the subsystem immediately after
nvdimm_pmem_region_create(), this means the flush could be issued
before DRIVER_OK. We need virtio_device_ready() before. We don't have
a RX virtqueue and the callback looks safe if the callback is called
after virtio_device_ready() but before the nvdimm region creating.

And it looks to me there's a race between the assignment of
provider_data and virtio_pmem_flush(). If the flush was issued before
the assignment we will end up with a NULL pointer dereference. This is
something we need to fix.

>  arm_scmi

It looks to me the singleton device could be used by SCMI immediately after

        /* Ensure initialized scmi_vdev is visible */
        smp_store_mb(scmi_vdev, vdev);

So we probably need to do virtio_device_ready() before that. It has an
optional rx queue but the filling is done after the above assignment,
so it's safe. And the callback looks safe is a callback is triggered
after virtio_device_ready() buy before the above assignment.

>  virtio_rpmsg_bus.c
>

This is somehow more complicated. It has an rx queue, the rx filling
is done before virtio_device_ready() but the kick is done after. And
it looks to me the device could be used by subsystem immediately
rpmsg_virtio_add_ctrl_dev() returns.

This means, if we do virtio_device_ready() after
rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
be exploited.

It requires more thoughts.

Thanks

>
>
> > >
> > > >
> > > > > If not, we need to clarify it in the spec
> > > > > and call virtio_device_ready() before subsystem registration.
> > > >
> > > > hmm, i don't get what we need to clarify
> > >
> > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > device only process the buffer after a kick after DRIVER_OK (I think
> > > no)?
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > >
> > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > me the code is correct.
> > > > > >
> > > > > > OK.
> > > > > >
> > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > >
> > > > > > > The above looks fine and we have three more:
> > > > > > >
> > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > >
> > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > Thanks
> > > > > >
> > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > device once it's registered?
> > > > >
> > > > > It depends on the specific subsystem.
> > > > >
> > > > > For the subsystem that can't use the device immediately, calling
> > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > netdev_register() seems to be fine.
> > > >
> > > > exactly
> > > >
> > > > > For the subsystem that can use the device immediately, if the
> > > > > subsystem does not depend on the result of a request in the probe to
> > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > DRIVER_OK.
> > > >
> > > > Well first won't driver code normally kick as well?
> > >
> > > Kick itself is not blocked.
> > >
> > > > And without kick, won't everything just be blocked?
> > >
> > > It depends on the subsystem. E.g driver can choose to use a callback
> > > instead of polling the used buffer in the probe.
> > >
> > > >
> > > >
> > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > >
> > > > > Thanks
> > > >
> > > > Then we can get an interrupt for an unregistered device.
> > >
> > > It depends on the device. For the device that doesn't have an rx queue
> > > (or device to driver queue), we are fine:
> > >
> > > E.g in virtio-blk:
> > >
> > >         virtio_device_ready(vdev);
> > >
> > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > >         if (err)
> > >                 goto out_cleanup_disk;
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > +      */
> > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > >       if (ret)
> > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > >   * */
> > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > >  {
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > +      */
> > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > >
> > > > > > > > > > So make this conditional
> > > > > > > > > >
> > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > +
> > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > >  }
> > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > >
> > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > +
> > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > >
> > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > -
> > > > > > > > > > >       /*
> > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > >        * driver.
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > >
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > +      */
> > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > >  {
> > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > >
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > +      */
> > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > >  }
> > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > >
> > > > > > > > > > and make this conditional
> > > > > > > > > >
> > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > >       }
> > > > > > > > > > >
> > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > +     }
> > > > > > > > > > >
> > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > >
> > > > > > > > > > and make this conditional
> > > > > > > > > >
> > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > >
> > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > +
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > +      */
> > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > +     /*
> > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > +      *
> > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > +      */
> > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14  7:40                         ` Jason Wang
@ 2022-06-14 13:50                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-14 13:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > >
> > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > >
> > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > >
> > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > >
> > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > I found 5 of these:
> > > > > > > > >
> > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > >
> > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > >
> > > > > > >
> > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > it's registered?
> > > > > >
> > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > >
> > > > > it's not allowed to kick
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > If yes,
> > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > for a well behaved device.
> > > > >
> > > > > your patches drop the interrupt though, it won't be just delayed.
> > > >
> > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > >
> > > > So for virtio bt, it works like:
> > > >
> > > > 1) driver queue buffer and kick
> > > > 2) driver set DRIVER_OK
> > > > 3) device start to process the buffer
> > > > 4) device send an notification
> > > >
> > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > or anything I missed?
> > >
> > > btw, hci has an open and close method and we do rx refill in
> > > hdev->open, so we're probably fine here.
> > >
> > > Thanks
> >
> >
> > Sounds good. Now to audit the rest of them from this POV ;)
> 
> Adding maintainers.
> 
> >
> >  drivers/i2c/busses/i2c-virtio.c
> 
> It looks to me the device could be used immediately after
> i2c_add_adapter() return. So we probably need to add
> virtio_device_ready() before that. Fortunately, there's no rx vq in
> i2c and the callback looks safe if the callback is called before the
> i2c registration and after virtio_device_ready().
> 
> >  drivers/net/caif/caif_virtio.c
> 
> A networking device, RX is backed by vringh so we don't need to
> refill. TX is backed by virtio and is available until ndo_open. So
> it's fine to let the core to set DRIVER_OK after probe().

How about we just add an explicit ready in the driver anyway?
I think the implicit ready is just creating a mess as people
tend to forget to think about it.

> >  drivers/nvdimm/virtio_pmem.c
> 
> It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> 
> But the device could be used by the subsystem immediately after
> nvdimm_pmem_region_create(), this means the flush could be issued
> before DRIVER_OK. We need virtio_device_ready() before. We don't have
> a RX virtqueue and the callback looks safe if the callback is called
> after virtio_device_ready() but before the nvdimm region creating.
> 
> And it looks to me there's a race between the assignment of
> provider_data and virtio_pmem_flush(). If the flush was issued before
> the assignment we will end up with a NULL pointer dereference. This is
> something we need to fix.
> 
> >  arm_scmi
> 
> It looks to me the singleton device could be used by SCMI immediately after
> 
>         /* Ensure initialized scmi_vdev is visible */
>         smp_store_mb(scmi_vdev, vdev);
> 
> So we probably need to do virtio_device_ready() before that. It has an
> optional rx queue but the filling is done after the above assignment,
> so it's safe. And the callback looks safe is a callback is triggered
> after virtio_device_ready() buy before the above assignment.
> 
> >  virtio_rpmsg_bus.c
> >
> 
> This is somehow more complicated. It has an rx queue, the rx filling
> is done before virtio_device_ready() but the kick is done after. And
> it looks to me the device could be used by subsystem immediately
> rpmsg_virtio_add_ctrl_dev() returns.
> 
> This means, if we do virtio_device_ready() after
> rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> be exploited.
> 
> It requires more thoughts.
> 
> Thanks
> 
> >
> >
> > > >
> > > > >
> > > > > > If not, we need to clarify it in the spec
> > > > > > and call virtio_device_ready() before subsystem registration.
> > > > >
> > > > > hmm, i don't get what we need to clarify
> > > >
> > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > no)?
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > >
> > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > me the code is correct.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > >
> > > > > > > > The above looks fine and we have three more:
> > > > > > > >
> > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > >
> > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > device once it's registered?
> > > > > >
> > > > > > It depends on the specific subsystem.
> > > > > >
> > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > netdev_register() seems to be fine.
> > > > >
> > > > > exactly
> > > > >
> > > > > > For the subsystem that can use the device immediately, if the
> > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > DRIVER_OK.
> > > > >
> > > > > Well first won't driver code normally kick as well?
> > > >
> > > > Kick itself is not blocked.
> > > >
> > > > > And without kick, won't everything just be blocked?
> > > >
> > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > instead of polling the used buffer in the probe.
> > > >
> > > > >
> > > > >
> > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Then we can get an interrupt for an unregistered device.
> > > >
> > > > It depends on the device. For the device that doesn't have an rx queue
> > > > (or device to driver queue), we are fine:
> > > >
> > > > E.g in virtio-blk:
> > > >
> > > >         virtio_device_ready(vdev);
> > > >
> > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > >         if (err)
> > > >                 goto out_cleanup_disk;
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > >   * */
> > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > >  {
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > >
> > > > > > > > > > > So make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +
> > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > >
> > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > +
> > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > >
> > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > -
> > > > > > > > > > > >       /*
> > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > >        * driver.
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > >  {
> > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > >       }
> > > > > > > > > > > >
> > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > +     }
> > > > > > > > > > > >
> > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > >
> > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > +
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > +      *
> > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-14 13:50                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-14 13:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > >
> > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > >
> > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > >
> > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > >
> > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > I found 5 of these:
> > > > > > > > >
> > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > >
> > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > >
> > > > > > >
> > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > it's registered?
> > > > > >
> > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > >
> > > > > it's not allowed to kick
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > If yes,
> > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > for a well behaved device.
> > > > >
> > > > > your patches drop the interrupt though, it won't be just delayed.
> > > >
> > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > >
> > > > So for virtio bt, it works like:
> > > >
> > > > 1) driver queue buffer and kick
> > > > 2) driver set DRIVER_OK
> > > > 3) device start to process the buffer
> > > > 4) device send an notification
> > > >
> > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > or anything I missed?
> > >
> > > btw, hci has an open and close method and we do rx refill in
> > > hdev->open, so we're probably fine here.
> > >
> > > Thanks
> >
> >
> > Sounds good. Now to audit the rest of them from this POV ;)
> 
> Adding maintainers.
> 
> >
> >  drivers/i2c/busses/i2c-virtio.c
> 
> It looks to me the device could be used immediately after
> i2c_add_adapter() return. So we probably need to add
> virtio_device_ready() before that. Fortunately, there's no rx vq in
> i2c and the callback looks safe if the callback is called before the
> i2c registration and after virtio_device_ready().
> 
> >  drivers/net/caif/caif_virtio.c
> 
> A networking device, RX is backed by vringh so we don't need to
> refill. TX is backed by virtio and is available until ndo_open. So
> it's fine to let the core to set DRIVER_OK after probe().

How about we just add an explicit ready in the driver anyway?
I think the implicit ready is just creating a mess as people
tend to forget to think about it.

> >  drivers/nvdimm/virtio_pmem.c
> 
> It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> 
> But the device could be used by the subsystem immediately after
> nvdimm_pmem_region_create(), this means the flush could be issued
> before DRIVER_OK. We need virtio_device_ready() before. We don't have
> a RX virtqueue and the callback looks safe if the callback is called
> after virtio_device_ready() but before the nvdimm region creating.
> 
> And it looks to me there's a race between the assignment of
> provider_data and virtio_pmem_flush(). If the flush was issued before
> the assignment we will end up with a NULL pointer dereference. This is
> something we need to fix.
> 
> >  arm_scmi
> 
> It looks to me the singleton device could be used by SCMI immediately after
> 
>         /* Ensure initialized scmi_vdev is visible */
>         smp_store_mb(scmi_vdev, vdev);
> 
> So we probably need to do virtio_device_ready() before that. It has an
> optional rx queue but the filling is done after the above assignment,
> so it's safe. And the callback looks safe is a callback is triggered
> after virtio_device_ready() buy before the above assignment.
> 
> >  virtio_rpmsg_bus.c
> >
> 
> This is somehow more complicated. It has an rx queue, the rx filling
> is done before virtio_device_ready() but the kick is done after. And
> it looks to me the device could be used by subsystem immediately
> rpmsg_virtio_add_ctrl_dev() returns.
> 
> This means, if we do virtio_device_ready() after
> rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> be exploited.
> 
> It requires more thoughts.
> 
> Thanks
> 
> >
> >
> > > >
> > > > >
> > > > > > If not, we need to clarify it in the spec
> > > > > > and call virtio_device_ready() before subsystem registration.
> > > > >
> > > > > hmm, i don't get what we need to clarify
> > > >
> > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > no)?
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > >
> > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > me the code is correct.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > >
> > > > > > > > The above looks fine and we have three more:
> > > > > > > >
> > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > >
> > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > device once it's registered?
> > > > > >
> > > > > > It depends on the specific subsystem.
> > > > > >
> > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > netdev_register() seems to be fine.
> > > > >
> > > > > exactly
> > > > >
> > > > > > For the subsystem that can use the device immediately, if the
> > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > DRIVER_OK.
> > > > >
> > > > > Well first won't driver code normally kick as well?
> > > >
> > > > Kick itself is not blocked.
> > > >
> > > > > And without kick, won't everything just be blocked?
> > > >
> > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > instead of polling the used buffer in the probe.
> > > >
> > > > >
> > > > >
> > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Then we can get an interrupt for an unregistered device.
> > > >
> > > > It depends on the device. For the device that doesn't have an rx queue
> > > > (or device to driver queue), we are fine:
> > > >
> > > > E.g in virtio-blk:
> > > >
> > > >         virtio_device_ready(vdev);
> > > >
> > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > >         if (err)
> > > >                 goto out_cleanup_disk;
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > >   * */
> > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > >  {
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > >
> > > > > > > > > > > So make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +
> > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > >
> > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > +
> > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > >
> > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > -
> > > > > > > > > > > >       /*
> > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > >        * driver.
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > >  {
> > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > >       }
> > > > > > > > > > > >
> > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > +     }
> > > > > > > > > > > >
> > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > >
> > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > +
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > +      *
> > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14  7:40                         ` Jason Wang
@ 2022-06-14 15:49                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-14 15:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > >
> > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > >
> > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > >
> > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > >
> > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > I found 5 of these:
> > > > > > > > >
> > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > >
> > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > >
> > > > > > >
> > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > it's registered?
> > > > > >
> > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > >
> > > > > it's not allowed to kick
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > If yes,
> > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > for a well behaved device.
> > > > >
> > > > > your patches drop the interrupt though, it won't be just delayed.
> > > >
> > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > >
> > > > So for virtio bt, it works like:
> > > >
> > > > 1) driver queue buffer and kick
> > > > 2) driver set DRIVER_OK
> > > > 3) device start to process the buffer
> > > > 4) device send an notification
> > > >
> > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > or anything I missed?
> > >
> > > btw, hci has an open and close method and we do rx refill in
> > > hdev->open, so we're probably fine here.
> > >
> > > Thanks
> >
> >
> > Sounds good. Now to audit the rest of them from this POV ;)
> 
> Adding maintainers.
> 
> >
> >  drivers/i2c/busses/i2c-virtio.c
> 
> It looks to me the device could be used immediately after
> i2c_add_adapter() return. So we probably need to add
> virtio_device_ready() before that. Fortunately, there's no rx vq in
> i2c and the callback looks safe if the callback is called before the
> i2c registration and after virtio_device_ready().
> 
> >  drivers/net/caif/caif_virtio.c
> 
> A networking device, RX is backed by vringh so we don't need to
> refill. TX is backed by virtio and is available until ndo_open. So
> it's fine to let the core to set DRIVER_OK after probe().
> 
> >  drivers/nvdimm/virtio_pmem.c
> 
> It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> 
> But the device could be used by the subsystem immediately after
> nvdimm_pmem_region_create(), this means the flush could be issued
> before DRIVER_OK. We need virtio_device_ready() before. We don't have
> a RX virtqueue and the callback looks safe if the callback is called
> after virtio_device_ready() but before the nvdimm region creating.
> 
> And it looks to me there's a race between the assignment of
> provider_data and virtio_pmem_flush(). If the flush was issued before
> the assignment we will end up with a NULL pointer dereference. This is
> something we need to fix.
> 
> >  arm_scmi
> 
> It looks to me the singleton device could be used by SCMI immediately after
> 
>         /* Ensure initialized scmi_vdev is visible */
>         smp_store_mb(scmi_vdev, vdev);
> 
> So we probably need to do virtio_device_ready() before that. It has an
> optional rx queue but the filling is done after the above assignment,
> so it's safe. And the callback looks safe is a callback is triggered
> after virtio_device_ready() buy before the above assignment.
> 
> >  virtio_rpmsg_bus.c
> >
> 
> This is somehow more complicated. It has an rx queue, the rx filling
> is done before virtio_device_ready() but the kick is done after. And
> it looks to me the device could be used by subsystem immediately
> rpmsg_virtio_add_ctrl_dev() returns.
> 
> This means, if we do virtio_device_ready() after
> rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> be exploited.
> 
> It requires more thoughts.
> 
> Thanks

I think at this point let's do it before so we at least do not
get a regression with your patches, add a big comment and work
on fixing properly in the next Linux version. Do you think you can
commit to a full fix in the next linux version?


> >
> >
> > > >
> > > > >
> > > > > > If not, we need to clarify it in the spec
> > > > > > and call virtio_device_ready() before subsystem registration.
> > > > >
> > > > > hmm, i don't get what we need to clarify
> > > >
> > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > no)?
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > >
> > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > me the code is correct.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > >
> > > > > > > > The above looks fine and we have three more:
> > > > > > > >
> > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > >
> > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > device once it's registered?
> > > > > >
> > > > > > It depends on the specific subsystem.
> > > > > >
> > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > netdev_register() seems to be fine.
> > > > >
> > > > > exactly
> > > > >
> > > > > > For the subsystem that can use the device immediately, if the
> > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > DRIVER_OK.
> > > > >
> > > > > Well first won't driver code normally kick as well?
> > > >
> > > > Kick itself is not blocked.
> > > >
> > > > > And without kick, won't everything just be blocked?
> > > >
> > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > instead of polling the used buffer in the probe.
> > > >
> > > > >
> > > > >
> > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Then we can get an interrupt for an unregistered device.
> > > >
> > > > It depends on the device. For the device that doesn't have an rx queue
> > > > (or device to driver queue), we are fine:
> > > >
> > > > E.g in virtio-blk:
> > > >
> > > >         virtio_device_ready(vdev);
> > > >
> > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > >         if (err)
> > > >                 goto out_cleanup_disk;
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > >   * */
> > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > >  {
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > >
> > > > > > > > > > > So make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +
> > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > >
> > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > +
> > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > >
> > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > -
> > > > > > > > > > > >       /*
> > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > >        * driver.
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > >  {
> > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > >       }
> > > > > > > > > > > >
> > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > +     }
> > > > > > > > > > > >
> > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > >
> > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > +
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > +      *
> > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-14 15:49                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-14 15:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > >
> > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > >
> > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > >
> > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > >
> > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > I found 5 of these:
> > > > > > > > >
> > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > >
> > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > >
> > > > > > >
> > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > it's registered?
> > > > > >
> > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > >
> > > > > it's not allowed to kick
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > If yes,
> > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > for a well behaved device.
> > > > >
> > > > > your patches drop the interrupt though, it won't be just delayed.
> > > >
> > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > >
> > > > So for virtio bt, it works like:
> > > >
> > > > 1) driver queue buffer and kick
> > > > 2) driver set DRIVER_OK
> > > > 3) device start to process the buffer
> > > > 4) device send an notification
> > > >
> > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > or anything I missed?
> > >
> > > btw, hci has an open and close method and we do rx refill in
> > > hdev->open, so we're probably fine here.
> > >
> > > Thanks
> >
> >
> > Sounds good. Now to audit the rest of them from this POV ;)
> 
> Adding maintainers.
> 
> >
> >  drivers/i2c/busses/i2c-virtio.c
> 
> It looks to me the device could be used immediately after
> i2c_add_adapter() return. So we probably need to add
> virtio_device_ready() before that. Fortunately, there's no rx vq in
> i2c and the callback looks safe if the callback is called before the
> i2c registration and after virtio_device_ready().
> 
> >  drivers/net/caif/caif_virtio.c
> 
> A networking device, RX is backed by vringh so we don't need to
> refill. TX is backed by virtio and is available until ndo_open. So
> it's fine to let the core to set DRIVER_OK after probe().
> 
> >  drivers/nvdimm/virtio_pmem.c
> 
> It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> 
> But the device could be used by the subsystem immediately after
> nvdimm_pmem_region_create(), this means the flush could be issued
> before DRIVER_OK. We need virtio_device_ready() before. We don't have
> a RX virtqueue and the callback looks safe if the callback is called
> after virtio_device_ready() but before the nvdimm region creating.
> 
> And it looks to me there's a race between the assignment of
> provider_data and virtio_pmem_flush(). If the flush was issued before
> the assignment we will end up with a NULL pointer dereference. This is
> something we need to fix.
> 
> >  arm_scmi
> 
> It looks to me the singleton device could be used by SCMI immediately after
> 
>         /* Ensure initialized scmi_vdev is visible */
>         smp_store_mb(scmi_vdev, vdev);
> 
> So we probably need to do virtio_device_ready() before that. It has an
> optional rx queue but the filling is done after the above assignment,
> so it's safe. And the callback looks safe is a callback is triggered
> after virtio_device_ready() buy before the above assignment.
> 
> >  virtio_rpmsg_bus.c
> >
> 
> This is somehow more complicated. It has an rx queue, the rx filling
> is done before virtio_device_ready() but the kick is done after. And
> it looks to me the device could be used by subsystem immediately
> rpmsg_virtio_add_ctrl_dev() returns.
> 
> This means, if we do virtio_device_ready() after
> rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> be exploited.
> 
> It requires more thoughts.
> 
> Thanks

I think at this point let's do it before so we at least do not
get a regression with your patches, add a big comment and work
on fixing properly in the next Linux version. Do you think you can
commit to a full fix in the next linux version?


> >
> >
> > > >
> > > > >
> > > > > > If not, we need to clarify it in the spec
> > > > > > and call virtio_device_ready() before subsystem registration.
> > > > >
> > > > > hmm, i don't get what we need to clarify
> > > >
> > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > no)?
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > >
> > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > me the code is correct.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > >
> > > > > > > > The above looks fine and we have three more:
> > > > > > > >
> > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > >
> > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > device once it's registered?
> > > > > >
> > > > > > It depends on the specific subsystem.
> > > > > >
> > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > netdev_register() seems to be fine.
> > > > >
> > > > > exactly
> > > > >
> > > > > > For the subsystem that can use the device immediately, if the
> > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > DRIVER_OK.
> > > > >
> > > > > Well first won't driver code normally kick as well?
> > > >
> > > > Kick itself is not blocked.
> > > >
> > > > > And without kick, won't everything just be blocked?
> > > >
> > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > instead of polling the used buffer in the probe.
> > > >
> > > > >
> > > > >
> > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Then we can get an interrupt for an unregistered device.
> > > >
> > > > It depends on the device. For the device that doesn't have an rx queue
> > > > (or device to driver queue), we are fine:
> > > >
> > > > E.g in virtio-blk:
> > > >
> > > >         virtio_device_ready(vdev);
> > > >
> > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > >         if (err)
> > > >                 goto out_cleanup_disk;
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > >   * */
> > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > >  {
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > >
> > > > > > > > > > > So make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +
> > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > >
> > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > +
> > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > >
> > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > -
> > > > > > > > > > > >       /*
> > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > >        * driver.
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > >  {
> > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > >
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > >  }
> > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > >       }
> > > > > > > > > > > >
> > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > +     }
> > > > > > > > > > > >
> > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > >
> > > > > > > > > > > and make this conditional
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > >
> > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > +
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > +      */
> > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > +     /*
> > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > +      *
> > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > +      */
> > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14  7:40                         ` Jason Wang
                                           ` (2 preceding siblings ...)
  (?)
@ 2022-06-14 16:46                         ` Cristian Marussi
  2022-06-15  1:41                             ` Jason Wang
  -1 siblings, 1 reply; 106+ messages in thread
From: Cristian Marussi @ 2022-06-14 16:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Paul E. McKenney, Marc Zyngier,
	Halil Pasic, Cornelia Huck, eperezma, Cindy Lu,
	Stefano Garzarella, Xuan Zhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390, conghui.chen, Viresh Kumar,
	netdev, pankaj.gupta.linux, sudeep.holla, Bjorn Andersson,
	Mathieu Poirier

On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >

Hi Jason,

> > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > >
> > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > >
> > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > >
> > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > >
> > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > I found 5 of these:
> > > > > > > > >
> > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > >
> > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > >
> > > > > > >
> > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > it's registered?
> > > > > >
> > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > >
> > > > > it's not allowed to kick
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > If yes,
> > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > for a well behaved device.
> > > > >
> > > > > your patches drop the interrupt though, it won't be just delayed.
> > > >
> > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > >
> > > > So for virtio bt, it works like:
> > > >
> > > > 1) driver queue buffer and kick
> > > > 2) driver set DRIVER_OK
> > > > 3) device start to process the buffer
> > > > 4) device send an notification
> > > >
> > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > or anything I missed?
> > >
> > > btw, hci has an open and close method and we do rx refill in
> > > hdev->open, so we're probably fine here.
> > >
> > > Thanks
> >
> >
> > Sounds good. Now to audit the rest of them from this POV ;)
> 
> Adding maintainers.
> 
> >
> >  drivers/i2c/busses/i2c-virtio.c
> 
> It looks to me the device could be used immediately after
> i2c_add_adapter() return. So we probably need to add
> virtio_device_ready() before that. Fortunately, there's no rx vq in
> i2c and the callback looks safe if the callback is called before the
> i2c registration and after virtio_device_ready().
> 
> >  drivers/net/caif/caif_virtio.c
> 
> A networking device, RX is backed by vringh so we don't need to
> refill. TX is backed by virtio and is available until ndo_open. So
> it's fine to let the core to set DRIVER_OK after probe().
> 
> >  drivers/nvdimm/virtio_pmem.c
> 
> It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> 
> But the device could be used by the subsystem immediately after
> nvdimm_pmem_region_create(), this means the flush could be issued
> before DRIVER_OK. We need virtio_device_ready() before. We don't have
> a RX virtqueue and the callback looks safe if the callback is called
> after virtio_device_ready() but before the nvdimm region creating.
> 
> And it looks to me there's a race between the assignment of
> provider_data and virtio_pmem_flush(). If the flush was issued before
> the assignment we will end up with a NULL pointer dereference. This is
> something we need to fix.
> 
> >  arm_scmi
> 
> It looks to me the singleton device could be used by SCMI immediately after
> 
>         /* Ensure initialized scmi_vdev is visible */
>         smp_store_mb(scmi_vdev, vdev);
> 
> So we probably need to do virtio_device_ready() before that. It has an
> optional rx queue but the filling is done after the above assignment,
> so it's safe. And the callback looks safe is a callback is triggered
> after virtio_device_ready() buy before the above assignment.
> 

I wanted to give it a go at this series testing it on the context of
SCMI but it does not apply

- not on a v5.18:

17:33 $ git rebase -i v5.18
17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
Applying: virtio: use virtio_device_ready() in virtio_device_restore()
Applying: virtio: use virtio_reset_device() when possible
Applying: virtio: introduce config op to synchronize vring callbacks
Applying: virtio-pci: implement synchronize_cbs()
Applying: virtio-mmio: implement synchronize_cbs()
error: patch failed: drivers/virtio/virtio_mmio.c:345
error: drivers/virtio/virtio_mmio.c: patch does not apply
Patch failed at 0005 virtio-mmio: implement synchronize_cbs()

- neither on a v5.19-rc2:

17:33 $ git rebase -i v5.19-rc2 
17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
Applying: virtio: use virtio_device_ready() in virtio_device_restore()
error: patch failed: drivers/virtio/virtio.c:526
error: drivers/virtio/virtio.c: patch does not apply
Patch failed at 0001 virtio: use virtio_device_ready() in
virtio_device_restore()
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".

... what I should take as base ?

Thanks,
Cristian


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14 13:50                           ` Michael S. Tsirkin
@ 2022-06-15  1:32                             ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Tue, Jun 14, 2022 at 9:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
>
> How about we just add an explicit ready in the driver anyway?
> I think the implicit ready is just creating a mess as people
> tend to forget to think about it.

This is possible, and we could fail the probe if ready is not set by a driver.

Thanks

>
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
> > >  virtio_rpmsg_bus.c
> > >
> >
> > This is somehow more complicated. It has an rx queue, the rx filling
> > is done before virtio_device_ready() but the kick is done after. And
> > it looks to me the device could be used by subsystem immediately
> > rpmsg_virtio_add_ctrl_dev() returns.
> >
> > This means, if we do virtio_device_ready() after
> > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > be exploited.
> >
> > It requires more thoughts.
> >
> > Thanks
> >
> > >
> > >
> > > > >
> > > > > >
> > > > > > > If not, we need to clarify it in the spec
> > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > >
> > > > > > hmm, i don't get what we need to clarify
> > > > >
> > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > no)?
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > >
> > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > me the code is correct.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > >
> > > > > > > > > The above looks fine and we have three more:
> > > > > > > > >
> > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > >
> > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > device once it's registered?
> > > > > > >
> > > > > > > It depends on the specific subsystem.
> > > > > > >
> > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > netdev_register() seems to be fine.
> > > > > >
> > > > > > exactly
> > > > > >
> > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > DRIVER_OK.
> > > > > >
> > > > > > Well first won't driver code normally kick as well?
> > > > >
> > > > > Kick itself is not blocked.
> > > > >
> > > > > > And without kick, won't everything just be blocked?
> > > > >
> > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > instead of polling the used buffer in the probe.
> > > > >
> > > > > >
> > > > > >
> > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Then we can get an interrupt for an unregistered device.
> > > > >
> > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > (or device to driver queue), we are fine:
> > > > >
> > > > > E.g in virtio-blk:
> > > > >
> > > > >         virtio_device_ready(vdev);
> > > > >
> > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > >         if (err)
> > > > >                 goto out_cleanup_disk;
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > >   * */
> > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > So make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > -
> > > > > > > > > > > > >       /*
> > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > >
> > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > +      *
> > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-15  1:32                             ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Tue, Jun 14, 2022 at 9:50 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
>
> How about we just add an explicit ready in the driver anyway?
> I think the implicit ready is just creating a mess as people
> tend to forget to think about it.

This is possible, and we could fail the probe if ready is not set by a driver.

Thanks

>
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
> > >  virtio_rpmsg_bus.c
> > >
> >
> > This is somehow more complicated. It has an rx queue, the rx filling
> > is done before virtio_device_ready() but the kick is done after. And
> > it looks to me the device could be used by subsystem immediately
> > rpmsg_virtio_add_ctrl_dev() returns.
> >
> > This means, if we do virtio_device_ready() after
> > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > be exploited.
> >
> > It requires more thoughts.
> >
> > Thanks
> >
> > >
> > >
> > > > >
> > > > > >
> > > > > > > If not, we need to clarify it in the spec
> > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > >
> > > > > > hmm, i don't get what we need to clarify
> > > > >
> > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > no)?
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > >
> > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > me the code is correct.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > >
> > > > > > > > > The above looks fine and we have three more:
> > > > > > > > >
> > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > >
> > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > device once it's registered?
> > > > > > >
> > > > > > > It depends on the specific subsystem.
> > > > > > >
> > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > netdev_register() seems to be fine.
> > > > > >
> > > > > > exactly
> > > > > >
> > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > DRIVER_OK.
> > > > > >
> > > > > > Well first won't driver code normally kick as well?
> > > > >
> > > > > Kick itself is not blocked.
> > > > >
> > > > > > And without kick, won't everything just be blocked?
> > > > >
> > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > instead of polling the used buffer in the probe.
> > > > >
> > > > > >
> > > > > >
> > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Then we can get an interrupt for an unregistered device.
> > > > >
> > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > (or device to driver queue), we are fine:
> > > > >
> > > > > E.g in virtio-blk:
> > > > >
> > > > >         virtio_device_ready(vdev);
> > > > >
> > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > >         if (err)
> > > > >                 goto out_cleanup_disk;
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > >   * */
> > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > So make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > -
> > > > > > > > > > > > >       /*
> > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > >
> > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > +      *
> > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14 15:49                           ` Michael S. Tsirkin
@ 2022-06-15  1:38                             ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
> >
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
> > >  virtio_rpmsg_bus.c
> > >
> >
> > This is somehow more complicated. It has an rx queue, the rx filling
> > is done before virtio_device_ready() but the kick is done after. And
> > it looks to me the device could be used by subsystem immediately
> > rpmsg_virtio_add_ctrl_dev() returns.
> >
> > This means, if we do virtio_device_ready() after
> > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > be exploited.
> >
> > It requires more thoughts.
> >
> > Thanks
>
> I think at this point let's do it before so we at least do not
> get a regression with your patches, add a big comment and work
> on fixing properly in the next Linux version. Do you think you can
> commit to a full fix in the next linux version?

I think it should be ok.

If I understand you correctly, you meant to disable the hardening in
this release?

(Actually, my understanding is that since we are developing mainline
instead of a downstream version with a hardening features, bug reports
are somehow expected, especially consider most of the bugs are not
related to hardening itself)

Thanks

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 13a7348cedff..7ef3115efbad 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
        vq->we_own_ring = true;
        vq->notify = notify;
        vq->weak_barriers = weak_barriers;
-       vq->broken = true;
+       vq->broken = false;
        vq->last_used_idx = 0;
        vq->event_triggered = false;
        vq->num_added = 0;

>
>
> > >
> > >
> > > > >
> > > > > >
> > > > > > > If not, we need to clarify it in the spec
> > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > >
> > > > > > hmm, i don't get what we need to clarify
> > > > >
> > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > no)?
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > >
> > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > me the code is correct.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > >
> > > > > > > > > The above looks fine and we have three more:
> > > > > > > > >
> > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > >
> > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > device once it's registered?
> > > > > > >
> > > > > > > It depends on the specific subsystem.
> > > > > > >
> > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > netdev_register() seems to be fine.
> > > > > >
> > > > > > exactly
> > > > > >
> > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > DRIVER_OK.
> > > > > >
> > > > > > Well first won't driver code normally kick as well?
> > > > >
> > > > > Kick itself is not blocked.
> > > > >
> > > > > > And without kick, won't everything just be blocked?
> > > > >
> > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > instead of polling the used buffer in the probe.
> > > > >
> > > > > >
> > > > > >
> > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Then we can get an interrupt for an unregistered device.
> > > > >
> > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > (or device to driver queue), we are fine:
> > > > >
> > > > > E.g in virtio-blk:
> > > > >
> > > > >         virtio_device_ready(vdev);
> > > > >
> > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > >         if (err)
> > > > >                 goto out_cleanup_disk;
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > >   * */
> > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > So make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > -
> > > > > > > > > > > > >       /*
> > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > >
> > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > +      *
> > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
>


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-15  1:38                             ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
> >
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
> > >  virtio_rpmsg_bus.c
> > >
> >
> > This is somehow more complicated. It has an rx queue, the rx filling
> > is done before virtio_device_ready() but the kick is done after. And
> > it looks to me the device could be used by subsystem immediately
> > rpmsg_virtio_add_ctrl_dev() returns.
> >
> > This means, if we do virtio_device_ready() after
> > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > be exploited.
> >
> > It requires more thoughts.
> >
> > Thanks
>
> I think at this point let's do it before so we at least do not
> get a regression with your patches, add a big comment and work
> on fixing properly in the next Linux version. Do you think you can
> commit to a full fix in the next linux version?

I think it should be ok.

If I understand you correctly, you meant to disable the hardening in
this release?

(Actually, my understanding is that since we are developing mainline
instead of a downstream version with a hardening features, bug reports
are somehow expected, especially consider most of the bugs are not
related to hardening itself)

Thanks

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 13a7348cedff..7ef3115efbad 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
        vq->we_own_ring = true;
        vq->notify = notify;
        vq->weak_barriers = weak_barriers;
-       vq->broken = true;
+       vq->broken = false;
        vq->last_used_idx = 0;
        vq->event_triggered = false;
        vq->num_added = 0;

>
>
> > >
> > >
> > > > >
> > > > > >
> > > > > > > If not, we need to clarify it in the spec
> > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > >
> > > > > > hmm, i don't get what we need to clarify
> > > > >
> > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > no)?
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > >
> > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > me the code is correct.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > >
> > > > > > > > > The above looks fine and we have three more:
> > > > > > > > >
> > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > >
> > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > device once it's registered?
> > > > > > >
> > > > > > > It depends on the specific subsystem.
> > > > > > >
> > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > netdev_register() seems to be fine.
> > > > > >
> > > > > > exactly
> > > > > >
> > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > DRIVER_OK.
> > > > > >
> > > > > > Well first won't driver code normally kick as well?
> > > > >
> > > > > Kick itself is not blocked.
> > > > >
> > > > > > And without kick, won't everything just be blocked?
> > > > >
> > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > instead of polling the used buffer in the probe.
> > > > >
> > > > > >
> > > > > >
> > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Then we can get an interrupt for an unregistered device.
> > > > >
> > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > (or device to driver queue), we are fine:
> > > > >
> > > > > E.g in virtio-blk:
> > > > >
> > > > >         virtio_device_ready(vdev);
> > > > >
> > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > >         if (err)
> > > > >                 goto out_cleanup_disk;
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > >   * */
> > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > So make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > -
> > > > > > > > > > > > >       /*
> > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > >
> > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > >
> > > > > > > > > > > > and make this conditional
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > +      *
> > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > +      */
> > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-14 16:46                         ` Cristian Marussi
@ 2022-06-15  1:41                             ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:41 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Michael S. Tsirkin, virtualization, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Paul E. McKenney, Marc Zyngier,
	Halil Pasic, Cornelia Huck, eperezma, Cindy Lu,
	Stefano Garzarella, Xuan Zhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390, conghui.chen, Viresh Kumar,
	netdev, pankaj.gupta.linux, sudeep.holla, Bjorn Andersson,
	Mathieu Poirier

On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
<cristian.marussi@arm.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
>
> Hi Jason,
>
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
> >
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
>
> I wanted to give it a go at this series testing it on the context of
> SCMI but it does not apply
>
> - not on a v5.18:
>
> 17:33 $ git rebase -i v5.18
> 17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> Applying: virtio: use virtio_reset_device() when possible
> Applying: virtio: introduce config op to synchronize vring callbacks
> Applying: virtio-pci: implement synchronize_cbs()
> Applying: virtio-mmio: implement synchronize_cbs()
> error: patch failed: drivers/virtio/virtio_mmio.c:345
> error: drivers/virtio/virtio_mmio.c: patch does not apply
> Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
>
> - neither on a v5.19-rc2:
>
> 17:33 $ git rebase -i v5.19-rc2
> 17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> error: patch failed: drivers/virtio/virtio.c:526
> error: drivers/virtio/virtio.c: patch does not apply
> Patch failed at 0001 virtio: use virtio_device_ready() in
> virtio_device_restore()
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
>
> ... what I should take as base ?

It should have already been included in rc2, so there's no need to
apply patch manually.

Thanks

>
> Thanks,
> Cristian
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-15  1:41                             ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-15  1:41 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Michael S. Tsirkin, Peter Zijlstra, Viresh Kumar, linux-kernel,
	Vineeth Vijayan, Cindy Lu, Marc Zyngier, Halil Pasic, eperezma,
	Paul E. McKenney, linux-s390, Thomas Gleixner, virtualization,
	conghui.chen, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
<cristian.marussi@arm.com> wrote:
>
> On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
>
> Hi Jason,
>
> > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > >
> > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > >
> > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > >
> > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > >
> > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > I found 5 of these:
> > > > > > > > > >
> > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > >
> > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > >
> > > > > > > >
> > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > it's registered?
> > > > > > >
> > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > >
> > > > > > it's not allowed to kick
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > > If yes,
> > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > for a well behaved device.
> > > > > >
> > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > >
> > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > >
> > > > > So for virtio bt, it works like:
> > > > >
> > > > > 1) driver queue buffer and kick
> > > > > 2) driver set DRIVER_OK
> > > > > 3) device start to process the buffer
> > > > > 4) device send an notification
> > > > >
> > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > or anything I missed?
> > > >
> > > > btw, hci has an open and close method and we do rx refill in
> > > > hdev->open, so we're probably fine here.
> > > >
> > > > Thanks
> > >
> > >
> > > Sounds good. Now to audit the rest of them from this POV ;)
> >
> > Adding maintainers.
> >
> > >
> > >  drivers/i2c/busses/i2c-virtio.c
> >
> > It looks to me the device could be used immediately after
> > i2c_add_adapter() return. So we probably need to add
> > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > i2c and the callback looks safe if the callback is called before the
> > i2c registration and after virtio_device_ready().
> >
> > >  drivers/net/caif/caif_virtio.c
> >
> > A networking device, RX is backed by vringh so we don't need to
> > refill. TX is backed by virtio and is available until ndo_open. So
> > it's fine to let the core to set DRIVER_OK after probe().
> >
> > >  drivers/nvdimm/virtio_pmem.c
> >
> > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> >
> > But the device could be used by the subsystem immediately after
> > nvdimm_pmem_region_create(), this means the flush could be issued
> > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > a RX virtqueue and the callback looks safe if the callback is called
> > after virtio_device_ready() but before the nvdimm region creating.
> >
> > And it looks to me there's a race between the assignment of
> > provider_data and virtio_pmem_flush(). If the flush was issued before
> > the assignment we will end up with a NULL pointer dereference. This is
> > something we need to fix.
> >
> > >  arm_scmi
> >
> > It looks to me the singleton device could be used by SCMI immediately after
> >
> >         /* Ensure initialized scmi_vdev is visible */
> >         smp_store_mb(scmi_vdev, vdev);
> >
> > So we probably need to do virtio_device_ready() before that. It has an
> > optional rx queue but the filling is done after the above assignment,
> > so it's safe. And the callback looks safe is a callback is triggered
> > after virtio_device_ready() buy before the above assignment.
> >
>
> I wanted to give it a go at this series testing it on the context of
> SCMI but it does not apply
>
> - not on a v5.18:
>
> 17:33 $ git rebase -i v5.18
> 17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> Applying: virtio: use virtio_reset_device() when possible
> Applying: virtio: introduce config op to synchronize vring callbacks
> Applying: virtio-pci: implement synchronize_cbs()
> Applying: virtio-mmio: implement synchronize_cbs()
> error: patch failed: drivers/virtio/virtio_mmio.c:345
> error: drivers/virtio/virtio_mmio.c: patch does not apply
> Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
>
> - neither on a v5.19-rc2:
>
> 17:33 $ git rebase -i v5.19-rc2
> 17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> error: patch failed: drivers/virtio/virtio.c:526
> error: drivers/virtio/virtio.c: patch does not apply
> Patch failed at 0001 virtio: use virtio_device_ready() in
> virtio_device_restore()
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
>
> ... what I should take as base ?

It should have already been included in rc2, so there's no need to
apply patch manually.

Thanks

>
> Thanks,
> Cristian
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-15  1:41                             ` Jason Wang
  (?)
@ 2022-06-15 18:24                             ` Cristian Marussi
  2022-06-17  3:14                                 ` Jason Wang
  -1 siblings, 1 reply; 106+ messages in thread
From: Cristian Marussi @ 2022-06-15 18:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Paul E. McKenney, Marc Zyngier,
	Halil Pasic, Cornelia Huck, eperezma, Cindy Lu,
	Stefano Garzarella, Xuan Zhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390, conghui.chen, Viresh Kumar,
	netdev, pankaj.gupta.linux, sudeep.holla, Bjorn Andersson,
	Mathieu Poirier

On Wed, Jun 15, 2022 at 09:41:18AM +0800, Jason Wang wrote:
> On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
> <cristian.marussi@arm.com> wrote:

Hi Jason,

> >
> > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> >

[snip]

> > >
> > > >  arm_scmi
> > >
> > > It looks to me the singleton device could be used by SCMI immediately after
> > >
> > >         /* Ensure initialized scmi_vdev is visible */
> > >         smp_store_mb(scmi_vdev, vdev);
> > >
> > > So we probably need to do virtio_device_ready() before that. It has an
> > > optional rx queue but the filling is done after the above assignment,
> > > so it's safe. And the callback looks safe is a callback is triggered
> > > after virtio_device_ready() buy before the above assignment.
> > >
> >
> > I wanted to give it a go at this series testing it on the context of
> > SCMI but it does not apply
> >
> > - not on a v5.18:
> >
> > 17:33 $ git rebase -i v5.18
> > 17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > Applying: virtio: use virtio_reset_device() when possible
> > Applying: virtio: introduce config op to synchronize vring callbacks
> > Applying: virtio-pci: implement synchronize_cbs()
> > Applying: virtio-mmio: implement synchronize_cbs()
> > error: patch failed: drivers/virtio/virtio_mmio.c:345
> > error: drivers/virtio/virtio_mmio.c: patch does not apply
> > Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
> >
> > - neither on a v5.19-rc2:
> >
> > 17:33 $ git rebase -i v5.19-rc2
> > 17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > error: patch failed: drivers/virtio/virtio.c:526
> > error: drivers/virtio/virtio.c: patch does not apply
> > Patch failed at 0001 virtio: use virtio_device_ready() in
> > virtio_device_restore()
> > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > When you have resolved this problem, run "git am --continue".
> >
> > ... what I should take as base ?
> 
> It should have already been included in rc2, so there's no need to
> apply patch manually.
> 

I tested this series as included in v5.19-rc2 (WITHOUT adding a virtio_device_ready
in SCMI virtio as you mentioned above ... if I got it right) and I have NOT seen any
issue around SCMI virtio using my usual test setup (using both SCMI vqueues).

No anomalies even when using SCMI virtio in atomic/polling mode.

Adding a virtio_device_ready() at the end of the SCMI virtio probe()
works fine either, it does not make any difference in my setup.
(both using QEMU and kvmtool with this latter NOT supporting
 virtio_V1...not sure if it makes a difference but I thought was worth
 mentioning)

Thanks,
Cristian


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-15  1:38                             ` Jason Wang
@ 2022-06-16 17:11                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-16 17:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, cristian.marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > >
> > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > >
> > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > I found 5 of these:
> > > > > > > > > > >
> > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > >
> > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > it's registered?
> > > > > > > >
> > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > >
> > > > > > > it's not allowed to kick
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > > If yes,
> > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > for a well behaved device.
> > > > > > >
> > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > >
> > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > >
> > > > > > So for virtio bt, it works like:
> > > > > >
> > > > > > 1) driver queue buffer and kick
> > > > > > 2) driver set DRIVER_OK
> > > > > > 3) device start to process the buffer
> > > > > > 4) device send an notification
> > > > > >
> > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > or anything I missed?
> > > > >
> > > > > btw, hci has an open and close method and we do rx refill in
> > > > > hdev->open, so we're probably fine here.
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > Sounds good. Now to audit the rest of them from this POV ;)
> > >
> > > Adding maintainers.
> > >
> > > >
> > > >  drivers/i2c/busses/i2c-virtio.c
> > >
> > > It looks to me the device could be used immediately after
> > > i2c_add_adapter() return. So we probably need to add
> > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > i2c and the callback looks safe if the callback is called before the
> > > i2c registration and after virtio_device_ready().
> > >
> > > >  drivers/net/caif/caif_virtio.c
> > >
> > > A networking device, RX is backed by vringh so we don't need to
> > > refill. TX is backed by virtio and is available until ndo_open. So
> > > it's fine to let the core to set DRIVER_OK after probe().
> > >
> > > >  drivers/nvdimm/virtio_pmem.c
> > >
> > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > >
> > > But the device could be used by the subsystem immediately after
> > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > a RX virtqueue and the callback looks safe if the callback is called
> > > after virtio_device_ready() but before the nvdimm region creating.
> > >
> > > And it looks to me there's a race between the assignment of
> > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > the assignment we will end up with a NULL pointer dereference. This is
> > > something we need to fix.
> > >
> > > >  arm_scmi
> > >
> > > It looks to me the singleton device could be used by SCMI immediately after
> > >
> > >         /* Ensure initialized scmi_vdev is visible */
> > >         smp_store_mb(scmi_vdev, vdev);
> > >
> > > So we probably need to do virtio_device_ready() before that. It has an
> > > optional rx queue but the filling is done after the above assignment,
> > > so it's safe. And the callback looks safe is a callback is triggered
> > > after virtio_device_ready() buy before the above assignment.
> > >
> > > >  virtio_rpmsg_bus.c
> > > >
> > >
> > > This is somehow more complicated. It has an rx queue, the rx filling
> > > is done before virtio_device_ready() but the kick is done after. And
> > > it looks to me the device could be used by subsystem immediately
> > > rpmsg_virtio_add_ctrl_dev() returns.
> > >
> > > This means, if we do virtio_device_ready() after
> > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > be exploited.
> > >
> > > It requires more thoughts.
> > >
> > > Thanks
> >
> > I think at this point let's do it before so we at least do not
> > get a regression with your patches, add a big comment and work
> > on fixing properly in the next Linux version. Do you think you can
> > commit to a full fix in the next linux version?
> 
> I think it should be ok.
> 
> If I understand you correctly, you meant to disable the hardening in
> this release?
> 
> (Actually, my understanding is that since we are developing mainline
> instead of a downstream version with a hardening features, bug reports
> are somehow expected, especially consider most of the bugs are not
> related to hardening itself)


Absolutely. Question is do you think we can fix everything by the
release? At least for rpmsg we don't seem to have a handle on it yet.


> Thanks
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..7ef3115efbad 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->we_own_ring = true;
>         vq->notify = notify;
>         vq->weak_barriers = weak_barriers;
> -       vq->broken = true;
> +       vq->broken = false;
>         vq->last_used_idx = 0;
>         vq->event_triggered = false;
>         vq->num_added = 0;


and drop it on reset?

> >
> >
> > > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > >
> > > > > > > hmm, i don't get what we need to clarify
> > > > > >
> > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > no)?
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > >
> > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > me the code is correct.
> > > > > > > > >
> > > > > > > > > OK.
> > > > > > > > >
> > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > >
> > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > >
> > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > >
> > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > device once it's registered?
> > > > > > > >
> > > > > > > > It depends on the specific subsystem.
> > > > > > > >
> > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > netdev_register() seems to be fine.
> > > > > > >
> > > > > > > exactly
> > > > > > >
> > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > DRIVER_OK.
> > > > > > >
> > > > > > > Well first won't driver code normally kick as well?
> > > > > >
> > > > > > Kick itself is not blocked.
> > > > > >
> > > > > > > And without kick, won't everything just be blocked?
> > > > > >
> > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > instead of polling the used buffer in the probe.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > >
> > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > (or device to driver queue), we are fine:
> > > > > >
> > > > > > E.g in virtio-blk:
> > > > > >
> > > > > >         virtio_device_ready(vdev);
> > > > > >
> > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > >         if (err)
> > > > > >                 goto out_cleanup_disk;
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > >       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-16 17:11                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-16 17:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	cristian.marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > >
> > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > >
> > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > I found 5 of these:
> > > > > > > > > > >
> > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > >
> > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > it's registered?
> > > > > > > >
> > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > >
> > > > > > > it's not allowed to kick
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > > If yes,
> > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > for a well behaved device.
> > > > > > >
> > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > >
> > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > >
> > > > > > So for virtio bt, it works like:
> > > > > >
> > > > > > 1) driver queue buffer and kick
> > > > > > 2) driver set DRIVER_OK
> > > > > > 3) device start to process the buffer
> > > > > > 4) device send an notification
> > > > > >
> > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > or anything I missed?
> > > > >
> > > > > btw, hci has an open and close method and we do rx refill in
> > > > > hdev->open, so we're probably fine here.
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > Sounds good. Now to audit the rest of them from this POV ;)
> > >
> > > Adding maintainers.
> > >
> > > >
> > > >  drivers/i2c/busses/i2c-virtio.c
> > >
> > > It looks to me the device could be used immediately after
> > > i2c_add_adapter() return. So we probably need to add
> > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > i2c and the callback looks safe if the callback is called before the
> > > i2c registration and after virtio_device_ready().
> > >
> > > >  drivers/net/caif/caif_virtio.c
> > >
> > > A networking device, RX is backed by vringh so we don't need to
> > > refill. TX is backed by virtio and is available until ndo_open. So
> > > it's fine to let the core to set DRIVER_OK after probe().
> > >
> > > >  drivers/nvdimm/virtio_pmem.c
> > >
> > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > >
> > > But the device could be used by the subsystem immediately after
> > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > a RX virtqueue and the callback looks safe if the callback is called
> > > after virtio_device_ready() but before the nvdimm region creating.
> > >
> > > And it looks to me there's a race between the assignment of
> > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > the assignment we will end up with a NULL pointer dereference. This is
> > > something we need to fix.
> > >
> > > >  arm_scmi
> > >
> > > It looks to me the singleton device could be used by SCMI immediately after
> > >
> > >         /* Ensure initialized scmi_vdev is visible */
> > >         smp_store_mb(scmi_vdev, vdev);
> > >
> > > So we probably need to do virtio_device_ready() before that. It has an
> > > optional rx queue but the filling is done after the above assignment,
> > > so it's safe. And the callback looks safe is a callback is triggered
> > > after virtio_device_ready() buy before the above assignment.
> > >
> > > >  virtio_rpmsg_bus.c
> > > >
> > >
> > > This is somehow more complicated. It has an rx queue, the rx filling
> > > is done before virtio_device_ready() but the kick is done after. And
> > > it looks to me the device could be used by subsystem immediately
> > > rpmsg_virtio_add_ctrl_dev() returns.
> > >
> > > This means, if we do virtio_device_ready() after
> > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > be exploited.
> > >
> > > It requires more thoughts.
> > >
> > > Thanks
> >
> > I think at this point let's do it before so we at least do not
> > get a regression with your patches, add a big comment and work
> > on fixing properly in the next Linux version. Do you think you can
> > commit to a full fix in the next linux version?
> 
> I think it should be ok.
> 
> If I understand you correctly, you meant to disable the hardening in
> this release?
> 
> (Actually, my understanding is that since we are developing mainline
> instead of a downstream version with a hardening features, bug reports
> are somehow expected, especially consider most of the bugs are not
> related to hardening itself)


Absolutely. Question is do you think we can fix everything by the
release? At least for rpmsg we don't seem to have a handle on it yet.


> Thanks
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..7ef3115efbad 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->we_own_ring = true;
>         vq->notify = notify;
>         vq->weak_barriers = weak_barriers;
> -       vq->broken = true;
> +       vq->broken = false;
>         vq->last_used_idx = 0;
>         vq->event_triggered = false;
>         vq->num_added = 0;


and drop it on reset?

> >
> >
> > > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > >
> > > > > > > hmm, i don't get what we need to clarify
> > > > > >
> > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > no)?
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > >
> > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > me the code is correct.
> > > > > > > > >
> > > > > > > > > OK.
> > > > > > > > >
> > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > >
> > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > >
> > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > >
> > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > device once it's registered?
> > > > > > > >
> > > > > > > > It depends on the specific subsystem.
> > > > > > > >
> > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > netdev_register() seems to be fine.
> > > > > > >
> > > > > > > exactly
> > > > > > >
> > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > DRIVER_OK.
> > > > > > >
> > > > > > > Well first won't driver code normally kick as well?
> > > > > >
> > > > > > Kick itself is not blocked.
> > > > > >
> > > > > > > And without kick, won't everything just be blocked?
> > > > > >
> > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > instead of polling the used buffer in the probe.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > >
> > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > (or device to driver queue), we are fine:
> > > > > >
> > > > > > E.g in virtio-blk:
> > > > > >
> > > > > >         virtio_device_ready(vdev);
> > > > > >
> > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > >         if (err)
> > > > > >                 goto out_cleanup_disk;
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > >       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-16 17:11                               ` Michael S. Tsirkin
@ 2022-06-17  1:24                                 ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  1:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, Cristian Marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > >
> > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > >
> > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > it's registered?
> > > > > > > > >
> > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > >
> > > > > > > > it's not allowed to kick
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > > If yes,
> > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > for a well behaved device.
> > > > > > > >
> > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > >
> > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > >
> > > > > > > So for virtio bt, it works like:
> > > > > > >
> > > > > > > 1) driver queue buffer and kick
> > > > > > > 2) driver set DRIVER_OK
> > > > > > > 3) device start to process the buffer
> > > > > > > 4) device send an notification
> > > > > > >
> > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > or anything I missed?
> > > > > >
> > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > hdev->open, so we're probably fine here.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > >
> > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > >
> > > > Adding maintainers.
> > > >
> > > > >
> > > > >  drivers/i2c/busses/i2c-virtio.c
> > > >
> > > > It looks to me the device could be used immediately after
> > > > i2c_add_adapter() return. So we probably need to add
> > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > i2c and the callback looks safe if the callback is called before the
> > > > i2c registration and after virtio_device_ready().
> > > >
> > > > >  drivers/net/caif/caif_virtio.c
> > > >
> > > > A networking device, RX is backed by vringh so we don't need to
> > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > it's fine to let the core to set DRIVER_OK after probe().
> > > >
> > > > >  drivers/nvdimm/virtio_pmem.c
> > > >
> > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > >
> > > > But the device could be used by the subsystem immediately after
> > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > after virtio_device_ready() but before the nvdimm region creating.
> > > >
> > > > And it looks to me there's a race between the assignment of
> > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > something we need to fix.
> > > >
> > > > >  arm_scmi
> > > >
> > > > It looks to me the singleton device could be used by SCMI immediately after
> > > >
> > > >         /* Ensure initialized scmi_vdev is visible */
> > > >         smp_store_mb(scmi_vdev, vdev);
> > > >
> > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > optional rx queue but the filling is done after the above assignment,
> > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > after virtio_device_ready() buy before the above assignment.
> > > >
> > > > >  virtio_rpmsg_bus.c
> > > > >
> > > >
> > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > is done before virtio_device_ready() but the kick is done after. And
> > > > it looks to me the device could be used by subsystem immediately
> > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > >
> > > > This means, if we do virtio_device_ready() after
> > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > be exploited.
> > > >
> > > > It requires more thoughts.
> > > >
> > > > Thanks
> > >
> > > I think at this point let's do it before so we at least do not
> > > get a regression with your patches, add a big comment and work
> > > on fixing properly in the next Linux version. Do you think you can
> > > commit to a full fix in the next linux version?
> >
> > I think it should be ok.
> >
> > If I understand you correctly, you meant to disable the hardening in
> > this release?
> >
> > (Actually, my understanding is that since we are developing mainline
> > instead of a downstream version with a hardening features, bug reports
> > are somehow expected, especially consider most of the bugs are not
> > related to hardening itself)
>
>
> Absolutely. Question is do you think we can fix everything by the
> release?

Probably not, I'm auditing all the virtio drivers and it seems we have
many issues:

1) race between subsystem registration/use and virtio_device_ready()
2) race between notifications and subsystem registerstiation/use

And it looks to me even virtio-net has this race.

So I think I will post a patch to disable this like below for this release.

> At least for rpmsg we don't seem to have a handle on it yet.

Yes.

>
>
> > Thanks
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 13a7348cedff..7ef3115efbad 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->we_own_ring = true;
> >         vq->notify = notify;
> >         vq->weak_barriers = weak_barriers;
> > -       vq->broken = true;
> > +       vq->broken = false;
> >         vq->last_used_idx = 0;
> >         vq->event_triggered = false;
> >         vq->num_added = 0;
>
>
> and drop it on reset?

Right.

Thanks

>
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > >
> > > > > > > > hmm, i don't get what we need to clarify
> > > > > > >
> > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > no)?
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > >
> > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > me the code is correct.
> > > > > > > > > >
> > > > > > > > > > OK.
> > > > > > > > > >
> > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > >
> > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > >
> > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > >
> > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > device once it's registered?
> > > > > > > > >
> > > > > > > > > It depends on the specific subsystem.
> > > > > > > > >
> > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > netdev_register() seems to be fine.
> > > > > > > >
> > > > > > > > exactly
> > > > > > > >
> > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > DRIVER_OK.
> > > > > > > >
> > > > > > > > Well first won't driver code normally kick as well?
> > > > > > >
> > > > > > > Kick itself is not blocked.
> > > > > > >
> > > > > > > > And without kick, won't everything just be blocked?
> > > > > > >
> > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > instead of polling the used buffer in the probe.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > >
> > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > (or device to driver queue), we are fine:
> > > > > > >
> > > > > > > E.g in virtio-blk:
> > > > > > >
> > > > > > >         virtio_device_ready(vdev);
> > > > > > >
> > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > >         if (err)
> > > > > > >                 goto out_cleanup_disk;
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-17  1:24                                 ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  1:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	Cristian Marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > >
> > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > >
> > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > it's registered?
> > > > > > > > >
> > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > >
> > > > > > > > it's not allowed to kick
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > > If yes,
> > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > for a well behaved device.
> > > > > > > >
> > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > >
> > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > >
> > > > > > > So for virtio bt, it works like:
> > > > > > >
> > > > > > > 1) driver queue buffer and kick
> > > > > > > 2) driver set DRIVER_OK
> > > > > > > 3) device start to process the buffer
> > > > > > > 4) device send an notification
> > > > > > >
> > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > or anything I missed?
> > > > > >
> > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > hdev->open, so we're probably fine here.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > >
> > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > >
> > > > Adding maintainers.
> > > >
> > > > >
> > > > >  drivers/i2c/busses/i2c-virtio.c
> > > >
> > > > It looks to me the device could be used immediately after
> > > > i2c_add_adapter() return. So we probably need to add
> > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > i2c and the callback looks safe if the callback is called before the
> > > > i2c registration and after virtio_device_ready().
> > > >
> > > > >  drivers/net/caif/caif_virtio.c
> > > >
> > > > A networking device, RX is backed by vringh so we don't need to
> > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > it's fine to let the core to set DRIVER_OK after probe().
> > > >
> > > > >  drivers/nvdimm/virtio_pmem.c
> > > >
> > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > >
> > > > But the device could be used by the subsystem immediately after
> > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > after virtio_device_ready() but before the nvdimm region creating.
> > > >
> > > > And it looks to me there's a race between the assignment of
> > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > something we need to fix.
> > > >
> > > > >  arm_scmi
> > > >
> > > > It looks to me the singleton device could be used by SCMI immediately after
> > > >
> > > >         /* Ensure initialized scmi_vdev is visible */
> > > >         smp_store_mb(scmi_vdev, vdev);
> > > >
> > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > optional rx queue but the filling is done after the above assignment,
> > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > after virtio_device_ready() buy before the above assignment.
> > > >
> > > > >  virtio_rpmsg_bus.c
> > > > >
> > > >
> > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > is done before virtio_device_ready() but the kick is done after. And
> > > > it looks to me the device could be used by subsystem immediately
> > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > >
> > > > This means, if we do virtio_device_ready() after
> > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > be exploited.
> > > >
> > > > It requires more thoughts.
> > > >
> > > > Thanks
> > >
> > > I think at this point let's do it before so we at least do not
> > > get a regression with your patches, add a big comment and work
> > > on fixing properly in the next Linux version. Do you think you can
> > > commit to a full fix in the next linux version?
> >
> > I think it should be ok.
> >
> > If I understand you correctly, you meant to disable the hardening in
> > this release?
> >
> > (Actually, my understanding is that since we are developing mainline
> > instead of a downstream version with a hardening features, bug reports
> > are somehow expected, especially consider most of the bugs are not
> > related to hardening itself)
>
>
> Absolutely. Question is do you think we can fix everything by the
> release?

Probably not, I'm auditing all the virtio drivers and it seems we have
many issues:

1) race between subsystem registration/use and virtio_device_ready()
2) race between notifications and subsystem registerstiation/use

And it looks to me even virtio-net has this race.

So I think I will post a patch to disable this like below for this release.

> At least for rpmsg we don't seem to have a handle on it yet.

Yes.

>
>
> > Thanks
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 13a7348cedff..7ef3115efbad 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->we_own_ring = true;
> >         vq->notify = notify;
> >         vq->weak_barriers = weak_barriers;
> > -       vq->broken = true;
> > +       vq->broken = false;
> >         vq->last_used_idx = 0;
> >         vq->event_triggered = false;
> >         vq->num_added = 0;
>
>
> and drop it on reset?

Right.

Thanks

>
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > >
> > > > > > > > hmm, i don't get what we need to clarify
> > > > > > >
> > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > no)?
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > >
> > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > me the code is correct.
> > > > > > > > > >
> > > > > > > > > > OK.
> > > > > > > > > >
> > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > >
> > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > >
> > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > >
> > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > device once it's registered?
> > > > > > > > >
> > > > > > > > > It depends on the specific subsystem.
> > > > > > > > >
> > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > netdev_register() seems to be fine.
> > > > > > > >
> > > > > > > > exactly
> > > > > > > >
> > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > DRIVER_OK.
> > > > > > > >
> > > > > > > > Well first won't driver code normally kick as well?
> > > > > > >
> > > > > > > Kick itself is not blocked.
> > > > > > >
> > > > > > > > And without kick, won't everything just be blocked?
> > > > > > >
> > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > instead of polling the used buffer in the probe.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > >
> > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > (or device to driver queue), we are fine:
> > > > > > >
> > > > > > > E.g in virtio-blk:
> > > > > > >
> > > > > > >         virtio_device_ready(vdev);
> > > > > > >
> > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > >         if (err)
> > > > > > >                 goto out_cleanup_disk;
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-15 18:24                             ` Cristian Marussi
@ 2022-06-17  3:14                                 ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  3:14 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Michael S. Tsirkin, virtualization, linux-kernel,
	Thomas Gleixner, Peter Zijlstra, Paul E. McKenney, Marc Zyngier,
	Halil Pasic, Cornelia Huck, eperezma, Cindy Lu,
	Stefano Garzarella, Xuan Zhuo, Vineeth Vijayan,
	Peter Oberparleiter, linux-s390, conghui.chen, Viresh Kumar,
	netdev, pankaj.gupta.linux, sudeep.holla, Bjorn Andersson,
	Mathieu Poirier

On Thu, Jun 16, 2022 at 2:24 AM Cristian Marussi
<cristian.marussi@arm.com> wrote:
>
> On Wed, Jun 15, 2022 at 09:41:18AM +0800, Jason Wang wrote:
> > On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
> > <cristian.marussi@arm.com> wrote:
>
> Hi Jason,
>
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > >
>
> [snip]
>
> > > >
> > > > >  arm_scmi
> > > >
> > > > It looks to me the singleton device could be used by SCMI immediately after
> > > >
> > > >         /* Ensure initialized scmi_vdev is visible */
> > > >         smp_store_mb(scmi_vdev, vdev);
> > > >
> > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > optional rx queue but the filling is done after the above assignment,
> > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > after virtio_device_ready() buy before the above assignment.
> > > >
> > >
> > > I wanted to give it a go at this series testing it on the context of
> > > SCMI but it does not apply
> > >
> > > - not on a v5.18:
> > >
> > > 17:33 $ git rebase -i v5.18
> > > 17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > Applying: virtio: use virtio_reset_device() when possible
> > > Applying: virtio: introduce config op to synchronize vring callbacks
> > > Applying: virtio-pci: implement synchronize_cbs()
> > > Applying: virtio-mmio: implement synchronize_cbs()
> > > error: patch failed: drivers/virtio/virtio_mmio.c:345
> > > error: drivers/virtio/virtio_mmio.c: patch does not apply
> > > Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
> > >
> > > - neither on a v5.19-rc2:
> > >
> > > 17:33 $ git rebase -i v5.19-rc2
> > > 17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > error: patch failed: drivers/virtio/virtio.c:526
> > > error: drivers/virtio/virtio.c: patch does not apply
> > > Patch failed at 0001 virtio: use virtio_device_ready() in
> > > virtio_device_restore()
> > > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > > When you have resolved this problem, run "git am --continue".
> > >
> > > ... what I should take as base ?
> >
> > It should have already been included in rc2, so there's no need to
> > apply patch manually.
> >
>
> I tested this series as included in v5.19-rc2 (WITHOUT adding a virtio_device_ready
> in SCMI virtio as you mentioned above ... if I got it right) and I have NOT seen any
> issue around SCMI virtio using my usual test setup (using both SCMI vqueues).
>
> No anomalies even when using SCMI virtio in atomic/polling mode.
>
> Adding a virtio_device_ready() at the end of the SCMI virtio probe()
> works fine either, it does not make any difference in my setup.
> (both using QEMU and kvmtool with this latter NOT supporting
>  virtio_V1...not sure if it makes a difference but I thought was worth
>  mentioning)

Thanks a lot for the testing.

We want to prevent malicious hypervisors from attacking us. So more questions:

Assuming we do:

virtio_device_ready();
/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);

This means we allow the callbacks (scmi_vio_complete) to be called
before smp_store_mb(). We need to make sure the callbacks are robust.
And this looks fine since we have the check of
scmi_vio_channel_acquire() and if the notification is called before
smp_store_mb(), the acquire will fail.

If we put virtio_device_ready() after smp_store_mb() like:

/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);
virtio_device_ready();

If I understand correctly, there will be a race since the SCMI may try
to use the device before virtio_device_ready(), this violates the
virtio spec somehow.

Thanks

>
> Thanks,
> Cristian
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-17  3:14                                 ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  3:14 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Michael S. Tsirkin, Peter Zijlstra, Viresh Kumar, linux-kernel,
	Vineeth Vijayan, Cindy Lu, Marc Zyngier, Halil Pasic, eperezma,
	Paul E. McKenney, linux-s390, Thomas Gleixner, virtualization,
	conghui.chen, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Thu, Jun 16, 2022 at 2:24 AM Cristian Marussi
<cristian.marussi@arm.com> wrote:
>
> On Wed, Jun 15, 2022 at 09:41:18AM +0800, Jason Wang wrote:
> > On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
> > <cristian.marussi@arm.com> wrote:
>
> Hi Jason,
>
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > >
>
> [snip]
>
> > > >
> > > > >  arm_scmi
> > > >
> > > > It looks to me the singleton device could be used by SCMI immediately after
> > > >
> > > >         /* Ensure initialized scmi_vdev is visible */
> > > >         smp_store_mb(scmi_vdev, vdev);
> > > >
> > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > optional rx queue but the filling is done after the above assignment,
> > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > after virtio_device_ready() buy before the above assignment.
> > > >
> > >
> > > I wanted to give it a go at this series testing it on the context of
> > > SCMI but it does not apply
> > >
> > > - not on a v5.18:
> > >
> > > 17:33 $ git rebase -i v5.18
> > > 17:33 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > Applying: virtio: use virtio_reset_device() when possible
> > > Applying: virtio: introduce config op to synchronize vring callbacks
> > > Applying: virtio-pci: implement synchronize_cbs()
> > > Applying: virtio-mmio: implement synchronize_cbs()
> > > error: patch failed: drivers/virtio/virtio_mmio.c:345
> > > error: drivers/virtio/virtio_mmio.c: patch does not apply
> > > Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
> > >
> > > - neither on a v5.19-rc2:
> > >
> > > 17:33 $ git rebase -i v5.19-rc2
> > > 17:35 $ git am ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > error: patch failed: drivers/virtio/virtio.c:526
> > > error: drivers/virtio/virtio.c: patch does not apply
> > > Patch failed at 0001 virtio: use virtio_device_ready() in
> > > virtio_device_restore()
> > > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > > When you have resolved this problem, run "git am --continue".
> > >
> > > ... what I should take as base ?
> >
> > It should have already been included in rc2, so there's no need to
> > apply patch manually.
> >
>
> I tested this series as included in v5.19-rc2 (WITHOUT adding a virtio_device_ready
> in SCMI virtio as you mentioned above ... if I got it right) and I have NOT seen any
> issue around SCMI virtio using my usual test setup (using both SCMI vqueues).
>
> No anomalies even when using SCMI virtio in atomic/polling mode.
>
> Adding a virtio_device_ready() at the end of the SCMI virtio probe()
> works fine either, it does not make any difference in my setup.
> (both using QEMU and kvmtool with this latter NOT supporting
>  virtio_V1...not sure if it makes a difference but I thought was worth
>  mentioning)

Thanks a lot for the testing.

We want to prevent malicious hypervisors from attacking us. So more questions:

Assuming we do:

virtio_device_ready();
/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);

This means we allow the callbacks (scmi_vio_complete) to be called
before smp_store_mb(). We need to make sure the callbacks are robust.
And this looks fine since we have the check of
scmi_vio_channel_acquire() and if the notification is called before
smp_store_mb(), the acquire will fail.

If we put virtio_device_ready() after smp_store_mb() like:

/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);
virtio_device_ready();

If I understand correctly, there will be a race since the SCMI may try
to use the device before virtio_device_ready(), this violates the
virtio spec somehow.

Thanks

>
> Thanks,
> Cristian
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-17  1:24                                 ` Jason Wang
@ 2022-06-17  5:36                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-17  5:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, Cristian Marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > > >
> > > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > > >
> > > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > > it's registered?
> > > > > > > > > >
> > > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > > >
> > > > > > > > > it's not allowed to kick
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > If yes,
> > > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > > for a well behaved device.
> > > > > > > > >
> > > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > > >
> > > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > > >
> > > > > > > > So for virtio bt, it works like:
> > > > > > > >
> > > > > > > > 1) driver queue buffer and kick
> > > > > > > > 2) driver set DRIVER_OK
> > > > > > > > 3) device start to process the buffer
> > > > > > > > 4) device send an notification
> > > > > > > >
> > > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > > or anything I missed?
> > > > > > >
> > > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > > hdev->open, so we're probably fine here.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > > >
> > > > > Adding maintainers.
> > > > >
> > > > > >
> > > > > >  drivers/i2c/busses/i2c-virtio.c
> > > > >
> > > > > It looks to me the device could be used immediately after
> > > > > i2c_add_adapter() return. So we probably need to add
> > > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > > i2c and the callback looks safe if the callback is called before the
> > > > > i2c registration and after virtio_device_ready().
> > > > >
> > > > > >  drivers/net/caif/caif_virtio.c
> > > > >
> > > > > A networking device, RX is backed by vringh so we don't need to
> > > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > > it's fine to let the core to set DRIVER_OK after probe().
> > > > >
> > > > > >  drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > > >
> > > > > But the device could be used by the subsystem immediately after
> > > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > > after virtio_device_ready() but before the nvdimm region creating.
> > > > >
> > > > > And it looks to me there's a race between the assignment of
> > > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > > something we need to fix.
> > > > >
> > > > > >  arm_scmi
> > > > >
> > > > > It looks to me the singleton device could be used by SCMI immediately after
> > > > >
> > > > >         /* Ensure initialized scmi_vdev is visible */
> > > > >         smp_store_mb(scmi_vdev, vdev);
> > > > >
> > > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > > optional rx queue but the filling is done after the above assignment,
> > > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > > after virtio_device_ready() buy before the above assignment.
> > > > >
> > > > > >  virtio_rpmsg_bus.c
> > > > > >
> > > > >
> > > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > > is done before virtio_device_ready() but the kick is done after. And
> > > > > it looks to me the device could be used by subsystem immediately
> > > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > > >
> > > > > This means, if we do virtio_device_ready() after
> > > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > > be exploited.
> > > > >
> > > > > It requires more thoughts.
> > > > >
> > > > > Thanks
> > > >
> > > > I think at this point let's do it before so we at least do not
> > > > get a regression with your patches, add a big comment and work
> > > > on fixing properly in the next Linux version. Do you think you can
> > > > commit to a full fix in the next linux version?
> > >
> > > I think it should be ok.
> > >
> > > If I understand you correctly, you meant to disable the hardening in
> > > this release?
> > >
> > > (Actually, my understanding is that since we are developing mainline
> > > instead of a downstream version with a hardening features, bug reports
> > > are somehow expected, especially consider most of the bugs are not
> > > related to hardening itself)
> >
> >
> > Absolutely. Question is do you think we can fix everything by the
> > release?
> 
> Probably not, I'm auditing all the virtio drivers and it seems we have
> many issues:
> 
> 1) race between subsystem registration/use and virtio_device_ready()
> 2) race between notifications and subsystem registerstiation/use
> 
> And it looks to me even virtio-net has this race.

Interesting. How does it look for virtio-net?

> So I think I will post a patch to disable this like below for this release.


However please do post patches that add device_ready as appropriate.
This is basic spec compliance.

Also do you think we should do a full revert? Maybe a Kconfig option is
ok for now.


> > At least for rpmsg we don't seem to have a handle on it yet.
> 
> Yes.
> 
> >
> >
> > > Thanks
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 13a7348cedff..7ef3115efbad 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->we_own_ring = true;
> > >         vq->notify = notify;
> > >         vq->weak_barriers = weak_barriers;
> > > -       vq->broken = true;
> > > +       vq->broken = false;
> > >         vq->last_used_idx = 0;
> > >         vq->event_triggered = false;
> > >         vq->num_added = 0;
> >
> >
> > and drop it on reset?
> 
> Right.
> 
> Thanks
> 
> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > > >
> > > > > > > > > hmm, i don't get what we need to clarify
> > > > > > > >
> > > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > > no)?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > > >
> > > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > > me the code is correct.
> > > > > > > > > > >
> > > > > > > > > > > OK.
> > > > > > > > > > >
> > > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > > >
> > > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > > >
> > > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > > >
> > > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > > device once it's registered?
> > > > > > > > > >
> > > > > > > > > > It depends on the specific subsystem.
> > > > > > > > > >
> > > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > > netdev_register() seems to be fine.
> > > > > > > > >
> > > > > > > > > exactly
> > > > > > > > >
> > > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > > DRIVER_OK.
> > > > > > > > >
> > > > > > > > > Well first won't driver code normally kick as well?
> > > > > > > >
> > > > > > > > Kick itself is not blocked.
> > > > > > > >
> > > > > > > > > And without kick, won't everything just be blocked?
> > > > > > > >
> > > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > > instead of polling the used buffer in the probe.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > > >
> > > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > > (or device to driver queue), we are fine:
> > > > > > > >
> > > > > > > > E.g in virtio-blk:
> > > > > > > >
> > > > > > > >         virtio_device_ready(vdev);
> > > > > > > >
> > > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > > >         if (err)
> > > > > > > >                 goto out_cleanup_disk;
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-17  5:36                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 106+ messages in thread
From: Michael S. Tsirkin @ 2022-06-17  5:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	Cristian Marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > > >
> > > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > > >
> > > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > > it's registered?
> > > > > > > > > >
> > > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > > >
> > > > > > > > > it's not allowed to kick
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > If yes,
> > > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > > for a well behaved device.
> > > > > > > > >
> > > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > > >
> > > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > > >
> > > > > > > > So for virtio bt, it works like:
> > > > > > > >
> > > > > > > > 1) driver queue buffer and kick
> > > > > > > > 2) driver set DRIVER_OK
> > > > > > > > 3) device start to process the buffer
> > > > > > > > 4) device send an notification
> > > > > > > >
> > > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > > or anything I missed?
> > > > > > >
> > > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > > hdev->open, so we're probably fine here.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > > >
> > > > > Adding maintainers.
> > > > >
> > > > > >
> > > > > >  drivers/i2c/busses/i2c-virtio.c
> > > > >
> > > > > It looks to me the device could be used immediately after
> > > > > i2c_add_adapter() return. So we probably need to add
> > > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > > i2c and the callback looks safe if the callback is called before the
> > > > > i2c registration and after virtio_device_ready().
> > > > >
> > > > > >  drivers/net/caif/caif_virtio.c
> > > > >
> > > > > A networking device, RX is backed by vringh so we don't need to
> > > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > > it's fine to let the core to set DRIVER_OK after probe().
> > > > >
> > > > > >  drivers/nvdimm/virtio_pmem.c
> > > > >
> > > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > > >
> > > > > But the device could be used by the subsystem immediately after
> > > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > > after virtio_device_ready() but before the nvdimm region creating.
> > > > >
> > > > > And it looks to me there's a race between the assignment of
> > > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > > something we need to fix.
> > > > >
> > > > > >  arm_scmi
> > > > >
> > > > > It looks to me the singleton device could be used by SCMI immediately after
> > > > >
> > > > >         /* Ensure initialized scmi_vdev is visible */
> > > > >         smp_store_mb(scmi_vdev, vdev);
> > > > >
> > > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > > optional rx queue but the filling is done after the above assignment,
> > > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > > after virtio_device_ready() buy before the above assignment.
> > > > >
> > > > > >  virtio_rpmsg_bus.c
> > > > > >
> > > > >
> > > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > > is done before virtio_device_ready() but the kick is done after. And
> > > > > it looks to me the device could be used by subsystem immediately
> > > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > > >
> > > > > This means, if we do virtio_device_ready() after
> > > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > > be exploited.
> > > > >
> > > > > It requires more thoughts.
> > > > >
> > > > > Thanks
> > > >
> > > > I think at this point let's do it before so we at least do not
> > > > get a regression with your patches, add a big comment and work
> > > > on fixing properly in the next Linux version. Do you think you can
> > > > commit to a full fix in the next linux version?
> > >
> > > I think it should be ok.
> > >
> > > If I understand you correctly, you meant to disable the hardening in
> > > this release?
> > >
> > > (Actually, my understanding is that since we are developing mainline
> > > instead of a downstream version with a hardening features, bug reports
> > > are somehow expected, especially consider most of the bugs are not
> > > related to hardening itself)
> >
> >
> > Absolutely. Question is do you think we can fix everything by the
> > release?
> 
> Probably not, I'm auditing all the virtio drivers and it seems we have
> many issues:
> 
> 1) race between subsystem registration/use and virtio_device_ready()
> 2) race between notifications and subsystem registerstiation/use
> 
> And it looks to me even virtio-net has this race.

Interesting. How does it look for virtio-net?

> So I think I will post a patch to disable this like below for this release.


However please do post patches that add device_ready as appropriate.
This is basic spec compliance.

Also do you think we should do a full revert? Maybe a Kconfig option is
ok for now.


> > At least for rpmsg we don't seem to have a handle on it yet.
> 
> Yes.
> 
> >
> >
> > > Thanks
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 13a7348cedff..7ef3115efbad 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->we_own_ring = true;
> > >         vq->notify = notify;
> > >         vq->weak_barriers = weak_barriers;
> > > -       vq->broken = true;
> > > +       vq->broken = false;
> > >         vq->last_used_idx = 0;
> > >         vq->event_triggered = false;
> > >         vq->num_added = 0;
> >
> >
> > and drop it on reset?
> 
> Right.
> 
> Thanks
> 
> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > > >
> > > > > > > > > hmm, i don't get what we need to clarify
> > > > > > > >
> > > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > > no)?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > > >
> > > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > > me the code is correct.
> > > > > > > > > > >
> > > > > > > > > > > OK.
> > > > > > > > > > >
> > > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > > >
> > > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > > >
> > > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > > >
> > > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > > device once it's registered?
> > > > > > > > > >
> > > > > > > > > > It depends on the specific subsystem.
> > > > > > > > > >
> > > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > > netdev_register() seems to be fine.
> > > > > > > > >
> > > > > > > > > exactly
> > > > > > > > >
> > > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > > DRIVER_OK.
> > > > > > > > >
> > > > > > > > > Well first won't driver code normally kick as well?
> > > > > > > >
> > > > > > > > Kick itself is not blocked.
> > > > > > > >
> > > > > > > > > And without kick, won't everything just be blocked?
> > > > > > > >
> > > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > > instead of polling the used buffer in the probe.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > > >
> > > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > > (or device to driver queue), we are fine:
> > > > > > > >
> > > > > > > > E.g in virtio-blk:
> > > > > > > >
> > > > > > > >         virtio_device_ready(vdev);
> > > > > > > >
> > > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > > >         if (err)
> > > > > > > >                 goto out_cleanup_disk;
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-17  5:36                                   ` Michael S. Tsirkin
@ 2022-06-17  7:26                                     ` Jason Wang
  -1 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  7:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Marc Zyngier, Halil Pasic, Cornelia Huck,
	eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, Cristian Marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Fri, Jun 17, 2022 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> > On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > > > it's registered?
> > > > > > > > > > >
> > > > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > it's not allowed to kick
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > If yes,
> > > > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > > > for a well behaved device.
> > > > > > > > > >
> > > > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > > > >
> > > > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > > > >
> > > > > > > > > So for virtio bt, it works like:
> > > > > > > > >
> > > > > > > > > 1) driver queue buffer and kick
> > > > > > > > > 2) driver set DRIVER_OK
> > > > > > > > > 3) device start to process the buffer
> > > > > > > > > 4) device send an notification
> > > > > > > > >
> > > > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > > > or anything I missed?
> > > > > > > >
> > > > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > > > hdev->open, so we're probably fine here.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > > > >
> > > > > > Adding maintainers.
> > > > > >
> > > > > > >
> > > > > > >  drivers/i2c/busses/i2c-virtio.c
> > > > > >
> > > > > > It looks to me the device could be used immediately after
> > > > > > i2c_add_adapter() return. So we probably need to add
> > > > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > > > i2c and the callback looks safe if the callback is called before the
> > > > > > i2c registration and after virtio_device_ready().
> > > > > >
> > > > > > >  drivers/net/caif/caif_virtio.c
> > > > > >
> > > > > > A networking device, RX is backed by vringh so we don't need to
> > > > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > > > it's fine to let the core to set DRIVER_OK after probe().
> > > > > >
> > > > > > >  drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > > > >
> > > > > > But the device could be used by the subsystem immediately after
> > > > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > > > after virtio_device_ready() but before the nvdimm region creating.
> > > > > >
> > > > > > And it looks to me there's a race between the assignment of
> > > > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > > > something we need to fix.
> > > > > >
> > > > > > >  arm_scmi
> > > > > >
> > > > > > It looks to me the singleton device could be used by SCMI immediately after
> > > > > >
> > > > > >         /* Ensure initialized scmi_vdev is visible */
> > > > > >         smp_store_mb(scmi_vdev, vdev);
> > > > > >
> > > > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > > > optional rx queue but the filling is done after the above assignment,
> > > > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > > > after virtio_device_ready() buy before the above assignment.
> > > > > >
> > > > > > >  virtio_rpmsg_bus.c
> > > > > > >
> > > > > >
> > > > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > > > is done before virtio_device_ready() but the kick is done after. And
> > > > > > it looks to me the device could be used by subsystem immediately
> > > > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > > > >
> > > > > > This means, if we do virtio_device_ready() after
> > > > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > > > be exploited.
> > > > > >
> > > > > > It requires more thoughts.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I think at this point let's do it before so we at least do not
> > > > > get a regression with your patches, add a big comment and work
> > > > > on fixing properly in the next Linux version. Do you think you can
> > > > > commit to a full fix in the next linux version?
> > > >
> > > > I think it should be ok.
> > > >
> > > > If I understand you correctly, you meant to disable the hardening in
> > > > this release?
> > > >
> > > > (Actually, my understanding is that since we are developing mainline
> > > > instead of a downstream version with a hardening features, bug reports
> > > > are somehow expected, especially consider most of the bugs are not
> > > > related to hardening itself)
> > >
> > >
> > > Absolutely. Question is do you think we can fix everything by the
> > > release?
> >
> > Probably not, I'm auditing all the virtio drivers and it seems we have
> > many issues:
> >
> > 1) race between subsystem registration/use and virtio_device_ready()
> > 2) race between notifications and subsystem registerstiation/use
> >
> > And it looks to me even virtio-net has this race.
>
> Interesting. How does it look for virtio-net?

Will post a patch soon.

>
> > So I think I will post a patch to disable this like below for this release.
>
>
> However please do post patches that add device_ready as appropriate.
> This is basic spec compliance.

Working on this.

>
> Also do you think we should do a full revert? Maybe a Kconfig option is
> ok for now.

Yes, Kconfig should be fine.

Patch will be posted soon.

Thanks

>
>
> > > At least for rpmsg we don't seem to have a handle on it yet.
> >
> > Yes.
> >
> > >
> > >
> > > > Thanks
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 13a7348cedff..7ef3115efbad 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->we_own_ring = true;
> > > >         vq->notify = notify;
> > > >         vq->weak_barriers = weak_barriers;
> > > > -       vq->broken = true;
> > > > +       vq->broken = false;
> > > >         vq->last_used_idx = 0;
> > > >         vq->event_triggered = false;
> > > >         vq->num_added = 0;
> > >
> > >
> > > and drop it on reset?
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > > > >
> > > > > > > > > > hmm, i don't get what we need to clarify
> > > > > > > > >
> > > > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > > > no)?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > > > me the code is correct.
> > > > > > > > > > > >
> > > > > > > > > > > > OK.
> > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > > > >
> > > > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > > > device once it's registered?
> > > > > > > > > > >
> > > > > > > > > > > It depends on the specific subsystem.
> > > > > > > > > > >
> > > > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > > > netdev_register() seems to be fine.
> > > > > > > > > >
> > > > > > > > > > exactly
> > > > > > > > > >
> > > > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > > > DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > Well first won't driver code normally kick as well?
> > > > > > > > >
> > > > > > > > > Kick itself is not blocked.
> > > > > > > > >
> > > > > > > > > > And without kick, won't everything just be blocked?
> > > > > > > > >
> > > > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > > > instead of polling the used buffer in the probe.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > > > >
> > > > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > > > (or device to driver queue), we are fine:
> > > > > > > > >
> > > > > > > > > E.g in virtio-blk:
> > > > > > > > >
> > > > > > > > >         virtio_device_ready(vdev);
> > > > > > > > >
> > > > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > > > >         if (err)
> > > > > > > > >                 goto out_cleanup_disk;
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-17  7:26                                     ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-06-17  7:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	Cristian Marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Fri, Jun 17, 2022 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> > On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > > > > > >    that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The vq->broken is re-used in this patch for implementing the IRQ
> > > > > > > > > > > > > > > > > hardening. The vq->broken is set to true during both initialization
> > > > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can check and return
> > > > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > > > > > > > > > > > > > > > > to let the interrupt core aware of such invalid interrupt to prevent
> > > > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The reason of using a per queue variable instead of a per device one
> > > > > > > > > > > > > > > > > is that we may need it for per queue reset hardening in the future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > > > > > > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > > > > > > > > > > > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > > > > > > > > > > > > > > Cc: Halil Pasic <pasic@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > > > > > > > > > > > > > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > > > > > > > > > > > > > > > > Cc: linux-s390@vger.kernel.org
> > > > > > > > > > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by default, it
> > > > > > > > > > > > > > > won't be used by the users so there's no way for us to receive the bug
> > > > > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks better if we
> > > > > > > > > > > > > > > simply warn instead of disable it by default.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I meant more like a flag in struct virtio_driver.
> > > > > > > > > > > > > > For now, could you audit all drivers which don't call _ready?
> > > > > > > > > > > > > > I found 5 of these:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/bluetooth/virtio_bt.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > This driver seems to be fine, it doesn't use the device/vq in its probe().
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > But it calls hci_register_dev and that in turn queues all kind of
> > > > > > > > > > > > work. Also, can linux start using the device immediately after
> > > > > > > > > > > > it's registered?
> > > > > > > > > > >
> > > > > > > > > > > So I think the driver is allowed to queue before DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > it's not allowed to kick
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > If yes,
> > > > > > > > > > > the only side effect is the delay of the tx interrupt after DRIVER_OK
> > > > > > > > > > > for a well behaved device.
> > > > > > > > > >
> > > > > > > > > > your patches drop the interrupt though, it won't be just delayed.
> > > > > > > > >
> > > > > > > > > For a well behaved device, it can only trigger the interrupt after DRIVER_OK.
> > > > > > > > >
> > > > > > > > > So for virtio bt, it works like:
> > > > > > > > >
> > > > > > > > > 1) driver queue buffer and kick
> > > > > > > > > 2) driver set DRIVER_OK
> > > > > > > > > 3) device start to process the buffer
> > > > > > > > > 4) device send an notification
> > > > > > > > >
> > > > > > > > > The only risk is that the virtqueue could be filled before DRIVER_OK,
> > > > > > > > > or anything I missed?
> > > > > > > >
> > > > > > > > btw, hci has an open and close method and we do rx refill in
> > > > > > > > hdev->open, so we're probably fine here.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > Sounds good. Now to audit the rest of them from this POV ;)
> > > > > >
> > > > > > Adding maintainers.
> > > > > >
> > > > > > >
> > > > > > >  drivers/i2c/busses/i2c-virtio.c
> > > > > >
> > > > > > It looks to me the device could be used immediately after
> > > > > > i2c_add_adapter() return. So we probably need to add
> > > > > > virtio_device_ready() before that. Fortunately, there's no rx vq in
> > > > > > i2c and the callback looks safe if the callback is called before the
> > > > > > i2c registration and after virtio_device_ready().
> > > > > >
> > > > > > >  drivers/net/caif/caif_virtio.c
> > > > > >
> > > > > > A networking device, RX is backed by vringh so we don't need to
> > > > > > refill. TX is backed by virtio and is available until ndo_open. So
> > > > > > it's fine to let the core to set DRIVER_OK after probe().
> > > > > >
> > > > > > >  drivers/nvdimm/virtio_pmem.c
> > > > > >
> > > > > > It doesn't use interrupt so far, so it has nothing to do with the IRQ hardening.
> > > > > >
> > > > > > But the device could be used by the subsystem immediately after
> > > > > > nvdimm_pmem_region_create(), this means the flush could be issued
> > > > > > before DRIVER_OK. We need virtio_device_ready() before. We don't have
> > > > > > a RX virtqueue and the callback looks safe if the callback is called
> > > > > > after virtio_device_ready() but before the nvdimm region creating.
> > > > > >
> > > > > > And it looks to me there's a race between the assignment of
> > > > > > provider_data and virtio_pmem_flush(). If the flush was issued before
> > > > > > the assignment we will end up with a NULL pointer dereference. This is
> > > > > > something we need to fix.
> > > > > >
> > > > > > >  arm_scmi
> > > > > >
> > > > > > It looks to me the singleton device could be used by SCMI immediately after
> > > > > >
> > > > > >         /* Ensure initialized scmi_vdev is visible */
> > > > > >         smp_store_mb(scmi_vdev, vdev);
> > > > > >
> > > > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > > > optional rx queue but the filling is done after the above assignment,
> > > > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > > > after virtio_device_ready() buy before the above assignment.
> > > > > >
> > > > > > >  virtio_rpmsg_bus.c
> > > > > > >
> > > > > >
> > > > > > This is somehow more complicated. It has an rx queue, the rx filling
> > > > > > is done before virtio_device_ready() but the kick is done after. And
> > > > > > it looks to me the device could be used by subsystem immediately
> > > > > > rpmsg_virtio_add_ctrl_dev() returns.
> > > > > >
> > > > > > This means, if we do virtio_device_ready() after
> > > > > > rpmsg_virtio_add_ctrl_dev(), we may get kick before DRIVER_OK. If we
> > > > > > do virtio_device_ready() before rpmsg_virtio_add_ctrl_dev(), there's a
> > > > > > race between the callbacks and rpmsg_virtio_add_ctrl_dev() that could
> > > > > > be exploited.
> > > > > >
> > > > > > It requires more thoughts.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I think at this point let's do it before so we at least do not
> > > > > get a regression with your patches, add a big comment and work
> > > > > on fixing properly in the next Linux version. Do you think you can
> > > > > commit to a full fix in the next linux version?
> > > >
> > > > I think it should be ok.
> > > >
> > > > If I understand you correctly, you meant to disable the hardening in
> > > > this release?
> > > >
> > > > (Actually, my understanding is that since we are developing mainline
> > > > instead of a downstream version with a hardening features, bug reports
> > > > are somehow expected, especially consider most of the bugs are not
> > > > related to hardening itself)
> > >
> > >
> > > Absolutely. Question is do you think we can fix everything by the
> > > release?
> >
> > Probably not, I'm auditing all the virtio drivers and it seems we have
> > many issues:
> >
> > 1) race between subsystem registration/use and virtio_device_ready()
> > 2) race between notifications and subsystem registerstiation/use
> >
> > And it looks to me even virtio-net has this race.
>
> Interesting. How does it look for virtio-net?

Will post a patch soon.

>
> > So I think I will post a patch to disable this like below for this release.
>
>
> However please do post patches that add device_ready as appropriate.
> This is basic spec compliance.

Working on this.

>
> Also do you think we should do a full revert? Maybe a Kconfig option is
> ok for now.

Yes, Kconfig should be fine.

Patch will be posted soon.

Thanks

>
>
> > > At least for rpmsg we don't seem to have a handle on it yet.
> >
> > Yes.
> >
> > >
> > >
> > > > Thanks
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 13a7348cedff..7ef3115efbad 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->we_own_ring = true;
> > > >         vq->notify = notify;
> > > >         vq->weak_barriers = weak_barriers;
> > > > -       vq->broken = true;
> > > > +       vq->broken = false;
> > > >         vq->last_used_idx = 0;
> > > >         vq->event_triggered = false;
> > > >         vq->num_added = 0;
> > >
> > >
> > > and drop it on reset?
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > If not, we need to clarify it in the spec
> > > > > > > > > > > and call virtio_device_ready() before subsystem registration.
> > > > > > > > > >
> > > > > > > > > > hmm, i don't get what we need to clarify
> > > > > > > > >
> > > > > > > > > E.g the driver is not allowed to kick or after DRIVER_OK should the
> > > > > > > > > device only process the buffer after a kick after DRIVER_OK (I think
> > > > > > > > > no)?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/gpu/drm/virtio/virtgpu_drv.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > It calles virtio_device_ready() in virtio_gpu_init(), and it looks to
> > > > > > > > > > > > > me the code is correct.
> > > > > > > > > > > >
> > > > > > > > > > > > OK.
> > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/i2c/busses/i2c-virtio.c
> > > > > > > > > > > > > > drivers/net/caif/caif_virtio.c
> > > > > > > > > > > > > > drivers/nvdimm/virtio_pmem.c
> > > > > > > > > > > > >
> > > > > > > > > > > > > The above looks fine and we have three more:
> > > > > > > > > > > > >
> > > > > > > > > > > > > arm_scmi: probe() doesn't use vq
> > > > > > > > > > > > > mac80211_hwsim.c: doesn't use vq (only fill rx), but it kicks the rx,
> > > > > > > > > > > > > it looks to me we need a device_ready before the kick.
> > > > > > > > > > > > > virtio_rpmsg_bus.c: doesn't use vq
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will post a patch for mac80211_hwsim.c.
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > Same comments for all of the above. Might linux not start using the
> > > > > > > > > > > > device once it's registered?
> > > > > > > > > > >
> > > > > > > > > > > It depends on the specific subsystem.
> > > > > > > > > > >
> > > > > > > > > > > For the subsystem that can't use the device immediately, calling
> > > > > > > > > > > virtio_device_ready() after the subsystem's registration should be
> > > > > > > > > > > fine. E.g for the networking subsystem, the TX won't happen if
> > > > > > > > > > > ndo_open() is not called, calling virtio_device_ready() after
> > > > > > > > > > > netdev_register() seems to be fine.
> > > > > > > > > >
> > > > > > > > > > exactly
> > > > > > > > > >
> > > > > > > > > > > For the subsystem that can use the device immediately, if the
> > > > > > > > > > > subsystem does not depend on the result of a request in the probe to
> > > > > > > > > > > proceed, we are still fine. Since those requests will be proceed after
> > > > > > > > > > > DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > Well first won't driver code normally kick as well?
> > > > > > > > >
> > > > > > > > > Kick itself is not blocked.
> > > > > > > > >
> > > > > > > > > > And without kick, won't everything just be blocked?
> > > > > > > > >
> > > > > > > > > It depends on the subsystem. E.g driver can choose to use a callback
> > > > > > > > > instead of polling the used buffer in the probe.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > For the rest we need to do virtio_device_ready() before registration.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Then we can get an interrupt for an unregistered device.
> > > > > > > > >
> > > > > > > > > It depends on the device. For the device that doesn't have an rx queue
> > > > > > > > > (or device to driver queue), we are fine:
> > > > > > > > >
> > > > > > > > > E.g in virtio-blk:
> > > > > > > > >
> > > > > > > > >         virtio_device_ready(vdev);
> > > > > > > > >
> > > > > > > > >         err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
> > > > > > > > >         if (err)
> > > > > > > > >                 goto out_cleanup_disk;
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > >  drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio.c                | 15 ++++++++++++---
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_mmio.c           |  5 +++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> > > > > > > > > > > > > > > > >  drivers/virtio/virtio_ring.c           | 11 +++++++----
> > > > > > > > > > > > > > > > >  include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> > > > > > > > > > > > > > > > >  6 files changed, 53 insertions(+), 7 deletions(-)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > index c188e4f20ca3..97e51c34e6cf 100644
> > > > > > > > > > > > > > > > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > > > > > > > > > > > > > > > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > > >       ccw->flags = 0;
> > > > > > > > > > > > > > > > >       ccw->count = sizeof(status);
> > > > > > > > > > > > > > > > >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > > > > > > > > > > > > > > > > +     /* We use ssch for setting the status which is a serializing
> > > > > > > > > > > > > > > > > +      * instruction that guarantees the memory writes have
> > > > > > > > > > > > > > > > > +      * completed before ssch.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> > > > > > > > > > > > > > > > >       /* Write failed? We assume status is unchanged. */
> > > > > > > > > > > > > > > > >       if (ret)
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > index aa1eb5132767..95fac4c97c8b 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > > > > > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >   * */
> > > > > > > > > > > > > > > > >  void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The below virtio_synchronize_cbs() guarantees that any
> > > > > > > > > > > > > > > > > +      * interrupt for this line arriving after
> > > > > > > > > > > > > > > > > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > > > > > > > > > > > > > > > > +      * vq->broken as true.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > > +     virtio_break_device(dev);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >       dev->config->reset(dev);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > > > > > > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       dev->config_enabled = false;
> > > > > > > > > > > > > > > > >       dev->config_change_pending = false;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > > +     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >       /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > > > > >        * driver messed it up.  This also tests that code path a little. */
> > > > > > > > > > > > > > > > >       virtio_reset_device(dev);
> > > > > > > > > > > > > > > > > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       /* Acknowledge that we've seen the device. */
> > > > > > > > > > > > > > > > >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -     INIT_LIST_HEAD(&dev->vqs);
> > > > > > > > > > > > > > > > > -     spin_lock_init(&dev->vqs_list_lock);
> > > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > >       /*
> > > > > > > > > > > > > > > > >        * device_add() causes the bus infrastructure to look for a matching
> > > > > > > > > > > > > > > > >        * driver.
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > index c9699a59f93c..f9a36bc7ac27 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > > > > > > > > > > > > > > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> > > > > > > > > > > > > > > > >       /* We should never be setting status to 0. */
> > > > > > > > > > > > > > > > >       BUG_ON(status == 0);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > index 4093f9cca7a6..a0fa14f28a7f 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > > > > > > > > > > > > > > > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > > > > > > > > > > > > > > > > +      * that the the cache coherent memory writes have completed
> > > > > > > > > > > > > > > > > +      * before writing to the MMIO region.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       vp_iowrite8(status, &cfg->device_status);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >  EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > > > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > index 9c231e1fded7..13a7348cedff 100644
> > > > > > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > > > > > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > > > > > > > > > > > > > >       vq->we_own_ring = true;
> > > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > > > > >               return IRQ_NONE;
> > > > > > > > > > > > > > > > >       }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -     if (unlikely(vq->broken))
> > > > > > > > > > > > > > > > > -             return IRQ_HANDLED;
> > > > > > > > > > > > > > > > > +     if (unlikely(vq->broken)) {
> > > > > > > > > > > > > > > > > +             dev_warn_once(&vq->vq.vdev->dev,
> > > > > > > > > > > > > > > > > +                           "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > > > > > +             return IRQ_NONE;
> > > > > > > > > > > > > > > > > +     }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >       /* Just a hint for performance: so it's ok that this can be racy! */
> > > > > > > > > > > > > > > > >       if (vq->event)
> > > > > > > > > > > > > > > > > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > > > > > > > > > > > > > >       vq->we_own_ring = false;
> > > > > > > > > > > > > > > > >       vq->notify = notify;
> > > > > > > > > > > > > > > > >       vq->weak_barriers = weak_barriers;
> > > > > > > > > > > > > > > > > -     vq->broken = false;
> > > > > > > > > > > > > > > > > +     vq->broken = true;
> > > > > > > > > > > > > > > > >       vq->last_used_idx = 0;
> > > > > > > > > > > > > > > > >       vq->event_triggered = false;
> > > > > > > > > > > > > > > > >       vq->num_added = 0;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and make this conditional
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > index 25be018810a7..d4edfd7d91bb 100644
> > > > > > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > > > > > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > > > > >       unsigned status = dev->config->get_status(dev);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > > > > > > > > > > > > > > > > +      * will see the driver specific setup if it sees vq->broken
> > > > > > > > > > > > > > > > > +      * as false (even if the notifications come before DRIVER_OK).
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > > +     virtio_synchronize_cbs(dev);
> > > > > > > > > > > > > > > > > +     __virtio_unbreak_device(dev);
> > > > > > > > > > > > > > > > > +     /*
> > > > > > > > > > > > > > > > > +      * The transport should ensure the visibility of vq->broken
> > > > > > > > > > > > > > > > > +      * before setting DRIVER_OK. See the comments for the transport
> > > > > > > > > > > > > > > > > +      * specific set_status() method.
> > > > > > > > > > > > > > > > > +      *
> > > > > > > > > > > > > > > > > +      * A well behaved device will only notify a virtqueue after
> > > > > > > > > > > > > > > > > +      * DRIVER_OK, this means the device should "see" the coherenct
> > > > > > > > > > > > > > > > > +      * memory write that set vq->broken as false which is done by
> > > > > > > > > > > > > > > > > +      * the driver when it sees DRIVER_OK, then the following
> > > > > > > > > > > > > > > > > +      * driver's vring_interrupt() will see vq->broken as false so
> > > > > > > > > > > > > > > > > +      * we won't lose any notification.
> > > > > > > > > > > > > > > > > +      */
> > > > > > > > > > > > > > > > >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-06-17  7:26                                     ` Jason Wang
@ 2022-06-17 14:33                                       ` Peter Zijlstra
  -1 siblings, 0 replies; 106+ messages in thread
From: Peter Zijlstra @ 2022-06-17 14:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Halil Pasic,
	Cornelia Huck, eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390, conghui.chen,
	Viresh Kumar, netdev, pankaj.gupta.linux, Cristian Marussi,
	sudeep.holla, Bjorn Andersson, Mathieu Poirier

On Fri, Jun 17, 2022 at 03:26:16PM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> > > On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for

Guys; have you heard about trimming emails on reply?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-06-17 14:33                                       ` Peter Zijlstra
  0 siblings, 0 replies; 106+ messages in thread
From: Peter Zijlstra @ 2022-06-17 14:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Viresh Kumar, linux-kernel, Vineeth Vijayan,
	Cindy Lu, Marc Zyngier, Halil Pasic, eperezma, Paul E. McKenney,
	linux-s390, Thomas Gleixner, virtualization, conghui.chen,
	Cristian Marussi, pankaj.gupta.linux, Mathieu Poirier, netdev,
	Cornelia Huck, Peter Oberparleiter, Bjorn Andersson,
	sudeep.holla

On Fri, Jun 17, 2022 at 03:26:16PM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> > > On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for

Guys; have you heard about trimming emails on reply?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-05-27  6:01   ` Jason Wang
                     ` (3 preceding siblings ...)
  (?)
@ 2022-07-05 11:06   ` chenxiang (M)
  2022-07-05 12:56       ` Jason Wang
  -1 siblings, 1 reply; 106+ messages in thread
From: chenxiang (M) @ 2022-07-05 11:06 UTC (permalink / raw)
  To: Jason Wang, mst, virtualization, linux-kernel
  Cc: tglx, peterz, paulmck, maz, pasic, cohuck, eperezma, lulu,
	sgarzare, xuanzhuo, Vineeth Vijayan, Peter Oberparleiter,
	linux-s390

Hi,

I encounter a issue when testing virtio-balloon on my platform (ARM64) 
with kernel 5.19-rc4 to boot VM with "-device virtio-balloon ", and

then change the size of balloon in qemu monitor, but it isn't valid, and 
the log is as follows:

QEMU 6.1.50 monitor - type 'help' for more information
(qemu) info balloon
info balloon
balloon: actual=4096
(qemu) balloon 3172
balloon 3172
(qemu) info balloon
info balloon
balloon: actual=4096

I git bisect the patch, and find this patch 
([8b4ec69d7e098a7ddf832e1e7840de53ed474c77] virtio: harden vring IRQ) at 
last.

Do you have any idea about it?


Best regards,

Xiang Chen

在 2022/5/27 14:01, Jason Wang 写道:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
>
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>     that is used by some device such as virtio-blk
> 2) done only for PCI transport
>
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt() can check and return
> when vq->broken is true. And in this case, switch to return IRQ_NONE
> to let the interrupt core aware of such invalid interrupt to prevent
> IRQ storm.
>
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
>
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/s390/virtio/virtio_ccw.c       |  4 ++++
>   drivers/virtio/virtio.c                | 15 ++++++++++++---
>   drivers/virtio/virtio_mmio.c           |  5 +++++
>   drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
>   drivers/virtio/virtio_ring.c           | 11 +++++++----
>   include/linux/virtio_config.h          | 20 ++++++++++++++++++++
>   6 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index c188e4f20ca3..97e51c34e6cf 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
>   	ccw->flags = 0;
>   	ccw->count = sizeof(status);
>   	ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> +	/* We use ssch for setting the status which is a serializing
> +	 * instruction that guarantees the memory writes have
> +	 * completed before ssch.
> +	 */
>   	ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
>   	/* Write failed? We assume status is unchanged. */
>   	if (ret)
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index aa1eb5132767..95fac4c97c8b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>    * */
>   void virtio_reset_device(struct virtio_device *dev)
>   {
> +	/*
> +	 * The below virtio_synchronize_cbs() guarantees that any
> +	 * interrupt for this line arriving after
> +	 * virtio_synchronize_vqs() has completed is guaranteed to see
> +	 * vq->broken as true.
> +	 */
> +	virtio_break_device(dev);
> +	virtio_synchronize_cbs(dev);
> +
>   	dev->config->reset(dev);
>   }
>   EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>   	dev->config_enabled = false;
>   	dev->config_change_pending = false;
>   
> +	INIT_LIST_HEAD(&dev->vqs);
> +	spin_lock_init(&dev->vqs_list_lock);
> +
>   	/* We always start by resetting the device, in case a previous
>   	 * driver messed it up.  This also tests that code path a little. */
>   	virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>   	/* Acknowledge that we've seen the device. */
>   	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>   
> -	INIT_LIST_HEAD(&dev->vqs);
> -	spin_lock_init(&dev->vqs_list_lock);
> -
>   	/*
>   	 * device_add() causes the bus infrastructure to look for a matching
>   	 * driver.
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index c9699a59f93c..f9a36bc7ac27 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
>   	/* We should never be setting status to 0. */
>   	BUG_ON(status == 0);
>   
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>   	writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
>   }
>   
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> index 4093f9cca7a6..a0fa14f28a7f 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
>   {
>   	struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
>   
> +	/*
> +	 * Per memory-barriers.txt, wmb() is not needed to guarantee
> +	 * that the the cache coherent memory writes have completed
> +	 * before writing to the MMIO region.
> +	 */
>   	vp_iowrite8(status, &cfg->device_status);
>   }
>   EXPORT_SYMBOL_GPL(vp_modern_set_status);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9c231e1fded7..13a7348cedff 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   	vq->we_own_ring = true;
>   	vq->notify = notify;
>   	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>   	vq->last_used_idx = 0;
>   	vq->event_triggered = false;
>   	vq->num_added = 0;
> @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   		return IRQ_NONE;
>   	}
>   
> -	if (unlikely(vq->broken))
> -		return IRQ_HANDLED;
> +	if (unlikely(vq->broken)) {
> +		dev_warn_once(&vq->vq.vdev->dev,
> +			      "virtio vring IRQ raised before DRIVER_OK");
> +		return IRQ_NONE;
> +	}
>   
>   	/* Just a hint for performance: so it's ok that this can be racy! */
>   	if (vq->event)
> @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->we_own_ring = false;
>   	vq->notify = notify;
>   	vq->weak_barriers = weak_barriers;
> -	vq->broken = false;
> +	vq->broken = true;
>   	vq->last_used_idx = 0;
>   	vq->event_triggered = false;
>   	vq->num_added = 0;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 25be018810a7..d4edfd7d91bb 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
>   	unsigned status = dev->config->get_status(dev);
>   
>   	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> +	/*
> +	 * The virtio_synchronize_cbs() makes sure vring_interrupt()
> +	 * will see the driver specific setup if it sees vq->broken
> +	 * as false (even if the notifications come before DRIVER_OK).
> +	 */
> +	virtio_synchronize_cbs(dev);
> +	__virtio_unbreak_device(dev);
> +	/*
> +	 * The transport should ensure the visibility of vq->broken
> +	 * before setting DRIVER_OK. See the comments for the transport
> +	 * specific set_status() method.
> +	 *
> +	 * A well behaved device will only notify a virtqueue after
> +	 * DRIVER_OK, this means the device should "see" the coherenct
> +	 * memory write that set vq->broken as false which is done by
> +	 * the driver when it sees DRIVER_OK, then the following
> +	 * driver's vring_interrupt() will see vq->broken as false so
> +	 * we won't lose any notification.
> +	 */
>   	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>   }
>   


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
  2022-07-05 11:06   ` chenxiang (M)
@ 2022-07-05 12:56       ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-07-05 12:56 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: linux-s390, Peter Oberparleiter, Cindy Lu, Paul E. McKenney, mst,
	Peter Zijlstra, Marc Zyngier, Cornelia Huck, linux-kernel,
	virtualization, Halil Pasic, eperezma, Vineeth Vijayan,
	Thomas Gleixner

On Tue, Jul 5, 2022 at 7:09 PM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Hi,
>
> I encounter a issue when testing virtio-balloon on my platform (ARM64)
> with kernel 5.19-rc4 to boot VM with "-device virtio-balloon ", and
>
> then change the size of balloon in qemu monitor, but it isn't valid, and
> the log is as follows:
>
> QEMU 6.1.50 monitor - type 'help' for more information
> (qemu) info balloon
> info balloon
> balloon: actual=4096
> (qemu) balloon 3172
> balloon 3172
> (qemu) info balloon
> info balloon
> balloon: actual=4096
>
> I git bisect the patch, and find this patch
> ([8b4ec69d7e098a7ddf832e1e7840de53ed474c77] virtio: harden vring IRQ) at
> last.
>
> Do you have any idea about it?

Yes, we noticed this issue and have disable the hardening feature via:

c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163 ("virtio: disable
notification hardening by default")
6a9720576cd00d30722c5f755bd17d4cfa9df636 ("virtio:
VIRTIO_HARDEN_NOTIFICATION is broken")

which have been merged in Michael's tree.

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=6a9720576cd00d30722c5f755bd17d4cfa9df636

Thanks

>
>
> Best regards,
>
> Xiang Chen
>
> 在 2022/5/27 14:01, Jason Wang 写道:
> > This is a rework on the previous IRQ hardening that is done for
> > virtio-pci where several drawbacks were found and were reverted:
> >
> > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> >     that is used by some device such as virtio-blk
> > 2) done only for PCI transport
> >
> > The vq->broken is re-used in this patch for implementing the IRQ
> > hardening. The vq->broken is set to true during both initialization
> > and reset. And the vq->broken is set to false in
> > virtio_device_ready(). Then vring_interrupt() can check and return
> > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > to let the interrupt core aware of such invalid interrupt to prevent
> > IRQ storm.
> >
> > The reason of using a per queue variable instead of a per device one
> > is that we may need it for per queue reset hardening in the future.
> >
> > Note that the hardening is only done for vring interrupt since the
> > config interrupt hardening is already done in commit 22b7050a024d7
> > ("virtio: defer config changed notifications"). But the method that is
> > used by config interrupt can't be reused by the vring interrupt
> > handler because it uses spinlock to do the synchronization which is
> > expensive.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> >   drivers/virtio/virtio.c                | 15 ++++++++++++---
> >   drivers/virtio/virtio_mmio.c           |  5 +++++
> >   drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> >   drivers/virtio/virtio_ring.c           | 11 +++++++----
> >   include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> >   6 files changed, 53 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > index c188e4f20ca3..97e51c34e6cf 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> >       ccw->flags = 0;
> >       ccw->count = sizeof(status);
> >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > +     /* We use ssch for setting the status which is a serializing
> > +      * instruction that guarantees the memory writes have
> > +      * completed before ssch.
> > +      */
> >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> >       /* Write failed? We assume status is unchanged. */
> >       if (ret)
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index aa1eb5132767..95fac4c97c8b 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> >    * */
> >   void virtio_reset_device(struct virtio_device *dev)
> >   {
> > +     /*
> > +      * The below virtio_synchronize_cbs() guarantees that any
> > +      * interrupt for this line arriving after
> > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > +      * vq->broken as true.
> > +      */
> > +     virtio_break_device(dev);
> > +     virtio_synchronize_cbs(dev);
> > +
> >       dev->config->reset(dev);
> >   }
> >   EXPORT_SYMBOL_GPL(virtio_reset_device);
> > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> >       dev->config_enabled = false;
> >       dev->config_change_pending = false;
> >
> > +     INIT_LIST_HEAD(&dev->vqs);
> > +     spin_lock_init(&dev->vqs_list_lock);
> > +
> >       /* We always start by resetting the device, in case a previous
> >        * driver messed it up.  This also tests that code path a little. */
> >       virtio_reset_device(dev);
> > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> >       /* Acknowledge that we've seen the device. */
> >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> > -     INIT_LIST_HEAD(&dev->vqs);
> > -     spin_lock_init(&dev->vqs_list_lock);
> > -
> >       /*
> >        * device_add() causes the bus infrastructure to look for a matching
> >        * driver.
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index c9699a59f93c..f9a36bc7ac27 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> >       /* We should never be setting status to 0. */
> >       BUG_ON(status == 0);
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> >   }
> >
> > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > index 4093f9cca7a6..a0fa14f28a7f 100644
> > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> >   {
> >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       vp_iowrite8(status, &cfg->device_status);
> >   }
> >   EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 9c231e1fded7..13a7348cedff 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->we_own_ring = true;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
> > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >               return IRQ_NONE;
> >       }
> >
> > -     if (unlikely(vq->broken))
> > -             return IRQ_HANDLED;
> > +     if (unlikely(vq->broken)) {
> > +             dev_warn_once(&vq->vq.vdev->dev,
> > +                           "virtio vring IRQ raised before DRIVER_OK");
> > +             return IRQ_NONE;
> > +     }
> >
> >       /* Just a hint for performance: so it's ok that this can be racy! */
> >       if (vq->event)
> > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >       vq->we_own_ring = false;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 25be018810a7..d4edfd7d91bb 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> >       unsigned status = dev->config->get_status(dev);
> >
> >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> > +     /*
> > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > +      * will see the driver specific setup if it sees vq->broken
> > +      * as false (even if the notifications come before DRIVER_OK).
> > +      */
> > +     virtio_synchronize_cbs(dev);
> > +     __virtio_unbreak_device(dev);
> > +     /*
> > +      * The transport should ensure the visibility of vq->broken
> > +      * before setting DRIVER_OK. See the comments for the transport
> > +      * specific set_status() method.
> > +      *
> > +      * A well behaved device will only notify a virtqueue after
> > +      * DRIVER_OK, this means the device should "see" the coherenct
> > +      * memory write that set vq->broken as false which is done by
> > +      * the driver when it sees DRIVER_OK, then the following
> > +      * driver's vring_interrupt() will see vq->broken as false so
> > +      * we won't lose any notification.
> > +      */
> >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> >   }
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH V6 8/9] virtio: harden vring IRQ
@ 2022-07-05 12:56       ` Jason Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Wang @ 2022-07-05 12:56 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: mst, virtualization, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, Paul E. McKenney, Marc Zyngier, Halil Pasic,
	Cornelia Huck, eperezma, Cindy Lu, Stefano Garzarella, Xuan Zhuo,
	Vineeth Vijayan, Peter Oberparleiter, linux-s390

On Tue, Jul 5, 2022 at 7:09 PM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Hi,
>
> I encounter a issue when testing virtio-balloon on my platform (ARM64)
> with kernel 5.19-rc4 to boot VM with "-device virtio-balloon ", and
>
> then change the size of balloon in qemu monitor, but it isn't valid, and
> the log is as follows:
>
> QEMU 6.1.50 monitor - type 'help' for more information
> (qemu) info balloon
> info balloon
> balloon: actual=4096
> (qemu) balloon 3172
> balloon 3172
> (qemu) info balloon
> info balloon
> balloon: actual=4096
>
> I git bisect the patch, and find this patch
> ([8b4ec69d7e098a7ddf832e1e7840de53ed474c77] virtio: harden vring IRQ) at
> last.
>
> Do you have any idea about it?

Yes, we noticed this issue and have disable the hardening feature via:

c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163 ("virtio: disable
notification hardening by default")
6a9720576cd00d30722c5f755bd17d4cfa9df636 ("virtio:
VIRTIO_HARDEN_NOTIFICATION is broken")

which have been merged in Michael's tree.

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=6a9720576cd00d30722c5f755bd17d4cfa9df636

Thanks

>
>
> Best regards,
>
> Xiang Chen
>
> 在 2022/5/27 14:01, Jason Wang 写道:
> > This is a rework on the previous IRQ hardening that is done for
> > virtio-pci where several drawbacks were found and were reverted:
> >
> > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> >     that is used by some device such as virtio-blk
> > 2) done only for PCI transport
> >
> > The vq->broken is re-used in this patch for implementing the IRQ
> > hardening. The vq->broken is set to true during both initialization
> > and reset. And the vq->broken is set to false in
> > virtio_device_ready(). Then vring_interrupt() can check and return
> > when vq->broken is true. And in this case, switch to return IRQ_NONE
> > to let the interrupt core aware of such invalid interrupt to prevent
> > IRQ storm.
> >
> > The reason of using a per queue variable instead of a per device one
> > is that we may need it for per queue reset hardening in the future.
> >
> > Note that the hardening is only done for vring interrupt since the
> > config interrupt hardening is already done in commit 22b7050a024d7
> > ("virtio: defer config changed notifications"). But the method that is
> > used by config interrupt can't be reused by the vring interrupt
> > handler because it uses spinlock to do the synchronization which is
> > expensive.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
> > Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   drivers/s390/virtio/virtio_ccw.c       |  4 ++++
> >   drivers/virtio/virtio.c                | 15 ++++++++++++---
> >   drivers/virtio/virtio_mmio.c           |  5 +++++
> >   drivers/virtio/virtio_pci_modern_dev.c |  5 +++++
> >   drivers/virtio/virtio_ring.c           | 11 +++++++----
> >   include/linux/virtio_config.h          | 20 ++++++++++++++++++++
> >   6 files changed, 53 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > index c188e4f20ca3..97e51c34e6cf 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status)
> >       ccw->flags = 0;
> >       ccw->count = sizeof(status);
> >       ccw->cda = (__u32)(unsigned long)&vcdev->dma_area->status;
> > +     /* We use ssch for setting the status which is a serializing
> > +      * instruction that guarantees the memory writes have
> > +      * completed before ssch.
> > +      */
> >       ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
> >       /* Write failed? We assume status is unchanged. */
> >       if (ret)
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index aa1eb5132767..95fac4c97c8b 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> >    * */
> >   void virtio_reset_device(struct virtio_device *dev)
> >   {
> > +     /*
> > +      * The below virtio_synchronize_cbs() guarantees that any
> > +      * interrupt for this line arriving after
> > +      * virtio_synchronize_vqs() has completed is guaranteed to see
> > +      * vq->broken as true.
> > +      */
> > +     virtio_break_device(dev);
> > +     virtio_synchronize_cbs(dev);
> > +
> >       dev->config->reset(dev);
> >   }
> >   EXPORT_SYMBOL_GPL(virtio_reset_device);
> > @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
> >       dev->config_enabled = false;
> >       dev->config_change_pending = false;
> >
> > +     INIT_LIST_HEAD(&dev->vqs);
> > +     spin_lock_init(&dev->vqs_list_lock);
> > +
> >       /* We always start by resetting the device, in case a previous
> >        * driver messed it up.  This also tests that code path a little. */
> >       virtio_reset_device(dev);
> > @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
> >       /* Acknowledge that we've seen the device. */
> >       virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> > -     INIT_LIST_HEAD(&dev->vqs);
> > -     spin_lock_init(&dev->vqs_list_lock);
> > -
> >       /*
> >        * device_add() causes the bus infrastructure to look for a matching
> >        * driver.
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index c9699a59f93c..f9a36bc7ac27 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status)
> >       /* We should never be setting status to 0. */
> >       BUG_ON(status == 0);
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
> >   }
> >
> > diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
> > index 4093f9cca7a6..a0fa14f28a7f 100644
> > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > @@ -467,6 +467,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev,
> >   {
> >       struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
> >
> > +     /*
> > +      * Per memory-barriers.txt, wmb() is not needed to guarantee
> > +      * that the the cache coherent memory writes have completed
> > +      * before writing to the MMIO region.
> > +      */
> >       vp_iowrite8(status, &cfg->device_status);
> >   }
> >   EXPORT_SYMBOL_GPL(vp_modern_set_status);
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 9c231e1fded7..13a7348cedff 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->we_own_ring = true;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
> > @@ -2134,8 +2134,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >               return IRQ_NONE;
> >       }
> >
> > -     if (unlikely(vq->broken))
> > -             return IRQ_HANDLED;
> > +     if (unlikely(vq->broken)) {
> > +             dev_warn_once(&vq->vq.vdev->dev,
> > +                           "virtio vring IRQ raised before DRIVER_OK");
> > +             return IRQ_NONE;
> > +     }
> >
> >       /* Just a hint for performance: so it's ok that this can be racy! */
> >       if (vq->event)
> > @@ -2177,7 +2180,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >       vq->we_own_ring = false;
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> > -     vq->broken = false;
> > +     vq->broken = true;
> >       vq->last_used_idx = 0;
> >       vq->event_triggered = false;
> >       vq->num_added = 0;
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 25be018810a7..d4edfd7d91bb 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev)
> >       unsigned status = dev->config->get_status(dev);
> >
> >       BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> > +     /*
> > +      * The virtio_synchronize_cbs() makes sure vring_interrupt()
> > +      * will see the driver specific setup if it sees vq->broken
> > +      * as false (even if the notifications come before DRIVER_OK).
> > +      */
> > +     virtio_synchronize_cbs(dev);
> > +     __virtio_unbreak_device(dev);
> > +     /*
> > +      * The transport should ensure the visibility of vq->broken
> > +      * before setting DRIVER_OK. See the comments for the transport
> > +      * specific set_status() method.
> > +      *
> > +      * A well behaved device will only notify a virtqueue after
> > +      * DRIVER_OK, this means the device should "see" the coherenct
> > +      * memory write that set vq->broken as false which is done by
> > +      * the driver when it sees DRIVER_OK, then the following
> > +      * driver's vring_interrupt() will see vq->broken as false so
> > +      * we won't lose any notification.
> > +      */
> >       dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> >   }
> >
>


^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2022-07-05 13:36 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27  6:01 [PATCH V6 0/9] rework on the IRQ hardening of virtio Jason Wang
2022-05-27  6:01 ` Jason Wang
2022-05-27  6:01 ` [PATCH V6 1/9] virtio: use virtio_device_ready() in virtio_device_restore() Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:29   ` Xuan Zhuo
2022-05-27  7:29     ` Xuan Zhuo
2022-05-27  6:01 ` [PATCH V6 2/9] virtio: use virtio_reset_device() when possible Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:30   ` Xuan Zhuo
2022-05-27  7:30     ` Xuan Zhuo
2022-05-27  8:52   ` Eugenio Perez Martin
2022-05-27 10:34   ` Stefano Garzarella
2022-05-27 10:34     ` Stefano Garzarella
2022-05-27  6:01 ` [PATCH V6 3/9] virtio: introduce config op to synchronize vring callbacks Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:30   ` Xuan Zhuo
2022-05-27  7:30     ` Xuan Zhuo
2022-05-27 10:36   ` Stefano Garzarella
2022-05-27 10:36     ` Stefano Garzarella
2022-05-27  6:01 ` [PATCH V6 4/9] virtio-pci: implement synchronize_cbs() Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:31   ` Xuan Zhuo
2022-05-27  7:31     ` Xuan Zhuo
2022-05-27  6:01 ` [PATCH V6 5/9] virtio-mmio: " Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:32   ` Xuan Zhuo
2022-05-27  7:32     ` Xuan Zhuo
2022-05-27  6:01 ` [PATCH V6 6/9] virtio-ccw: " Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-30 15:12   ` Cornelia Huck
2022-05-30 15:12     ` Cornelia Huck
2022-05-27  6:01 ` [PATCH V6 7/9] virtio: allow to unbreak virtqueue Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:33   ` Xuan Zhuo
2022-05-27  7:33     ` Xuan Zhuo
2022-05-30 15:15   ` Cornelia Huck
2022-05-30 15:15     ` Cornelia Huck
2022-05-27  6:01 ` [PATCH V6 8/9] virtio: harden vring IRQ Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:49   ` Xuan Zhuo
2022-05-27  7:49     ` Xuan Zhuo
2022-05-30 15:18   ` Cornelia Huck
2022-05-30 15:18     ` Cornelia Huck
2022-06-11  5:12   ` Michael S. Tsirkin
2022-06-11  5:12     ` Michael S. Tsirkin
2022-06-13  5:26     ` Jason Wang
2022-06-13  5:26       ` Jason Wang
2022-06-13  7:23       ` Michael S. Tsirkin
2022-06-13  7:23         ` Michael S. Tsirkin
2022-06-13  8:07         ` Jason Wang
2022-06-13  8:07           ` Jason Wang
2022-06-13  8:19           ` Michael S. Tsirkin
2022-06-13  8:19             ` Michael S. Tsirkin
2022-06-13  8:51             ` Jason Wang
2022-06-13  8:51               ` Jason Wang
2022-06-13  8:59               ` Michael S. Tsirkin
2022-06-13  8:59                 ` Michael S. Tsirkin
2022-06-13  9:08                 ` Jason Wang
2022-06-13  9:08                   ` Jason Wang
2022-06-13  9:14                   ` Jason Wang
2022-06-13  9:14                     ` Jason Wang
2022-06-13  9:27                     ` Michael S. Tsirkin
2022-06-13  9:27                       ` Michael S. Tsirkin
2022-06-14  7:40                       ` Jason Wang
2022-06-14  7:40                         ` Jason Wang
2022-06-14 13:50                         ` Michael S. Tsirkin
2022-06-14 13:50                           ` Michael S. Tsirkin
2022-06-15  1:32                           ` Jason Wang
2022-06-15  1:32                             ` Jason Wang
2022-06-14 15:49                         ` Michael S. Tsirkin
2022-06-14 15:49                           ` Michael S. Tsirkin
2022-06-15  1:38                           ` Jason Wang
2022-06-15  1:38                             ` Jason Wang
2022-06-16 17:11                             ` Michael S. Tsirkin
2022-06-16 17:11                               ` Michael S. Tsirkin
2022-06-17  1:24                               ` Jason Wang
2022-06-17  1:24                                 ` Jason Wang
2022-06-17  5:36                                 ` Michael S. Tsirkin
2022-06-17  5:36                                   ` Michael S. Tsirkin
2022-06-17  7:26                                   ` Jason Wang
2022-06-17  7:26                                     ` Jason Wang
2022-06-17 14:33                                     ` Peter Zijlstra
2022-06-17 14:33                                       ` Peter Zijlstra
2022-06-14 16:46                         ` Cristian Marussi
2022-06-15  1:41                           ` Jason Wang
2022-06-15  1:41                             ` Jason Wang
2022-06-15 18:24                             ` Cristian Marussi
2022-06-17  3:14                               ` Jason Wang
2022-06-17  3:14                                 ` Jason Wang
2022-06-13  9:26                   ` Michael S. Tsirkin
2022-06-13  9:26                     ` Michael S. Tsirkin
2022-06-14  7:19                     ` Jason Wang
2022-06-14  7:19                       ` Jason Wang
2022-07-05 11:06   ` chenxiang (M)
2022-07-05 12:56     ` Jason Wang
2022-07-05 12:56       ` Jason Wang
2022-05-27  6:01 ` [PATCH V6 9/9] virtio: use WARN_ON() to warning illegal status value Jason Wang
2022-05-27  6:01   ` Jason Wang
2022-05-27  7:49   ` Xuan Zhuo
2022-05-27  7:49     ` Xuan Zhuo
2022-05-27 10:35   ` Stefano Garzarella
2022-05-27 10:35     ` Stefano Garzarella
2022-05-27 10:50   ` Michael S. Tsirkin
2022-05-27 10:50     ` Michael S. Tsirkin
2022-05-30  3:48     ` Jason Wang
2022-05-30  3:48       ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.