All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 00/10] More virtio hardening
@ 2021-10-19  7:01 ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

Hi All:

This series treis to do more hardening for virito.

patch 1 validates the num_queues for virio-blk device.
patch 2 validates max_nr_ports for virito-console device.
patch 3-5 harden virtio-pci interrupts to make sure no exepcted
interrupt handler is tiggered. If this makes sense we can do similar
things in other transport drivers.
patch 6-7 validate used ring length.
patch 8-10 let the driver to validate the used length instead of the
virtio core when possible.

Smoking test on blk/net with packed=on/off and iommu_platform=on/off.

Please review.

Changes since V2:
- don't validate max_nr_ports in .validate()
- fail the probe instead of trying to work when blk/console returns
  invalid number of queues/ports
- use READ_ONCE() instead of smp_load_acquire() for checking
  intx_soft_enabled
- use "suppress_used_validation" instead of "validate_used"

Changes since V1:
- fix and document the memory ordering around the intx_soft_enabled
  when enabling and disabling INTX interrupt
- for the driver that can validate the used length, virtio core
  won't even try to allocate auxilary arrays and validate the used length
- tweak the commit log
- fix typos

Jason Wang (10):
  virtio-blk: validate num_queues during probe
  virtio_console: validate max_nr_ports before trying to use it
  virtio_config: introduce a new .enable_cbs method
  virtio_pci: harden MSI-X interrupts
  virtio-pci: harden INTX interrupts
  virtio_ring: fix typos in vring_desc_extra
  virtio_ring: validate used buffer length
  virtio-net: don't let virtio core to validate used length
  virtio-blk: don't let virtio core to validate used length
  virtio-scsi: don't let virtio core to validate used buffer length

 drivers/block/virtio_blk.c         |  5 +++
 drivers/char/virtio_console.c      |  9 +++++
 drivers/net/virtio_net.c           |  1 +
 drivers/scsi/virtio_scsi.c         |  1 +
 drivers/virtio/virtio_pci_common.c | 48 ++++++++++++++++++++----
 drivers/virtio/virtio_pci_common.h |  7 +++-
 drivers/virtio/virtio_pci_legacy.c |  5 ++-
 drivers/virtio/virtio_pci_modern.c |  6 ++-
 drivers/virtio/virtio_ring.c       | 60 +++++++++++++++++++++++++++++-
 include/linux/virtio.h             |  2 +
 include/linux/virtio_config.h      |  6 +++
 11 files changed, 135 insertions(+), 15 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V3 00/10] More virtio hardening
@ 2021-10-19  7:01 ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

Hi All:

This series treis to do more hardening for virito.

patch 1 validates the num_queues for virio-blk device.
patch 2 validates max_nr_ports for virito-console device.
patch 3-5 harden virtio-pci interrupts to make sure no exepcted
interrupt handler is tiggered. If this makes sense we can do similar
things in other transport drivers.
patch 6-7 validate used ring length.
patch 8-10 let the driver to validate the used length instead of the
virtio core when possible.

Smoking test on blk/net with packed=on/off and iommu_platform=on/off.

Please review.

Changes since V2:
- don't validate max_nr_ports in .validate()
- fail the probe instead of trying to work when blk/console returns
  invalid number of queues/ports
- use READ_ONCE() instead of smp_load_acquire() for checking
  intx_soft_enabled
- use "suppress_used_validation" instead of "validate_used"

Changes since V1:
- fix and document the memory ordering around the intx_soft_enabled
  when enabling and disabling INTX interrupt
- for the driver that can validate the used length, virtio core
  won't even try to allocate auxilary arrays and validate the used length
- tweak the commit log
- fix typos

Jason Wang (10):
  virtio-blk: validate num_queues during probe
  virtio_console: validate max_nr_ports before trying to use it
  virtio_config: introduce a new .enable_cbs method
  virtio_pci: harden MSI-X interrupts
  virtio-pci: harden INTX interrupts
  virtio_ring: fix typos in vring_desc_extra
  virtio_ring: validate used buffer length
  virtio-net: don't let virtio core to validate used length
  virtio-blk: don't let virtio core to validate used length
  virtio-scsi: don't let virtio core to validate used buffer length

 drivers/block/virtio_blk.c         |  5 +++
 drivers/char/virtio_console.c      |  9 +++++
 drivers/net/virtio_net.c           |  1 +
 drivers/scsi/virtio_scsi.c         |  1 +
 drivers/virtio/virtio_pci_common.c | 48 ++++++++++++++++++++----
 drivers/virtio/virtio_pci_common.h |  7 +++-
 drivers/virtio/virtio_pci_legacy.c |  5 ++-
 drivers/virtio/virtio_pci_modern.c |  6 ++-
 drivers/virtio/virtio_ring.c       | 60 +++++++++++++++++++++++++++++-
 include/linux/virtio.h             |  2 +
 include/linux/virtio_config.h      |  6 +++
 11 files changed, 135 insertions(+), 15 deletions(-)

-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V3 01/10] virtio-blk: validate num_queues during probe
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Paolo Bonzini, Stefan Hajnoczi, Stefano Garzarella

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/block/virtio_blk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9b3bd083b411..10bc0879e618 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -497,6 +497,10 @@ static int init_vq(struct virtio_blk *vblk)
 				   &num_vqs);
 	if (err)
 		num_vqs = 1;
+	if (!err && !num_vqs) {
+		dev_err(&vdev->dev, "MQ advertisted but zero queues reported\n");
+		return -EINVAL;
+	}
 
 	num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 01/10] virtio-blk: validate num_queues during probe
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/block/virtio_blk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9b3bd083b411..10bc0879e618 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -497,6 +497,10 @@ static int init_vq(struct virtio_blk *vblk)
 				   &num_vqs);
 	if (err)
 		num_vqs = 1;
+	if (!err && !num_vqs) {
+		dev_err(&vdev->dev, "MQ advertisted but zero queues reported\n");
+		return -EINVAL;
+	}
 
 	num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
 
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 02/10] virtio_console: validate max_nr_ports before trying to use it
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Amit Shah

We calculate nr_ports based on the max_nr_ports:

nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;

If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports and fail
the probe for invalid max_nr_ports in this case.

Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/char/virtio_console.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7a86..660c5c388c29 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -28,6 +28,7 @@
 #include "../tty/hvc/hvc_console.h"
 
 #define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
+#define VIRTCONS_MAX_PORTS 0x8000
 
 /*
  * This is a global struct for storing common data for all the devices
@@ -2036,6 +2037,14 @@ static int virtcons_probe(struct virtio_device *vdev)
 	    virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 				 struct virtio_console_config, max_nr_ports,
 				 &portdev->max_nr_ports) == 0) {
+		if (portdev->max_nr_ports == 0 ||
+		    portdev->max_nr_ports > VIRTCONS_MAX_PORTS) {
+			dev_err(&vdev->dev,
+				"Invalidate max_nr_ports %d",
+				portdev->max_nr_ports);
+			err = -EINVAL;
+			goto free;
+		}
 		multiport = true;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 02/10] virtio_console: validate max_nr_ports before trying to use it
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: f.hetzelt, david.kaplan, Amit Shah, konrad.wilk, linux-kernel,
	virtualization

We calculate nr_ports based on the max_nr_ports:

nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;

If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports and fail
the probe for invalid max_nr_ports in this case.

Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/char/virtio_console.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7a86..660c5c388c29 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -28,6 +28,7 @@
 #include "../tty/hvc/hvc_console.h"
 
 #define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
+#define VIRTCONS_MAX_PORTS 0x8000
 
 /*
  * This is a global struct for storing common data for all the devices
@@ -2036,6 +2037,14 @@ static int virtcons_probe(struct virtio_device *vdev)
 	    virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 				 struct virtio_console_config, max_nr_ports,
 				 &portdev->max_nr_ports) == 0) {
+		if (portdev->max_nr_ports == 0 ||
+		    portdev->max_nr_ports > VIRTCONS_MAX_PORTS) {
+			dev_err(&vdev->dev,
+				"Invalidate max_nr_ports %d",
+				portdev->max_nr_ports);
+			err = -EINVAL;
+			goto free;
+		}
 		multiport = true;
 	}
 
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 03/10] virtio_config: introduce a new .enable_cbs method
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

This patch introduces a new method to enable the callbacks for config
and virtqueues. This will be used for making sure the virtqueue
callbacks are only enabled after virtio_device_ready() if transport
implements this method.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 8519b3ae5d52..4d107ad31149 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -23,6 +23,8 @@ struct virtio_shm_region {
  *       any of @get/@set, @get_status/@set_status, or @get_features/
  *       @finalize_features are NOT safe to be called from an atomic
  *       context.
+ * @enable_cbs: enable the callbacks
+ *      vdev: the virtio_device
  * @get: read the value of a configuration field
  *	vdev: the virtio_device
  *	offset: the offset of the configuration field
@@ -75,6 +77,7 @@ struct virtio_shm_region {
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
+	void (*enable_cbs)(struct virtio_device *vdev);
 	void (*get)(struct virtio_device *vdev, unsigned offset,
 		    void *buf, unsigned len);
 	void (*set)(struct virtio_device *vdev, unsigned offset,
@@ -229,6 +232,9 @@ void virtio_device_ready(struct virtio_device *dev)
 {
 	unsigned status = dev->config->get_status(dev);
 
+	if (dev->config->enable_cbs)
+                  dev->config->enable_cbs(dev);
+
 	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 03/10] virtio_config: introduce a new .enable_cbs method
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

This patch introduces a new method to enable the callbacks for config
and virtqueues. This will be used for making sure the virtqueue
callbacks are only enabled after virtio_device_ready() if transport
implements this method.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_config.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 8519b3ae5d52..4d107ad31149 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -23,6 +23,8 @@ struct virtio_shm_region {
  *       any of @get/@set, @get_status/@set_status, or @get_features/
  *       @finalize_features are NOT safe to be called from an atomic
  *       context.
+ * @enable_cbs: enable the callbacks
+ *      vdev: the virtio_device
  * @get: read the value of a configuration field
  *	vdev: the virtio_device
  *	offset: the offset of the configuration field
@@ -75,6 +77,7 @@ struct virtio_shm_region {
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
+	void (*enable_cbs)(struct virtio_device *vdev);
 	void (*get)(struct virtio_device *vdev, unsigned offset,
 		    void *buf, unsigned len);
 	void (*set)(struct virtio_device *vdev, unsigned offset,
@@ -229,6 +232,9 @@ void virtio_device_ready(struct virtio_device *dev)
 {
 	unsigned status = dev->config->get_status(dev);
 
+	if (dev->config->enable_cbs)
+                  dev->config->enable_cbs(dev);
+
 	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: Paul E . McKenney, david.kaplan, konrad.wilk, Peter Zijlstra,
	f.hetzelt, linux-kernel, virtualization, Thomas Gleixner

We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:

1) switching to use disable_irq() to prevent the virtio interrupt
   handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()

This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
 drivers/virtio/virtio_pci_common.h |  6 ++++--
 drivers/virtio/virtio_pci_legacy.c |  5 +++--
 drivers/virtio/virtio_pci_modern.c |  6 ++++--
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index b35bb2d57f62..8d8f83aca721 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
 		 "Force legacy mode for transitional virtio 1 devices");
 #endif
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev)
+/* disable irq handlers */
+void vp_disable_cbs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
@@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
 		synchronize_irq(vp_dev->pci_dev->irq);
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
+		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
+}
+
+/* enable irq handlers */
+void vp_enable_cbs(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		return;
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
 }
 
 /* the notify function used when creating a virt queue */
@@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 		 "%s-config", name);
 	err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  vp_config_changed, IRQF_NO_AUTOEN,
+			  vp_dev->msix_names[v],
 			  vp_dev);
 	if (err)
 		goto error;
@@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 			 "%s-virtqueues", name);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
+				  vp_vring_interrupt, IRQF_NO_AUTOEN,
+				  vp_dev->msix_names[v],
 				  vp_dev);
 		if (err)
 			goto error;
@@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
 			 "%s-%s",
 			 dev_name(&vp_dev->vdev.dev), names[i]);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
-				  vring_interrupt, 0,
+				  vring_interrupt, IRQF_NO_AUTOEN,
 				  vp_dev->msix_names[msix_vec],
 				  vqs[i]);
 		if (err)
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index beec047a8f8d..52e924603075 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
 	return container_of(vdev, struct virtio_pci_device, vdev);
 }
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev);
+/* disable irq handlers */
+void vp_disable_cbs(struct virtio_device *vdev);
+/* enable irq handlers */
+void vp_enable_cbs(struct virtio_device *vdev);
 /* the notify function used when creating a virt queue */
 bool vp_notify(struct virtqueue *vq);
 /* the config->del_vqs() implementation */
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..d9c95d89cdd0 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
 	/* Flush out the status write, and flush in device writes,
 	 * including MSi-X interrupts, if any. */
 	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_cbs(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
 }
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= vp_get,
 	.set		= vp_set,
 	.get_status	= vp_get_status,
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 30654d3a0b41..5455bc041fb6 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_cbs(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 }
 
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= NULL,
 	.set		= NULL,
 	.generation	= vp_generation,
@@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= vp_get,
 	.set		= vp_set,
 	.generation	= vp_generation,
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Thomas Gleixner, Peter Zijlstra, Paul E . McKenney

We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:

1) switching to use disable_irq() to prevent the virtio interrupt
   handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()

This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
 drivers/virtio/virtio_pci_common.h |  6 ++++--
 drivers/virtio/virtio_pci_legacy.c |  5 +++--
 drivers/virtio/virtio_pci_modern.c |  6 ++++--
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index b35bb2d57f62..8d8f83aca721 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
 		 "Force legacy mode for transitional virtio 1 devices");
 #endif
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev)
+/* disable irq handlers */
+void vp_disable_cbs(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
@@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
 		synchronize_irq(vp_dev->pci_dev->irq);
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
-		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
+		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
+}
+
+/* enable irq handlers */
+void vp_enable_cbs(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int i;
+
+	if (vp_dev->intx_enabled)
+		return;
+
+	for (i = 0; i < vp_dev->msix_vectors; ++i)
+		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
 }
 
 /* the notify function used when creating a virt queue */
@@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 		 "%s-config", name);
 	err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-			  vp_config_changed, 0, vp_dev->msix_names[v],
+			  vp_config_changed, IRQF_NO_AUTOEN,
+			  vp_dev->msix_names[v],
 			  vp_dev);
 	if (err)
 		goto error;
@@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 		snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
 			 "%s-virtqueues", name);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
-				  vp_vring_interrupt, 0, vp_dev->msix_names[v],
+				  vp_vring_interrupt, IRQF_NO_AUTOEN,
+				  vp_dev->msix_names[v],
 				  vp_dev);
 		if (err)
 			goto error;
@@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
 			 "%s-%s",
 			 dev_name(&vp_dev->vdev.dev), names[i]);
 		err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
-				  vring_interrupt, 0,
+				  vring_interrupt, IRQF_NO_AUTOEN,
 				  vp_dev->msix_names[msix_vec],
 				  vqs[i]);
 		if (err)
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index beec047a8f8d..52e924603075 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev)
 	return container_of(vdev, struct virtio_pci_device, vdev);
 }
 
-/* wait for pending irq handlers */
-void vp_synchronize_vectors(struct virtio_device *vdev);
+/* disable irq handlers */
+void vp_disable_cbs(struct virtio_device *vdev);
+/* enable irq handlers */
+void vp_enable_cbs(struct virtio_device *vdev);
 /* the notify function used when creating a virt queue */
 bool vp_notify(struct virtqueue *vq);
 /* the config->del_vqs() implementation */
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..d9c95d89cdd0 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -97,8 +97,8 @@ static void vp_reset(struct virtio_device *vdev)
 	/* Flush out the status write, and flush in device writes,
 	 * including MSi-X interrupts, if any. */
 	ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_cbs(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -194,6 +194,7 @@ static void del_vq(struct virtio_pci_vq_info *info)
 }
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= vp_get,
 	.set		= vp_set,
 	.get_status	= vp_get_status,
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 30654d3a0b41..5455bc041fb6 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -172,8 +172,8 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
-	/* Flush pending VQ/configuration callbacks. */
-	vp_synchronize_vectors(vdev);
+	/* Disable VQ/configuration callbacks. */
+	vp_disable_cbs(vdev);
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
@@ -380,6 +380,7 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 }
 
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= NULL,
 	.set		= NULL,
 	.generation	= vp_generation,
@@ -397,6 +398,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
+	.enable_cbs	= vp_enable_cbs,
 	.get		= vp_get,
 	.set		= vp_set,
 	.generation	= vp_generation,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 05/10] virtio-pci: harden INTX interrupts
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Boqun Feng, Thomas Gleixner, Peter Zijlstra,
	Paul E . McKenney

This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.

Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
 drivers/virtio/virtio_pci_common.h |  1 +
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 8d8f83aca721..1bce254a462a 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		/*
+		 * The below synchronize() guarantees that any
+		 * interrupt for this line arriving after
+		 * synchronize_irq() has completed is guaranteed to see
+		 * intx_soft_enabled == false.
+		 */
+		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
 		synchronize_irq(vp_dev->pci_dev->irq);
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		disable_irq(vp_dev->pci_dev->irq);
+		/*
+		 * The above disable_irq() provides TSO ordering and
+		 * as such promotes the below store to store-release.
+		 */
+		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
+		enable_irq(vp_dev->pci_dev->irq);
 		return;
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -97,6 +113,9 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 	struct virtio_pci_device *vp_dev = opaque;
 	u8 isr;
 
+	if (!READ_ONCE(vp_dev->intx_soft_enabled))
+		return IRQ_NONE;
+
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
 	isr = ioread8(vp_dev->isr);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 52e924603075..7b59e10063c3 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -64,6 +64,7 @@ struct virtio_pci_device {
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
+	bool intx_soft_enabled;
 	cpumask_var_t *msix_affinity_masks;
 	/* Name strings for interrupts. This size should be enough,
 	 * and I'm too lazy to allocate each name separately. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 05/10] virtio-pci: harden INTX interrupts
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: Paul E . McKenney, david.kaplan, konrad.wilk, Peter Zijlstra,
	Boqun Feng, f.hetzelt, linux-kernel, virtualization,
	Thomas Gleixner

This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.

Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
 drivers/virtio/virtio_pci_common.h |  1 +
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 8d8f83aca721..1bce254a462a 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		/*
+		 * The below synchronize() guarantees that any
+		 * interrupt for this line arriving after
+		 * synchronize_irq() has completed is guaranteed to see
+		 * intx_soft_enabled == false.
+		 */
+		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
 		synchronize_irq(vp_dev->pci_dev->irq);
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	int i;
 
-	if (vp_dev->intx_enabled)
+	if (vp_dev->intx_enabled) {
+		disable_irq(vp_dev->pci_dev->irq);
+		/*
+		 * The above disable_irq() provides TSO ordering and
+		 * as such promotes the below store to store-release.
+		 */
+		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
+		enable_irq(vp_dev->pci_dev->irq);
 		return;
+	}
 
 	for (i = 0; i < vp_dev->msix_vectors; ++i)
 		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
@@ -97,6 +113,9 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 	struct virtio_pci_device *vp_dev = opaque;
 	u8 isr;
 
+	if (!READ_ONCE(vp_dev->intx_soft_enabled))
+		return IRQ_NONE;
+
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
 	isr = ioread8(vp_dev->isr);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 52e924603075..7b59e10063c3 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -64,6 +64,7 @@ struct virtio_pci_device {
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
+	bool intx_soft_enabled;
 	cpumask_var_t *msix_affinity_masks;
 	/* Name strings for interrupts. This size should be enough,
 	 * and I'm too lazy to allocate each name separately. */
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 06/10] virtio_ring: fix typos in vring_desc_extra
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

We're actually tracking descriptor address and length instead of the
buffer.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..d2ca0a7365f8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -79,8 +79,8 @@ struct vring_desc_state_packed {
 };
 
 struct vring_desc_extra {
-	dma_addr_t addr;		/* Buffer DMA addr. */
-	u32 len;			/* Buffer length. */
+	dma_addr_t addr;		/* Descriptor DMA addr. */
+	u32 len;			/* Descriptor length. */
 	u16 flags;			/* Descriptor flags. */
 	u16 next;			/* The next desc state in a list. */
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 06/10] virtio_ring: fix typos in vring_desc_extra
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

We're actually tracking descriptor address and length instead of the
buffer.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..d2ca0a7365f8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -79,8 +79,8 @@ struct vring_desc_state_packed {
 };
 
 struct vring_desc_extra {
-	dma_addr_t addr;		/* Buffer DMA addr. */
-	u32 len;			/* Buffer length. */
+	dma_addr_t addr;		/* Descriptor DMA addr. */
+	u32 len;			/* Descriptor length. */
 	u16 flags;			/* Descriptor flags. */
 	u16 next;			/* The next desc state in a list. */
 };
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 07/10] virtio_ring: validate used buffer length
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

This patch validate the used buffer length provided by the device
before trying to use it. This is done by record the in buffer length
in a new field in desc_state structure during virtqueue_add(), then we
can fail the virtqueue_get_buf() when we find the device is trying to
give us a used buffer length which is greater than the in buffer
length.

Since some drivers have already done the validation by themselves,
this patch tries to makes the core validation optional. For the driver
that doesn't want the validation, it can set the validate_used to be
true (which could be overridden by force_used_validation). To be more
efficient, a dedicate array is used for storing the validate used
length, this helps to eliminate the cache stress if validation is done
by the driver.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 56 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 58 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d2ca0a7365f8..306c86b94480 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -14,6 +14,9 @@
 #include <linux/spinlock.h>
 #include <xen/xen.h>
 
+static bool force_used_validation = false;
+module_param(force_used_validation, bool, 0444);
+
 #ifdef DEBUG
 /* For development, we want to crash whenever the ring is screwed. */
 #define BAD_RING(_vq, fmt, args...)				\
@@ -182,6 +185,9 @@ struct vring_virtqueue {
 		} packed;
 	};
 
+	/* Per-descriptor in buffer length */
+	u32 *buflen;
+
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	bool (*notify)(struct virtqueue *vq);
 
@@ -490,6 +496,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	u32 buflen = 0;
 
 	START_USE(vq);
 
@@ -571,6 +578,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
 						     indirect);
+			buflen += sg->length;
 		}
 	}
 	/* Last one doesn't continue. */
@@ -610,6 +618,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	else
 		vq->split.desc_state[head].indir_desc = ctx;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[head] = buflen;
+
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
@@ -784,6 +796,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", i);
 		return NULL;
 	}
+	if (vq->buflen && unlikely(*len > vq->buflen[i])) {
+		BAD_RING(vq, "used len %d is larger than in buflen %u\n",
+			*len, vq->buflen[i]);
+		return NULL;
+	}
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
@@ -1062,6 +1079,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	unsigned int i, n, err_idx;
 	u16 head, id;
 	dma_addr_t addr;
+	u32 buflen = 0;
 
 	head = vq->packed.next_avail_idx;
 	desc = alloc_indirect_packed(total_sg, gfp);
@@ -1089,6 +1107,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 			desc[i].addr = cpu_to_le64(addr);
 			desc[i].len = cpu_to_le32(sg->length);
 			i++;
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1142,6 +1162,10 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	vq->packed.desc_state[id].indir_desc = desc;
 	vq->packed.desc_state[id].last = id;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[id] = buflen;
+
 	vq->num_added += 1;
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -1176,6 +1200,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	unsigned int i, n, c, descs_used, err_idx;
 	__le16 head_flags, flags;
 	u16 head, id, prev, curr, avail_used_flags;
+	u32 buflen = 0;
 
 	START_USE(vq);
 
@@ -1250,6 +1275,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 					1 << VRING_PACKED_DESC_F_AVAIL |
 					1 << VRING_PACKED_DESC_F_USED;
 			}
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1269,6 +1296,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	vq->packed.desc_state[id].indir_desc = ctx;
 	vq->packed.desc_state[id].last = prev;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[id] = buflen;
+
 	/*
 	 * A driver MUST NOT make the first descriptor in the list
 	 * available before all subsequent descriptors comprising
@@ -1455,6 +1486,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", id);
 		return NULL;
 	}
+	if (vq->buflen && unlikely(*len > vq->buflen[id])) {
+		BAD_RING(vq, "used len %d is larger than in buflen %u\n",
+			*len, vq->buflen[id]);
+		return NULL;
+	}
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
@@ -1660,6 +1696,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	struct vring_virtqueue *vq;
 	struct vring_packed_desc *ring;
 	struct vring_packed_desc_event *driver, *device;
+	struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
 	dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
 	size_t ring_size_in_bytes, event_size_in_bytes;
 
@@ -1749,6 +1786,13 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	if (!vq->packed.desc_extra)
 		goto err_desc_extra;
 
+	if (!drv->suppress_used_validation || force_used_validation) {
+		vq->buflen = kmalloc_array(num, sizeof(*vq->buflen),
+					   GFP_KERNEL);
+		if (!vq->buflen)
+			goto err_buflen;
+	}
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
 		vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
@@ -1761,6 +1805,8 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	spin_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
+err_buflen:
+	kfree(vq->packed.desc_extra);
 err_desc_extra:
 	kfree(vq->packed.desc_state);
 err_desc_state:
@@ -2168,6 +2214,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
+	struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
 	struct vring_virtqueue *vq;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
@@ -2227,6 +2274,13 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	if (!vq->split.desc_extra)
 		goto err_extra;
 
+	if (!drv->suppress_used_validation || force_used_validation) {
+		vq->buflen = kmalloc_array(vring.num, sizeof(*vq->buflen),
+					   GFP_KERNEL);
+		if (!vq->buflen)
+			goto err_buflen;
+	}
+
 	/* Put everything in free lists. */
 	vq->free_head = 0;
 	memset(vq->split.desc_state, 0, vring.num *
@@ -2237,6 +2291,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	spin_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
+err_buflen:
+	kfree(vq->split.desc_extra);
 err_extra:
 	kfree(vq->split.desc_state);
 err_state:
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..44d0e09da2d9 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -152,6 +152,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
  * @feature_table_size: number of entries in the feature table array.
  * @feature_table_legacy: same as feature_table but when working in legacy mode.
  * @feature_table_size_legacy: number of entries in feature table legacy array.
+ * @suppress_used_validation: set to not have core validate used length
  * @probe: the function to call when a device is found.  Returns 0 or -errno.
  * @scan: optional function to call after successful probe; intended
  *    for virtio-scsi to invoke a scan.
@@ -168,6 +169,7 @@ struct virtio_driver {
 	unsigned int feature_table_size;
 	const unsigned int *feature_table_legacy;
 	unsigned int feature_table_size_legacy;
+	bool suppress_used_validation;
 	int (*validate)(struct virtio_device *dev);
 	int (*probe)(struct virtio_device *dev);
 	void (*scan)(struct virtio_device *dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 07/10] virtio_ring: validate used buffer length
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

This patch validate the used buffer length provided by the device
before trying to use it. This is done by record the in buffer length
in a new field in desc_state structure during virtqueue_add(), then we
can fail the virtqueue_get_buf() when we find the device is trying to
give us a used buffer length which is greater than the in buffer
length.

Since some drivers have already done the validation by themselves,
this patch tries to makes the core validation optional. For the driver
that doesn't want the validation, it can set the validate_used to be
true (which could be overridden by force_used_validation). To be more
efficient, a dedicate array is used for storing the validate used
length, this helps to eliminate the cache stress if validation is done
by the driver.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 56 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 58 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d2ca0a7365f8..306c86b94480 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -14,6 +14,9 @@
 #include <linux/spinlock.h>
 #include <xen/xen.h>
 
+static bool force_used_validation = false;
+module_param(force_used_validation, bool, 0444);
+
 #ifdef DEBUG
 /* For development, we want to crash whenever the ring is screwed. */
 #define BAD_RING(_vq, fmt, args...)				\
@@ -182,6 +185,9 @@ struct vring_virtqueue {
 		} packed;
 	};
 
+	/* Per-descriptor in buffer length */
+	u32 *buflen;
+
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	bool (*notify)(struct virtqueue *vq);
 
@@ -490,6 +496,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	unsigned int i, n, avail, descs_used, prev, err_idx;
 	int head;
 	bool indirect;
+	u32 buflen = 0;
 
 	START_USE(vq);
 
@@ -571,6 +578,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
 						     indirect);
+			buflen += sg->length;
 		}
 	}
 	/* Last one doesn't continue. */
@@ -610,6 +618,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	else
 		vq->split.desc_state[head].indir_desc = ctx;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[head] = buflen;
+
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
@@ -784,6 +796,11 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", i);
 		return NULL;
 	}
+	if (vq->buflen && unlikely(*len > vq->buflen[i])) {
+		BAD_RING(vq, "used len %d is larger than in buflen %u\n",
+			*len, vq->buflen[i]);
+		return NULL;
+	}
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
@@ -1062,6 +1079,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	unsigned int i, n, err_idx;
 	u16 head, id;
 	dma_addr_t addr;
+	u32 buflen = 0;
 
 	head = vq->packed.next_avail_idx;
 	desc = alloc_indirect_packed(total_sg, gfp);
@@ -1089,6 +1107,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 			desc[i].addr = cpu_to_le64(addr);
 			desc[i].len = cpu_to_le32(sg->length);
 			i++;
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1142,6 +1162,10 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	vq->packed.desc_state[id].indir_desc = desc;
 	vq->packed.desc_state[id].last = id;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[id] = buflen;
+
 	vq->num_added += 1;
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -1176,6 +1200,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	unsigned int i, n, c, descs_used, err_idx;
 	__le16 head_flags, flags;
 	u16 head, id, prev, curr, avail_used_flags;
+	u32 buflen = 0;
 
 	START_USE(vq);
 
@@ -1250,6 +1275,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 					1 << VRING_PACKED_DESC_F_AVAIL |
 					1 << VRING_PACKED_DESC_F_USED;
 			}
+			if (n >= out_sgs)
+				buflen += sg->length;
 		}
 	}
 
@@ -1269,6 +1296,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	vq->packed.desc_state[id].indir_desc = ctx;
 	vq->packed.desc_state[id].last = prev;
 
+	/* Store in buffer length if necessary */
+	if (vq->buflen)
+		vq->buflen[id] = buflen;
+
 	/*
 	 * A driver MUST NOT make the first descriptor in the list
 	 * available before all subsequent descriptors comprising
@@ -1455,6 +1486,11 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 		BAD_RING(vq, "id %u is not a head!\n", id);
 		return NULL;
 	}
+	if (vq->buflen && unlikely(*len > vq->buflen[id])) {
+		BAD_RING(vq, "used len %d is larger than in buflen %u\n",
+			*len, vq->buflen[id]);
+		return NULL;
+	}
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
@@ -1660,6 +1696,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	struct vring_virtqueue *vq;
 	struct vring_packed_desc *ring;
 	struct vring_packed_desc_event *driver, *device;
+	struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
 	dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
 	size_t ring_size_in_bytes, event_size_in_bytes;
 
@@ -1749,6 +1786,13 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	if (!vq->packed.desc_extra)
 		goto err_desc_extra;
 
+	if (!drv->suppress_used_validation || force_used_validation) {
+		vq->buflen = kmalloc_array(num, sizeof(*vq->buflen),
+					   GFP_KERNEL);
+		if (!vq->buflen)
+			goto err_buflen;
+	}
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
 		vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
@@ -1761,6 +1805,8 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	spin_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
+err_buflen:
+	kfree(vq->packed.desc_extra);
 err_desc_extra:
 	kfree(vq->packed.desc_state);
 err_desc_state:
@@ -2168,6 +2214,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
+	struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
 	struct vring_virtqueue *vq;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
@@ -2227,6 +2274,13 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	if (!vq->split.desc_extra)
 		goto err_extra;
 
+	if (!drv->suppress_used_validation || force_used_validation) {
+		vq->buflen = kmalloc_array(vring.num, sizeof(*vq->buflen),
+					   GFP_KERNEL);
+		if (!vq->buflen)
+			goto err_buflen;
+	}
+
 	/* Put everything in free lists. */
 	vq->free_head = 0;
 	memset(vq->split.desc_state, 0, vring.num *
@@ -2237,6 +2291,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	spin_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
+err_buflen:
+	kfree(vq->split.desc_extra);
 err_extra:
 	kfree(vq->split.desc_state);
 err_state:
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..44d0e09da2d9 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -152,6 +152,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
  * @feature_table_size: number of entries in the feature table array.
  * @feature_table_legacy: same as feature_table but when working in legacy mode.
  * @feature_table_size_legacy: number of entries in feature table legacy array.
+ * @suppress_used_validation: set to not have core validate used length
  * @probe: the function to call when a device is found.  Returns 0 or -errno.
  * @scan: optional function to call after successful probe; intended
  *    for virtio-scsi to invoke a scan.
@@ -168,6 +169,7 @@ struct virtio_driver {
 	unsigned int feature_table_size;
 	const unsigned int *feature_table_legacy;
 	unsigned int feature_table_size_legacy;
+	bool suppress_used_validation;
 	int (*validate)(struct virtio_device *dev);
 	int (*probe)(struct virtio_device *dev);
 	void (*scan)(struct virtio_device *dev);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 08/10] virtio-net: don't let virtio core to validate used length
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

For RX virtuqueue, the used length is validated in all the three paths
(big, small and mergeable). For control vq, we never tries to use used
length. So this patch forbids the core to validate the used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 79bd2585ec6b..6a8b52aed05f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3384,6 +3384,7 @@ static struct virtio_driver virtio_net_driver = {
 	.feature_table_size = ARRAY_SIZE(features),
 	.feature_table_legacy = features_legacy,
 	.feature_table_size_legacy = ARRAY_SIZE(features_legacy),
+	.suppress_used_validation = true,
 	.driver.name =	KBUILD_MODNAME,
 	.driver.owner =	THIS_MODULE,
 	.id_table =	id_table,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 08/10] virtio-net: don't let virtio core to validate used length
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

For RX virtuqueue, the used length is validated in all the three paths
(big, small and mergeable). For control vq, we never tries to use used
length. So this patch forbids the core to validate the used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 79bd2585ec6b..6a8b52aed05f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3384,6 +3384,7 @@ static struct virtio_driver virtio_net_driver = {
 	.feature_table_size = ARRAY_SIZE(features),
 	.feature_table_legacy = features_legacy,
 	.feature_table_size_legacy = ARRAY_SIZE(features_legacy),
+	.suppress_used_validation = true,
 	.driver.name =	KBUILD_MODNAME,
 	.driver.owner =	THIS_MODULE,
 	.id_table =	id_table,
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 09/10] virtio-blk: don't let virtio core to validate used length
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

We never tries to use used length, so the patch prevents the virtio
core from validating used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/block/virtio_blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 10bc0879e618..6f19dcb7c3aa 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1010,6 +1010,7 @@ static struct virtio_driver virtio_blk = {
 	.feature_table_size		= ARRAY_SIZE(features),
 	.feature_table_legacy		= features_legacy,
 	.feature_table_size_legacy	= ARRAY_SIZE(features_legacy),
+	.suppress_used_validation	= true,
 	.driver.name			= KBUILD_MODNAME,
 	.driver.owner			= THIS_MODULE,
 	.id_table			= id_table,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 09/10] virtio-blk: don't let virtio core to validate used length
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

We never tries to use used length, so the patch prevents the virtio
core from validating used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/block/virtio_blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 10bc0879e618..6f19dcb7c3aa 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1010,6 +1010,7 @@ static struct virtio_driver virtio_blk = {
 	.feature_table_size		= ARRAY_SIZE(features),
 	.feature_table_legacy		= features_legacy,
 	.feature_table_size_legacy	= ARRAY_SIZE(features_legacy),
+	.suppress_used_validation	= true,
 	.driver.name			= KBUILD_MODNAME,
 	.driver.owner			= THIS_MODULE,
 	.id_table			= id_table,
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 10/10] virtio-scsi: don't let virtio core to validate used buffer length
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-19  7:01   ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

We never tries to use used length, so the patch prevents the virtio
core from validating used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index c25ce8f0e0af..be7870a092cf 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -977,6 +977,7 @@ static unsigned int features[] = {
 static struct virtio_driver virtio_scsi_driver = {
 	.feature_table = features,
 	.feature_table_size = ARRAY_SIZE(features),
+	.suppress_used_validation = true,
 	.driver.name = KBUILD_MODNAME,
 	.driver.owner = THIS_MODULE,
 	.id_table = id_table,
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V3 10/10] virtio-scsi: don't let virtio core to validate used buffer length
@ 2021-10-19  7:01   ` Jason Wang
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2021-10-19  7:01 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

We never tries to use used length, so the patch prevents the virtio
core from validating used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index c25ce8f0e0af..be7870a092cf 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -977,6 +977,7 @@ static unsigned int features[] = {
 static struct virtio_driver virtio_scsi_driver = {
 	.feature_table = features,
 	.feature_table_size = ARRAY_SIZE(features),
+	.suppress_used_validation = true,
 	.driver.name = KBUILD_MODNAME,
 	.driver.owner = THIS_MODULE,
 	.id_table = id_table,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
  2021-10-19  7:01   ` Jason Wang
@ 2021-10-20  7:18     ` Stefano Garzarella
  -1 siblings, 0 replies; 48+ messages in thread
From: Stefano Garzarella @ 2021-10-20  7:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Paolo Bonzini, Stefan Hajnoczi

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
>If an untrusted device neogitates BLK_F_MQ but advertises a zero

s/neogitates/negotiates

>num_queues, the driver may end up trying to allocating zero size
>buffers where ZERO_SIZE_PTR is returned which may pass the checking
>against the NULL. This will lead unexpected results.
>
>Fixing this by failing the probe in this case.
>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> drivers/block/virtio_blk.c | 4 ++++
> 1 file changed, 4 insertions(+)

Should we CC stable?

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
@ 2021-10-20  7:18     ` Stefano Garzarella
  0 siblings, 0 replies; 48+ messages in thread
From: Stefano Garzarella @ 2021-10-20  7:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, mst, Stefan Hajnoczi, Paolo Bonzini

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
>If an untrusted device neogitates BLK_F_MQ but advertises a zero

s/neogitates/negotiates

>num_queues, the driver may end up trying to allocating zero size
>buffers where ZERO_SIZE_PTR is returned which may pass the checking
>against the NULL. This will lead unexpected results.
>
>Fixing this by failing the probe in this case.
>
>Cc: Paolo Bonzini <pbonzini@redhat.com>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> drivers/block/virtio_blk.c | 4 ++++
> 1 file changed, 4 insertions(+)

Should we CC stable?

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
  2021-10-20  7:18     ` Stefano Garzarella
@ 2021-10-20  7:37       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2021-10-20  7:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jason Wang, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Paolo Bonzini, Stefan Hajnoczi

On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:
> On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
> 
> s/neogitates/negotiates
> 
> > num_queues, the driver may end up trying to allocating zero size
> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
> > against the NULL. This will lead unexpected results.
> > 
> > Fixing this by failing the probe in this case.
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Cc: Stefano Garzarella <sgarzare@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > drivers/block/virtio_blk.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> 
> Should we CC stable?
> 
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

No IMO - I don't think we can reasonably expect stable to become
protected against attacks on encrypted guests. That's
a new feature, not a bugfix.

-- 
MST


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
@ 2021-10-20  7:37       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2021-10-20  7:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:
> On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
> 
> s/neogitates/negotiates
> 
> > num_queues, the driver may end up trying to allocating zero size
> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
> > against the NULL. This will lead unexpected results.
> > 
> > Fixing this by failing the probe in this case.
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Cc: Stefano Garzarella <sgarzare@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > drivers/block/virtio_blk.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> 
> Should we CC stable?
> 
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

No IMO - I don't think we can reasonably expect stable to become
protected against attacks on encrypted guests. That's
a new feature, not a bugfix.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
  2021-10-19  7:01   ` Jason Wang
@ 2021-10-20  7:55     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2021-10-20  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Paolo Bonzini, Stefano Garzarella

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero
> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
> 
> Fixing this by failing the probe in this case.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/block/virtio_blk.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
@ 2021-10-20  7:55     ` Stefan Hajnoczi
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Hajnoczi @ 2021-10-20  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, mst, Paolo Bonzini


[-- Attachment #1.1: Type: text/plain, Size: 706 bytes --]

On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero
> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
> 
> Fixing this by failing the probe in this case.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/block/virtio_blk.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
  2021-10-20  7:37       ` Michael S. Tsirkin
@ 2021-10-20  8:44         ` Stefano Garzarella
  -1 siblings, 0 replies; 48+ messages in thread
From: Stefano Garzarella @ 2021-10-20  8:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Paolo Bonzini, Stefan Hajnoczi

On Wed, Oct 20, 2021 at 03:37:31AM -0400, Michael S. Tsirkin wrote:
>On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:
>> On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
>> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
>>
>> s/neogitates/negotiates
>>
>> > num_queues, the driver may end up trying to allocating zero size
>> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
>> > against the NULL. This will lead unexpected results.
>> >
>> > Fixing this by failing the probe in this case.
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> > Cc: Stefano Garzarella <sgarzare@redhat.com>
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> > drivers/block/virtio_blk.c | 4 ++++
>> > 1 file changed, 4 insertions(+)
>>
>> Should we CC stable?
>>
>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>No IMO - I don't think we can reasonably expect stable to become
>protected against attacks on encrypted guests. That's
>a new feature, not a bugfix.

Yep, make sense.
I had only seen the single patch, not the entire series, and it seemed 
like a fix.

Viewed as a whole, it makes sense to consider it a new feature to 
improve audits in the guest.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 01/10] virtio-blk: validate num_queues during probe
@ 2021-10-20  8:44         ` Stefano Garzarella
  0 siblings, 0 replies; 48+ messages in thread
From: Stefano Garzarella @ 2021-10-20  8:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: david.kaplan, konrad.wilk, f.hetzelt, linux-kernel,
	virtualization, Stefan Hajnoczi, Paolo Bonzini

On Wed, Oct 20, 2021 at 03:37:31AM -0400, Michael S. Tsirkin wrote:
>On Wed, Oct 20, 2021 at 09:18:17AM +0200, Stefano Garzarella wrote:
>> On Tue, Oct 19, 2021 at 03:01:43PM +0800, Jason Wang wrote:
>> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
>>
>> s/neogitates/negotiates
>>
>> > num_queues, the driver may end up trying to allocating zero size
>> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
>> > against the NULL. This will lead unexpected results.
>> >
>> > Fixing this by failing the probe in this case.
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> > Cc: Stefano Garzarella <sgarzare@redhat.com>
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> > drivers/block/virtio_blk.c | 4 ++++
>> > 1 file changed, 4 insertions(+)
>>
>> Should we CC stable?
>>
>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>No IMO - I don't think we can reasonably expect stable to become
>protected against attacks on encrypted guests. That's
>a new feature, not a bugfix.

Yep, make sense.
I had only seen the single patch, not the entire series, and it seemed 
like a fix.

Viewed as a whole, it makes sense to consider it a new feature to 
improve audits in the guest.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 00/10] More virtio hardening
  2021-10-19  7:01 ` Jason Wang
@ 2021-10-23 21:31   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 21:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: konrad.wilk, f.hetzelt, linux-kernel, david.kaplan, virtualization

On Tue, Oct 19, 2021 at 03:01:42PM +0800, Jason Wang wrote:
> Hi All:
> 
> This series treis to do more hardening for virito.

OK. So patches 7-10 caused a crash in virtio-blk.
I'm close to sure it's patch 10 actually, and forcing
validation seems to fix the crash.
I also suspect it has something to do with the fact that
blk submits requests in the middle of the probe function.

picked up 1-6 for now.


> patch 1 validates the num_queues for virio-blk device.
> patch 2 validates max_nr_ports for virito-console device.
> patch 3-5 harden virtio-pci interrupts to make sure no exepcted
> interrupt handler is tiggered. If this makes sense we can do similar
> things in other transport drivers.
> patch 6-7 validate used ring length.
> patch 8-10 let the driver to validate the used length instead of the
> virtio core when possible.
> 
> Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> 
> Please review.
> 
> Changes since V2:
> - don't validate max_nr_ports in .validate()
> - fail the probe instead of trying to work when blk/console returns
>   invalid number of queues/ports
> - use READ_ONCE() instead of smp_load_acquire() for checking
>   intx_soft_enabled
> - use "suppress_used_validation" instead of "validate_used"
> 
> Changes since V1:
> - fix and document the memory ordering around the intx_soft_enabled
>   when enabling and disabling INTX interrupt
> - for the driver that can validate the used length, virtio core
>   won't even try to allocate auxilary arrays and validate the used length
> - tweak the commit log
> - fix typos
> 
> Jason Wang (10):
>   virtio-blk: validate num_queues during probe
>   virtio_console: validate max_nr_ports before trying to use it
>   virtio_config: introduce a new .enable_cbs method
>   virtio_pci: harden MSI-X interrupts
>   virtio-pci: harden INTX interrupts
>   virtio_ring: fix typos in vring_desc_extra
>   virtio_ring: validate used buffer length
>   virtio-net: don't let virtio core to validate used length
>   virtio-blk: don't let virtio core to validate used length
>   virtio-scsi: don't let virtio core to validate used buffer length
> 
>  drivers/block/virtio_blk.c         |  5 +++
>  drivers/char/virtio_console.c      |  9 +++++
>  drivers/net/virtio_net.c           |  1 +
>  drivers/scsi/virtio_scsi.c         |  1 +
>  drivers/virtio/virtio_pci_common.c | 48 ++++++++++++++++++++----
>  drivers/virtio/virtio_pci_common.h |  7 +++-
>  drivers/virtio/virtio_pci_legacy.c |  5 ++-
>  drivers/virtio/virtio_pci_modern.c |  6 ++-
>  drivers/virtio/virtio_ring.c       | 60 +++++++++++++++++++++++++++++-
>  include/linux/virtio.h             |  2 +
>  include/linux/virtio_config.h      |  6 +++
>  11 files changed, 135 insertions(+), 15 deletions(-)
> 
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 00/10] More virtio hardening
@ 2021-10-23 21:31   ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 21:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, f.hetzelt, david.kaplan, konrad.wilk

On Tue, Oct 19, 2021 at 03:01:42PM +0800, Jason Wang wrote:
> Hi All:
> 
> This series treis to do more hardening for virito.

OK. So patches 7-10 caused a crash in virtio-blk.
I'm close to sure it's patch 10 actually, and forcing
validation seems to fix the crash.
I also suspect it has something to do with the fact that
blk submits requests in the middle of the probe function.

picked up 1-6 for now.


> patch 1 validates the num_queues for virio-blk device.
> patch 2 validates max_nr_ports for virito-console device.
> patch 3-5 harden virtio-pci interrupts to make sure no exepcted
> interrupt handler is tiggered. If this makes sense we can do similar
> things in other transport drivers.
> patch 6-7 validate used ring length.
> patch 8-10 let the driver to validate the used length instead of the
> virtio core when possible.
> 
> Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> 
> Please review.
> 
> Changes since V2:
> - don't validate max_nr_ports in .validate()
> - fail the probe instead of trying to work when blk/console returns
>   invalid number of queues/ports
> - use READ_ONCE() instead of smp_load_acquire() for checking
>   intx_soft_enabled
> - use "suppress_used_validation" instead of "validate_used"
> 
> Changes since V1:
> - fix and document the memory ordering around the intx_soft_enabled
>   when enabling and disabling INTX interrupt
> - for the driver that can validate the used length, virtio core
>   won't even try to allocate auxilary arrays and validate the used length
> - tweak the commit log
> - fix typos
> 
> Jason Wang (10):
>   virtio-blk: validate num_queues during probe
>   virtio_console: validate max_nr_ports before trying to use it
>   virtio_config: introduce a new .enable_cbs method
>   virtio_pci: harden MSI-X interrupts
>   virtio-pci: harden INTX interrupts
>   virtio_ring: fix typos in vring_desc_extra
>   virtio_ring: validate used buffer length
>   virtio-net: don't let virtio core to validate used length
>   virtio-blk: don't let virtio core to validate used length
>   virtio-scsi: don't let virtio core to validate used buffer length
> 
>  drivers/block/virtio_blk.c         |  5 +++
>  drivers/char/virtio_console.c      |  9 +++++
>  drivers/net/virtio_net.c           |  1 +
>  drivers/scsi/virtio_scsi.c         |  1 +
>  drivers/virtio/virtio_pci_common.c | 48 ++++++++++++++++++++----
>  drivers/virtio/virtio_pci_common.h |  7 +++-
>  drivers/virtio/virtio_pci_legacy.c |  5 ++-
>  drivers/virtio/virtio_pci_modern.c |  6 ++-
>  drivers/virtio/virtio_ring.c       | 60 +++++++++++++++++++++++++++++-
>  include/linux/virtio.h             |  2 +
>  include/linux/virtio_config.h      |  6 +++
>  11 files changed, 135 insertions(+), 15 deletions(-)
> 
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2021-10-19  7:01   ` Jason Wang
  (?)
@ 2022-03-08 15:19   ` Marc Zyngier
  2022-03-08 16:35       ` Michael S. Tsirkin
  -1 siblings, 1 reply; 48+ messages in thread
From: Marc Zyngier @ 2022-03-08 15:19 UTC (permalink / raw)
  To: Thomas Gleixner, Jason Wang
  Cc: mst, virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Peter Zijlstra, Paul E . McKenney, keirf

On Tue, 19 Oct 2021 08:01:46 +0100,
Jason Wang <jasowang@redhat.com> wrote:
> 
> We used to synchronize pending MSI-X irq handlers via
> synchronize_irq(), this may not work for the untrusted device which
> may keep sending interrupts after reset which may lead unexpected
> results. Similarly, we should not enable MSI-X interrupt until the
> device is ready. So this patch fixes those two issues by:
> 
> 1) switching to use disable_irq() to prevent the virtio interrupt
>    handlers to be called after the device is reset.
> 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> 
> This can make sure the virtio interrupt handler won't be called before
> virtio_device_ready() and after reset.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
>  drivers/virtio/virtio_pci_common.h |  6 ++++--
>  drivers/virtio/virtio_pci_legacy.c |  5 +++--
>  drivers/virtio/virtio_pci_modern.c |  6 ++++--
>  4 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index b35bb2d57f62..8d8f83aca721 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
>  		 "Force legacy mode for transitional virtio 1 devices");
>  #endif
>  
> -/* wait for pending irq handlers */
> -void vp_synchronize_vectors(struct virtio_device *vdev)
> +/* disable irq handlers */
> +void vp_disable_cbs(struct virtio_device *vdev)
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
> @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>  		synchronize_irq(vp_dev->pci_dev->irq);
>  
>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
> +/* enable irq handlers */
> +void vp_enable_cbs(struct virtio_device *vdev)
> +{
> +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +	int i;
> +
> +	if (vp_dev->intx_enabled)
> +		return;
> +
> +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));

This results in a splat at boot time if you set maxcpus=<whatever>,
see below. Enabling interrupts that are affinity managed is *bad*. You
don't even know whether the CPU which is supposed to handle this is
online or not.

The core kernel notices it, shouts and keeps the interrupt disabled,
but this should be fixed. The whole point of managed interrupts is to
let them be dealt with outside of the drivers, and tied into the CPUs
being brought up and down. If virtio needs (for one reason or another)
to manage interrupts on its own, so be it. But this patch isn't the
way to do it, I'm afraid.

	M.

[    3.434849] ------------[ cut here ]------------
[    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210 irq_startup+0x10
e/0x120
[    3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover(E) vir
tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci(E+) libah
ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) virtio(E) 
drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common(E) crc32
_pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus(E) scsi_
common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
[    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G            E     5.
17.0-rc7-00020-gea4424be1688 #63
[    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02
/06/2015
[    3.434897] RIP: 0010:irq_startup+0x10e/0x120
[    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48 89 ef e8
 94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f> 0b eb ac
 0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
[    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX: 0000000000000040
[    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI: ffff8bcf8ce34648
[    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09: ffffffffa74cb828
[    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15: 0000000000000200
[    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000) knlGS:00000
00000000000
[    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4: 0000000000170ef0
[    3.434928] Call Trace:
[    3.434936]  <TASK>
[    3.434938]  enable_irq+0x48/0x90
[    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
[    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
[    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
[    3.434959]  really_probe+0x1f5/0x3d0
[    3.434966]  __driver_probe_device+0xfe/0x180
[    3.434969]  driver_probe_device+0x1e/0x90
[    3.434971]  __driver_attach+0xc0/0x1c0
[    3.434974]  ? __device_attach_driver+0xe0/0xe0
[    3.434976]  ? __device_attach_driver+0xe0/0xe0
[    3.434978]  bus_for_each_dev+0x78/0xc0
[    3.434982]  bus_add_driver+0x149/0x1e0
[    3.434985]  driver_register+0x8b/0xe0
[    3.434987]  ? 0xffffffffc01aa000
[    3.434990]  init+0x52/0x1000 [virtio_blk]
[    3.434994]  do_one_initcall+0x44/0x200
[    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
[    3.435006]  do_init_module+0x4c/0x260
[    3.435013]  __do_sys_finit_module+0xb4/0x120
[    3.435018]  do_syscall_64+0x3b/0xc0
[    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    3.435037] RIP: 0033:0x7f5b31c589b9
[    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8
 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
[    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX: 00000000000
00139
[    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX: 00007f5b31c589b9
[    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI: 0000000000000005
[    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ca47ba9030
[    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f5b31de3e2d
[    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15: 000055ca47ba8700
[    3.435053]  </TASK>
[    3.435059] ---[ end trace 0000000000000000 ]---
[    3.440593]  vda: vda1 vda2 vda3
[    3.445283] scsi host0: Virtio SCSI HBA
[    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ
: 0 ANSI: 5

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-08 15:19   ` Marc Zyngier
@ 2022-03-08 16:35       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 16:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Jason Wang, virtualization, linux-kernel,
	f.hetzelt, david.kaplan, konrad.wilk, Peter Zijlstra,
	Paul E . McKenney, keirf

On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> On Tue, 19 Oct 2021 08:01:46 +0100,
> Jason Wang <jasowang@redhat.com> wrote:
> > 
> > We used to synchronize pending MSI-X irq handlers via
> > synchronize_irq(), this may not work for the untrusted device which
> > may keep sending interrupts after reset which may lead unexpected
> > results. Similarly, we should not enable MSI-X interrupt until the
> > device is ready. So this patch fixes those two issues by:
> > 
> > 1) switching to use disable_irq() to prevent the virtio interrupt
> >    handlers to be called after the device is reset.
> > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > 
> > This can make sure the virtio interrupt handler won't be called before
> > virtio_device_ready() and after reset.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> >  4 files changed, 32 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index b35bb2d57f62..8d8f83aca721 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> >  		 "Force legacy mode for transitional virtio 1 devices");
> >  #endif
> >  
> > -/* wait for pending irq handlers */
> > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > +/* disable irq handlers */
> > +void vp_disable_cbs(struct virtio_device *vdev)
> >  {
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >  		synchronize_irq(vp_dev->pci_dev->irq);
> >  
> >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +}
> > +
> > +/* enable irq handlers */
> > +void vp_enable_cbs(struct virtio_device *vdev)
> > +{
> > +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > +	int i;
> > +
> > +	if (vp_dev->intx_enabled)
> > +		return;
> > +
> > +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> 
> This results in a splat at boot time if you set maxcpus=<whatever>,
> see below. Enabling interrupts that are affinity managed is *bad*. You
> don't even know whether the CPU which is supposed to handle this is
> online or not.
> 
> The core kernel notices it, shouts and keeps the interrupt disabled,
> but this should be fixed. The whole point of managed interrupts is to
> let them be dealt with outside of the drivers, and tied into the CPUs
> being brought up and down. If virtio needs (for one reason or another)
> to manage interrupts on its own, so be it. But this patch isn't the
> way to do it, I'm afraid.
> 
> 	M.

Thanks for reporting this!

What virtio is doing here isn't unique however.

If device driver is to be hardened against device sending interrupts
while driver is initializing/cleaning up, it needs kernel to provide
capability to disable these.

If we then declare that that is impossible for managed interrupts
then that will mean most devices can't use managed interrupts
because ideally we'd have all drivers hardened.

Thomas I think you were the one who suggested enabling/disabling
interrupts originally - thoughts?

Feedback appreciated.



> [    3.434849] ------------[ cut here ]------------
> [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210 irq_startup+0x10
> e/0x120
> [    3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover(E) vir
> tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci(E+) libah
> ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) virtio(E) 
> drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common(E) crc32
> _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus(E) scsi_
> common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
> [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G            E     5.
> 17.0-rc7-00020-gea4424be1688 #63
> [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02
> /06/2015
> [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
> [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48 89 ef e8
>  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f> 0b eb ac
>  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
> [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX: 0000000000000040
> [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI: ffff8bcf8ce34648
> [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09: ffffffffa74cb828
> [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15: 0000000000000200
> [    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000) knlGS:00000
> 00000000000
> [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4: 0000000000170ef0
> [    3.434928] Call Trace:
> [    3.434936]  <TASK>
> [    3.434938]  enable_irq+0x48/0x90
> [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
> [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
> [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
> [    3.434959]  really_probe+0x1f5/0x3d0
> [    3.434966]  __driver_probe_device+0xfe/0x180
> [    3.434969]  driver_probe_device+0x1e/0x90
> [    3.434971]  __driver_attach+0xc0/0x1c0
> [    3.434974]  ? __device_attach_driver+0xe0/0xe0
> [    3.434976]  ? __device_attach_driver+0xe0/0xe0
> [    3.434978]  bus_for_each_dev+0x78/0xc0
> [    3.434982]  bus_add_driver+0x149/0x1e0
> [    3.434985]  driver_register+0x8b/0xe0
> [    3.434987]  ? 0xffffffffc01aa000
> [    3.434990]  init+0x52/0x1000 [virtio_blk]
> [    3.434994]  do_one_initcall+0x44/0x200
> [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
> [    3.435006]  do_init_module+0x4c/0x260
> [    3.435013]  __do_sys_finit_module+0xb4/0x120
> [    3.435018]  do_syscall_64+0x3b/0xc0
> [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [    3.435037] RIP: 0033:0x7f5b31c589b9
> [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8
>  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
>  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX: 00000000000
> 00139
> [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX: 00007f5b31c589b9
> [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI: 0000000000000005
> [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ca47ba9030
> [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f5b31de3e2d
> [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15: 000055ca47ba8700
> [    3.435053]  </TASK>
> [    3.435059] ---[ end trace 0000000000000000 ]---
> [    3.440593]  vda: vda1 vda2 vda3
> [    3.445283] scsi host0: Virtio SCSI HBA
> [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ
> : 0 ANSI: 5
> 
> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
@ 2022-03-08 16:35       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 16:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: keirf, Paul E . McKenney, david.kaplan, konrad.wilk,
	Peter Zijlstra, f.hetzelt, linux-kernel, virtualization,
	Thomas Gleixner

On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> On Tue, 19 Oct 2021 08:01:46 +0100,
> Jason Wang <jasowang@redhat.com> wrote:
> > 
> > We used to synchronize pending MSI-X irq handlers via
> > synchronize_irq(), this may not work for the untrusted device which
> > may keep sending interrupts after reset which may lead unexpected
> > results. Similarly, we should not enable MSI-X interrupt until the
> > device is ready. So this patch fixes those two issues by:
> > 
> > 1) switching to use disable_irq() to prevent the virtio interrupt
> >    handlers to be called after the device is reset.
> > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > 
> > This can make sure the virtio interrupt handler won't be called before
> > virtio_device_ready() and after reset.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> >  4 files changed, 32 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index b35bb2d57f62..8d8f83aca721 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> >  		 "Force legacy mode for transitional virtio 1 devices");
> >  #endif
> >  
> > -/* wait for pending irq handlers */
> > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > +/* disable irq handlers */
> > +void vp_disable_cbs(struct virtio_device *vdev)
> >  {
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >  		synchronize_irq(vp_dev->pci_dev->irq);
> >  
> >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > +}
> > +
> > +/* enable irq handlers */
> > +void vp_enable_cbs(struct virtio_device *vdev)
> > +{
> > +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > +	int i;
> > +
> > +	if (vp_dev->intx_enabled)
> > +		return;
> > +
> > +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> 
> This results in a splat at boot time if you set maxcpus=<whatever>,
> see below. Enabling interrupts that are affinity managed is *bad*. You
> don't even know whether the CPU which is supposed to handle this is
> online or not.
> 
> The core kernel notices it, shouts and keeps the interrupt disabled,
> but this should be fixed. The whole point of managed interrupts is to
> let them be dealt with outside of the drivers, and tied into the CPUs
> being brought up and down. If virtio needs (for one reason or another)
> to manage interrupts on its own, so be it. But this patch isn't the
> way to do it, I'm afraid.
> 
> 	M.

Thanks for reporting this!

What virtio is doing here isn't unique however.

If device driver is to be hardened against device sending interrupts
while driver is initializing/cleaning up, it needs kernel to provide
capability to disable these.

If we then declare that that is impossible for managed interrupts
then that will mean most devices can't use managed interrupts
because ideally we'd have all drivers hardened.

Thomas I think you were the one who suggested enabling/disabling
interrupts originally - thoughts?

Feedback appreciated.



> [    3.434849] ------------[ cut here ]------------
> [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210 irq_startup+0x10
> e/0x120
> [    3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover(E) vir
> tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci(E+) libah
> ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) virtio(E) 
> drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common(E) crc32
> _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus(E) scsi_
> common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
> [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G            E     5.
> 17.0-rc7-00020-gea4424be1688 #63
> [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02
> /06/2015
> [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
> [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48 89 ef e8
>  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f> 0b eb ac
>  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
> [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX: 0000000000000040
> [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI: ffff8bcf8ce34648
> [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09: ffffffffa74cb828
> [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15: 0000000000000200
> [    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000) knlGS:00000
> 00000000000
> [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4: 0000000000170ef0
> [    3.434928] Call Trace:
> [    3.434936]  <TASK>
> [    3.434938]  enable_irq+0x48/0x90
> [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
> [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
> [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
> [    3.434959]  really_probe+0x1f5/0x3d0
> [    3.434966]  __driver_probe_device+0xfe/0x180
> [    3.434969]  driver_probe_device+0x1e/0x90
> [    3.434971]  __driver_attach+0xc0/0x1c0
> [    3.434974]  ? __device_attach_driver+0xe0/0xe0
> [    3.434976]  ? __device_attach_driver+0xe0/0xe0
> [    3.434978]  bus_for_each_dev+0x78/0xc0
> [    3.434982]  bus_add_driver+0x149/0x1e0
> [    3.434985]  driver_register+0x8b/0xe0
> [    3.434987]  ? 0xffffffffc01aa000
> [    3.434990]  init+0x52/0x1000 [virtio_blk]
> [    3.434994]  do_one_initcall+0x44/0x200
> [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
> [    3.435006]  do_init_module+0x4c/0x260
> [    3.435013]  __do_sys_finit_module+0xb4/0x120
> [    3.435018]  do_syscall_64+0x3b/0xc0
> [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [    3.435037] RIP: 0033:0x7f5b31c589b9
> [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8
>  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
>  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX: 00000000000
> 00139
> [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX: 00007f5b31c589b9
> [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI: 0000000000000005
> [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ca47ba9030
> [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f5b31de3e2d
> [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15: 000055ca47ba8700
> [    3.435053]  </TASK>
> [    3.435059] ---[ end trace 0000000000000000 ]---
> [    3.440593]  vda: vda1 vda2 vda3
> [    3.445283] scsi host0: Virtio SCSI HBA
> [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ
> : 0 ANSI: 5
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-08 16:35       ` Michael S. Tsirkin
  (?)
@ 2022-03-09  3:41       ` Jason Wang
  2022-03-09  7:04           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 48+ messages in thread
From: Jason Wang @ 2022-03-09  3:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: keirf, Paul E . McKenney, kaplan, david, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Marc Zyngier, Hetzelt, Felicitas, linux-kernel,
	virtualization, Thomas Gleixner


[-- Attachment #1.1: Type: text/plain, Size: 8366 bytes --]

On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > On Tue, 19 Oct 2021 08:01:46 +0100,
> > Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > We used to synchronize pending MSI-X irq handlers via
> > > synchronize_irq(), this may not work for the untrusted device which
> > > may keep sending interrupts after reset which may lead unexpected
> > > results. Similarly, we should not enable MSI-X interrupt until the
> > > device is ready. So this patch fixes those two issues by:
> > >
> > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > >    handlers to be called after the device is reset.
> > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > >
> > > This can make sure the virtio interrupt handler won't be called before
> > > virtio_device_ready() and after reset.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c
> b/drivers/virtio/virtio_pci_common.c
> > > index b35bb2d57f62..8d8f83aca721 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > >              "Force legacy mode for transitional virtio 1 devices");
> > >  #endif
> > >
> > > -/* wait for pending irq handlers */
> > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > +/* disable irq handlers */
> > > +void vp_disable_cbs(struct virtio_device *vdev)
> > >  {
> > >     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >     int i;
> > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device
> *vdev)
> > >             synchronize_irq(vp_dev->pci_dev->irq);
> > >
> > >     for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > -           synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +           disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +}
> > > +
> > > +/* enable irq handlers */
> > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > +{
> > > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > +   int i;
> > > +
> > > +   if (vp_dev->intx_enabled)
> > > +           return;
> > > +
> > > +   for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > +           enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >
> > This results in a splat at boot time if you set maxcpus=<whatever>,
> > see below. Enabling interrupts that are affinity managed is *bad*. You
> > don't even know whether the CPU which is supposed to handle this is
> > online or not.
> >
> > The core kernel notices it, shouts and keeps the interrupt disabled,
> > but this should be fixed. The whole point of managed interrupts is to
> > let them be dealt with outside of the drivers, and tied into the CPUs
> > being brought up and down. If virtio needs (for one reason or another)
> > to manage interrupts on its own, so be it. But this patch isn't the
> > way to do it, I'm afraid.
> >
> >       M.
>
> Thanks for reporting this!
>
> What virtio is doing here isn't unique however.
>
> If device driver is to be hardened against device sending interrupts
> while driver is initializing/cleaning up, it needs kernel to provide
> capability to disable these.
>
> If we then declare that that is impossible for managed interrupts
> then that will mean most devices can't use managed interrupts
> because ideally we'd have all drivers hardened.
>

Or that probably means we need to use a driver specific mechanism as what
we did for INTX instead of using NO_AUTO_EN.

Thanks


>
> Thomas I think you were the one who suggested enabling/disabling
> interrupts originally - thoughts?
>
> Feedback appreciated.
>
>
>
> > [    3.434849] ------------[ cut here ]------------
> > [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210
> irq_startup+0x10
> > e/0x120
> > [    3.434861] Modules linked in: virtio_net(E+) net_failover(E)
> failover(E) vir
> > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E)
> ahci(E+) libah
> > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
> virtio(E)
> > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E)
> crct10dif_common(E) crc32
> > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E)
> i2c_smbus(E) scsi_
> > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
> > [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G
> E     5.
> > 17.0-rc7-00020-gea4424be1688 #63
> > [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 0.0.0 02
> > /06/2015
> > [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
> > [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48
> 89 ef e8
> >  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f>
> 0b eb ac
> >  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> > [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
> > [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX:
> 0000000000000040
> > [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI:
> ffff8bcf8ce34648
> > [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09:
> ffffffffa74cb828
> > [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000001
> > [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15:
> 0000000000000200
> > [    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000)
> knlGS:00000
> > 00000000000
> > [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4:
> 0000000000170ef0
> > [    3.434928] Call Trace:
> > [    3.434936]  <TASK>
> > [    3.434938]  enable_irq+0x48/0x90
> > [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
> > [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
> > [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
> > [    3.434959]  really_probe+0x1f5/0x3d0
> > [    3.434966]  __driver_probe_device+0xfe/0x180
> > [    3.434969]  driver_probe_device+0x1e/0x90
> > [    3.434971]  __driver_attach+0xc0/0x1c0
> > [    3.434974]  ? __device_attach_driver+0xe0/0xe0
> > [    3.434976]  ? __device_attach_driver+0xe0/0xe0
> > [    3.434978]  bus_for_each_dev+0x78/0xc0
> > [    3.434982]  bus_add_driver+0x149/0x1e0
> > [    3.434985]  driver_register+0x8b/0xe0
> > [    3.434987]  ? 0xffffffffc01aa000
> > [    3.434990]  init+0x52/0x1000 [virtio_blk]
> > [    3.434994]  do_one_initcall+0x44/0x200
> > [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
> > [    3.435006]  do_init_module+0x4c/0x260
> > [    3.435013]  __do_sys_finit_module+0xb4/0x120
> > [    3.435018]  do_syscall_64+0x3b/0xc0
> > [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [    3.435037] RIP: 0033:0x7f5b31c589b9
> > [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> 48 89 f8
> >  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> 3d 01 f0
> >  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> > [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX:
> 00000000000
> > 00139
> > [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX:
> 00007f5b31c589b9
> > [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI:
> 0000000000000005
> > [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09:
> 000055ca47ba9030
> > [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12:
> 00007f5b31de3e2d
> > [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15:
> 000055ca47ba8700
> > [    3.435053]  </TASK>
> > [    3.435059] ---[ end trace 0000000000000000 ]---
> > [    3.440593]  vda: vda1 vda2 vda3
> > [    3.445283] scsi host0: Virtio SCSI HBA
> > [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM
> 2.5+ PQ
> > : 0 ANSI: 5
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
>

[-- Attachment #1.2: Type: text/html, Size: 10596 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-09  3:41       ` Jason Wang
@ 2022-03-09  7:04           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09  7:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: Marc Zyngier, Thomas Gleixner, virtualization, linux-kernel,
	Hetzelt, Felicitas, kaplan, david, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Paul E . McKenney, keirf

On Wed, Mar 09, 2022 at 11:41:01AM +0800, Jason Wang wrote:
> 
> 
> On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
>     > On Tue, 19 Oct 2021 08:01:46 +0100,
>     > Jason Wang <jasowang@redhat.com> wrote:
>     > >
>     > > We used to synchronize pending MSI-X irq handlers via
>     > > synchronize_irq(), this may not work for the untrusted device which
>     > > may keep sending interrupts after reset which may lead unexpected
>     > > results. Similarly, we should not enable MSI-X interrupt until the
>     > > device is ready. So this patch fixes those two issues by:
>     > >
>     > > 1) switching to use disable_irq() to prevent the virtio interrupt
>     > >    handlers to be called after the device is reset.
>     > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
>     > >
>     > > This can make sure the virtio interrupt handler won't be called before
>     > > virtio_device_ready() and after reset.
>     > >
>     > > Cc: Thomas Gleixner <tglx@linutronix.de>
>     > > Cc: Peter Zijlstra <peterz@infradead.org>
>     > > Cc: Paul E. McKenney <paulmck@kernel.org>
>     > > Signed-off-by: Jason Wang <jasowang@redhat.com>
>     > > ---
>     > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
>     > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
>     > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
>     > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
>     > >  4 files changed, 32 insertions(+), 12 deletions(-)
>     > >
>     > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/
>     virtio_pci_common.c
>     > > index b35bb2d57f62..8d8f83aca721 100644
>     > > --- a/drivers/virtio/virtio_pci_common.c
>     > > +++ b/drivers/virtio/virtio_pci_common.c
>     > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
>     > >              "Force legacy mode for transitional virtio 1 devices");
>     > >  #endif
>     > > 
>     > > -/* wait for pending irq handlers */
>     > > -void vp_synchronize_vectors(struct virtio_device *vdev)
>     > > +/* disable irq handlers */
>     > > +void vp_disable_cbs(struct virtio_device *vdev)
>     > >  {
>     > >     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>     > >     int i;
>     > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device
>     *vdev)
>     > >             synchronize_irq(vp_dev->pci_dev->irq);
>     > > 
>     > >     for (i = 0; i < vp_dev->msix_vectors; ++i)
>     > > -           synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     > > +           disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     > > +}
>     > > +
>     > > +/* enable irq handlers */
>     > > +void vp_enable_cbs(struct virtio_device *vdev)
>     > > +{
>     > > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>     > > +   int i;
>     > > +
>     > > +   if (vp_dev->intx_enabled)
>     > > +           return;
>     > > +
>     > > +   for (i = 0; i < vp_dev->msix_vectors; ++i)
>     > > +           enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     >
>     > This results in a splat at boot time if you set maxcpus=<whatever>,
>     > see below. Enabling interrupts that are affinity managed is *bad*. You
>     > don't even know whether the CPU which is supposed to handle this is
>     > online or not.
>     >
>     > The core kernel notices it, shouts and keeps the interrupt disabled,
>     > but this should be fixed. The whole point of managed interrupts is to
>     > let them be dealt with outside of the drivers, and tied into the CPUs
>     > being brought up and down. If virtio needs (for one reason or another)
>     > to manage interrupts on its own, so be it. But this patch isn't the
>     > way to do it, I'm afraid.
>     >
>     >       M.
> 
>     Thanks for reporting this!
> 
>     What virtio is doing here isn't unique however.
> 
>     If device driver is to be hardened against device sending interrupts
>     while driver is initializing/cleaning up, it needs kernel to provide
>     capability to disable these.
> 
>     If we then declare that that is impossible for managed interrupts
>     then that will mean most devices can't use managed interrupts
>     because ideally we'd have all drivers hardened.
> 
> 
> Or that probably means we need to use a driver specific mechanism as what we
> did for INTX instead of using NO_AUTO_EN.
> 
> Thanks 

What we did for INTX was pretty expensive, we justified this
by saying INTX handling involves expensive IO reads anyway
so it's in the noise.

Let's see what does Thomas suggest.

> 
> 
>     Thomas I think you were the one who suggested enabling/disabling
>     interrupts originally - thoughts?
> 
>     Feedback appreciated.
> 
> 
> 
>     > [    3.434849] ------------[ cut here ]------------
>     > [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210
>     irq_startup+0x10
>     > e/0x120
>     > [    3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover
>     (E) vir
>     > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci
>     (E+) libah
>     > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
>     virtio(E)
>     > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common
>     (E) crc32
>     > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus
>     (E) scsi_
>     > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
>     > [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G           
>     E     5.
>     > 17.0-rc7-00020-gea4424be1688 #63
>     > [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>     0.0.0 02
>     > /06/2015
>     > [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
>     > [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48
>     89 ef e8
>     >  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f>
>     0b eb ac
>     >  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
>     > [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
>     > [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX:
>     0000000000000040
>     > [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI:
>     ffff8bcf8ce34648
>     > [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09:
>     ffffffffa74cb828
>     > [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12:
>     0000000000000001
>     > [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15:
>     0000000000000200
>     > [    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000)
>     knlGS:00000
>     > 00000000000
>     > [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     > [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4:
>     0000000000170ef0
>     > [    3.434928] Call Trace:
>     > [    3.434936]  <TASK>
>     > [    3.434938]  enable_irq+0x48/0x90
>     > [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
>     > [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
>     > [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
>     > [    3.434959]  really_probe+0x1f5/0x3d0
>     > [    3.434966]  __driver_probe_device+0xfe/0x180
>     > [    3.434969]  driver_probe_device+0x1e/0x90
>     > [    3.434971]  __driver_attach+0xc0/0x1c0
>     > [    3.434974]  ? __device_attach_driver+0xe0/0xe0
>     > [    3.434976]  ? __device_attach_driver+0xe0/0xe0
>     > [    3.434978]  bus_for_each_dev+0x78/0xc0
>     > [    3.434982]  bus_add_driver+0x149/0x1e0
>     > [    3.434985]  driver_register+0x8b/0xe0
>     > [    3.434987]  ? 0xffffffffc01aa000
>     > [    3.434990]  init+0x52/0x1000 [virtio_blk]
>     > [    3.434994]  do_one_initcall+0x44/0x200
>     > [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
>     > [    3.435006]  do_init_module+0x4c/0x260
>     > [    3.435013]  __do_sys_finit_module+0xb4/0x120
>     > [    3.435018]  do_syscall_64+0x3b/0xc0
>     > [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>     > [    3.435037] RIP: 0033:0x7f5b31c589b9
>     > [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>     48 89 f8
>     >  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>     3d 01 f0
>     >  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
>     > [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX:
>     00000000000
>     > 00139
>     > [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX:
>     00007f5b31c589b9
>     > [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI:
>     0000000000000005
>     > [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09:
>     000055ca47ba9030
>     > [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12:
>     00007f5b31de3e2d
>     > [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15:
>     000055ca47ba8700
>     > [    3.435053]  </TASK>
>     > [    3.435059] ---[ end trace 0000000000000000 ]---
>     > [    3.440593]  vda: vda1 vda2 vda3
>     > [    3.445283] scsi host0: Virtio SCSI HBA
>     > [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM     
>     2.5+ PQ
>     > : 0 ANSI: 5
>     >
>     > --
>     > Without deviation from the norm, progress is not possible.
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
@ 2022-03-09  7:04           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09  7:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: keirf, Paul E . McKenney, kaplan, david, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Marc Zyngier, Hetzelt, Felicitas, linux-kernel,
	virtualization, Thomas Gleixner

On Wed, Mar 09, 2022 at 11:41:01AM +0800, Jason Wang wrote:
> 
> 
> On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
>     > On Tue, 19 Oct 2021 08:01:46 +0100,
>     > Jason Wang <jasowang@redhat.com> wrote:
>     > >
>     > > We used to synchronize pending MSI-X irq handlers via
>     > > synchronize_irq(), this may not work for the untrusted device which
>     > > may keep sending interrupts after reset which may lead unexpected
>     > > results. Similarly, we should not enable MSI-X interrupt until the
>     > > device is ready. So this patch fixes those two issues by:
>     > >
>     > > 1) switching to use disable_irq() to prevent the virtio interrupt
>     > >    handlers to be called after the device is reset.
>     > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
>     > >
>     > > This can make sure the virtio interrupt handler won't be called before
>     > > virtio_device_ready() and after reset.
>     > >
>     > > Cc: Thomas Gleixner <tglx@linutronix.de>
>     > > Cc: Peter Zijlstra <peterz@infradead.org>
>     > > Cc: Paul E. McKenney <paulmck@kernel.org>
>     > > Signed-off-by: Jason Wang <jasowang@redhat.com>
>     > > ---
>     > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
>     > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
>     > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
>     > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
>     > >  4 files changed, 32 insertions(+), 12 deletions(-)
>     > >
>     > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/
>     virtio_pci_common.c
>     > > index b35bb2d57f62..8d8f83aca721 100644
>     > > --- a/drivers/virtio/virtio_pci_common.c
>     > > +++ b/drivers/virtio/virtio_pci_common.c
>     > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
>     > >              "Force legacy mode for transitional virtio 1 devices");
>     > >  #endif
>     > > 
>     > > -/* wait for pending irq handlers */
>     > > -void vp_synchronize_vectors(struct virtio_device *vdev)
>     > > +/* disable irq handlers */
>     > > +void vp_disable_cbs(struct virtio_device *vdev)
>     > >  {
>     > >     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>     > >     int i;
>     > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device
>     *vdev)
>     > >             synchronize_irq(vp_dev->pci_dev->irq);
>     > > 
>     > >     for (i = 0; i < vp_dev->msix_vectors; ++i)
>     > > -           synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     > > +           disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     > > +}
>     > > +
>     > > +/* enable irq handlers */
>     > > +void vp_enable_cbs(struct virtio_device *vdev)
>     > > +{
>     > > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>     > > +   int i;
>     > > +
>     > > +   if (vp_dev->intx_enabled)
>     > > +           return;
>     > > +
>     > > +   for (i = 0; i < vp_dev->msix_vectors; ++i)
>     > > +           enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>     >
>     > This results in a splat at boot time if you set maxcpus=<whatever>,
>     > see below. Enabling interrupts that are affinity managed is *bad*. You
>     > don't even know whether the CPU which is supposed to handle this is
>     > online or not.
>     >
>     > The core kernel notices it, shouts and keeps the interrupt disabled,
>     > but this should be fixed. The whole point of managed interrupts is to
>     > let them be dealt with outside of the drivers, and tied into the CPUs
>     > being brought up and down. If virtio needs (for one reason or another)
>     > to manage interrupts on its own, so be it. But this patch isn't the
>     > way to do it, I'm afraid.
>     >
>     >       M.
> 
>     Thanks for reporting this!
> 
>     What virtio is doing here isn't unique however.
> 
>     If device driver is to be hardened against device sending interrupts
>     while driver is initializing/cleaning up, it needs kernel to provide
>     capability to disable these.
> 
>     If we then declare that that is impossible for managed interrupts
>     then that will mean most devices can't use managed interrupts
>     because ideally we'd have all drivers hardened.
> 
> 
> Or that probably means we need to use a driver specific mechanism as what we
> did for INTX instead of using NO_AUTO_EN.
> 
> Thanks 

What we did for INTX was pretty expensive, we justified this
by saying INTX handling involves expensive IO reads anyway
so it's in the noise.

Let's see what does Thomas suggest.

> 
> 
>     Thomas I think you were the one who suggested enabling/disabling
>     interrupts originally - thoughts?
> 
>     Feedback appreciated.
> 
> 
> 
>     > [    3.434849] ------------[ cut here ]------------
>     > [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210
>     irq_startup+0x10
>     > e/0x120
>     > [    3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover
>     (E) vir
>     > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci
>     (E+) libah
>     > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
>     virtio(E)
>     > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common
>     (E) crc32
>     > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus
>     (E) scsi_
>     > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
>     > [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G           
>     E     5.
>     > 17.0-rc7-00020-gea4424be1688 #63
>     > [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>     0.0.0 02
>     > /06/2015
>     > [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
>     > [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48
>     89 ef e8
>     >  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f>
>     0b eb ac
>     >  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
>     > [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
>     > [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX:
>     0000000000000040
>     > [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI:
>     ffff8bcf8ce34648
>     > [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09:
>     ffffffffa74cb828
>     > [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12:
>     0000000000000001
>     > [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15:
>     0000000000000200
>     > [    3.434918] FS:  00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000)
>     knlGS:00000
>     > 00000000000
>     > [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     > [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4:
>     0000000000170ef0
>     > [    3.434928] Call Trace:
>     > [    3.434936]  <TASK>
>     > [    3.434938]  enable_irq+0x48/0x90
>     > [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
>     > [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
>     > [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
>     > [    3.434959]  really_probe+0x1f5/0x3d0
>     > [    3.434966]  __driver_probe_device+0xfe/0x180
>     > [    3.434969]  driver_probe_device+0x1e/0x90
>     > [    3.434971]  __driver_attach+0xc0/0x1c0
>     > [    3.434974]  ? __device_attach_driver+0xe0/0xe0
>     > [    3.434976]  ? __device_attach_driver+0xe0/0xe0
>     > [    3.434978]  bus_for_each_dev+0x78/0xc0
>     > [    3.434982]  bus_add_driver+0x149/0x1e0
>     > [    3.434985]  driver_register+0x8b/0xe0
>     > [    3.434987]  ? 0xffffffffc01aa000
>     > [    3.434990]  init+0x52/0x1000 [virtio_blk]
>     > [    3.434994]  do_one_initcall+0x44/0x200
>     > [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
>     > [    3.435006]  do_init_module+0x4c/0x260
>     > [    3.435013]  __do_sys_finit_module+0xb4/0x120
>     > [    3.435018]  do_syscall_64+0x3b/0xc0
>     > [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>     > [    3.435037] RIP: 0033:0x7f5b31c589b9
>     > [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>     48 89 f8
>     >  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>     3d 01 f0
>     >  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
>     > [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX:
>     00000000000
>     > 00139
>     > [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX:
>     00007f5b31c589b9
>     > [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI:
>     0000000000000005
>     > [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09:
>     000055ca47ba9030
>     > [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12:
>     00007f5b31de3e2d
>     > [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15:
>     000055ca47ba8700
>     > [    3.435053]  </TASK>
>     > [    3.435059] ---[ end trace 0000000000000000 ]---
>     > [    3.440593]  vda: vda1 vda2 vda3
>     > [    3.445283] scsi host0: Virtio SCSI HBA
>     > [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU CD-ROM     
>     2.5+ PQ
>     > : 0 ANSI: 5
>     >
>     > --
>     > Without deviation from the norm, progress is not possible.
> 
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-09  7:04           ` Michael S. Tsirkin
  (?)
@ 2022-03-09  8:14           ` Jason Wang
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Wang @ 2022-03-09  8:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: keirf, Paul E . McKenney, kaplan, david, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Marc Zyngier, Hetzelt, Felicitas, linux-kernel,
	virtualization, Thomas Gleixner


[-- Attachment #1.1: Type: text/plain, Size: 10123 bytes --]

On Wed, Mar 9, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Mar 09, 2022 at 11:41:01AM +0800, Jason Wang wrote:
> >
> >
> > On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >     On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> >     > On Tue, 19 Oct 2021 08:01:46 +0100,
> >     > Jason Wang <jasowang@redhat.com> wrote:
> >     > >
> >     > > We used to synchronize pending MSI-X irq handlers via
> >     > > synchronize_irq(), this may not work for the untrusted device
> which
> >     > > may keep sending interrupts after reset which may lead unexpected
> >     > > results. Similarly, we should not enable MSI-X interrupt until
> the
> >     > > device is ready. So this patch fixes those two issues by:
> >     > >
> >     > > 1) switching to use disable_irq() to prevent the virtio interrupt
> >     > >    handlers to be called after the device is reset.
> >     > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> >     > >
> >     > > This can make sure the virtio interrupt handler won't be called
> before
> >     > > virtio_device_ready() and after reset.
> >     > >
> >     > > Cc: Thomas Gleixner <tglx@linutronix.de>
> >     > > Cc: Peter Zijlstra <peterz@infradead.org>
> >     > > Cc: Paul E. McKenney <paulmck@kernel.org>
> >     > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >     > > ---
> >     > >  drivers/virtio/virtio_pci_common.c | 27
> +++++++++++++++++++++------
> >     > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> >     > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> >     > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> >     > >  4 files changed, 32 insertions(+), 12 deletions(-)
> >     > >
> >     > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/
> >     virtio_pci_common.c
> >     > > index b35bb2d57f62..8d8f83aca721 100644
> >     > > --- a/drivers/virtio/virtio_pci_common.c
> >     > > +++ b/drivers/virtio/virtio_pci_common.c
> >     > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> >     > >              "Force legacy mode for transitional virtio 1
> devices");
> >     > >  #endif
> >     > >
> >     > > -/* wait for pending irq handlers */
> >     > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> >     > > +/* disable irq handlers */
> >     > > +void vp_disable_cbs(struct virtio_device *vdev)
> >     > >  {
> >     > >     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >     > >     int i;
> >     > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct
> virtio_device
> >     *vdev)
> >     > >             synchronize_irq(vp_dev->pci_dev->irq);
> >     > >
> >     > >     for (i = 0; i < vp_dev->msix_vectors; ++i)
> >     > > -           synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >     > > +           disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >     > > +}
> >     > > +
> >     > > +/* enable irq handlers */
> >     > > +void vp_enable_cbs(struct virtio_device *vdev)
> >     > > +{
> >     > > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >     > > +   int i;
> >     > > +
> >     > > +   if (vp_dev->intx_enabled)
> >     > > +           return;
> >     > > +
> >     > > +   for (i = 0; i < vp_dev->msix_vectors; ++i)
> >     > > +           enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >     >
> >     > This results in a splat at boot time if you set maxcpus=<whatever>,
> >     > see below. Enabling interrupts that are affinity managed is *bad*.
> You
> >     > don't even know whether the CPU which is supposed to handle this is
> >     > online or not.
> >     >
> >     > The core kernel notices it, shouts and keeps the interrupt
> disabled,
> >     > but this should be fixed. The whole point of managed interrupts is
> to
> >     > let them be dealt with outside of the drivers, and tied into the
> CPUs
> >     > being brought up and down. If virtio needs (for one reason or
> another)
> >     > to manage interrupts on its own, so be it. But this patch isn't the
> >     > way to do it, I'm afraid.
> >     >
> >     >       M.
> >
> >     Thanks for reporting this!
> >
> >     What virtio is doing here isn't unique however.
> >
> >     If device driver is to be hardened against device sending interrupts
> >     while driver is initializing/cleaning up, it needs kernel to provide
> >     capability to disable these.
> >
> >     If we then declare that that is impossible for managed interrupts
> >     then that will mean most devices can't use managed interrupts
> >     because ideally we'd have all drivers hardened.
> >
> >
> > Or that probably means we need to use a driver specific mechanism as
> what we
> > did for INTX instead of using NO_AUTO_EN.
> >
> > Thanks
>
> What we did for INTX was pretty expensive, we justified this
> by saying INTX handling involves expensive IO reads anyway
> so it's in the noise.
>

We only add a READ_ONCE() on the callback like:

+       if (!READ_ONCE(vp_dev->intx_soft_enabled))
+               return IRQ_NONE;

It should not be that expensive.


>
> Let's see what does Thomas suggest.
>

Yes, sure.

Thanks


>
> >
> >
> >     Thomas I think you were the one who suggested enabling/disabling
> >     interrupts originally - thoughts?
> >
> >     Feedback appreciated.
> >
> >
> >
> >     > [    3.434849] ------------[ cut here ]------------
> >     > [    3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210
> >     irq_startup+0x10
> >     > e/0x120
> >     > [    3.434861] Modules linked in: virtio_net(E+) net_failover(E)
> failover
> >     (E) vir
> >     > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E)
> ahci
> >     (E+) libah
> >     > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E)
> virtio_pci_modern_dev(E)
> >     virtio(E)
> >     > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E)
> crct10dif_common
> >     (E) crc32
> >     > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E)
> i2c_smbus
> >     (E) scsi_
> >     > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
> >     > [    3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G
>
> >     E     5.
> >     > 17.0-rc7-00020-gea4424be1688 #63
> >     > [    3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS
> >     0.0.0 02
> >     > /06/2015
> >     > [    3.434897] RIP: 0010:irq_startup+0x10e/0x120
> >     > [    3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff
> ff 48
> >     89 ef e8
> >     >  94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff
> <0f>
> >     0b eb ac
> >     >  0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> >     > [    3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
> >     > [    3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX:
> >     0000000000000040
> >     > [    3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI:
> >     ffff8bcf8ce34648
> >     > [    3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09:
> >     ffffffffa74cb828
> >     > [    3.434915] R10: 0000000000000000 R11: 0000000000000000 R12:
> >     0000000000000001
> >     > [    3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15:
> >     0000000000000200
> >     > [    3.434918] FS:  00007f5b3179f8c0(0000)
> GS:ffff8bcffbc00000(0000)
> >     knlGS:00000
> >     > 00000000000
> >     > [    3.434919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >     > [    3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4:
> >     0000000000170ef0
> >     > [    3.434928] Call Trace:
> >     > [    3.434936]  <TASK>
> >     > [    3.434938]  enable_irq+0x48/0x90
> >     > [    3.434943]  vp_enable_cbs+0x36/0x70 [virtio_pci]
> >     > [    3.434948]  virtblk_probe+0x457/0x7dc [virtio_blk]
> >     > [    3.434954]  virtio_dev_probe+0x1ae/0x280 [virtio]
> >     > [    3.434959]  really_probe+0x1f5/0x3d0
> >     > [    3.434966]  __driver_probe_device+0xfe/0x180
> >     > [    3.434969]  driver_probe_device+0x1e/0x90
> >     > [    3.434971]  __driver_attach+0xc0/0x1c0
> >     > [    3.434974]  ? __device_attach_driver+0xe0/0xe0
> >     > [    3.434976]  ? __device_attach_driver+0xe0/0xe0
> >     > [    3.434978]  bus_for_each_dev+0x78/0xc0
> >     > [    3.434982]  bus_add_driver+0x149/0x1e0
> >     > [    3.434985]  driver_register+0x8b/0xe0
> >     > [    3.434987]  ? 0xffffffffc01aa000
> >     > [    3.434990]  init+0x52/0x1000 [virtio_blk]
> >     > [    3.434994]  do_one_initcall+0x44/0x200
> >     > [    3.435001]  ? kmem_cache_alloc_trace+0x300/0x400
> >     > [    3.435006]  do_init_module+0x4c/0x260
> >     > [    3.435013]  __do_sys_finit_module+0xb4/0x120
> >     > [    3.435018]  do_syscall_64+0x3b/0xc0
> >     > [    3.435027]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >     > [    3.435037] RIP: 0033:0x7f5b31c589b9
> >     > [    3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
> 00 00
> >     48 89 f8
> >     >  48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
> <48>
> >     3d 01 f0
> >     >  ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> >     > [    3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246
> ORIG_RAX:
> >     00000000000
> >     > 00139
> >     > [    3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX:
> >     00007f5b31c589b9
> >     > [    3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI:
> >     0000000000000005
> >     > [    3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09:
> >     000055ca47ba9030
> >     > [    3.435049] R10: 0000000000000005 R11: 0000000000000246 R12:
> >     00007f5b31de3e2d
> >     > [    3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15:
> >     000055ca47ba8700
> >     > [    3.435053]  </TASK>
> >     > [    3.435059] ---[ end trace 0000000000000000 ]---
> >     > [    3.440593]  vda: vda1 vda2 vda3
> >     > [    3.445283] scsi host0: Virtio SCSI HBA
> >     > [    3.450373] scsi 0:0:0:0: CD-ROM            QEMU     QEMU
> CD-ROM
> >     2.5+ PQ
> >     > : 0 ANSI: 5
> >     >
> >     > --
> >     > Without deviation from the norm, progress is not possible.
> >
> >
>
>

[-- Attachment #1.2: Type: text/html, Size: 13978 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
  2021-10-19  7:01   ` Jason Wang
  (?)
@ 2022-03-09 10:45   ` Marc Zyngier
  2022-03-09 11:27       ` Michael S. Tsirkin
  -1 siblings, 1 reply; 48+ messages in thread
From: Marc Zyngier @ 2022-03-09 10:45 UTC (permalink / raw)
  To: Jason Wang, Will Deacon
  Cc: mst, virtualization, linux-kernel, f.hetzelt, david.kaplan,
	konrad.wilk, Boqun Feng, Thomas Gleixner, Peter Zijlstra,
	Paul E . McKenney, keirf

[Adding Will to check on my understanding of the interactions between
 spinlocks and WRITE_ONCE()]

On Tue, 19 Oct 2021 08:01:47 +0100,
Jason Wang <jasowang@redhat.com> wrote:
> 
> This patch tries to make sure the virtio interrupt handler for INTX
> won't be called after a reset and before virtio_device_ready(). We
> can't use IRQF_NO_AUTOEN since we're using shared interrupt
> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> intx_soft_enabled variable and toggle it during in
> vp_disable/enable_vectors(). The INTX interrupt handler will check
> intx_soft_enabled before processing the actual interrupt.
> 
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
>  drivers/virtio/virtio_pci_common.h |  1 +
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 8d8f83aca721..1bce254a462a 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
>  
> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		/*
> +		 * The below synchronize() guarantees that any
> +		 * interrupt for this line arriving after
> +		 * synchronize_irq() has completed is guaranteed to see
> +		 * intx_soft_enabled == false.
> +		 */
> +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
>  		synchronize_irq(vp_dev->pci_dev->irq);
> +	}
>  
>  	for (i = 0; i < vp_dev->msix_vectors; ++i)
>  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	int i;
>  
> -	if (vp_dev->intx_enabled)
> +	if (vp_dev->intx_enabled) {
> +		disable_irq(vp_dev->pci_dev->irq);
> +		/*
> +		 * The above disable_irq() provides TSO ordering and
> +		 * as such promotes the below store to store-release.
> +		 */
> +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);

What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
write up into the lock used by disable_irq(), as the unlock only has
release semantics. Is that what you are relying on? I don't see how
this upgrades WRITE_ONCE() to have release semantics.

> +		enable_irq(vp_dev->pci_dev->irq);

Same thing does here: my understanding is that the write can be pushed
down into the lock, which has acquire semantics only.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-08 16:35       ` Michael S. Tsirkin
  (?)
  (?)
@ 2022-03-09 11:08       ` Marc Zyngier
  2022-03-09 12:13           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 48+ messages in thread
From: Marc Zyngier @ 2022-03-09 11:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Thomas Gleixner, Jason Wang, virtualization, linux-kernel,
	f.hetzelt, david.kaplan, konrad.wilk, Peter Zijlstra,
	Paul E . McKenney, keirf

On Tue, 08 Mar 2022 16:35:52 +0000,
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > On Tue, 19 Oct 2021 08:01:46 +0100,
> > Jason Wang <jasowang@redhat.com> wrote:
> > > 
> > > We used to synchronize pending MSI-X irq handlers via
> > > synchronize_irq(), this may not work for the untrusted device which
> > > may keep sending interrupts after reset which may lead unexpected
> > > results. Similarly, we should not enable MSI-X interrupt until the
> > > device is ready. So this patch fixes those two issues by:
> > > 
> > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > >    handlers to be called after the device is reset.
> > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > 
> > > This can make sure the virtio interrupt handler won't be called before
> > > virtio_device_ready() and after reset.
> > > 
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > index b35bb2d57f62..8d8f83aca721 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > >  		 "Force legacy mode for transitional virtio 1 devices");
> > >  #endif
> > >  
> > > -/* wait for pending irq handlers */
> > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > +/* disable irq handlers */
> > > +void vp_disable_cbs(struct virtio_device *vdev)
> > >  {
> > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >  	int i;
> > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > >  
> > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +}
> > > +
> > > +/* enable irq handlers */
> > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > +{
> > > +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > +	int i;
> > > +
> > > +	if (vp_dev->intx_enabled)
> > > +		return;
> > > +
> > > +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > 
> > This results in a splat at boot time if you set maxcpus=<whatever>,
> > see below. Enabling interrupts that are affinity managed is *bad*. You
> > don't even know whether the CPU which is supposed to handle this is
> > online or not.
> > 
> > The core kernel notices it, shouts and keeps the interrupt disabled,
> > but this should be fixed. The whole point of managed interrupts is to
> > let them be dealt with outside of the drivers, and tied into the CPUs
> > being brought up and down. If virtio needs (for one reason or another)
> > to manage interrupts on its own, so be it. But this patch isn't the
> > way to do it, I'm afraid.
> > 
> > 	M.
> 
> Thanks for reporting this!
> 
> What virtio is doing here isn't unique however.

Then it is even worse than I though. Can you point me to the other
drivers doing such thing?

> If device driver is to be hardened against device sending interrupts
> while driver is initializing/cleaning up, it needs kernel to provide
> capability to disable these.
>
> If we then declare that that is impossible for managed interrupts
> then that will mean most devices can't use managed interrupts
> because ideally we'd have all drivers hardened.

What I find odd is that you want to do the interrupt hardening in the
individual endpoint drivers. This makes everything complicated, and
just doesn't scale.

The natural place for this sort of checks would be in the interrupt
controller driver, which has all the state as its disposal, and is
guaranteed to be able to take the right course of action if it sees
something that contradicts its internal state tracking (affinity,
masking, interrupt life cycle in general).

Because even if you were allowed to mess with the enable state, this
doesn't give you any guarantee that the interrupt is delivered on the
correct CPU either.

> Thomas I think you were the one who suggested enabling/disabling
> interrupts originally - thoughts?
> 
> Feedback appreciated.

Feedback given.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
  2022-03-09 10:45   ` Marc Zyngier
@ 2022-03-09 11:27       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 11:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Jason Wang, Will Deacon, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Boqun Feng, Thomas Gleixner,
	Peter Zijlstra, Paul E . McKenney, keirf

On Wed, Mar 09, 2022 at 10:45:49AM +0000, Marc Zyngier wrote:
> [Adding Will to check on my understanding of the interactions between
>  spinlocks and WRITE_ONCE()]
> 
> On Tue, 19 Oct 2021 08:01:47 +0100,
> Jason Wang <jasowang@redhat.com> wrote:
> > 
> > This patch tries to make sure the virtio interrupt handler for INTX
> > won't be called after a reset and before virtio_device_ready(). We
> > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > intx_soft_enabled variable and toggle it during in
> > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > intx_soft_enabled before processing the actual interrupt.
> > 
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
> >  drivers/virtio/virtio_pci_common.h |  1 +
> >  2 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index 8d8f83aca721..1bce254a462a 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> >  
> > -	if (vp_dev->intx_enabled)
> > +	if (vp_dev->intx_enabled) {
> > +		/*
> > +		 * The below synchronize() guarantees that any
> > +		 * interrupt for this line arriving after
> > +		 * synchronize_irq() has completed is guaranteed to see
> > +		 * intx_soft_enabled == false.
> > +		 */
> > +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> >  		synchronize_irq(vp_dev->pci_dev->irq);
> > +	}
> >  
> >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> >  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> >  
> > -	if (vp_dev->intx_enabled)
> > +	if (vp_dev->intx_enabled) {
> > +		disable_irq(vp_dev->pci_dev->irq);
> > +		/*
> > +		 * The above disable_irq() provides TSO ordering and
> > +		 * as such promotes the below store to store-release.
> > +		 */
> > +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> 
> What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
> write up into the lock used by disable_irq(), as the unlock only has
> release semantics. Is that what you are relying on? I don't see how
> this upgrades WRITE_ONCE() to have release semantics.
> 
> > +		enable_irq(vp_dev->pci_dev->irq);
> 
> Same thing does here: my understanding is that the write can be pushed
> down into the lock, which has acquire semantics only.
> 
> Thanks,
> 
> 	M.

Overall I feel what we are doing here is very standard and should be
pretty common for a driver that wants to be protected against a
malicious device:


1- get IRQ
2- initialize device with IRQ
3- enable IRQ

Doing it in the core kernel helps make sure interrupts are
not lost if they trigger during 2.

Without core kernel support one has to refactor the driver along the lines of:

a- initialize driver
b- get IRQ
c- initialize device with IRQ

and this is often tricky especially if one wants to do things like
discover device configuration and reconfigure the driver accordingly.


> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
@ 2022-03-09 11:27       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 11:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: keirf, Paul E . McKenney, david.kaplan, konrad.wilk,
	Peter Zijlstra, f.hetzelt, linux-kernel, virtualization,
	Thomas Gleixner, Will Deacon, Boqun Feng

On Wed, Mar 09, 2022 at 10:45:49AM +0000, Marc Zyngier wrote:
> [Adding Will to check on my understanding of the interactions between
>  spinlocks and WRITE_ONCE()]
> 
> On Tue, 19 Oct 2021 08:01:47 +0100,
> Jason Wang <jasowang@redhat.com> wrote:
> > 
> > This patch tries to make sure the virtio interrupt handler for INTX
> > won't be called after a reset and before virtio_device_ready(). We
> > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > intx_soft_enabled variable and toggle it during in
> > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > intx_soft_enabled before processing the actual interrupt.
> > 
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
> >  drivers/virtio/virtio_pci_common.h |  1 +
> >  2 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index 8d8f83aca721..1bce254a462a 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> >  
> > -	if (vp_dev->intx_enabled)
> > +	if (vp_dev->intx_enabled) {
> > +		/*
> > +		 * The below synchronize() guarantees that any
> > +		 * interrupt for this line arriving after
> > +		 * synchronize_irq() has completed is guaranteed to see
> > +		 * intx_soft_enabled == false.
> > +		 */
> > +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> >  		synchronize_irq(vp_dev->pci_dev->irq);
> > +	}
> >  
> >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> >  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
> >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >  	int i;
> >  
> > -	if (vp_dev->intx_enabled)
> > +	if (vp_dev->intx_enabled) {
> > +		disable_irq(vp_dev->pci_dev->irq);
> > +		/*
> > +		 * The above disable_irq() provides TSO ordering and
> > +		 * as such promotes the below store to store-release.
> > +		 */
> > +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> 
> What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
> write up into the lock used by disable_irq(), as the unlock only has
> release semantics. Is that what you are relying on? I don't see how
> this upgrades WRITE_ONCE() to have release semantics.
> 
> > +		enable_irq(vp_dev->pci_dev->irq);
> 
> Same thing does here: my understanding is that the write can be pushed
> down into the lock, which has acquire semantics only.
> 
> Thanks,
> 
> 	M.

Overall I feel what we are doing here is very standard and should be
pretty common for a driver that wants to be protected against a
malicious device:


1- get IRQ
2- initialize device with IRQ
3- enable IRQ

Doing it in the core kernel helps make sure interrupts are
not lost if they trigger during 2.

Without core kernel support one has to refactor the driver along the lines of:

a- initialize driver
b- get IRQ
c- initialize device with IRQ

and this is often tricky especially if one wants to do things like
discover device configuration and reconfigure the driver accordingly.


> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
  2022-03-09 11:08       ` Marc Zyngier
@ 2022-03-09 12:13           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 12:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Jason Wang, virtualization, linux-kernel,
	f.hetzelt, david.kaplan, konrad.wilk, Peter Zijlstra,
	Paul E . McKenney, keirf

On Wed, Mar 09, 2022 at 11:08:41AM +0000, Marc Zyngier wrote:
> On Tue, 08 Mar 2022 16:35:52 +0000,
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > > On Tue, 19 Oct 2021 08:01:46 +0100,
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > We used to synchronize pending MSI-X irq handlers via
> > > > synchronize_irq(), this may not work for the untrusted device which
> > > > may keep sending interrupts after reset which may lead unexpected
> > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > device is ready. So this patch fixes those two issues by:
> > > > 
> > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > >    handlers to be called after the device is reset.
> > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > 
> > > > This can make sure the virtio interrupt handler won't be called before
> > > > virtio_device_ready() and after reset.
> > > > 
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index b35bb2d57f62..8d8f83aca721 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > >  		 "Force legacy mode for transitional virtio 1 devices");
> > > >  #endif
> > > >  
> > > > -/* wait for pending irq handlers */
> > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > +/* disable irq handlers */
> > > > +void vp_disable_cbs(struct virtio_device *vdev)
> > > >  {
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > > >  
> > > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +}
> > > > +
> > > > +/* enable irq handlers */
> > > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > > +{
> > > > +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > +	int i;
> > > > +
> > > > +	if (vp_dev->intx_enabled)
> > > > +		return;
> > > > +
> > > > +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > 
> > > This results in a splat at boot time if you set maxcpus=<whatever>,
> > > see below. Enabling interrupts that are affinity managed is *bad*. You
> > > don't even know whether the CPU which is supposed to handle this is
> > > online or not.
> > > 
> > > The core kernel notices it, shouts and keeps the interrupt disabled,
> > > but this should be fixed. The whole point of managed interrupts is to
> > > let them be dealt with outside of the drivers, and tied into the CPUs
> > > being brought up and down. If virtio needs (for one reason or another)
> > > to manage interrupts on its own, so be it. But this patch isn't the
> > > way to do it, I'm afraid.
> > > 
> > > 	M.
> > 
> > Thanks for reporting this!
> > 
> > What virtio is doing here isn't unique however.
> 
> Then it is even worse than I though. Can you point me to the other
> drivers doing such thing?


What are you asking? Whether any drivers that set PCI_IRQ_AFFINITY also
set IRQF_NO_AUTOEN? I could not find any other drivers doing that, no.
When I said "isn't unique" I rather meant that other drivers need
something like this, and they likely do it in a driver specific, complex
fashion.

just poking around at random I found
drivers/scsi/mpi3mr/mpi3mr_fw.c

which does this last thing during initialization:

void mpi3mr_ioc_enable_intr(struct mpi3mr_ioc *mrioc)
{
        mrioc->intr_enabled = 1;
}

and the interrupt handler does:

static irqreturn_t mpi3mr_isr_primary(int irq, void *privdata)
{
        struct mpi3mr_intr_info *intr_info = privdata;
        struct mpi3mr_ioc *mrioc;
        u16 midx;
        u32 num_admin_replies = 0, num_op_reply = 0;

        if (!intr_info)
                return IRQ_NONE;

        mrioc = intr_info->mrioc;

        if (!mrioc->intr_enabled)
                return IRQ_NONE;


which seems to be trying to accomplish exactly the same thing and might
or might not actually need WRITE_ONCE or some barriers if it were to be
made 100% foolproof.



> > If device driver is to be hardened against device sending interrupts
> > while driver is initializing/cleaning up, it needs kernel to provide
> > capability to disable these.
> >
> > If we then declare that that is impossible for managed interrupts
> > then that will mean most devices can't use managed interrupts
> > because ideally we'd have all drivers hardened.
> 
> What I find odd is that you want to do the interrupt hardening in the
> individual endpoint drivers. This makes everything complicated, and
> just doesn't scale.
> The natural place for this sort of checks would be in the interrupt
> controller driver, which has all the state as its disposal, and is
> guaranteed to be able to take the right course of action if it sees
> something that contradicts its internal state tracking (affinity,
> masking, interrupt life cycle in general).

Exactly.  So here's what we are trying to do: driver is initializing
both itself and the device. As part of that, it assigns some IRQs. Once that
happens, device can trigger the IRQ callback by sending the interrupt.
If this happens too soon driver is not yet fully initialized and might
access uninitialized memory (and generally get confused because the
state is inconsistent).

Getting the IRQ in a disabled state and only enabling when we are
ready sounds like a very reasonable way to go about it just from
API perspective.





> 
> Because even if you were allowed to mess with the enable state, this
> doesn't give you any guarantee that the interrupt is delivered on the
> correct CPU either.

For virtio affinity is mostly an optimization, I don't think this
affects correctness.

> > Thomas I think you were the one who suggested enabling/disabling
> > interrupts originally - thoughts?
> > 
> > Feedback appreciated.
> 
> Feedback given.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts
@ 2022-03-09 12:13           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 12:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: keirf, Paul E . McKenney, david.kaplan, konrad.wilk,
	Peter Zijlstra, f.hetzelt, linux-kernel, virtualization,
	Thomas Gleixner

On Wed, Mar 09, 2022 at 11:08:41AM +0000, Marc Zyngier wrote:
> On Tue, 08 Mar 2022 16:35:52 +0000,
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > > On Tue, 19 Oct 2021 08:01:46 +0100,
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > We used to synchronize pending MSI-X irq handlers via
> > > > synchronize_irq(), this may not work for the untrusted device which
> > > > may keep sending interrupts after reset which may lead unexpected
> > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > device is ready. So this patch fixes those two issues by:
> > > > 
> > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > >    handlers to be called after the device is reset.
> > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > > 
> > > > This can make sure the virtio interrupt handler won't be called before
> > > > virtio_device_ready() and after reset.
> > > > 
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > >  drivers/virtio/virtio_pci_common.h |  6 ++++--
> > > >  drivers/virtio/virtio_pci_legacy.c |  5 +++--
> > > >  drivers/virtio/virtio_pci_modern.c |  6 ++++--
> > > >  4 files changed, 32 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index b35bb2d57f62..8d8f83aca721 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > >  		 "Force legacy mode for transitional virtio 1 devices");
> > > >  #endif
> > > >  
> > > > -/* wait for pending irq handlers */
> > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > +/* disable irq handlers */
> > > > +void vp_disable_cbs(struct virtio_device *vdev)
> > > >  {
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > > >  
> > > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > -		synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +}
> > > > +
> > > > +/* enable irq handlers */
> > > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > > +{
> > > > +	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > +	int i;
> > > > +
> > > > +	if (vp_dev->intx_enabled)
> > > > +		return;
> > > > +
> > > > +	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > +		enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > 
> > > This results in a splat at boot time if you set maxcpus=<whatever>,
> > > see below. Enabling interrupts that are affinity managed is *bad*. You
> > > don't even know whether the CPU which is supposed to handle this is
> > > online or not.
> > > 
> > > The core kernel notices it, shouts and keeps the interrupt disabled,
> > > but this should be fixed. The whole point of managed interrupts is to
> > > let them be dealt with outside of the drivers, and tied into the CPUs
> > > being brought up and down. If virtio needs (for one reason or another)
> > > to manage interrupts on its own, so be it. But this patch isn't the
> > > way to do it, I'm afraid.
> > > 
> > > 	M.
> > 
> > Thanks for reporting this!
> > 
> > What virtio is doing here isn't unique however.
> 
> Then it is even worse than I though. Can you point me to the other
> drivers doing such thing?


What are you asking? Whether any drivers that set PCI_IRQ_AFFINITY also
set IRQF_NO_AUTOEN? I could not find any other drivers doing that, no.
When I said "isn't unique" I rather meant that other drivers need
something like this, and they likely do it in a driver specific, complex
fashion.

just poking around at random I found
drivers/scsi/mpi3mr/mpi3mr_fw.c

which does this last thing during initialization:

void mpi3mr_ioc_enable_intr(struct mpi3mr_ioc *mrioc)
{
        mrioc->intr_enabled = 1;
}

and the interrupt handler does:

static irqreturn_t mpi3mr_isr_primary(int irq, void *privdata)
{
        struct mpi3mr_intr_info *intr_info = privdata;
        struct mpi3mr_ioc *mrioc;
        u16 midx;
        u32 num_admin_replies = 0, num_op_reply = 0;

        if (!intr_info)
                return IRQ_NONE;

        mrioc = intr_info->mrioc;

        if (!mrioc->intr_enabled)
                return IRQ_NONE;


which seems to be trying to accomplish exactly the same thing and might
or might not actually need WRITE_ONCE or some barriers if it were to be
made 100% foolproof.



> > If device driver is to be hardened against device sending interrupts
> > while driver is initializing/cleaning up, it needs kernel to provide
> > capability to disable these.
> >
> > If we then declare that that is impossible for managed interrupts
> > then that will mean most devices can't use managed interrupts
> > because ideally we'd have all drivers hardened.
> 
> What I find odd is that you want to do the interrupt hardening in the
> individual endpoint drivers. This makes everything complicated, and
> just doesn't scale.
> The natural place for this sort of checks would be in the interrupt
> controller driver, which has all the state as its disposal, and is
> guaranteed to be able to take the right course of action if it sees
> something that contradicts its internal state tracking (affinity,
> masking, interrupt life cycle in general).

Exactly.  So here's what we are trying to do: driver is initializing
both itself and the device. As part of that, it assigns some IRQs. Once that
happens, device can trigger the IRQ callback by sending the interrupt.
If this happens too soon driver is not yet fully initialized and might
access uninitialized memory (and generally get confused because the
state is inconsistent).

Getting the IRQ in a disabled state and only enabling when we are
ready sounds like a very reasonable way to go about it just from
API perspective.





> 
> Because even if you were allowed to mess with the enable state, this
> doesn't give you any guarantee that the interrupt is delivered on the
> correct CPU either.

For virtio affinity is mostly an optimization, I don't think this
affects correctness.

> > Thomas I think you were the one who suggested enabling/disabling
> > interrupts originally - thoughts?
> > 
> > Feedback appreciated.
> 
> Feedback given.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
  2022-03-09 11:27       ` Michael S. Tsirkin
  (?)
@ 2022-03-09 12:14       ` Marc Zyngier
  2022-03-09 12:30           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 48+ messages in thread
From: Marc Zyngier @ 2022-03-09 12:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Will Deacon, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Boqun Feng, Thomas Gleixner,
	Peter Zijlstra, Paul E . McKenney, keirf

On Wed, 09 Mar 2022 11:27:42 +0000,
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> On Wed, Mar 09, 2022 at 10:45:49AM +0000, Marc Zyngier wrote:
> > [Adding Will to check on my understanding of the interactions between
> >  spinlocks and WRITE_ONCE()]
> > 
> > On Tue, 19 Oct 2021 08:01:47 +0100,
> > Jason Wang <jasowang@redhat.com> wrote:
> > > 
> > > This patch tries to make sure the virtio interrupt handler for INTX
> > > won't be called after a reset and before virtio_device_ready(). We
> > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > intx_soft_enabled variable and toggle it during in
> > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > intx_soft_enabled before processing the actual interrupt.
> > > 
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
> > >  drivers/virtio/virtio_pci_common.h |  1 +
> > >  2 files changed, 22 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > index 8d8f83aca721..1bce254a462a 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
> > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >  	int i;
> > >  
> > > -	if (vp_dev->intx_enabled)
> > > +	if (vp_dev->intx_enabled) {
> > > +		/*
> > > +		 * The below synchronize() guarantees that any
> > > +		 * interrupt for this line arriving after
> > > +		 * synchronize_irq() has completed is guaranteed to see
> > > +		 * intx_soft_enabled == false.
> > > +		 */
> > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > > +	}
> > >  
> > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
> > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >  	int i;
> > >  
> > > -	if (vp_dev->intx_enabled)
> > > +	if (vp_dev->intx_enabled) {
> > > +		disable_irq(vp_dev->pci_dev->irq);
> > > +		/*
> > > +		 * The above disable_irq() provides TSO ordering and
> > > +		 * as such promotes the below store to store-release.
> > > +		 */
> > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> > 
> > What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
> > write up into the lock used by disable_irq(), as the unlock only has
> > release semantics. Is that what you are relying on? I don't see how
> > this upgrades WRITE_ONCE() to have release semantics.
> > 
> > > +		enable_irq(vp_dev->pci_dev->irq);
> > 
> > Same thing does here: my understanding is that the write can be pushed
> > down into the lock, which has acquire semantics only.
> > 
> > Thanks,
> > 
> > 	M.
> 
> Overall I feel what we are doing here is very standard and should be
> pretty common for a driver that wants to be protected against a
> malicious device:
> 
> 
> 1- get IRQ
> 2- initialize device with IRQ
> 3- enable IRQ
> 
> Doing it in the core kernel helps make sure interrupts are
> not lost if they trigger during 2.

But this isn't the core kernel. You're doing that in some random
driver (and even more, only for the PCI version of that driver).

> 
> Without core kernel support one has to refactor the driver along the lines of:
> 
> a- initialize driver
> b- get IRQ
> c- initialize device with IRQ
> 
> and this is often tricky especially if one wants to do things like
> discover device configuration and reconfigure the driver accordingly.

But this isn't what this patch is about, is it? You are just tracking
whether interrupts are enabled or not. To which my reply to you on the
previous patch still applies (this is the wrong place to track such
state).

You also haven't answered my question: what are your ordering
expectations wrt the WRITE_ONCE() above? The comment says 'TSO', and I
don't really understand how this is enforced.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
  2022-03-09 12:14       ` Marc Zyngier
@ 2022-03-09 12:30           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 12:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Jason Wang, Will Deacon, virtualization, linux-kernel, f.hetzelt,
	david.kaplan, konrad.wilk, Boqun Feng, Thomas Gleixner,
	Peter Zijlstra, Paul E . McKenney, keirf

On Wed, Mar 09, 2022 at 12:14:14PM +0000, Marc Zyngier wrote:
> On Wed, 09 Mar 2022 11:27:42 +0000,
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > On Wed, Mar 09, 2022 at 10:45:49AM +0000, Marc Zyngier wrote:
> > > [Adding Will to check on my understanding of the interactions between
> > >  spinlocks and WRITE_ONCE()]
> > > 
> > > On Tue, 19 Oct 2021 08:01:47 +0100,
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > This patch tries to make sure the virtio interrupt handler for INTX
> > > > won't be called after a reset and before virtio_device_ready(). We
> > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > > intx_soft_enabled variable and toggle it during in
> > > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > > intx_soft_enabled before processing the actual interrupt.
> > > > 
> > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
> > > >  drivers/virtio/virtio_pci_common.h |  1 +
> > > >  2 files changed, 22 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index 8d8f83aca721..1bce254a462a 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > >  
> > > > -	if (vp_dev->intx_enabled)
> > > > +	if (vp_dev->intx_enabled) {
> > > > +		/*
> > > > +		 * The below synchronize() guarantees that any
> > > > +		 * interrupt for this line arriving after
> > > > +		 * synchronize_irq() has completed is guaranteed to see
> > > > +		 * intx_soft_enabled == false.
> > > > +		 */
> > > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> > > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > > > +	}
> > > >  
> > > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > >  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > >  
> > > > -	if (vp_dev->intx_enabled)
> > > > +	if (vp_dev->intx_enabled) {
> > > > +		disable_irq(vp_dev->pci_dev->irq);
> > > > +		/*
> > > > +		 * The above disable_irq() provides TSO ordering and
> > > > +		 * as such promotes the below store to store-release.
> > > > +		 */
> > > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> > > 
> > > What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
> > > write up into the lock used by disable_irq(), as the unlock only has
> > > release semantics. Is that what you are relying on? I don't see how
> > > this upgrades WRITE_ONCE() to have release semantics.
> > > 
> > > > +		enable_irq(vp_dev->pci_dev->irq);
> > > 
> > > Same thing does here: my understanding is that the write can be pushed
> > > down into the lock, which has acquire semantics only.
> > > 
> > > Thanks,
> > > 
> > > 	M.
> > 
> > Overall I feel what we are doing here is very standard and should be
> > pretty common for a driver that wants to be protected against a
> > malicious device:
> > 
> > 
> > 1- get IRQ
> > 2- initialize device with IRQ
> > 3- enable IRQ
> > 
> > Doing it in the core kernel helps make sure interrupts are
> > not lost if they trigger during 2.
> 
> But this isn't the core kernel. You're doing that in some random
> driver (and even more, only for the PCI version of that driver).


Oh, I'm answering on the INTX patch. I was referring to the MSIX
path actually. Yes INTX can not currently start in disabled state
because it's shared.


> > 
> > Without core kernel support one has to refactor the driver along the lines of:
> > 
> > a- initialize driver
> > b- get IRQ
> > c- initialize device with IRQ
> > 
> > and this is often tricky especially if one wants to do things like
> > discover device configuration and reconfigure the driver accordingly.
> 
> But this isn't what this patch is about, is it? You are just tracking
> whether interrupts are enabled or not. To which my reply to you on the
> previous patch still applies (this is the wrong place to track such
> state).

It's not that we care. What we need is to make sure that interrupts sent
before driver is ready to accept them do not cause a callback.
If there was a way to tell kernel "I'm ready to accept callbacks now"
problem would be solved.



> You also haven't answered my question: what are your ordering
> expectations wrt the WRITE_ONCE() above? The comment says 'TSO', and I
> don't really understand how this is enforced.
> 
> Thanks,
> 
> 	M.

It's based on Peter's suggestion:
https://lore.kernel.org/r/YUCBZjjk77q8JS4f%40hirez.programming.kicks-ass.net


> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V3 05/10] virtio-pci: harden INTX interrupts
@ 2022-03-09 12:30           ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09 12:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: keirf, Paul E . McKenney, david.kaplan, konrad.wilk,
	Peter Zijlstra, f.hetzelt, linux-kernel, virtualization,
	Thomas Gleixner, Will Deacon, Boqun Feng

On Wed, Mar 09, 2022 at 12:14:14PM +0000, Marc Zyngier wrote:
> On Wed, 09 Mar 2022 11:27:42 +0000,
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > On Wed, Mar 09, 2022 at 10:45:49AM +0000, Marc Zyngier wrote:
> > > [Adding Will to check on my understanding of the interactions between
> > >  spinlocks and WRITE_ONCE()]
> > > 
> > > On Tue, 19 Oct 2021 08:01:47 +0100,
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > This patch tries to make sure the virtio interrupt handler for INTX
> > > > won't be called after a reset and before virtio_device_ready(). We
> > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > > intx_soft_enabled variable and toggle it during in
> > > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > > intx_soft_enabled before processing the actual interrupt.
> > > > 
> > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_pci_common.c | 23 +++++++++++++++++++++--
> > > >  drivers/virtio/virtio_pci_common.h |  1 +
> > > >  2 files changed, 22 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index 8d8f83aca721..1bce254a462a 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -30,8 +30,16 @@ void vp_disable_cbs(struct virtio_device *vdev)
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > >  
> > > > -	if (vp_dev->intx_enabled)
> > > > +	if (vp_dev->intx_enabled) {
> > > > +		/*
> > > > +		 * The below synchronize() guarantees that any
> > > > +		 * interrupt for this line arriving after
> > > > +		 * synchronize_irq() has completed is guaranteed to see
> > > > +		 * intx_soft_enabled == false.
> > > > +		 */
> > > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> > > >  		synchronize_irq(vp_dev->pci_dev->irq);
> > > > +	}
> > > >  
> > > >  	for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > >  		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > @@ -43,8 +51,16 @@ void vp_enable_cbs(struct virtio_device *vdev)
> > > >  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > >  	int i;
> > > >  
> > > > -	if (vp_dev->intx_enabled)
> > > > +	if (vp_dev->intx_enabled) {
> > > > +		disable_irq(vp_dev->pci_dev->irq);
> > > > +		/*
> > > > +		 * The above disable_irq() provides TSO ordering and
> > > > +		 * as such promotes the below store to store-release.
> > > > +		 */
> > > > +		WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> > > 
> > > What do you mean by TSO here? AFAICT, the CPU is allowed hoist this
> > > write up into the lock used by disable_irq(), as the unlock only has
> > > release semantics. Is that what you are relying on? I don't see how
> > > this upgrades WRITE_ONCE() to have release semantics.
> > > 
> > > > +		enable_irq(vp_dev->pci_dev->irq);
> > > 
> > > Same thing does here: my understanding is that the write can be pushed
> > > down into the lock, which has acquire semantics only.
> > > 
> > > Thanks,
> > > 
> > > 	M.
> > 
> > Overall I feel what we are doing here is very standard and should be
> > pretty common for a driver that wants to be protected against a
> > malicious device:
> > 
> > 
> > 1- get IRQ
> > 2- initialize device with IRQ
> > 3- enable IRQ
> > 
> > Doing it in the core kernel helps make sure interrupts are
> > not lost if they trigger during 2.
> 
> But this isn't the core kernel. You're doing that in some random
> driver (and even more, only for the PCI version of that driver).


Oh, I'm answering on the INTX patch. I was referring to the MSIX
path actually. Yes INTX can not currently start in disabled state
because it's shared.


> > 
> > Without core kernel support one has to refactor the driver along the lines of:
> > 
> > a- initialize driver
> > b- get IRQ
> > c- initialize device with IRQ
> > 
> > and this is often tricky especially if one wants to do things like
> > discover device configuration and reconfigure the driver accordingly.
> 
> But this isn't what this patch is about, is it? You are just tracking
> whether interrupts are enabled or not. To which my reply to you on the
> previous patch still applies (this is the wrong place to track such
> state).

It's not that we care. What we need is to make sure that interrupts sent
before driver is ready to accept them do not cause a callback.
If there was a way to tell kernel "I'm ready to accept callbacks now"
problem would be solved.



> You also haven't answered my question: what are your ordering
> expectations wrt the WRITE_ONCE() above? The comment says 'TSO', and I
> don't really understand how this is enforced.
> 
> Thanks,
> 
> 	M.

It's based on Peter's suggestion:
https://lore.kernel.org/r/YUCBZjjk77q8JS4f%40hirez.programming.kicks-ass.net


> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-03-09 12:30 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  7:01 [PATCH V3 00/10] More virtio hardening Jason Wang
2021-10-19  7:01 ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 01/10] virtio-blk: validate num_queues during probe Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-20  7:18   ` Stefano Garzarella
2021-10-20  7:18     ` Stefano Garzarella
2021-10-20  7:37     ` Michael S. Tsirkin
2021-10-20  7:37       ` Michael S. Tsirkin
2021-10-20  8:44       ` Stefano Garzarella
2021-10-20  8:44         ` Stefano Garzarella
2021-10-20  7:55   ` Stefan Hajnoczi
2021-10-20  7:55     ` Stefan Hajnoczi
2021-10-19  7:01 ` [PATCH V3 02/10] virtio_console: validate max_nr_ports before trying to use it Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 03/10] virtio_config: introduce a new .enable_cbs method Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts Jason Wang
2021-10-19  7:01   ` Jason Wang
2022-03-08 15:19   ` Marc Zyngier
2022-03-08 16:35     ` Michael S. Tsirkin
2022-03-08 16:35       ` Michael S. Tsirkin
2022-03-09  3:41       ` Jason Wang
2022-03-09  7:04         ` Michael S. Tsirkin
2022-03-09  7:04           ` Michael S. Tsirkin
2022-03-09  8:14           ` Jason Wang
2022-03-09 11:08       ` Marc Zyngier
2022-03-09 12:13         ` Michael S. Tsirkin
2022-03-09 12:13           ` Michael S. Tsirkin
2021-10-19  7:01 ` [PATCH V3 05/10] virtio-pci: harden INTX interrupts Jason Wang
2021-10-19  7:01   ` Jason Wang
2022-03-09 10:45   ` Marc Zyngier
2022-03-09 11:27     ` Michael S. Tsirkin
2022-03-09 11:27       ` Michael S. Tsirkin
2022-03-09 12:14       ` Marc Zyngier
2022-03-09 12:30         ` Michael S. Tsirkin
2022-03-09 12:30           ` Michael S. Tsirkin
2021-10-19  7:01 ` [PATCH V3 06/10] virtio_ring: fix typos in vring_desc_extra Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 07/10] virtio_ring: validate used buffer length Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 08/10] virtio-net: don't let virtio core to validate used length Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 09/10] virtio-blk: " Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-19  7:01 ` [PATCH V3 10/10] virtio-scsi: don't let virtio core to validate used buffer length Jason Wang
2021-10-19  7:01   ` Jason Wang
2021-10-23 21:31 ` [PATCH V3 00/10] More virtio hardening Michael S. Tsirkin
2021-10-23 21:31   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.