All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] virtio short improvements
@ 2021-07-17  7:42 Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-17  7:42 UTC (permalink / raw)
  To: mst, virtualization

Hi,

This series contains small improvements for virtio pci driver.
The main idea support surprise removal of virtio pci device.

Patches 1 to 3 prepare the code to handle surprise removal by marking
the device as broken in patch-4.

Patch summary:
patch-1: ensures that compiler optimization doesn't occur on vq->broken bit
patch-2: maintains the mirror sequence on VQ delete and VQ create
patch-3: protects vqs list for simultaneous access from reader and a writer
patch-4: handles surprise removal of virtio pci device which avoids
         call trace and system lockup

Parav Pandit (4):
  virtio: Improve vq->broken access to avoid any compiler optimization
  virtio: Keep vring_del_virtqueue() mirror of VQ create
  virtio: Protect vqs list access
  virtio_pci: Support surprise removal of virtio pci device

 drivers/virtio/virtio.c            |  1 +
 drivers/virtio/virtio_pci_common.c |  7 +++++++
 drivers/virtio/virtio_ring.c       | 17 ++++++++++++++---
 include/linux/virtio.h             |  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
  2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
@ 2021-07-17  7:42 ` Parav Pandit via Virtualization
  2021-07-17 20:38   ` Michael S. Tsirkin
  2021-07-17  7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-17  7:42 UTC (permalink / raw)
  To: mst, virtualization

Currently vq->broken field is read by virtqueue_is_broken() in busy
loop in one context by virtnet_send_command().

vq->broken is set to true in other process context by
virtio_break_device(). Reader and writer are accessing it without any
synchronization. This may lead to a compiler optimization which may
result to optimize reading vq->broken only once.

Hence, force reading vq->broken on each invocation of
virtqueue_is_broken() and ensure that its update is visible.

Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.")
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 drivers/virtio/virtio_ring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 89bfe46a8a7f..7f379fe7d78d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->broken;
+	return READ_ONCE(vq->broken);
 }
 EXPORT_SYMBOL_GPL(virtqueue_is_broken);
 
@@ -2387,7 +2387,9 @@ void virtio_break_device(struct virtio_device *dev)
 
 	list_for_each_entry(_vq, &dev->vqs, list) {
 		struct vring_virtqueue *vq = to_vvq(_vq);
-		vq->broken = true;
+
+		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+		smp_store_release(&vq->broken, true);
 	}
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create
  2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
@ 2021-07-17  7:42 ` Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
  3 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-17  7:42 UTC (permalink / raw)
  To: mst, virtualization

Keep the vring_del_virtqueue() mirror of the create routines.
i.e. to delete list entry first as it is added last during the create
routine.

Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 drivers/virtio/virtio_ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7f379fe7d78d..d2e1a7a21171 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2291,6 +2291,8 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	list_del(&_vq->list);
+
 	if (vq->we_own_ring) {
 		if (vq->packed_ring) {
 			vring_free_queue(vq->vq.vdev,
@@ -2321,7 +2323,6 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
 	}
-	list_del(&_vq->list);
 	kfree(vq);
 }
 EXPORT_SYMBOL_GPL(vring_del_virtqueue);
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] virtio: Protect vqs list access
  2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
  2021-07-17  7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
@ 2021-07-17  7:42 ` Parav Pandit via Virtualization
  2021-07-17 20:41   ` Michael S. Tsirkin
  2021-07-17  7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
  3 siblings, 1 reply; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-17  7:42 UTC (permalink / raw)
  To: mst, virtualization

VQs may be accessed to mark the device broken while they are
created/destroyed. Hence protect the access to the vqs list.

Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.")
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 drivers/virtio/virtio.c      | 1 +
 drivers/virtio/virtio_ring.c | 8 ++++++++
 include/linux/virtio.h       | 1 +
 3 files changed, 10 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 4b15c00c0a0a..a0d81e35ec4b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -355,6 +355,7 @@ int register_virtio_device(struct virtio_device *dev)
 	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
 	INIT_LIST_HEAD(&dev->vqs);
+	rwlock_init(&dev->vqs_list_lock);
 
 	/*
 	 * device_add() causes the bus infrastructure to look for a matching
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d2e1a7a21171..66a91dec39d9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1755,7 +1755,9 @@ static struct virtqueue *vring_create_virtqueue_packed(
 			cpu_to_le16(vq->packed.event_flags_shadow);
 	}
 
+	write_lock(&vdev->vqs_list_lock);
 	list_add_tail(&vq->vq.list, &vdev->vqs);
+	write_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
 err_desc_extra:
@@ -2229,7 +2231,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	memset(vq->split.desc_state, 0, vring.num *
 			sizeof(struct vring_desc_state_split));
 
+	write_lock(&vdev->vqs_list_lock);
 	list_add_tail(&vq->vq.list, &vdev->vqs);
+	write_unlock(&vdev->vqs_list_lock);
 	return &vq->vq;
 
 err_extra:
@@ -2291,7 +2295,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	write_lock(&vq->vq.vdev->vqs_list_lock);
 	list_del(&_vq->list);
+	write_unlock(&vq->vq.vdev->vqs_list_lock);
 
 	if (vq->we_own_ring) {
 		if (vq->packed_ring) {
@@ -2386,12 +2392,14 @@ void virtio_break_device(struct virtio_device *dev)
 {
 	struct virtqueue *_vq;
 
+	read_lock(&dev->vqs_list_lock);
 	list_for_each_entry(_vq, &dev->vqs, list) {
 		struct vring_virtqueue *vq = to_vvq(_vq);
 
 		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
 		smp_store_release(&vq->broken, true);
 	}
+	read_unlock(&dev->vqs_list_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b1894e0323fa..1cf77d480ef3 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -115,6 +115,7 @@ struct virtio_device {
 	const struct virtio_config_ops *config;
 	const struct vringh_config_ops *vringh_config;
 	struct list_head vqs;
+	rwlock_t vqs_list_lock;
 	u64 features;
 	void *priv;
 };
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
                   ` (2 preceding siblings ...)
  2021-07-17  7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
@ 2021-07-17  7:42 ` Parav Pandit via Virtualization
  2021-07-17 20:46   ` Michael S. Tsirkin
  3 siblings, 1 reply; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-17  7:42 UTC (permalink / raw)
  To: mst, virtualization

When a virtio pci device undergo surprise removal (aka async removaln in
PCIe spec), mark the device is broken so that any upper layer drivers can
abort any outstanding operation.

When a virtio net pci device undergo surprise removal which is used by a
NetworkManager, a below call trace was observed.

kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
Workqueue: events linkwatch_event
RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
Call Trace:
 virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
 ? __hw_addr_create_ex+0x85/0xc0
 __dev_mc_add+0x72/0x80
 igmp6_group_added+0xa7/0xd0
 ipv6_mc_up+0x3c/0x60
 ipv6_find_idev+0x36/0x80
 addrconf_add_dev+0x1e/0xa0
 addrconf_dev_config+0x71/0x130
 addrconf_notify+0x1f5/0xb40
 ? rtnl_is_locked+0x11/0x20
 ? __switch_to_asm+0x42/0x70
 ? finish_task_switch+0xaf/0x2c0
 ? raw_notifier_call_chain+0x3e/0x50
 raw_notifier_call_chain+0x3e/0x50
 netdev_state_change+0x67/0x90
 linkwatch_do_dev+0x3c/0x50
 __linkwatch_run_queue+0xd2/0x220
 linkwatch_event+0x21/0x30
 process_one_work+0x1c8/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x118/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

Hence, add the ability to abort the command on surprise removal
which prevents infinite loop and system lockup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..b35bb2d57f62 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&vp_dev->vdev.dev);
 
+	/*
+	 * Device is marked broken on surprise removal so that virtio upper
+	 * layers can abort any ongoing operation.
+	 */
+	if (!pci_device_is_present(pci_dev))
+		virtio_break_device(&vp_dev->vdev);
+
 	pci_disable_sriov(pci_dev);
 
 	unregister_virtio_device(&vp_dev->vdev);
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
  2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
@ 2021-07-17 20:38   ` Michael S. Tsirkin
  2021-07-19  5:26     ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-07-17 20:38 UTC (permalink / raw)
  To: Parav Pandit; +Cc: virtualization

On Sat, Jul 17, 2021 at 10:42:55AM +0300, Parav Pandit wrote:
> Currently vq->broken field is read by virtqueue_is_broken() in busy
> loop in one context by virtnet_send_command().
> 
> vq->broken is set to true in other process context by
> virtio_break_device(). Reader and writer are accessing it without any
> synchronization. This may lead to a compiler optimization which may
> result to optimize reading vq->broken only once.
> 
> Hence, force reading vq->broken on each invocation of
> virtqueue_is_broken() and ensure that its update is visible.
>
> Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.")

This is all theoretical right?
virtqueue_get_buf is not inlined so compiler generally
assumes any vq field can change.

I'm inclined to not include a Fixes
tag then. And please do change subject to say "theoretical"
to make that clear to people.
 
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/virtio/virtio_ring.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 89bfe46a8a7f..7f379fe7d78d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->broken;
> +	return READ_ONCE(vq->broken);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
>  
> @@ -2387,7 +2387,9 @@ void virtio_break_device(struct virtio_device *dev)
>  
>  	list_for_each_entry(_vq, &dev->vqs, list) {
>  		struct vring_virtqueue *vq = to_vvq(_vq);
> -		vq->broken = true;
> +
> +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> +		smp_store_release(&vq->broken, true);

A bit puzzled here. Why do we need release semantics here?
IUC store_release does not generally pair with READ_ONCE - does it?

The commit log does not describe this either.

>  	}
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] virtio: Protect vqs list access
  2021-07-17  7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
@ 2021-07-17 20:41   ` Michael S. Tsirkin
  2021-07-19  5:37     ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-07-17 20:41 UTC (permalink / raw)
  To: Parav Pandit; +Cc: virtualization

On Sat, Jul 17, 2021 at 10:42:57AM +0300, Parav Pandit wrote:
> VQs may be accessed to mark the device broken while they are
> created/destroyed. Hence protect the access to the vqs list.
> 
> Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/virtio/virtio.c      | 1 +
>  drivers/virtio/virtio_ring.c | 8 ++++++++
>  include/linux/virtio.h       | 1 +
>  3 files changed, 10 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 4b15c00c0a0a..a0d81e35ec4b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -355,6 +355,7 @@ int register_virtio_device(struct virtio_device *dev)
>  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
>  	INIT_LIST_HEAD(&dev->vqs);
> +	rwlock_init(&dev->vqs_list_lock);
>  
>  	/*
>  	 * device_add() causes the bus infrastructure to look for a matching

Let's just use a simple spinlock. I don't think we are worried about
scaling the breaking of devices to multiple CPUs.

> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d2e1a7a21171..66a91dec39d9 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1755,7 +1755,9 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  			cpu_to_le16(vq->packed.event_flags_shadow);
>  	}
>  
> +	write_lock(&vdev->vqs_list_lock);
>  	list_add_tail(&vq->vq.list, &vdev->vqs);
> +	write_unlock(&vdev->vqs_list_lock);
>  	return &vq->vq;
>  
>  err_desc_extra:
> @@ -2229,7 +2231,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	memset(vq->split.desc_state, 0, vring.num *
>  			sizeof(struct vring_desc_state_split));
>  
> +	write_lock(&vdev->vqs_list_lock);
>  	list_add_tail(&vq->vq.list, &vdev->vqs);
> +	write_unlock(&vdev->vqs_list_lock);
>  	return &vq->vq;
>  
>  err_extra:
> @@ -2291,7 +2295,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> +	write_lock(&vq->vq.vdev->vqs_list_lock);
>  	list_del(&_vq->list);
> +	write_unlock(&vq->vq.vdev->vqs_list_lock);
>  
>  	if (vq->we_own_ring) {
>  		if (vq->packed_ring) {
> @@ -2386,12 +2392,14 @@ void virtio_break_device(struct virtio_device *dev)
>  {
>  	struct virtqueue *_vq;
>  
> +	read_lock(&dev->vqs_list_lock);
>  	list_for_each_entry(_vq, &dev->vqs, list) {
>  		struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
>  		smp_store_release(&vq->broken, true);
>  	}
> +	read_unlock(&dev->vqs_list_lock);
>  }
>  EXPORT_SYMBOL_GPL(virtio_break_device);
>  
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b1894e0323fa..1cf77d480ef3 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -115,6 +115,7 @@ struct virtio_device {
>  	const struct virtio_config_ops *config;
>  	const struct vringh_config_ops *vringh_config;
>  	struct list_head vqs;
> +	rwlock_t vqs_list_lock;
>  	u64 features;
>  	void *priv;
>  };
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-17  7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
@ 2021-07-17 20:46   ` Michael S. Tsirkin
  2021-07-19  5:44     ` Parav Pandit via Virtualization
  2021-07-19  9:40     ` Cornelia Huck
  0 siblings, 2 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-07-17 20:46 UTC (permalink / raw)
  To: Parav Pandit; +Cc: virtualization

On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
> When a virtio pci device undergo surprise removal (aka async removaln in

typo

> PCIe spec), mark the device is broken so that any upper layer drivers can
> abort any outstanding operation.
> 
> When a virtio net pci device undergo surprise removal which is used by a
> NetworkManager, a below call trace was observed.
> 
> kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
> watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
> CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
> Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
> Workqueue: events linkwatch_event
> RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
> Call Trace:
>  virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
>  ? __hw_addr_create_ex+0x85/0xc0
>  __dev_mc_add+0x72/0x80
>  igmp6_group_added+0xa7/0xd0
>  ipv6_mc_up+0x3c/0x60
>  ipv6_find_idev+0x36/0x80
>  addrconf_add_dev+0x1e/0xa0
>  addrconf_dev_config+0x71/0x130
>  addrconf_notify+0x1f5/0xb40
>  ? rtnl_is_locked+0x11/0x20
>  ? __switch_to_asm+0x42/0x70
>  ? finish_task_switch+0xaf/0x2c0
>  ? raw_notifier_call_chain+0x3e/0x50
>  raw_notifier_call_chain+0x3e/0x50
>  netdev_state_change+0x67/0x90
>  linkwatch_do_dev+0x3c/0x50
>  __linkwatch_run_queue+0xd2/0x220
>  linkwatch_event+0x21/0x30
>  process_one_work+0x1c8/0x370
>  worker_thread+0x30/0x380
>  ? process_one_work+0x370/0x370
>  kthread+0x118/0x140
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x1f/0x30
> 
> Hence, add the ability to abort the command on surprise removal
> which prevents infinite loop and system lockup.
> 
> Signed-off-by: Parav Pandit <parav@nvidia.com>

OK that's nice, but I suspect this is not enough.
For example we need to also fix up all config space reads
to handle all-ones patterns specially.

E.g.

        /* After writing 0 to device_status, the driver MUST wait for a read of
         * device_status to return 0 before reinitializing the device.
         * This will flush out the status write, and flush in device writes,
         * including MSI-X interrupts, if any.
         */
        while (vp_modern_get_status(mdev))
                msleep(1);

lots of code in drivers needs to be fixed too.

I guess we need to annotate drivers somehow to mark up
whether they support surprise removal? And maybe find a
way to let host know?



> ---
>  drivers/virtio/virtio_pci_common.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 222d630c41fc..b35bb2d57f62 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>  	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>  	struct device *dev = get_device(&vp_dev->vdev.dev);
>  
> +	/*
> +	 * Device is marked broken on surprise removal so that virtio upper
> +	 * layers can abort any ongoing operation.
> +	 */
> +	if (!pci_device_is_present(pci_dev))
> +		virtio_break_device(&vp_dev->vdev);
> +
>  	pci_disable_sriov(pci_dev);
>  
>  	unregister_virtio_device(&vp_dev->vdev);
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
  2021-07-17 20:38   ` Michael S. Tsirkin
@ 2021-07-19  5:26     ` Parav Pandit via Virtualization
  2021-07-19 12:04       ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-19  5:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Sunday, July 18, 2021 2:09 AM
> 
> On Sat, Jul 17, 2021 at 10:42:55AM +0300, Parav Pandit wrote:
> > Currently vq->broken field is read by virtqueue_is_broken() in busy
> > loop in one context by virtnet_send_command().
> >
> > vq->broken is set to true in other process context by
> > virtio_break_device(). Reader and writer are accessing it without any
> > synchronization. This may lead to a compiler optimization which may
> > result to optimize reading vq->broken only once.
> >
> > Hence, force reading vq->broken on each invocation of
> > virtqueue_is_broken() and ensure that its update is visible.
> >
> > Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all
> > virtqueues broken.")
> 
> This is all theoretical right?
> virtqueue_get_buf is not inlined so compiler generally assumes any vq field
> can change.
Broken bit checking cannot rely on some other kernel API for correctness.
So it possibly not hitting this case now, but we shouldn't depend other APIs usage to ensure correctness.

> 
> I'm inclined to not include a Fixes
> tag then. And please do change subject to say "theoretical"
> to make that clear to people.
> 
I do not have any strong opinion on fixes tag. But virtio_broken_queue() API should be self-contained; for that I am not sure if this just theoretical.
Do you mean theoretical, because we haven't hit this bug?

> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c
> > b/drivers/virtio/virtio_ring.c index 89bfe46a8a7f..7f379fe7d78d 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
> > {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > -	return vq->broken;
> > +	return READ_ONCE(vq->broken);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
> >
> > @@ -2387,7 +2387,9 @@ void virtio_break_device(struct virtio_device
> > *dev)
> >
> >  	list_for_each_entry(_vq, &dev->vqs, list) {
> >  		struct vring_virtqueue *vq = to_vvq(_vq);
> > -		vq->broken = true;
> > +
> > +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> > +		smp_store_release(&vq->broken, true);
> 
> A bit puzzled here. Why do we need release semantics here?
> IUC store_release does not generally pair with READ_ONCE - does it?
> 
It does; paired at few places, such as,

(a) in uverbs_main.c, default_async_file
(b) in netlink.c, cb_table
(c) fs/dcache.c i_dir_seq,

However, in this scenario, WRITE_ONCE() is enough. So I will simplify and use that in v1.


> The commit log does not describe this either.
In commit log I mentioned that "ensure that update is visible".
I think a better commit log is, to say: "ensure that broken bit is written".
I will send v2 with 
(a) updated comment
(b) replacing smp.. with WRITE_ONCE()
(c) dropping the fixes tag.

> 
> >  	}
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_break_device);
> > --
> > 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 3/4] virtio: Protect vqs list access
  2021-07-17 20:41   ` Michael S. Tsirkin
@ 2021-07-19  5:37     ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-19  5:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Sunday, July 18, 2021 2:11 AM
> 
> On Sat, Jul 17, 2021 at 10:42:57AM +0300, Parav Pandit wrote:
> > VQs may be accessed to mark the device broken while they are
> > created/destroyed. Hence protect the access to the vqs list.
> >
> > Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all
> > virtqueues broken.")
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > ---
> >  drivers/virtio/virtio.c      | 1 +
> >  drivers/virtio/virtio_ring.c | 8 ++++++++
> >  include/linux/virtio.h       | 1 +
> >  3 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index
> > 4b15c00c0a0a..a0d81e35ec4b 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -355,6 +355,7 @@ int register_virtio_device(struct virtio_device *dev)
> >  	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >
> >  	INIT_LIST_HEAD(&dev->vqs);
> > +	rwlock_init(&dev->vqs_list_lock);
> >
> >  	/*
> >  	 * device_add() causes the bus infrastructure to look for a matching
> 
> Let's just use a simple spinlock. I don't think we are worried about scaling the
> breaking of devices to multiple CPUs.
>
It wasn't the scaling, just wanted to have the clarity on list access. But I realized that rwlock is bigger size too.
So yeah, will simplify to spinlock in v2.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-17 20:46   ` Michael S. Tsirkin
@ 2021-07-19  5:44     ` Parav Pandit via Virtualization
  2021-07-19 12:07       ` Michael S. Tsirkin
  2021-07-19  9:40     ` Cornelia Huck
  1 sibling, 1 reply; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-19  5:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Sunday, July 18, 2021 2:17 AM
> 
> On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
> > When a virtio pci device undergo surprise removal (aka async removaln
> > in
> 
> typo
Fixing it. Checkpatch.pl and codespell, both didn't catch it. 😊

> 
> OK that's nice, but I suspect this is not enough.
> For example we need to also fix up all config space reads to handle all-ones
> patterns specially.
> 
> E.g.
> 
>         /* After writing 0 to device_status, the driver MUST wait for a read of
>          * device_status to return 0 before reinitializing the device.
>          * This will flush out the status write, and flush in device writes,
>          * including MSI-X interrupts, if any.
>          */
>         while (vp_modern_get_status(mdev))
>                 msleep(1);
> 
> lots of code in drivers needs to be fixed too.
Above one particularly known to us in the hot plug area.
Above fix is needed to close the race of hot plug and unplug happening in parallel, which is occurring today, but less common.
It is in my todo list to fix it.
Will take care of it in near future in other series.

> 
> I guess we need to annotate drivers somehow to mark up whether they
> support surprise removal? And maybe find a way to let host know?
What is the benefit of it? Who can make use of that information?

In virtio pci case, it is similar improvement to what nvme pci driver went through few years back to support hot plug/unplug.
Lets complete this of fixes to make it little more robust like nvme.

> 
> 
> 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index 222d630c41fc..b35bb2d57f62 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
> >  	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> >  	struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > +	/*
> > +	 * Device is marked broken on surprise removal so that virtio upper
> > +	 * layers can abort any ongoing operation.
> > +	 */
> > +	if (!pci_device_is_present(pci_dev))
> > +		virtio_break_device(&vp_dev->vdev);
> > +
> >  	pci_disable_sriov(pci_dev);
> >
> >  	unregister_virtio_device(&vp_dev->vdev);
> > --
> > 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-17 20:46   ` Michael S. Tsirkin
  2021-07-19  5:44     ` Parav Pandit via Virtualization
@ 2021-07-19  9:40     ` Cornelia Huck
  1 sibling, 0 replies; 16+ messages in thread
From: Cornelia Huck @ 2021-07-19  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit; +Cc: virtualization

On Sat, Jul 17 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
>> When a virtio pci device undergo surprise removal (aka async removaln in
>
> typo
>
>> PCIe spec), mark the device is broken so that any upper layer drivers can
>> abort any outstanding operation.
>> 
>> When a virtio net pci device undergo surprise removal which is used by a
>> NetworkManager, a below call trace was observed.
>> 
>> kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
>> watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
>> CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
>> Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
>> Workqueue: events linkwatch_event
>> RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
>> Call Trace:
>>  virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
>>  ? __hw_addr_create_ex+0x85/0xc0
>>  __dev_mc_add+0x72/0x80
>>  igmp6_group_added+0xa7/0xd0
>>  ipv6_mc_up+0x3c/0x60
>>  ipv6_find_idev+0x36/0x80
>>  addrconf_add_dev+0x1e/0xa0
>>  addrconf_dev_config+0x71/0x130
>>  addrconf_notify+0x1f5/0xb40
>>  ? rtnl_is_locked+0x11/0x20
>>  ? __switch_to_asm+0x42/0x70
>>  ? finish_task_switch+0xaf/0x2c0
>>  ? raw_notifier_call_chain+0x3e/0x50
>>  raw_notifier_call_chain+0x3e/0x50
>>  netdev_state_change+0x67/0x90
>>  linkwatch_do_dev+0x3c/0x50
>>  __linkwatch_run_queue+0xd2/0x220
>>  linkwatch_event+0x21/0x30
>>  process_one_work+0x1c8/0x370
>>  worker_thread+0x30/0x380
>>  ? process_one_work+0x370/0x370
>>  kthread+0x118/0x140
>>  ? set_kthread_struct+0x40/0x40
>>  ret_from_fork+0x1f/0x30
>> 
>> Hence, add the ability to abort the command on surprise removal
>> which prevents infinite loop and system lockup.
>> 
>> Signed-off-by: Parav Pandit <parav@nvidia.com>
>
> OK that's nice, but I suspect this is not enough.
> For example we need to also fix up all config space reads
> to handle all-ones patterns specially.
>
> E.g.
>
>         /* After writing 0 to device_status, the driver MUST wait for a read of
>          * device_status to return 0 before reinitializing the device.
>          * This will flush out the status write, and flush in device writes,
>          * including MSI-X interrupts, if any.
>          */
>         while (vp_modern_get_status(mdev))
>                 msleep(1);
>
> lots of code in drivers needs to be fixed too.
>
> I guess we need to annotate drivers somehow to mark up
> whether they support surprise removal? And maybe find a
> way to let host know?

I'm wondering whether virtio-pci surprise removal would need more
support in drivers than virtio-ccw surprise removal; given that
virtio-ccw *only* supports surprise removal and I don't remember any
problem reports, the situation is probably not that bad.

Is surprise removal of block devices still a big problem? We have some
support for (non-virtio) ccw devices (e.g. dasd) via a 'disconnected'
state that was designed to mitigate problems with block devices that are
suddenly gone.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
  2021-07-19  5:26     ` Parav Pandit via Virtualization
@ 2021-07-19 12:04       ` Michael S. Tsirkin
  2021-07-19 14:20         ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-07-19 12:04 UTC (permalink / raw)
  To: Parav Pandit; +Cc: virtualization

On Mon, Jul 19, 2021 at 05:26:22AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Sunday, July 18, 2021 2:09 AM
> > 
> > On Sat, Jul 17, 2021 at 10:42:55AM +0300, Parav Pandit wrote:
> > > Currently vq->broken field is read by virtqueue_is_broken() in busy
> > > loop in one context by virtnet_send_command().
> > >
> > > vq->broken is set to true in other process context by
> > > virtio_break_device(). Reader and writer are accessing it without any
> > > synchronization. This may lead to a compiler optimization which may
> > > result to optimize reading vq->broken only once.
> > >
> > > Hence, force reading vq->broken on each invocation of
> > > virtqueue_is_broken() and ensure that its update is visible.
> > >
> > > Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all
> > > virtqueues broken.")
> > 
> > This is all theoretical right?
> > virtqueue_get_buf is not inlined so compiler generally assumes any vq field
> > can change.
> Broken bit checking cannot rely on some other kernel API for correctness.
> So it possibly not hitting this case now, but we shouldn't depend other APIs usage to ensure correctness.
> 
> > 
> > I'm inclined to not include a Fixes
> > tag then. And please do change subject to say "theoretical"
> > to make that clear to people.
> > 
> I do not have any strong opinion on fixes tag. But virtio_broken_queue() API should be self-contained; for that I am not sure if this just theoretical.
> Do you mean theoretical, because we haven't hit this bug?

Because with existing code I don't believe existing compilers are clever enough to
optimize this away.

> > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c
> > > b/drivers/virtio/virtio_ring.c index 89bfe46a8a7f..7f379fe7d78d 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
> > > {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > -	return vq->broken;
> > > +	return READ_ONCE(vq->broken);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
> > >
> > > @@ -2387,7 +2387,9 @@ void virtio_break_device(struct virtio_device
> > > *dev)
> > >
> > >  	list_for_each_entry(_vq, &dev->vqs, list) {
> > >  		struct vring_virtqueue *vq = to_vvq(_vq);
> > > -		vq->broken = true;
> > > +
> > > +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> > > +		smp_store_release(&vq->broken, true);
> > 
> > A bit puzzled here. Why do we need release semantics here?
> > IUC store_release does not generally pair with READ_ONCE - does it?
> > 
> It does; paired at few places, such as,
> 
> (a) in uverbs_main.c, default_async_file
> (b) in netlink.c, cb_table
> (c) fs/dcache.c i_dir_seq,
> 
> However, in this scenario, WRITE_ONCE() is enough. So I will simplify and use that in v1.
> 
> 
> > The commit log does not describe this either.
> In commit log I mentioned that "ensure that update is visible".
> I think a better commit log is, to say: "ensure that broken bit is written".

"is read repeatedly" maybe.

> I will send v2 with 
> (a) updated comment
> (b) replacing smp.. with WRITE_ONCE()
> (c) dropping the fixes tag.
> 
> > 
> > >  	}
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtio_break_device);
> > > --
> > > 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-19  5:44     ` Parav Pandit via Virtualization
@ 2021-07-19 12:07       ` Michael S. Tsirkin
  2021-07-19 14:15         ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-07-19 12:07 UTC (permalink / raw)
  To: Parav Pandit; +Cc: virtualization

On Mon, Jul 19, 2021 at 05:44:49AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Sunday, July 18, 2021 2:17 AM
> > 
> > On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
> > > When a virtio pci device undergo surprise removal (aka async removaln
> > > in
> > 
> > typo
> Fixing it. Checkpatch.pl and codespell, both didn't catch it. 😊
> 
> > 
> > OK that's nice, but I suspect this is not enough.
> > For example we need to also fix up all config space reads to handle all-ones
> > patterns specially.
> > 
> > E.g.
> > 
> >         /* After writing 0 to device_status, the driver MUST wait for a read of
> >          * device_status to return 0 before reinitializing the device.
> >          * This will flush out the status write, and flush in device writes,
> >          * including MSI-X interrupts, if any.
> >          */
> >         while (vp_modern_get_status(mdev))
> >                 msleep(1);
> > 
> > lots of code in drivers needs to be fixed too.
> Above one particularly known to us in the hot plug area.
> Above fix is needed to close the race of hot plug and unplug happening in parallel, which is occurring today, but less common.
> It is in my todo list to fix it.
> Will take care of it in near future in other series.
> 
> > 
> > I guess we need to annotate drivers somehow to mark up whether they
> > support surprise removal? And maybe find a way to let host know?
> What is the benefit of it? Who can make use of that information?

For example, host could avoid removing devices by hot removal if guest
is not ready for it. Or libosinfo could use that to tell users what to
do.

> In virtio pci case, it is similar improvement to what nvme pci driver went through few years back to support hot plug/unplug.
> Lets complete this of fixes to make it little more robust like nvme.

At least please mention in commit log that it's incomplete.

> > 
> > 
> > 
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c
> > > b/drivers/virtio/virtio_pci_common.c
> > > index 222d630c41fc..b35bb2d57f62 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev
> > *pci_dev)
> > >  	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > >  	struct device *dev = get_device(&vp_dev->vdev.dev);
> > >
> > > +	/*
> > > +	 * Device is marked broken on surprise removal so that virtio upper
> > > +	 * layers can abort any ongoing operation.
> > > +	 */
> > > +	if (!pci_device_is_present(pci_dev))
> > > +		virtio_break_device(&vp_dev->vdev);
> > > +
> > >  	pci_disable_sriov(pci_dev);
> > >
> > >  	unregister_virtio_device(&vp_dev->vdev);
> > > --
> > > 2.27.0
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
  2021-07-19 12:07       ` Michael S. Tsirkin
@ 2021-07-19 14:15         ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-19 14:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Monday, July 19, 2021 5:37 PM
> 
> On Mon, Jul 19, 2021 at 05:44:49AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Sunday, July 18, 2021 2:17 AM
> > >
> > > On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
> > > > When a virtio pci device undergo surprise removal (aka async
> > > > removaln in
> > >
> > > typo
> > Fixing it. Checkpatch.pl and codespell, both didn't catch it. 😊
> >
> > >
> > > OK that's nice, but I suspect this is not enough.
> > > For example we need to also fix up all config space reads to handle
> > > all-ones patterns specially.
> > >
> > > E.g.
> > >
> > >         /* After writing 0 to device_status, the driver MUST wait for a read of
> > >          * device_status to return 0 before reinitializing the device.
> > >          * This will flush out the status write, and flush in device writes,
> > >          * including MSI-X interrupts, if any.
> > >          */
> > >         while (vp_modern_get_status(mdev))
> > >                 msleep(1);
> > >
> > > lots of code in drivers needs to be fixed too.
> > Above one particularly known to us in the hot plug area.
> > Above fix is needed to close the race of hot plug and unplug happening in
> parallel, which is occurring today, but less common.
> > It is in my todo list to fix it.
> > Will take care of it in near future in other series.
> >
> > >
> > > I guess we need to annotate drivers somehow to mark up whether they
> > > support surprise removal? And maybe find a way to let host know?
> > What is the benefit of it? Who can make use of that information?
> 
> For example, host could avoid removing devices by hot removal if guest is
> not ready for it. Or libosinfo could use that to tell users what to do.
Not sure how to achieve it. Because this is decided by the pci hot plug driver at bit early stage.
And pci hot plug slot doesn't know what is going to arrive in it.

> 
> > In virtio pci case, it is similar improvement to what nvme pci driver went
> through few years back to support hot plug/unplug.
> > Lets complete this of fixes to make it little more robust like nvme.
> 
> At least please mention in commit log that it's incomplete.
This fix is self-contained. I don’t see mentioning about other bugs in this commit log.
But sure yes, I will mentioned in the cover letter that more improvements will be done subsequently.
Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
  2021-07-19 12:04       ` Michael S. Tsirkin
@ 2021-07-19 14:20         ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit via Virtualization @ 2021-07-19 14:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Monday, July 19, 2021 5:35 PM
> 
> On Mon, Jul 19, 2021 at 05:26:22AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Sunday, July 18, 2021 2:09 AM
> > >
> > > On Sat, Jul 17, 2021 at 10:42:55AM +0300, Parav Pandit wrote:
> > > > Currently vq->broken field is read by virtqueue_is_broken() in
> > > > busy loop in one context by virtnet_send_command().
> > > >
> > > > vq->broken is set to true in other process context by
> > > > virtio_break_device(). Reader and writer are accessing it without
> > > > any synchronization. This may lead to a compiler optimization
> > > > which may result to optimize reading vq->broken only once.
> > > >
> > > > Hence, force reading vq->broken on each invocation of
> > > > virtqueue_is_broken() and ensure that its update is visible.
> > > >
> > > > Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all
> > > > virtqueues broken.")
> > >
> > > This is all theoretical right?
> > > virtqueue_get_buf is not inlined so compiler generally assumes any
> > > vq field can change.
> > Broken bit checking cannot rely on some other kernel API for correctness.
> > So it possibly not hitting this case now, but we shouldn't depend other APIs
> usage to ensure correctness.
> >
> > >
> > > I'm inclined to not include a Fixes
> > > tag then. And please do change subject to say "theoretical"
> > > to make that clear to people.
> > >
> > I do not have any strong opinion on fixes tag. But virtio_broken_queue()
> API should be self-contained; for that I am not sure if this just theoretical.
> > Do you mean theoretical, because we haven't hit this bug?
> 
> Because with existing code I don't believe existing compilers are clever
> enough to optimize this away.
Ok. got it. I will mention in the commit log.

> 
> > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c
> > > > b/drivers/virtio/virtio_ring.c index 89bfe46a8a7f..7f379fe7d78d
> > > > 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue
> > > > *_vq) {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > -	return vq->broken;
> > > > +	return READ_ONCE(vq->broken);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
> > > >
> > > > @@ -2387,7 +2387,9 @@ void virtio_break_device(struct
> > > > virtio_device
> > > > *dev)
> > > >
> > > >  	list_for_each_entry(_vq, &dev->vqs, list) {
> > > >  		struct vring_virtqueue *vq = to_vvq(_vq);
> > > > -		vq->broken = true;
> > > > +
> > > > +		/* Pairs with READ_ONCE() in virtqueue_is_broken(). */
> > > > +		smp_store_release(&vq->broken, true);
> > >
> > > A bit puzzled here. Why do we need release semantics here?
> > > IUC store_release does not generally pair with READ_ONCE - does it?
> > >
> > It does; paired at few places, such as,
> >
> > (a) in uverbs_main.c, default_async_file
> > (b) in netlink.c, cb_table
> > (c) fs/dcache.c i_dir_seq,
> >
> > However, in this scenario, WRITE_ONCE() is enough. So I will simplify and
> use that in v1.
> >
> >
> > > The commit log does not describe this either.
> > In commit log I mentioned that "ensure that update is visible".
> > I think a better commit log is, to say: "ensure that broken bit is written".
> 
> "is read repeatedly" maybe.

I updated it to below in v2.

" Hence, force reading vq->broken on each invocation of
virtqueue_is_broken() and also force writing it so that such update is visible to the readers."


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-19 14:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
2021-07-17 20:38   ` Michael S. Tsirkin
2021-07-19  5:26     ` Parav Pandit via Virtualization
2021-07-19 12:04       ` Michael S. Tsirkin
2021-07-19 14:20         ` Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
2021-07-17 20:41   ` Michael S. Tsirkin
2021-07-19  5:37     ` Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
2021-07-17 20:46   ` Michael S. Tsirkin
2021-07-19  5:44     ` Parav Pandit via Virtualization
2021-07-19 12:07       ` Michael S. Tsirkin
2021-07-19 14:15         ` Parav Pandit via Virtualization
2021-07-19  9:40     ` Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.