All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	virtualization@lists.linux-foundation.org,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: [PATCH AUTOSEL 5.10 05/18] virtio_pci: Support surprise removal of virtio pci device
Date: Mon, 23 Aug 2021 20:54:19 -0400	[thread overview]
Message-ID: <20210824005432.631154-5-sashal@kernel.org> (raw)
In-Reply-To: <20210824005432.631154-1-sashal@kernel.org>

From: Parav Pandit <parav@nvidia.com>

[ Upstream commit 43bb40c5b92659966bdf4bfe584fde0a3575a049 ]

When a virtio pci device undergo surprise removal (aka async removal in
PCIe spec), mark the device as broken so that any upper layer drivers can
abort any outstanding operation.

When a virtio net pci device undergo surprise removal which is used by a
NetworkManager, a below call trace was observed.

kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
Workqueue: events linkwatch_event
RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
Call Trace:
 virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
 ? __hw_addr_create_ex+0x85/0xc0
 __dev_mc_add+0x72/0x80
 igmp6_group_added+0xa7/0xd0
 ipv6_mc_up+0x3c/0x60
 ipv6_find_idev+0x36/0x80
 addrconf_add_dev+0x1e/0xa0
 addrconf_dev_config+0x71/0x130
 addrconf_notify+0x1f5/0xb40
 ? rtnl_is_locked+0x11/0x20
 ? __switch_to_asm+0x42/0x70
 ? finish_task_switch+0xaf/0x2c0
 ? raw_notifier_call_chain+0x3e/0x50
 raw_notifier_call_chain+0x3e/0x50
 netdev_state_change+0x67/0x90
 linkwatch_do_dev+0x3c/0x50
 __linkwatch_run_queue+0xd2/0x220
 linkwatch_event+0x21/0x30
 process_one_work+0x1c8/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x118/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

Hence, add the ability to abort the command on surprise removal
which prevents infinite loop and system lockup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/virtio/virtio_pci_common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..b35bb2d57f62 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&vp_dev->vdev.dev);
 
+	/*
+	 * Device is marked broken on surprise removal so that virtio upper
+	 * layers can abort any ongoing operation.
+	 */
+	if (!pci_device_is_present(pci_dev))
+		virtio_break_device(&vp_dev->vdev);
+
 	pci_disable_sriov(pci_dev);
 
 	unregister_virtio_device(&vp_dev->vdev);
-- 
2.30.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Parav Pandit <parav@nvidia.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Sasha Levin <sashal@kernel.org>,
	virtualization@lists.linux-foundation.org
Subject: [PATCH AUTOSEL 5.10 05/18] virtio_pci: Support surprise removal of virtio pci device
Date: Mon, 23 Aug 2021 20:54:19 -0400	[thread overview]
Message-ID: <20210824005432.631154-5-sashal@kernel.org> (raw)
In-Reply-To: <20210824005432.631154-1-sashal@kernel.org>

From: Parav Pandit <parav@nvidia.com>

[ Upstream commit 43bb40c5b92659966bdf4bfe584fde0a3575a049 ]

When a virtio pci device undergo surprise removal (aka async removal in
PCIe spec), mark the device as broken so that any upper layer drivers can
abort any outstanding operation.

When a virtio net pci device undergo surprise removal which is used by a
NetworkManager, a below call trace was observed.

kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
Workqueue: events linkwatch_event
RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
Call Trace:
 virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
 ? __hw_addr_create_ex+0x85/0xc0
 __dev_mc_add+0x72/0x80
 igmp6_group_added+0xa7/0xd0
 ipv6_mc_up+0x3c/0x60
 ipv6_find_idev+0x36/0x80
 addrconf_add_dev+0x1e/0xa0
 addrconf_dev_config+0x71/0x130
 addrconf_notify+0x1f5/0xb40
 ? rtnl_is_locked+0x11/0x20
 ? __switch_to_asm+0x42/0x70
 ? finish_task_switch+0xaf/0x2c0
 ? raw_notifier_call_chain+0x3e/0x50
 raw_notifier_call_chain+0x3e/0x50
 netdev_state_change+0x67/0x90
 linkwatch_do_dev+0x3c/0x50
 __linkwatch_run_queue+0xd2/0x220
 linkwatch_event+0x21/0x30
 process_one_work+0x1c8/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x118/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

Hence, add the ability to abort the command on surprise removal
which prevents infinite loop and system lockup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/virtio/virtio_pci_common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..b35bb2d57f62 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&vp_dev->vdev.dev);
 
+	/*
+	 * Device is marked broken on surprise removal so that virtio upper
+	 * layers can abort any ongoing operation.
+	 */
+	if (!pci_device_is_present(pci_dev))
+		virtio_break_device(&vp_dev->vdev);
+
 	pci_disable_sriov(pci_dev);
 
 	unregister_virtio_device(&vp_dev->vdev);
-- 
2.30.2


  parent reply	other threads:[~2021-08-24  0:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24  0:54 [PATCH AUTOSEL 5.10 01/18] iwlwifi: pnvm: accept multiple HW-type TLVs Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 02/18] opp: remove WARN when no valid OPPs remain Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 03/18] cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 04/18] virtio: Improve vq->broken access to avoid any compiler optimization Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24  0:54 ` Sasha Levin [this message]
2021-08-24  0:54   ` [PATCH AUTOSEL 5.10 05/18] virtio_pci: Support surprise removal of virtio pci device Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 06/18] virtio_vdpa: reject invalid vq indices Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 07/18] vringh: Use wiov->used to check for read/write desc order Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 08/18] tools/virtio: fix build Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 09/18] qed: qed ll2 race condition fixes Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 10/18] qed: Fix null-pointer dereference in qed_rdma_create_qp() Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 11/18] Revert "drm/amd/pm: fix workload mismatch on vega10" Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 12/18] drm/amd/pm: change the workload type for some cards Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 13/18] blk-mq: don't grab rq's refcount in blk_mq_check_expired() Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 14/18] drm: Copy drm_wait_vblank to user before returning Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 15/18] drm/nouveau/disp: power down unused DP links during init Sasha Levin
2021-08-24  0:54   ` [Nouveau] " Sasha Levin
2021-08-24  0:54 ` [Nouveau] [PATCH AUTOSEL 5.10 16/18] drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24 17:07   ` [Nouveau] " Lyude Paul
2021-08-24 17:07     ` Lyude Paul
2021-08-24 17:07     ` Lyude Paul
2021-08-24  0:54 ` [Nouveau] [PATCH AUTOSEL 5.10 17/18] drm/nouveau: block a bunch of classes from userspace Sasha Levin
2021-08-24  0:54   ` Sasha Levin
2021-08-24  0:54 ` [PATCH AUTOSEL 5.10 18/18] net/rds: dma_map_sg is entitled to merge entries Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210824005432.631154-5-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.