All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: bhelgaas@google.com, davem@davemloft.net, decui@microsoft.com,
	edumazet@google.com, haiyangz@microsoft.com, jakeo@microsoft.com,
	kuba@kernel.org, kw@linux.com, kys@microsoft.com,
	leon@kernel.org, linux-pci@vger.kernel.org,
	lpieralisi@kernel.org, mikelley@microsoft.com, pabeni@redhat.com,
	robh@kernel.org, saeedm@nvidia.com, wei.liu@kernel.org,
	longli@microsoft.com, boqun.feng@gmail.com,
	ssengar@microsoft.com, helgaas@kernel.org
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	josete@microsoft.com, simon.horman@corigine.com,
	stable@vger.kernel.org
Subject: [PATCH v4 1/5] PCI: hv: Fix a race condition bug in hv_pci_query_relations()
Date: Wed, 14 Jun 2023 21:44:47 -0700	[thread overview]
Message-ID: <20230615044451.5580-2-decui@microsoft.com> (raw)
In-Reply-To: <20230615044451.5580-1-decui@microsoft.com>

Since day 1 of the driver, there has been a race between
hv_pci_query_relations() and survey_child_resources(): during fast
device hotplug, hv_pci_query_relations() may error out due to
device-remove and the stack variable 'comp' is no longer valid;
however, pci_devices_present_work() -> survey_child_resources() ->
complete() may be running on another CPU and accessing the no-longer-valid
'comp'. Fix the race by flushing the workqueue before we exit from
hv_pci_query_relations().

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
---

v2:
  Removed the "debug code".
  No change to the patch body.
  Added Cc:stable

v3:
  Added Michael's Reviewed-by.

v4:
  Provided more details of the issue in the comment and commit log.
  Added Lorenzo's Acked-by.

 drivers/pci/controller/pci-hyperv.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index bc32662c6bb7f..ea8862e656b68 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3401,6 +3401,24 @@ static int hv_pci_query_relations(struct hv_device *hdev)
 	if (!ret)
 		ret = wait_for_response(hdev, &comp);
 
+	/*
+	 * In the case of fast device addition/removal, it's possible that
+	 * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+	 * already got a PCI_BUS_RELATIONS* message from the host and the
+	 * channel callback already scheduled a work to hbus->wq, which can be
+	 * running pci_devices_present_work() -> survey_child_resources() ->
+	 * complete(&hbus->survey_event), even after hv_pci_query_relations()
+	 * exits and the stack variable 'comp' is no longer valid; as a result,
+	 * a hang or a page fault may happen when the complete() calls
+	 * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from
+	 * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is
+	 * -ENODEV, there can't be any more work item scheduled to hbus->wq
+	 * after the flush_workqueue(): see vmbus_onoffer_rescind() ->
+	 * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() ->
+	 * channel->rescind = true.
+	 */
+	flush_workqueue(hbus->wq);
+
 	return ret;
 }
 
-- 
2.25.1


  reply	other threads:[~2023-06-15  4:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15  4:44 [PATCH v4 0/5] pci-hyperv: Fix race condition bugs for fast device hotplug Dexuan Cui
2023-06-15  4:44 ` Dexuan Cui [this message]
2023-06-15  4:44 ` [PATCH v4 2/5] PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic Dexuan Cui
2023-06-15  4:44 ` [PATCH v4 3/5] PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev Dexuan Cui
2023-06-15  4:44 ` [PATCH v4 4/5] Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" Dexuan Cui
2023-06-15  4:44 ` [PATCH v4 5/5] PCI: hv: Add a per-bus mutex state_lock Dexuan Cui
2023-06-18  3:06 ` [PATCH v4 0/5] pci-hyperv: Fix race condition bugs for fast device hotplug Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615044451.5580-2-decui@microsoft.com \
    --to=decui@microsoft.com \
    --cc=bhelgaas@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=helgaas@kernel.org \
    --cc=jakeo@microsoft.com \
    --cc=josete@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kw@linux.com \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=lpieralisi@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=robh@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=simon.horman@corigine.com \
    --cc=ssengar@microsoft.com \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.