All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Adrian Suhov <v-adsuho@microsoft.com>,
	Chris Valean <v-chvale@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Michael Kelley <mikelley@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jack Morgenstein <jackm@mellanox.com>
Subject: [PATCH 4.15 19/53] PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()
Date: Tue, 17 Apr 2018 17:58:44 +0200	[thread overview]
Message-ID: <20180417155724.035131836@linuxfoundation.org> (raw)
In-Reply-To: <20180417155723.091120060@linuxfoundation.org>

4.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dexuan Cui <decui@microsoft.com>

commit de0aa7b2f97d348ba7d1e17a00744c989baa0cb6 upstream.

1. With the patch "x86/vector/msi: Switch to global reservation mode",
the recent v4.15 and newer kernels always hang for 1-vCPU Hyper-V VM
with SR-IOV. This is because when we reach hv_compose_msi_msg() by
request_irq() -> request_threaded_irq() ->__setup_irq()->irq_startup()
-> __irq_startup() -> irq_domain_activate_irq() -> ... ->
msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
disabled in __setup_irq().

Note: when we reach hv_compose_msi_msg() by another code path:
pci_enable_msix_range() -> ... -> irq_domain_activate_irq() -> ... ->
hv_compose_msi_msg(), local irq is not disabled.

hv_compose_msi_msg() depends on an interrupt from the host.
With interrupts disabled, a UP VM always hangs in the busy loop in
the function, because the interrupt callback hv_pci_onchannelcallback()
can not be called.

We can do nothing but work it around by polling the channel. This
is ugly, but we don't have any other choice.

2. If the host is ejecting the VF device before we reach
hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
forever, because at this time the host doesn't respond to the
CREATE_INTERRUPT request. This issue exists the first day the
pci-hyperv driver appears in the kernel.

Luckily, this can also by worked around by polling the channel
for the PCI_EJECT message and hpdev->state, and by checking the
PCI vendor ID.

Note: actually the above 2 issues also happen to a SMP VM, if
"hbus->hdev->channel->target_cpu == smp_processor_id()" is true.

Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode")
Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Tested-by: Chris Valean <v-chvale@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <stable@vger.kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/pci/host/pci-hyperv.c |   58 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -531,6 +531,8 @@ struct hv_pci_compl {
 	s32 completion_status;
 };
 
+static void hv_pci_onchannelcallback(void *context);
+
 /**
  * hv_pci_generic_compl() - Invoked for a completion packet
  * @context:		Set up by the sender of the packet.
@@ -675,6 +677,31 @@ static void _hv_pcifront_read_config(str
 	}
 }
 
+static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
+{
+	u16 ret;
+	unsigned long flags;
+	void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET +
+			     PCI_VENDOR_ID;
+
+	spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
+
+	/* Choose the function to be read. (See comment above) */
+	writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
+	/* Make sure the function was chosen before we start reading. */
+	mb();
+	/* Read from that function's config space. */
+	ret = readw(addr);
+	/*
+	 * mb() is not required here, because the spin_unlock_irqrestore()
+	 * is a barrier.
+	 */
+
+	spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+
+	return ret;
+}
+
 /**
  * _hv_pcifront_write_config() - Internal PCI config write
  * @hpdev:	The PCI driver's representation of the device
@@ -1117,8 +1144,37 @@ static void hv_compose_msi_msg(struct ir
 	 * Since this function is called with IRQ locks held, can't
 	 * do normal wait for completion; instead poll.
 	 */
-	while (!try_wait_for_completion(&comp.comp_pkt.host_event))
+	while (!try_wait_for_completion(&comp.comp_pkt.host_event)) {
+		/* 0xFFFF means an invalid PCI VENDOR ID. */
+		if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) {
+			dev_err_once(&hbus->hdev->device,
+				     "the device has gone\n");
+			goto free_int_desc;
+		}
+
+		/*
+		 * When the higher level interrupt code calls us with
+		 * interrupt disabled, we must poll the channel by calling
+		 * the channel callback directly when channel->target_cpu is
+		 * the current CPU. When the higher level interrupt code
+		 * calls us with interrupt enabled, let's add the
+		 * local_bh_disable()/enable() to avoid race.
+		 */
+		local_bh_disable();
+
+		if (hbus->hdev->channel->target_cpu == smp_processor_id())
+			hv_pci_onchannelcallback(hbus);
+
+		local_bh_enable();
+
+		if (hpdev->state == hv_pcichild_ejecting) {
+			dev_err_once(&hbus->hdev->device,
+				     "the device is being ejected\n");
+			goto free_int_desc;
+		}
+
 		udelay(100);
+	}
 
 	if (comp.comp_pkt.completion_status < 0) {
 		dev_err(&hbus->hdev->device,

  parent reply	other threads:[~2018-04-17 15:58 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 15:58 [PATCH 4.15 00/53] 4.15.18-stable review Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 01/53] drm/i915/edp: Do not do link training fallback or prune modes on EDP Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 02/53] netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ip_set_net_exit() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 03/53] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 04/53] rds: MP-RDS may use an invalid c_path Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 05/53] slip: Check if rstate is initialized before uncompressing Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 06/53] vhost: fix vhost_vq_access_ok() log check Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 07/53] l2tp: fix races in tunnel creation Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 08/53] l2tp: fix race in duplicate tunnel detection Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 09/53] ip_gre: clear feature flags when incompatible o_flags are set Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 10/53] vhost: Fix vhost_copy_to_user() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 11/53] lan78xx: Correctly indicate invalid OTP Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 12/53] media: v4l2-compat-ioctl32: dont oops on overlay Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 13/53] media: v4l: vsp1: Fix header display list status check in continuous mode Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 14/53] ipmi: Fix some error cleanup issues Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 15/53] parisc: Fix out of array access in match_pci_device() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 16/53] parisc: Fix HPMC handler by increasing size to multiple of 16 bytes Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 17/53] Drivers: hv: vmbus: do not mark HV_PCIE as perf_device Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 18/53] PCI: hv: Serialize the present and eject work items Greg Kroah-Hartman
2018-04-17 15:58 ` Greg Kroah-Hartman [this message]
2018-04-17 15:58 ` [PATCH 4.15 20/53] KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 21/53] perf/core: Fix use-after-free in uprobe_perf_close() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 22/53] x86/mce/AMD: Get address from already initialized block Greg Kroah-Hartman
2018-04-17 15:58   ` [4.15,22/53] " Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 23/53] hwmon: (ina2xx) Fix access to uninitialized mutex Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 24/53] ath9k: Protect queue draining by rcu_read_lock() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 25/53] x86/apic: Fix signedness bug in APIC ID validity checks Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 26/53] sunrpc: remove incorrect HMAC request initialization Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 27/53] f2fs: fix heap mode to reset it back Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 28/53] block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 29/53] nvme: Skip checking heads without namespaces Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 30/53] lib: fix stall in __bitmap_parselist() Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 31/53] blk-mq: order getting budget and driver tag Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 32/53] blk-mq: dont keep offline CPUs mapped to hctx 0 Greg Kroah-Hartman
2018-04-17 15:58   ` Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 33/53] ovl: fix lookup with middle layer opaque dir and absolute path redirects Greg Kroah-Hartman
2018-04-17 15:58 ` [PATCH 4.15 34/53] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 35/53] hugetlbfs: fix bug in pgoff overflow checking Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 36/53] nfsd: fix incorrect umasks Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 37/53] scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 38/53] apparmor: fix logging of the existence test for signals Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 39/53] apparmor: fix display of .ns_name for containers Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 40/53] apparmor: fix resource audit messages when auditing peer Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 41/53] block/loop: fix deadlock after loop_set_status Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 42/53] nfit: fix region registration vs block-data-window ranges Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 43/53] s390/qdio: dont retry EQBS after CCQ 96 Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 44/53] s390/qdio: dont merge ERROR output buffers Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 45/53] s390/ipl: ensure loadparm valid flag is set Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 46/53] s390/compat: fix setup_frame32 Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 47/53] get_user_pages_fast(): return -EFAULT on access_ok failure Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 48/53] mm/gup_benchmark: handle gup failures Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 49/53] getname_kernel() needs to make sure that ->name != ->iname in long case Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 50/53] Bluetooth: Fix connection if directed advertising and privacy is used Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 51/53] Bluetooth: hci_bcm: Treat Interrupt ACPI resources as always being active-low Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 52/53] rtl8187: Fix NULL pointer dereference in priv->conf_mutex Greg Kroah-Hartman
2018-04-17 15:59 ` [PATCH 4.15 53/53] ovl: set lower layer st_dev only if setting lower st_ino Greg Kroah-Hartman
2018-04-17 21:03 ` [PATCH 4.15 00/53] 4.15.18-stable review kernelci.org bot
2018-04-17 21:04 ` Shuah Khan
2018-04-18  5:22 ` Naresh Kamboju
2018-04-18 15:39 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180417155724.035131836@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=jackm@mellanox.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mikelley@microsoft.com \
    --cc=stable@vger.kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=v-adsuho@microsoft.com \
    --cc=v-chvale@microsoft.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.