linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Alex Estrin <alex.estrin@intel.com>,
	Kaike Wan <kaike.wan@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: [PATCH 4.19 108/110] IB/hfi1: Failed to drain send queue when QP is put into error state
Date: Thu, 18 Apr 2019 19:57:37 +0200	[thread overview]
Message-ID: <20190418160447.737038054@linuxfoundation.org> (raw)
In-Reply-To: <20190418160437.484158340@linuxfoundation.org>

From: Kaike Wan <kaike.wan@intel.com>

commit 662d66466637862ef955f7f6e78a286d8cf0ebef upstream.

When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:

(1) The QP builds a packet and tries to send through SDMA engine.
    However, PIO engine is still busy. Consequently, this packet is put on
    the QP's tx list and the QP is put on the PIO waiting list. The field
    qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;

(2) The QP is put into error state by the user application and
    notify_error_qp() is called, which removes the QP from the PIO waiting
    list and the packet from the QP's tx list. In addition, qp->s_flags is
    cleared of RVT_S_ANY_WAIT_IO bits, which does not include
    HFI1_S_WAIT_PIO_DRAIN bit;

(3) The hfi1_schdule_send() function is called to drain the QP's send
    queue. Subsequently, hfi1_do_send() is called. Since the flag bit
    HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails.  As
    a result, hfi1_do_send() bails out without draining any request from
    the send queue;

(4) The PIO engine completes the sending and tries to wake up any QP on
    its waiting list. But the QP has been removed from the PIO waiting
    list and therefore is kept in sleep forever.

The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.

Fixes: 2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable@vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/infiniband/hw/hfi1/qp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/hw/hfi1/qp.c
+++ b/drivers/infiniband/hw/hfi1/qp.c
@@ -784,7 +784,7 @@ void notify_error_qp(struct rvt_qp *qp)
 		write_seqlock(lock);
 		if (!list_empty(&priv->s_iowait.list) &&
 		    !(qp->s_flags & RVT_S_BUSY)) {
-			qp->s_flags &= ~RVT_S_ANY_WAIT_IO;
+			qp->s_flags &= ~HFI1_S_ANY_WAIT_IO;
 			list_del_init(&priv->s_iowait.list);
 			priv->s_iowait.lock = NULL;
 			rvt_put_qp(qp);



  parent reply	other threads:[~2019-04-18 18:02 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 17:55 [PATCH 4.19 000/110] 4.19.36-stable review Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 001/110] ARC: u-boot args: check that magic number is correct Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 002/110] arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 003/110] inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 004/110] perf/core: Restore mmap record type correctly Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 005/110] perf data: Dont store auxtrace index for directory data file Greg Kroah-Hartman
2019-04-18 18:15   ` Dan Rue
2019-04-18 17:55 ` [PATCH 4.19 006/110] ext4: avoid panic during forced reboot Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 007/110] ext4: add missing brelse() in add_new_gdb_meta_bg() Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 008/110] ext4: report real fs size after failed resize Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 009/110] ALSA: echoaudio: add a check for ioremap_nocache Greg Kroah-Hartman
2019-04-18 17:55 ` [PATCH 4.19 010/110] ALSA: sb8: add a check for request_region Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 011/110] auxdisplay: hd44780: Fix memory leak on ->remove() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 012/110] drm/udl: use drm_gem_object_put_unlocked Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 013/110] IB/mlx4: Fix race condition between catas error reset and aliasguid flows Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 014/110] i40iw: Avoid panic when handling the inetdev event Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 015/110] mmc: davinci: remove extraneous __init annotation Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 016/110] ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 017/110] thermal/intel_powerclamp: fix __percpu declaration of worker_data Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 018/110] thermal: samsung: Fix incorrect check after code merge Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 019/110] thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 020/110] thermal/int340x_thermal: Add additional UUIDs Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 021/110] thermal/int340x_thermal: fix mode setting Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 022/110] thermal/intel_powerclamp: fix truncated kthread name Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 023/110] scsi: iscsi: flush running unbind operations when removing a session Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 024/110] sched/cpufreq: Fix 32-bit math overflow Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 025/110] sched/core: Fix buffer overflow in cgroup2 property cpu.max Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 026/110] x86/mm: Dont leak kernel addresses Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 027/110] tools/power turbostat: return the exit status of a command Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 028/110] perf list: Dont forget to drop the reference to the allocated thread_map Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 029/110] perf config: Fix an error in the config template documentation Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 030/110] perf config: Fix a memory leak in collect_config() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 031/110] perf build-id: Fix memory leak in print_sdt_events() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 032/110] perf top: Fix error handling in cmd_top() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 033/110] perf hist: Add missing map__put() in error case Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 034/110] perf evsel: Free evsel->counts in perf_evsel__exit() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 035/110] perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 036/110] perf tests: Fix memory leak by expr__find_other() in test__expr() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 037/110] perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 038/110] ACPI / utils: Drop reference in test for device presence Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 039/110] PM / Domains: Avoid a potential deadlock Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 040/110] blk-iolatency: #include "blk.h" Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 041/110] drm/exynos/mixer: fix MIXER shadow registry synchronisation code Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 042/110] irqchip/stm32: Dont clear rising/falling config registers at init Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 043/110] irqchip/mbigen: Dont clear eventid when freeing an MSI Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 044/110] x86/hpet: Prevent potential NULL pointer dereference Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 045/110] x86/hyperv: " Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 046/110] x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 047/110] drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 048/110] iommu/vt-d: Check capability before disabling protected memory Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 049/110] x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 050/110] fix incorrect error code mapping for OBJECTID_NOT_FOUND Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 051/110] x86/gart: Exclude GART aperture from kcore Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 052/110] ext4: prohibit fstrim in norecovery mode Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 053/110] drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 054/110] gpio: pxa: handle corner case of unprobed device Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 055/110] rsi: improve kernel thread handling to fix kernel panic Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 056/110] f2fs: fix to avoid NULL pointer dereference on se->discard_map Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 057/110] 9p: do not trust pdu content for stat item size Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 058/110] 9p locks: add mount option for lock retry interval Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 059/110] ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx() Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 060/110] f2fs: fix to do sanity check with current segment number Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 061/110] netfilter: xt_cgroup: shrink size of v2 path Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 062/110] serial: uartps: console_setup() cant be placed to init section Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 063/110] powerpc/pseries: Remove prrn_work workqueue Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 064/110] media: au0828: cannot kfree dev before usb disconnect Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 065/110] Bluetooth: Fix debugfs NULL pointer dereference Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 066/110] HID: i2c-hid: override HID descriptors for certain devices Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 067/110] pinctrl: core: make sure strcmp() doesnt get a null parameter Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 068/110] ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 069/110] usbip: fix vhci_hcd controller counting Greg Kroah-Hartman
2019-04-18 17:56 ` [PATCH 4.19 070/110] ACPI / SBS: Fix GPE storm on recent MacBookPros Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 071/110] HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2 Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 072/110] KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 073/110] compiler.h: update definition of unreachable() Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 074/110] netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 075/110] f2fs: cleanup dirty pages if recover failed Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 076/110] net: stmmac: Set OWN bit for jumbo frames Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 077/110] cifs: fallback to older infolevels on findfirst queryinfo retry Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 078/110] kernel: hung_task.c: disable on suspend Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 079/110] platform/x86: Add Intel AtomISP2 dummy / power-management driver Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 080/110] drm/ttm: Fix bo_global and mem_global kfree error Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 081/110] ALSA: hda: fix front speakers on Huawei MBXP Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 082/110] ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 083/110] net/rds: fix warn in rds_message_alloc_sgs Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 084/110] xfrm: destroy xfrm_state synchronously on net exit path Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 085/110] crypto: sha256/arm - fix crash bug in Thumb2 build Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 086/110] crypto: sha512/arm " Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 087/110] net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 088/110] iommu/dmar: Fix buffer overflow during PCI bus notification Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 089/110] scsi: core: Avoid that system resume triggers a kernel warning Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 090/110] soc/tegra: pmc: Drop locking from tegra_powergate_is_powered() Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 091/110] lkdtm: Print real addresses Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 092/110] lkdtm: Add tests for NULL pointer dereference Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 093/110] drm/panel: panel-innolux: set display off in innolux_panel_unprepare Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 094/110] crypto: axis - fix for recursive locking from bottom half Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 095/110] Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 096/110] coresight: cpu-debug: Support for CA73 CPUs Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 097/110] PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 098/110] drm/nouveau/volt/gf117: fix speedo readout register Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 099/110] ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 100/110] drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 101/110] appletalk: Fix use-after-free in atalk_proc_exit Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 102/110] lib/div64.c: off by one in shift Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 103/110] rxrpc: Fix client call connect/disconnect race Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 104/110] f2fs: fix to dirty inode for i_mode recovery Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 105/110] include/linux/swap.h: use offsetof() instead of custom __swapoffset macro Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 106/110] bpf: fix use after free in bpf_evict_inode Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 107/110] tools include: Adopt linux/bits.h Greg Kroah-Hartman
2019-04-18 17:57 ` Greg Kroah-Hartman [this message]
2019-04-18 17:57 ` [PATCH 4.19 109/110] mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo Greg Kroah-Hartman
2019-04-18 17:57 ` [PATCH 4.19 110/110] [PATCH 4.19.y 2/2] mm: hide incomplete nr_indirectly_reclaimable in sysfs Greg Kroah-Hartman
2019-04-19  5:19 ` [PATCH 4.19 000/110] 4.19.36-stable review nobuhiro1.iwamatsu
2019-04-19 10:08 ` Jon Hunter
2019-04-19 10:42 ` Guenter Roeck
2019-04-19 14:22 ` shuah
2019-04-19 19:40 ` Guenter Roeck
2019-04-19 20:47   ` Guenter Roeck
2019-04-20  5:33 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190418160447.737038054@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alex.estrin@intel.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=jgg@mellanox.com \
    --cc=kaike.wan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).