linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Potnuri Bharat Teja <bharat@chelsio.com>,
	Samuel Jones <sjones@kalrayinc.com>,
	Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Subject: [PATCH 5.10 083/103] nvme-tcp: Fix possible race of io_work and direct send
Date: Fri, 15 Jan 2021 13:28:16 +0100	[thread overview]
Message-ID: <20210115122010.034701969@linuxfoundation.org> (raw)
In-Reply-To: <20210115122006.047132306@linuxfoundation.org>

From: Sagi Grimberg <sagi@grimberg.me>

commit 5c11f7d9f843bdd24cd29b95401938bc3f168070 upstream.

We may send a request (with or without its data) from two paths:

  1. From our I/O context nvme_tcp_io_work which is triggered from:
    - queue_rq
    - r2t reception
    - socket data_ready and write_space callbacks
  2. Directly from queue_rq if the send_list is empty (because we want to
     save the context switch associated with scheduling our io_work).

However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.

The race can trigger with the following sequence:

  1. queue_rq sends request (no incapsule data) and blocks
  2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
     to the send_list and schedules io_work
  3. io_work triggers and cannot acquire the send_mutex - because of (1),
     ends without self rescheduling
  4. queue_rq completes the send, and completes

==> no context will send the h2cdata - timeout.

Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.

Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/tcp.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(
 	}
 }
 
+static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
+{
+	int ret;
+
+	/* drain the send queue as much as we can... */
+	do {
+		ret = nvme_tcp_try_send(queue);
+	} while (ret > 0);
+}
+
 static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
 		bool sync, bool last)
 {
@@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_reques
 	if (queue->io_cpu == smp_processor_id() &&
 	    sync && empty && mutex_trylock(&queue->send_mutex)) {
 		queue->more_requests = !last;
-		nvme_tcp_try_send(queue);
+		nvme_tcp_send_all(queue);
 		queue->more_requests = false;
 		mutex_unlock(&queue->send_mutex);
 	} else if (last) {



  parent reply	other threads:[~2021-01-15 12:43 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 12:26 [PATCH 5.10 000/103] 5.10.8-rc1 review Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 001/103] powerpc/32s: Fix RTAS machine check with VMAP stack Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 002/103] io_uring: synchronise IOPOLL on task_submit fail Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 003/103] io_uring: limit {io|sq}poll submit locking scope Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 004/103] io_uring: patch up IOPOLL overflow_flush sync Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 005/103] RDMA/hns: Avoid filling sl in high 3 bits of vlan_id Greg Kroah-Hartman
2021-01-15 12:26 ` [PATCH 5.10 006/103] iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 007/103] drm/panfrost: Dont corrupt the queue mutex on open/close Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 008/103] io_uring: Fix return value from alloc_fixed_file_ref_node Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 009/103] scsi: ufs: Fix -Wsometimes-uninitialized warning Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 010/103] btrfs: skip unnecessary searches for xattrs when logging an inode Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 011/103] btrfs: fix deadlock when cloning inline extent and low on free metadata space Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 012/103] btrfs: shrink delalloc pages instead of full inodes Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 013/103] net: cdc_ncm: correct overhead in delayed_ndp_size Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 014/103] net: hns3: fix incorrect handling of sctp6 rss tuple Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 015/103] net: hns3: fix the number of queues actually used by ARQ Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 016/103] net: hns3: fix a phy loopback fail issue Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 017/103] net: stmmac: dwmac-sun8i: Fix probe error handling Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 018/103] net: stmmac: dwmac-sun8i: Balance internal PHY resource references Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 019/103] net: stmmac: dwmac-sun8i: Balance internal PHY power Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 020/103] net: stmmac: dwmac-sun8i: Balance syscon (de)initialization Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 021/103] net: vlan: avoid leaks on register_vlan_dev() failures Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 022/103] net/sonic: Fix some resource leaks in error handling paths Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 023/103] net: bareudp: add missing error handling for bareudp_link_config() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 024/103] ptp: ptp_ines: prevent build when HAS_IOMEM is not set Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 025/103] net: ipv6: fib: flush exceptions when purging route Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 026/103] tools: selftests: add test for changing routes with PTMU exceptions Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 027/103] net: fix pmtu check in nopmtudisc mode Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 028/103] net: ip: always refragment ip defragmented packets Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 029/103] chtls: Fix hardware tid leak Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 030/103] chtls: Remove invalid set_tcb call Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 031/103] chtls: Fix panic when route to peer not configured Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 032/103] chtls: Avoid unnecessary freeing of oreq pointer Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 033/103] chtls: Replace skb_dequeue with skb_peek Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 034/103] chtls: Added a check to avoid NULL pointer dereference Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 035/103] chtls: Fix chtls resources release sequence Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 036/103] octeontx2-af: fix memory leak of lmac and lmac->name Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 037/103] nexthop: Fix off-by-one error in error path Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 038/103] nexthop: Unlink nexthop group entry " Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 039/103] nexthop: Bounce NHA_GATEWAY in FDB nexthop groups Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 040/103] s390/qeth: fix deadlock during recovery Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 041/103] s390/qeth: fix locking for discipline setup / removal Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 042/103] s390/qeth: fix L2 header access in qeth_l3_osa_features_check() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 043/103] net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 044/103] net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 045/103] net/mlx5e: ethtool, Fix restriction of autoneg with 56G Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 046/103] net/mlx5e: In skb build skip setting mark in switchdev mode Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 047/103] net/mlx5: Check if lag is supported before creating one Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 048/103] scsi: lpfc: Fix variable vport set but not used in lpfc_sli4_abts_err_handler() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 049/103] ionic: start queues before announcing link up Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 050/103] HID: wacom: Fix memory leakage caused by kfifo_alloc Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 051/103] fanotify: Fix sys_fanotify_mark() on native x86-32 Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 052/103] ARM: OMAP2+: omap_device: fix idling of devices during probe Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 053/103] i2c: sprd: use a specific timeout to avoid system hang up issue Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 054/103] dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 055/103] selftests/bpf: Clarify build error if no vmlinux Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 056/103] can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 057/103] can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 058/103] can: kvaser_pciefd: select CONFIG_CRC32 Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 059/103] spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 060/103] cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 061/103] spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 062/103] spi: stm32: FIFO threshold level - fix align packet size Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 063/103] i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 064/103] i2c: mediatek: Fix apdma and i2c hand-shake timeout Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 065/103] bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 5.10 066/103] interconnect: imx: Add a missing of_node_put after of_device_is_available Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 067/103] interconnect: qcom: fix rpmh link failures Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 068/103] dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 069/103] dmaengine: milbeaut-xdmac: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 070/103] dmaengine: xilinx_dma: check dma_async_device_register return value Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 071/103] dmaengine: xilinx_dma: fix incompatible param warning in _child_probe() Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 072/103] dmaengine: xilinx_dma: fix mixed_enum_type coverity warning Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 073/103] arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 074/103] qed: select CONFIG_CRC32 Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 075/103] phy: dp83640: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 076/103] wil6210: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 077/103] block: rsxx: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 078/103] lightnvm: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 079/103] zonefs: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 080/103] iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb() Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 081/103] iommu/intel: Fix memleak in intel_irq_remapping_alloc Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 082/103] bpftool: Fix compilation failure for net.o with older glibc Greg Kroah-Hartman
2021-01-15 12:28 ` Greg Kroah-Hartman [this message]
2021-01-15 12:28 ` [PATCH 5.10 084/103] net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 085/103] net/mlx5e: Fix two double free cases Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 086/103] regmap: debugfs: Fix a memory leak when calling regmap_attach_dev Greg Kroah-Hartman
2021-01-15 20:18   ` Pavel Machek
2021-01-15 20:22     ` Nathan Chancellor
2021-01-15 20:26     ` Pavel Machek
2021-01-15 12:28 ` [PATCH 5.10 087/103] wan: ds26522: select CONFIG_BITREVERSE Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 088/103] arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 089/103] regulator: qcom-rpmh-regulator: correct hfsmps515 definition Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 090/103] net: mvpp2: disable force link UP during port init procedure Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 091/103] drm/i915/dp: Track pm_qos per connector Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 092/103] net: mvneta: fix error message when MTU too large for XDP Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 093/103] selftests: fib_nexthops: Fix wrong mausezahn invocation Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 094/103] KVM: arm64: Dont access PMCR_EL0 when no PMU is available Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 095/103] xsk: Fix race in SKB mode transmit with shared cq Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 096/103] xsk: Rollback reservation at NETDEV_TX_BUSY Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 097/103] block/rnbd-clt: avoid module unload race with close confirmation Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 098/103] can: isotp: isotp_getname(): fix kernel information leak Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 099/103] block: fix use-after-free in disk_part_iter_next Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 100/103] net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 101/103] regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init() Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 102/103] drm/panfrost: Remove unused variables in panfrost_job_close() Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 5.10 103/103] tools headers UAPI: Sync linux/fscrypt.h with the kernel sources Greg Kroah-Hartman
2021-01-15 21:13 ` [PATCH 5.10 000/103] 5.10.8-rc1 review Shuah Khan
2021-01-17 13:20   ` Greg Kroah-Hartman
2021-01-15 21:19 ` Guenter Roeck
2021-01-17 13:21   ` Greg Kroah-Hartman
2021-01-16  4:09 ` Naresh Kamboju
2021-01-17 13:21   ` Greg Kroah-Hartman
2021-01-16  7:57 ` Pavel Machek
2021-01-17 13:21   ` Greg Kroah-Hartman
2021-01-18  9:29     ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210115122010.034701969@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bharat@chelsio.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sagi@grimberg.me \
    --cc=sjones@kalrayinc.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).