All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Yu Kuai <yukuai3@huawei.com>, Josef Bacik <josef@toxicpanda.com>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>,
	linux-block@vger.kernel.org, nbd@other.debian.org
Subject: [PATCH AUTOSEL 5.10 35/38] nbd: fix io hung while disconnecting device
Date: Tue,  7 Jun 2022 13:58:30 -0400	[thread overview]
Message-ID: <20220607175835.480735-35-sashal@kernel.org> (raw)
In-Reply-To: <20220607175835.480735-1-sashal@kernel.org>

From: Yu Kuai <yukuai3@huawei.com>

[ Upstream commit 09dadb5985023e27d4740ebd17e6fea4640110e5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1543fd("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 29f9f1bcf1cb..edd49f1b4499 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1355,7 +1355,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
 				 struct block_device *bdev)
 {
-	sock_shutdown(nbd);
+	nbd_clear_sock(nbd);
 	__invalidate_device(bdev, true);
 	nbd_bdev_reset(bdev);
 	if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
-- 
2.35.1


  parent reply	other threads:[~2022-06-07 18:48 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 17:57 [PATCH AUTOSEL 5.10 01/38] iio: dummy: iio_simple_dummy: check the return value of kstrdup() Sasha Levin
2022-06-07 17:57 ` [PATCH AUTOSEL 5.10 02/38] staging: rtl8712: fix a potential memory leak in r871xu_drv_init() Sasha Levin
2022-06-07 17:57 ` [PATCH AUTOSEL 5.10 03/38] iio: st_sensors: Add a local lock for protecting odr Sasha Levin
2022-06-07 17:57 ` [PATCH AUTOSEL 5.10 04/38] lkdtm/usercopy: Expand size of "out of frame" object Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 05/38] tty: synclink_gt: Fix null-pointer-dereference in slgt_clean() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 06/38] tty: Fix a possible resource leak in icom_probe Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 07/38] drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 08/38] drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 09/38] USB: host: isp116x: check return value after calling platform_get_resource() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 10/38] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 11/38] drivers: usb: host: Fix deadlock in oxu_bus_suspend() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 12/38] USB: hcd-pci: Fully suspend across freeze/thaw cycle Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 13/38] sysrq: do not omit current cpu when showing backtrace of all active CPUs Sasha Levin
2022-06-07 17:58   ` Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 14/38] usb: dwc2: gadget: don't reset gadget's driver->bus Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 15/38] misc: rtsx: set NULL intfdata when probe fails Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 16/38] extcon: Modify extcon device to be created after driver data is set Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 17/38] clocksource/drivers/sp804: Avoid error on multiple instances Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 18/38] staging: rtl8712: fix uninit-value in usb_read8() and friends Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 19/38] staging: rtl8712: fix uninit-value in r871xu_drv_init() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 20/38] serial: msm_serial: disable interrupts in __msm_console_write() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 21/38] accessiblity: speakup: Add missing misc_deregister in softsynth_probe Sasha Levin
2022-06-08 21:08   ` Pavel Machek
2022-06-12 17:47     ` Sasha Levin
2022-06-12 17:49       ` Samuel Thibault
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 22/38] kernfs: Separate kernfs_pr_cont_buf and rename_lock Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 23/38] watchdog: wdat_wdt: Stop watchdog when rebooting the system Sasha Levin
2022-06-07 17:58 ` [dm-devel] [PATCH AUTOSEL 5.10 24/38] md: don't unregister sync_thread with reconfig_mutex held Sasha Levin
2022-06-07 17:58   ` Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 25/38] md: protect md_unregister_thread from reentrancy Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 26/38] scsi: myrb: Fix up null pointer access on myrb_cleanup() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 27/38] Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process" Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 28/38] ceph: allow ceph.dir.rctime xattr to be updatable Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 29/38] drm/radeon: fix a possible null pointer dereference Sasha Levin
2022-06-07 17:58   ` Sasha Levin
2022-06-07 17:58   ` Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 30/38] modpost: fix undefined behavior of is_arm_mapping_symbol() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 31/38] x86/cpu: Elide KCSAN for cpu_has() and friends Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 32/38] jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 33/38] nbd: call genl_unregister_family() first in nbd_cleanup() Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 34/38] nbd: fix race between nbd_alloc_config() and module removal Sasha Levin
2022-06-07 17:58 ` Sasha Levin [this message]
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 36/38] s390/gmap: voluntarily schedule during key setting Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 37/38] cifs: version operations for smb20 unneeded when legacy support disabled Sasha Levin
2022-06-07 17:58 ` [PATCH AUTOSEL 5.10 38/38] nodemask: Fix return values to be unsigned Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220607175835.480735-35-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=stable@vger.kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.