linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Ming Lei <ming.lei@redhat.com>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	Roman Pen <roman.penyaev@profitbricks.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 4.16 38/64] blk-mq: reinit q->tag_set_list entry only after grace period
Date: Sun, 24 Jun 2018 23:22:19 +0800	[thread overview]
Message-ID: <20180624142746.192170408@linuxfoundation.org> (raw)
In-Reply-To: <20180624142743.613370789@linuxfoundation.org>

4.16-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roman Pen <roman.penyaev@profitbricks.com>

commit a347c7ad8edf4c5685154f3fdc3c12fc1db800ba upstream.

It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  <IRQ>
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  </IRQ>

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash            D    0  5910   5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Cc: stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-mq.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2513,7 +2513,6 @@ static void blk_mq_del_queue_tag_set(str
 
 	mutex_lock(&set->tag_list_lock);
 	list_del_rcu(&q->tag_set_list);
-	INIT_LIST_HEAD(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2521,8 +2520,8 @@ static void blk_mq_del_queue_tag_set(str
 		blk_mq_update_tag_set_depth(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
-
 	synchronize_rcu();
+	INIT_LIST_HEAD(&q->tag_set_list);
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,



  parent reply	other threads:[~2018-06-24 15:35 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24 15:21 [PATCH 4.16 00/64] 4.16.18-stable review Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 01/64] bonding: re-evaluate force_primary when the primary slave name changes Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 03/64] ipv6: allow PMTU exceptions to local routes Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 04/64] net: dsa: add error handling for pskb_trim_rcsum Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 05/64] net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620 Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 06/64] net/sched: act_simple: fix parsing of TCA_DEF_DATA Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 07/64] tcp: verify the checksum of the first data segment in a new connection Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 08/64] socket: close race condition between sock_close() and sockfs_setattr() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 09/64] udp: fix rx queue len reported by diag and proc interface Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 10/64] net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 11/64] hv_netvsc: Fix a network regression after ifdown/ifup Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 12/64] tls: fix use-after-free in tls_push_record Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 13/64] ext4: fix hole length detection in ext4_ind_map_blocks() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 14/64] ext4: update mtime in ext4_punch_hole even if no blocks are released Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 15/64] ext4: do not allow external inodes for inline data Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 16/64] ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 17/64] ext4: correctly handle a zero-length xattr with a non-zero e_value_offs Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.16 18/64] ext4: fix fencepost error in check for inode count overflow during resize Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 19/64] driver core: Dont ignore class_dir_create_and_add() failure Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 20/64] Btrfs: fix clone vs chattr NODATASUM race Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 21/64] Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 22/64] btrfs: return error value if create_io_em failed in cow_file_range Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 23/64] btrfs: scrub: Dont use inode pages for device replace Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 24/64] ALSA: usb-audio: Disable the quirk for Nura headset Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 25/64] ALSA: hda/realtek - Enable mic-mute hotkey for several Lenovo AIOs Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 26/64] ALSA: hda/conexant - Add fixup for HP Z2 G4 workstation Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 27/64] ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 28/64] ALSA: hda: add dock and led support for HP EliteBook 830 G5 Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 29/64] ALSA: hda: add dock and led support for HP ProBook 640 G4 Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 30/64] x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 31/64] smb3: fix various xid leaks Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 32/64] smb3: on reconnect set PreviousSessionId field Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 33/64] CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 34/64] cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 35/64] nbd: fix nbd device deletion Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 36/64] nbd: update size when connected Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 37/64] nbd: use bd_set_size when updating disk size Greg Kroah-Hartman
2018-06-24 15:22 ` Greg Kroah-Hartman [this message]
2018-06-24 15:22 ` [PATCH 4.16 39/64] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 40/64] cpufreq: Fix new policy initialization during limits updates via sysfs Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 41/64] cpufreq: ti-cpufreq: Fix an incorrect error return value Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 42/64] cpufreq: governors: Fix long idle detection logic in load calculation Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 43/64] libata: zpodd: small read overflow in eject_tray() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 44/64] libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 45/64] nvme/pci: Sync controller reset for AER slot_reset Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 46/64] w1: mxc_w1: Enable clock before calling clk_get_rate() on it Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 47/64] x86/vector: Fix the args of vector_alloc tracepoint Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 48/64] x86/apic/vector: Prevent hlist corruption and leaks Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 49/64] x86/apic: Provide apic_ack_irq() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 50/64] x86/ioapic: Use apic_ack_irq() Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 51/64] x86/platform/uv: " Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 52/64] irq_remapping: " Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 53/64] genirq/generic_pending: Do not lose pending affinity update Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 54/64] genirq/affinity: Defer affinity setting if irq chip is busy Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 55/64] genirq/migration: Avoid out of line call if pending is not set Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 56/64] x86/intel_rdt: Enable CMT and MBM on new Skylake stepping Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 57/64] iwlwifi: fw: harden page loading code Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 58/64] orangefs: set i_size on new symlink Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 59/64] orangefs: report attributes_mask and attributes for statx Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 60/64] HID: intel_ish-hid: ipc: register more pm callbacks to support hibernation Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 61/64] HID: wacom: Correct logical maximum Y for 2nd-gen Intuos Pro large Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 62/64] vhost: fix info leak due to uninitialized memory Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 63/64] fs/binfmt_misc.c: do not allow offset overflow Greg Kroah-Hartman
2018-06-24 15:22 ` [PATCH 4.16 64/64] mm, page_alloc: do not break __GFP_THISNODE by zonelist reset Greg Kroah-Hartman
2018-06-25  6:40 ` [PATCH 4.16 00/64] 4.16.18-stable review Naresh Kamboju
2018-06-25 17:20 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180624142746.192170408@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@wdc.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=roman.penyaev@profitbricks.com \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).