All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Ming Lei <ming.lei@redhat.com>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	Roman Pen <roman.penyaev@profitbricks.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 4.14 37/52] blk-mq: reinit q->tag_set_list entry only after grace period
Date: Sun, 24 Jun 2018 23:21:30 +0800	[thread overview]
Message-ID: <20180624142746.685488450@linuxfoundation.org> (raw)
In-Reply-To: <20180624142744.234164867@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roman Pen <roman.penyaev@profitbricks.com>

commit a347c7ad8edf4c5685154f3fdc3c12fc1db800ba upstream.

It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  <IRQ>
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  </IRQ>

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash            D    0  5910   5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Cc: stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-mq.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2252,7 +2252,6 @@ static void blk_mq_del_queue_tag_set(str
 
 	mutex_lock(&set->tag_list_lock);
 	list_del_rcu(&q->tag_set_list);
-	INIT_LIST_HEAD(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2260,8 +2259,8 @@ static void blk_mq_del_queue_tag_set(str
 		blk_mq_update_tag_set_depth(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
-
 	synchronize_rcu();
+	INIT_LIST_HEAD(&q->tag_set_list);
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,



  parent reply	other threads:[~2018-06-24 15:31 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24 15:20 [PATCH 4.14 00/52] 4.14.52-stable review Greg Kroah-Hartman
2018-06-24 15:20 ` [PATCH 4.14 01/52] bonding: re-evaluate force_primary when the primary slave name changes Greg Kroah-Hartman
2018-06-24 15:20 ` [PATCH 4.14 03/52] ipv6: allow PMTU exceptions to local routes Greg Kroah-Hartman
2018-06-24 15:20 ` [PATCH 4.14 04/52] net: dsa: add error handling for pskb_trim_rcsum Greg Kroah-Hartman
2018-06-24 15:20 ` [PATCH 4.14 05/52] net/sched: act_simple: fix parsing of TCA_DEF_DATA Greg Kroah-Hartman
2018-06-24 15:20 ` [PATCH 4.14 06/52] tcp: verify the checksum of the first data segment in a new connection Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 07/52] socket: close race condition between sock_close() and sockfs_setattr() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 08/52] udp: fix rx queue len reported by diag and proc interface Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 09/52] net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 10/52] hv_netvsc: Fix a network regression after ifdown/ifup Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 11/52] tls: fix use-after-free in tls_push_record Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 12/52] NFSv4.1: Fix up replays of interrupted requests Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 13/52] ext4: fix hole length detection in ext4_ind_map_blocks() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 14/52] ext4: update mtime in ext4_punch_hole even if no blocks are released Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 15/52] ext4: do not allow external inodes for inline data Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 16/52] ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 17/52] ext4: correctly handle a zero-length xattr with a non-zero e_value_offs Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 18/52] ext4: fix fencepost error in check for inode count overflow during resize Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 19/52] driver core: Dont ignore class_dir_create_and_add() failure Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 20/52] Btrfs: fix clone vs chattr NODATASUM race Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 21/52] Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 22/52] btrfs: return error value if create_io_em failed in cow_file_range Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 23/52] btrfs: scrub: Dont use inode pages for device replace Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 24/52] ALSA: hda/realtek - Enable mic-mute hotkey for several Lenovo AIOs Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 25/52] ALSA: hda/conexant - Add fixup for HP Z2 G4 workstation Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 26/52] ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 27/52] ALSA: hda: add dock and led support for HP EliteBook 830 G5 Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 28/52] ALSA: hda: add dock and led support for HP ProBook 640 G4 Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 29/52] x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read() Greg Kroah-Hartman
2018-06-24 15:21   ` [4.14,29/52] " Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 30/52] smb3: fix various xid leaks Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 31/52] smb3: on reconnect set PreviousSessionId field Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 32/52] CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 33/52] cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 34/52] nbd: fix nbd device deletion Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 35/52] nbd: update size when connected Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 36/52] nbd: use bd_set_size when updating disk size Greg Kroah-Hartman
2018-06-24 15:21 ` Greg Kroah-Hartman [this message]
2018-06-24 15:21 ` [PATCH 4.14 38/52] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue Greg Kroah-Hartman
2018-06-24 15:21   ` Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 39/52] cpufreq: Fix new policy initialization during limits updates via sysfs Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 40/52] cpufreq: governors: Fix long idle detection logic in load calculation Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 41/52] libata: zpodd: small read overflow in eject_tray() Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 42/52] libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 43/52] w1: mxc_w1: Enable clock before calling clk_get_rate() on it Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 44/52] x86/intel_rdt: Enable CMT and MBM on new Skylake stepping Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 45/52] iwlwifi: fw: harden page loading code Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 46/52] orangefs: set i_size on new symlink Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 47/52] orangefs: report attributes_mask and attributes for statx Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 48/52] HID: intel_ish-hid: ipc: register more pm callbacks to support hibernation Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 49/52] HID: wacom: Correct logical maximum Y for 2nd-gen Intuos Pro large Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 50/52] vhost: fix info leak due to uninitialized memory Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 51/52] fs/binfmt_misc.c: do not allow offset overflow Greg Kroah-Hartman
2018-06-24 15:21 ` [PATCH 4.14 52/52] mm, page_alloc: do not break __GFP_THISNODE by zonelist reset Greg Kroah-Hartman
2018-06-25  6:43 ` [PATCH 4.14 00/52] 4.14.52-stable review Naresh Kamboju
2018-06-25 17:19 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180624142746.685488450@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@wdc.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=roman.penyaev@profitbricks.com \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.