From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757555AbcFHQU1 (ORCPT ); Wed, 8 Jun 2016 12:20:27 -0400 Received: from mail-pa0-f66.google.com ([209.85.220.66]:36498 "EHLO mail-pa0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757045AbcFHQUZ (ORCPT ); Wed, 8 Jun 2016 12:20:25 -0400 From: Cong Wang To: linux-raid@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Cong Wang , Shaohua Li Subject: [PATCH] md: use a mutex to protect a global list Date: Wed, 8 Jun 2016 09:20:16 -0700 Message-Id: <1465402816-10882-1-git-send-email-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We saw a list corruption in the list all_detected_devices: WARNING: CPU: 16 PID: 226 at lib/list_debug.c:29 __list_add+0x3c/0xa9() list_add corruption. next->prev should be prev (ffff880859d58320), but was ffff880859ce74c0. (next=ffffffff81abfdb0). Modules linked in: ahci libahci libata sd_mod scsi_mod CPU: 16 PID: 226 Comm: kworker/u241:4 Not tainted 4.1.20 #1 Hardware name: Dell Inc. PowerEdge C6220/04GD66, BIOS 2.2.3 11/07/2013 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff880859a5baf8 ffffffff81502872 ffff880859a5bb48 0000000000000009 ffff880859a5bb38 ffffffff810692a5 ffff880859ee8828 ffffffff812ad02c ffff880859d58320 ffffffff81abfdb0 ffff880859eb90c0 Call Trace: [] dump_stack+0x4d/0x63 [] warn_slowpath_common+0xa1/0xbb [] ? __list_add+0x3c/0xa9 [] warn_slowpath_fmt+0x46/0x48 [] __list_add+0x3c/0xa9 [] md_autodetect_dev+0x41/0x62 [] rescan_partitions+0x25f/0x29d [] ? mutex_lock+0x13/0x31 [] __blkdev_get+0x1aa/0x3cd [] blkdev_get+0x5f/0x294 [] ? put_device+0x17/0x19 [] ? disk_put_part+0x12/0x14 [] add_disk+0x29d/0x407 [] ? __pm_runtime_use_autosuspend+0x5c/0x64 [] sd_probe_async+0x115/0x1af [sd_mod] [] async_run_entry_fn+0x72/0x12c [] process_one_work+0x198/0x2ce [] worker_thread+0x1dd/0x2bb [] ? cancel_delayed_work_sync+0x15/0x15 [] ? cancel_delayed_work_sync+0x15/0x15 [] kthread+0xae/0xb6 [] ? param_array_set+0x40/0xfa [] ? __kthread_parkme+0x61/0x61 [] ret_from_fork+0x42/0x70 [] ? __kthread_parkme+0x61/0x61 I suspect it is because there is no lock protecting this global list, autostart_arrays() is called in ioctl() path where there is no lock. Cc: Shaohua Li Signed-off-by: Cong Wang --- drivers/md/md.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 866825f..f2f1912 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8799,6 +8799,7 @@ EXPORT_SYMBOL(md_reload_sb); * at boot time. */ +static DEFINE_MUTEX(detected_devices_mutex); static LIST_HEAD(all_detected_devices); struct detected_devices_node { struct list_head list; @@ -8812,7 +8813,9 @@ void md_autodetect_dev(dev_t dev) node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL); if (node_detected_dev) { node_detected_dev->dev = dev; + mutex_lock(&detected_devices_mutex); list_add_tail(&node_detected_dev->list, &all_detected_devices); + mutex_unlock(&detected_devices_mutex); } else { printk(KERN_CRIT "md: md_autodetect_dev: kzalloc failed" ", skipping dev(%d,%d)\n", MAJOR(dev), MINOR(dev)); @@ -8831,6 +8834,7 @@ static void autostart_arrays(int part) printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); + mutex_lock(&detected_devices_mutex); while (!list_empty(&all_detected_devices) && i_scanned < INT_MAX) { i_scanned++; node_detected_dev = list_entry(all_detected_devices.next, @@ -8849,6 +8853,7 @@ static void autostart_arrays(int part) list_add(&rdev->same_set, &pending_raid_disks); i_passed++; } + mutex_unlock(&detected_devices_mutex); printk(KERN_INFO "md: Scanned %d and added %d devices.\n", i_scanned, i_passed); -- 2.1.0