From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753060AbbDGK6b (ORCPT ); Tue, 7 Apr 2015 06:58:31 -0400 Received: from v094114.home.net.pl ([79.96.170.134]:60856 "HELO v094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750983AbbDGK63 (ORCPT ); Tue, 7 Apr 2015 06:58:29 -0400 From: "Rafael J. Wysocki" To: Xie XiuQi Cc: lenb@kernel.org, guohanjun@huawei.com, hanjun.guo@linaro.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] ACPI / HOTPLUG: fix device->physical_node_lock deadlock Date: Tue, 07 Apr 2015 13:22:52 +0200 Message-ID: <5219717.RxvSXtBaZ8@vostro.rjw.lan> User-Agent: KMail/4.11.5 (Linux/3.19.0+; KDE/4.11.5; x86_64; ; ) In-Reply-To: <1428397392-26200-1-git-send-email-xiexiuqi@huawei.com> References: <1428397392-26200-1-git-send-email-xiexiuqi@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="utf-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday, April 07, 2015 05:03:12 PM Xie XiuQi wrote: > I meet a deadlock during cpu hotplug. The code path is bellow: > > Call Trace: > [] dump_stack+0x19/0x1b > [] validate_chain.isra.43+0xf4a/0x1120 > [] ? sched_clock+0x9/0x10 > [] ? sched_clock_local+0x1d/0x80 > [] ? sched_clock_cpu+0xa8/0x100 > [] __lock_acquire+0x3c6/0xb70 > [] ? sched_clock_cpu+0xa8/0x100 > [] lock_acquire+0xa2/0x1f0 > [] ? acpi_scan_is_offline+0x2c/0xa3 > [] mutex_lock_nested+0x94/0x3f0 > [] ? acpi_scan_is_offline+0x2c/0xa3 > [] ? acpi_scan_is_offline+0x2c/0xa3 > [] ? trace_hardirqs_on+0xd/0x10 > [] acpi_scan_is_offline+0x2c/0xa3 --> LOCK (DEADLOCK) Is it the same device, actually? acpi_container_offline() walks the *children* of the container while acpi_bus_offline() locks the container itself. Is it not the case? > [] acpi_container_offline+0x32/0x4e > [] container_offline+0x19/0x20 > [] device_offline+0x95/0xc0 > [] acpi_bus_offline+0xbc/0x126 --> LOCK > [] acpi_device_hotplug+0x236/0x46b > [] acpi_hotplug_work_fn+0x1e/0x29 > [] process_one_work+0x220/0x710 > [] ? process_one_work+0x1b4/0x710 > [] worker_thread+0x11b/0x3a0 > [] ? process_one_work+0x710/0x710 > [] kthread+0xed/0x100 > [] ? insert_kthread_work+0x80/0x80 > [] ret_from_fork+0x7c/0xb0 > [] ? insert_kthread_work+0x80/0x80 > > This deadlock was introduced by commit caa73ea > ("ACPI / hotplug / driver core: Handle containers in a special way"). > > In this patch, we just introduced a lockless version __acpi_scan_is_offline() > for acpi_container_offline(), to avoid this deadlock. So why is this a correct approach? Why can acpi_container_offline() suddenly call __acpi_scan_is_offline() without the lock? > Cc: # v3.14+ > Signed-off-by: Xie XiuQi > --- > drivers/acpi/container.c | 2 +- > drivers/acpi/internal.h | 1 + > drivers/acpi/scan.c | 15 ++++++++++++--- > 3 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c > index c8ead9f..43bda3b2 100644 > --- a/drivers/acpi/container.c > +++ b/drivers/acpi/container.c > @@ -50,7 +50,7 @@ static int acpi_container_offline(struct container_dev *cdev) > > /* Check all of the dependent devices' physical companions. */ > list_for_each_entry(child, &adev->children, node) > - if (!acpi_scan_is_offline(child, false)) > + if (!__acpi_scan_is_offline(child, false)) > return -EBUSY; > > return 0; > diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h > index 56b321a..3b7a07b 100644 > --- a/drivers/acpi/internal.h > +++ b/drivers/acpi/internal.h > @@ -80,6 +80,7 @@ void acpi_apd_init(void); > acpi_status acpi_hotplug_schedule(struct acpi_device *adev, u32 src); > bool acpi_queue_hotplug_work(struct work_struct *work); > void acpi_device_hotplug(struct acpi_device *adev, u32 src); > +bool __acpi_scan_is_offline(struct acpi_device *adev, bool uevent); > bool acpi_scan_is_offline(struct acpi_device *adev, bool uevent); > > /* -------------------------------------------------------------------------- > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index bbca783..ea55a9a 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -293,13 +293,12 @@ acpi_device_modalias_show(struct device *dev, struct device_attribute *attr, cha > } > static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL); > > -bool acpi_scan_is_offline(struct acpi_device *adev, bool uevent) > +/* Must be called under physical_node_lock. */ > +bool __acpi_scan_is_offline(struct acpi_device *adev, bool uevent) > { > struct acpi_device_physical_node *pn; > bool offline = true; > > - mutex_lock(&adev->physical_node_lock); > - > list_for_each_entry(pn, &adev->physical_node_list, node) > if (device_supports_offline(pn->dev) && !pn->dev->offline) { > if (uevent) > @@ -309,7 +308,17 @@ bool acpi_scan_is_offline(struct acpi_device *adev, bool uevent) > break; > } > > + return offline; > +} > + > +bool acpi_scan_is_offline(struct acpi_device *adev, bool uevent) > +{ > + bool offline = true; > + > + mutex_lock(&adev->physical_node_lock); > + offline = __acpi_scan_is_offline(adev, uevent); > mutex_unlock(&adev->physical_node_lock); > + > return offline; > } > > -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.