From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752197Ab3LRAvT (ORCPT ); Tue, 17 Dec 2013 19:51:19 -0500 Received: from v094114.home.net.pl ([79.96.170.134]:59243 "HELO v094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751877Ab3LRAvS convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 19:51:18 -0500 From: "Rafael J. Wysocki" To: Tejun Heo Cc: "Rafael J. Wysocki" , Jens Axboe , tomaz.solc@tablix.org, aaron.lu@intel.com, linux-kernel@vger.kernel.org, Oleg Nesterov , Greg Kroah-Hartman , Fengguang Wu Subject: Re: [PATCH v2] libata, freezer: avoid block device removal while system is frozen Date: Wed, 18 Dec 2013 02:04:35 +0100 Message-ID: <3253153.CIRSLE6KOu@vostro.rjw.lan> User-Agent: KMail/4.10.5 (Linux/3.12.0-rc6+; KDE/4.10.5; x86_64; ; ) In-Reply-To: <20131217125042.GF29989@htj.dyndns.org> References: <20131213174932.GA27070@htj.dyndns.org> <20131213204034.GE27070@htj.dyndns.org> <20131217125042.GF29989@htj.dyndns.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="utf-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday, December 17, 2013 07:50:42 AM Tejun Heo wrote: > Hello, > > Rafael, if you're okay with the workaround, I'll route it through > libata/for-3.13-fixes. > > Thanks. > ------- 8< ------- > Freezable kthreads and workqueues are fundamentally problematic in > that they effectively introduce a big kernel lock widely used in the > kernel and have already been the culprit of several deadlock > scenarios. This is the latest occurrence. > > During resume, libata rescans all the ports and revalidates all > pre-existing devices. If it determines that a device has gone > missing, the device is removed from the system which involves > invalidating block device and flushing bdi while holding driver core > layer locks. Unfortunately, this can race with the rest of device > resume. Because freezable kthreads and workqueues are thawed after > device resume is complete and block device removal depends on > freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make > progress, this can lead to deadlock - block device removal can't > proceed because kthreads are frozen and kthreads can't be thawed > because device resume is blocked behind block device removal. > > 839a8e8660b6 ("writeback: replace custom worker pool implementation > with unbound workqueue") made this particular deadlock scenario more > visible but the underlying problem has always been there - the > original forker task and jbd2 are freezable too. In fact, this is > highly likely just one of many possible deadlock scenarios given that > freezer behaves as a big kernel lock and we don't have any debug > mechanism around it. > > I believe the right thing to do is getting rid of freezable kthreads > and workqueues. This is something fundamentally broken. For now, > implement a funny workaround in libata - just avoid doing block device > hot[un]plug while the system is frozen. Kernel engineering at its > finest. :( > > v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built > as a module. > > Signed-off-by: Tejun Heo > Reported-by: Tomaž Šolc > Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801 > Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org > Cc: "Rafael J. Wysocki" > Cc: Greg Kroah-Hartman > Cc: Len Brown > Cc: Oleg Nesterov > Cc: stable@vger.kernel.org > --- > drivers/ata/libata-scsi.c | 19 +++++++++++++++++++ > kernel/freezer.c | 6 ++++++ > 2 files changed, 25 insertions(+) > > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -3871,6 +3871,25 @@ void ata_scsi_hotplug(struct work_struct > return; > } > > + /* > + * XXX - UGLY HACK > + * > + * The core suspend/resume path is fundamentally broken due to > + * freezable kthreads and workqueue and may deadlock if a block > + * device gets removed while resume is in progress. I don't know > + * what the solution is short of removing freezable kthreads and > + * workqueues altogether. Do you mean the block device core or the SCSI core or something else? It would be good to clarify that here to avoid confusion. > + * The following is an ugly hack to avoid kicking off device > + * removal while freezer is active. This is a joke but does avoid > + * this particular deadlock scenario. > + * > + * https://bugzilla.kernel.org/show_bug.cgi?id=62801 > + * http://marc.info/?l=linux-kernel&m=138695698516487 > + */ > + while (pm_freezing) > + msleep(100); Why is the sleep time 100 ms exactly? And why does it matter? For example, what would change if it were 10 ms? > + > DPRINTK("ENTER\n"); > mutex_lock(&ap->scsi_scan_mutex); > > --- a/kernel/freezer.c > +++ b/kernel/freezer.c > @@ -19,6 +19,12 @@ EXPORT_SYMBOL(system_freezing_cnt); > bool pm_freezing; > bool pm_nosig_freezing; > > +/* > + * Temporary export for the deadlock workaround in ata_scsi_hotplug(). > + * Remove once the hack becomes unnecessary. > + */ > +EXPORT_SYMBOL_GPL(pm_freezing); > + > /* protects freezing and frozen transitions */ > static DEFINE_SPINLOCK(freezer_lock); > Thanks, Rafael