linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Jens Axboe <axboe@kernel.dk>,
	tomaz.solc@tablix.org, aaron.lu@intel.com,
	linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>
Subject: Re: [PATCH v2] libata, freezer: avoid block device removal while system is frozen
Date: Wed, 18 Dec 2013 02:04:35 +0100	[thread overview]
Message-ID: <3253153.CIRSLE6KOu@vostro.rjw.lan> (raw)
In-Reply-To: <20131217125042.GF29989@htj.dyndns.org>

On Tuesday, December 17, 2013 07:50:42 AM Tejun Heo wrote:
> Hello,
> 
> Rafael, if you're okay with the workaround, I'll route it through
> libata/for-3.13-fixes.
> 
> Thanks.
> ------- 8< -------
> Freezable kthreads and workqueues are fundamentally problematic in
> that they effectively introduce a big kernel lock widely used in the
> kernel and have already been the culprit of several deadlock
> scenarios.  This is the latest occurrence.
> 
> During resume, libata rescans all the ports and revalidates all
> pre-existing devices.  If it determines that a device has gone
> missing, the device is removed from the system which involves
> invalidating block device and flushing bdi while holding driver core
> layer locks.  Unfortunately, this can race with the rest of device
> resume.  Because freezable kthreads and workqueues are thawed after
> device resume is complete and block device removal depends on
> freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make
> progress, this can lead to deadlock - block device removal can't
> proceed because kthreads are frozen and kthreads can't be thawed
> because device resume is blocked behind block device removal.
> 
> 839a8e8660b6 ("writeback: replace custom worker pool implementation
> with unbound workqueue") made this particular deadlock scenario more
> visible but the underlying problem has always been there - the
> original forker task and jbd2 are freezable too.  In fact, this is
> highly likely just one of many possible deadlock scenarios given that
> freezer behaves as a big kernel lock and we don't have any debug
> mechanism around it.
> 
> I believe the right thing to do is getting rid of freezable kthreads
> and workqueues.  This is something fundamentally broken.  For now,
> implement a funny workaround in libata - just avoid doing block device
> hot[un]plug while the system is frozen.  Kernel engineering at its
> finest.  :(
> 
> v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built
>     as a module.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Tomaž Šolc <tomaz.solc@tablix.org>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801
> Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/ata/libata-scsi.c |   19 +++++++++++++++++++
>  kernel/freezer.c          |    6 ++++++
>  2 files changed, 25 insertions(+)
> 
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3871,6 +3871,25 @@ void ata_scsi_hotplug(struct work_struct
>  		return;
>  	}
>  
> +	/*
> +	 * XXX - UGLY HACK
> +	 *
> +	 * The core suspend/resume path is fundamentally broken due to
> +	 * freezable kthreads and workqueue and may deadlock if a block
> +	 * device gets removed while resume is in progress.  I don't know
> +	 * what the solution is short of removing freezable kthreads and
> +	 * workqueues altogether.

Do you mean the block device core or the SCSI core or something else?  It would
be good to clarify that here to avoid confusion.

> +	 * The following is an ugly hack to avoid kicking off device
> +	 * removal while freezer is active.  This is a joke but does avoid
> +	 * this particular deadlock scenario.
> +	 *
> +	 * https://bugzilla.kernel.org/show_bug.cgi?id=62801
> +	 * http://marc.info/?l=linux-kernel&m=138695698516487
> +	 */
> +	while (pm_freezing)
> +		msleep(100);

Why is the sleep time 100 ms exactly?  And why does it matter?

For example, what would change if it were 10 ms?

> +
>  	DPRINTK("ENTER\n");
>  	mutex_lock(&ap->scsi_scan_mutex);
>  
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -19,6 +19,12 @@ EXPORT_SYMBOL(system_freezing_cnt);
>  bool pm_freezing;
>  bool pm_nosig_freezing;
>  
> +/*
> + * Temporary export for the deadlock workaround in ata_scsi_hotplug().
> + * Remove once the hack becomes unnecessary.
> + */
> +EXPORT_SYMBOL_GPL(pm_freezing);
> +
>  /* protects freezing and frozen transitions */
>  static DEFINE_SPINLOCK(freezer_lock);
>  

Thanks,
Rafael


  reply	other threads:[~2013-12-18  0:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13 17:49 Writeback threads and freezable Tejun Heo
2013-12-13 18:52 ` Tejun Heo
2013-12-13 20:40   ` [PATCH] libata, freezer: avoid block device removal while system is frozen Tejun Heo
2013-12-13 22:45     ` Nigel Cunningham
2013-12-13 23:07       ` Tejun Heo
2013-12-13 23:15         ` Nigel Cunningham
2013-12-14  1:55           ` Dave Chinner
2013-12-14 20:31           ` Tejun Heo
2013-12-14 20:36             ` Tejun Heo
2013-12-14 21:21               ` Nigel Cunningham
2013-12-17  2:35                 ` Rafael J. Wysocki
2013-12-17  2:34             ` Rafael J. Wysocki
2013-12-17 12:34               ` Tejun Heo
2013-12-18  0:35                 ` Rafael J. Wysocki
2013-12-18 11:17                   ` Tejun Heo
2013-12-18 21:48                     ` Rafael J. Wysocki
2013-12-18 21:39                       ` Tejun Heo
2013-12-18 21:41                         ` Tejun Heo
2013-12-18 22:04                           ` Rafael J. Wysocki
2013-12-19 23:35                             ` [PATCH wq/for-3.14 1/2] workqueue: update max_active clamping rules Tejun Heo
2013-12-20  1:26                               ` Rafael J. Wysocki
2013-12-19 23:37                             ` [PATCH wq/for-3.14 2/2] workqueue: implement @drain for workqueue_set_max_active() Tejun Heo
2013-12-20  1:31                               ` Rafael J. Wysocki
2013-12-20 13:32                                 ` Tejun Heo
2013-12-20 13:56                                   ` Rafael J. Wysocki
2013-12-20 14:23                                     ` Tejun Heo
2013-12-16 12:12         ` [PATCH] libata, freezer: avoid block device removal while system is frozen Ming Lei
2013-12-16 12:45           ` Tejun Heo
2013-12-16 13:24             ` Ming Lei
2013-12-16 16:05               ` Tejun Heo
2013-12-17  2:38     ` Rafael J. Wysocki
2013-12-17 12:36       ` Tejun Heo
2013-12-18  0:23         ` Rafael J. Wysocki
2013-12-17 12:50     ` [PATCH v2] " Tejun Heo
2013-12-18  1:04       ` Rafael J. Wysocki [this message]
2013-12-18 11:08         ` Tejun Heo
2013-12-18 12:07       ` [PATCH v3] " Tejun Heo
2013-12-18 22:08         ` Rafael J. Wysocki
2013-12-19 17:24           ` Tejun Heo
2013-12-19 18:54         ` [PATCH v4] " Tejun Heo
2013-12-14  1:53 ` Writeback threads and freezable Dave Chinner
2013-12-14 17:30   ` Greg Kroah-Hartman
2013-12-14 20:23   ` Tejun Heo
2013-12-16  3:56     ` Dave Chinner
2013-12-16 12:51       ` Tejun Heo
2013-12-16 12:56         ` Tejun Heo
2013-12-18  0:35           ` Dave Chinner
2013-12-18 11:43             ` Tejun Heo
2013-12-18 22:14               ` Rafael J. Wysocki
2013-12-19  4:08               ` Dave Chinner
2013-12-19 16:24                 ` Tejun Heo
2013-12-20  0:51                   ` Dave Chinner
2013-12-20 14:51                     ` Tejun Heo
2013-12-20 14:00                   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3253153.CIRSLE6KOu@vostro.rjw.lan \
    --to=rjw@rjwysocki.net \
    --cc=aaron.lu@intel.com \
    --cc=axboe@kernel.dk \
    --cc=fengguang.wu@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tj@kernel.org \
    --cc=tomaz.solc@tablix.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).