LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Tejun Heo <tj@kernel.org>
To: "Rafael J. Wysocki" <rjw@sisk.pl>, Jens Axboe <axboe@kernel.dk>
Cc: tomaz.solc@tablix.org, aaron.lu@intel.com,
	linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>
Subject: Writeback threads and freezable
Date: Fri, 13 Dec 2013 12:49:32 -0500
Message-ID: <20131213174932.GA27070@htj.dyndns.org> (raw)

Hello, guys.

This is discovered while investigating bug 62801 - "EliteBoot hangs at
dock, suspend, undock, resume".

 https://bugzilla.kernel.org/show_bug.cgi?id=62801

The laptop locks up during resume if undocked while suspended.  The
dock contains a hard drive and the drive removal path and resume path
get stuck.  This got bisected to 839a8e8660b6 ("writeback: replace
custom worker pool implementation with unbound workqueue") by the
reporter.  The problem can be reproduced by just removing mounted
harddrive while a machine is suspended.  I first thought it was some
dumb mistake but it turns out to be something fundamental.

So, here's the lock up.

* Resume starts and libata resume is kicked off.

* libata EH determines that the device is gone.  Eventually it invokes
  device_del() through a work item.

* device_del() tries to delete the block device which invokes
  writeback_inodes_sb() on the mounted filesystem which in turn
  schedules and flushes bdi work item.  Note that at this point, the
  kworker is holding multiple driver layer locks.

Now, if the rest of dpm resume races with the above and gets stuck
behind one of the locks libata device removal path is holding, it
deadlocks because thaw_workqueues() is called only after dpm resume is
complete.  libata device removal waits for the scheduled writeback
work item to finish which in turn is waiting for the workqueue to be
thawed, and dpm resume can't make it to thaw_workqueues() because it's
stuck behind one of the mutexes held by the libata device removal
path.

I think the issues already existed before 839a8e8660b6 ("writeback:
replace custom worker pool implementation with unbound workqueue")
although the patch seems to have exposed it a lot more.  The issue
probably was a lot less visible because, if the bdi had a flusher
spawned, kthread_stop() overrides freezer and if not the device has
been idle & clean for some time and unlikely to require any action on
shutdown, but the forker thread was still freezable and I think it
should be possible to trigger some condition where forker action is
waited upon during shutdown sequence.

The fundamental problem, really, is that device drivers make use of
kthreads and/or workqueues in their operation and freezable kthreads
and workqueues essentially behave as a giant lock.  If any
suspend/resume operation somehow ends up depending on a freezable
kthread or workqueue, which can happen through quite convoluted and
long dependency chain, there are possibilities for deadlock even if
those operations themselves aren't interlocked.  In this case, libata
resume finishes fine and device removal is run completely
asynchronously but the dependency chain is still completed through
driver layer locks.

I've always thought that using freezer for kernel threads / workqueues
is something fundamentally undesirable, exactly because freezer acts
as a giant lock and we don't have any meangingful debug support around
it.  Even in leaf drivers proper where the dependency isn't
complicated, this has ample possibility to lead to surprise lockups.

I'm unsure why bdi should be freezable.  It's not like corking things
at bdi can reliably prevent block layer from issuing further commands
down to drivers.  Maybe there are things which directly hook into bdi
and depend on it being frozen, I don't know, but it just seems
dangerous to inject a large locking dependency into the chain from
such high level construct.

I'm not completely sure what to do.  We can make an exception for
bdi_wq and thaw it early but that breaks the guarantee that bdi is
frozen between device suspend and resume invocations and we might as
well just remove WQ_FREEZABLE from it.  I don't know.

Ideas?

Thanks.

-- 
tejun

             reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13 17:49 Tejun Heo [this message]
2013-12-13 18:52 ` Tejun Heo
2013-12-13 20:40   ` [PATCH] libata, freezer: avoid block device removal while system is frozen Tejun Heo
2013-12-13 22:45     ` Nigel Cunningham
2013-12-13 23:07       ` Tejun Heo
2013-12-13 23:15         ` Nigel Cunningham
2013-12-14  1:55           ` Dave Chinner
2013-12-14 20:31           ` Tejun Heo
2013-12-14 20:36             ` Tejun Heo
2013-12-14 21:21               ` Nigel Cunningham
2013-12-17  2:35                 ` Rafael J. Wysocki
2013-12-17  2:34             ` Rafael J. Wysocki
2013-12-17 12:34               ` Tejun Heo
2013-12-18  0:35                 ` Rafael J. Wysocki
2013-12-18 11:17                   ` Tejun Heo
2013-12-18 21:48                     ` Rafael J. Wysocki
2013-12-18 21:39                       ` Tejun Heo
2013-12-18 21:41                         ` Tejun Heo
2013-12-18 22:04                           ` Rafael J. Wysocki
2013-12-19 23:35                             ` [PATCH wq/for-3.14 1/2] workqueue: update max_active clamping rules Tejun Heo
2013-12-20  1:26                               ` Rafael J. Wysocki
2013-12-19 23:37                             ` [PATCH wq/for-3.14 2/2] workqueue: implement @drain for workqueue_set_max_active() Tejun Heo
2013-12-20  1:31                               ` Rafael J. Wysocki
2013-12-20 13:32                                 ` Tejun Heo
2013-12-20 13:56                                   ` Rafael J. Wysocki
2013-12-20 14:23                                     ` Tejun Heo
2013-12-16 12:12         ` [PATCH] libata, freezer: avoid block device removal while system is frozen Ming Lei
2013-12-16 12:45           ` Tejun Heo
2013-12-16 13:24             ` Ming Lei
2013-12-16 16:05               ` Tejun Heo
2013-12-17  2:38     ` Rafael J. Wysocki
2013-12-17 12:36       ` Tejun Heo
2013-12-18  0:23         ` Rafael J. Wysocki
2013-12-17 12:50     ` [PATCH v2] " Tejun Heo
2013-12-18  1:04       ` Rafael J. Wysocki
2013-12-18 11:08         ` Tejun Heo
2013-12-18 12:07       ` [PATCH v3] " Tejun Heo
2013-12-18 22:08         ` Rafael J. Wysocki
2013-12-19 17:24           ` Tejun Heo
2013-12-19 18:54         ` [PATCH v4] " Tejun Heo
2013-12-14  1:53 ` Writeback threads and freezable Dave Chinner
2013-12-14 17:30   ` Greg Kroah-Hartman
2013-12-14 20:23   ` Tejun Heo
2013-12-16  3:56     ` Dave Chinner
2013-12-16 12:51       ` Tejun Heo
2013-12-16 12:56         ` Tejun Heo
2013-12-18  0:35           ` Dave Chinner
2013-12-18 11:43             ` Tejun Heo
2013-12-18 22:14               ` Rafael J. Wysocki
2013-12-19  4:08               ` Dave Chinner
2013-12-19 16:24                 ` Tejun Heo
2013-12-20  0:51                   ` Dave Chinner
2013-12-20 14:51                     ` Tejun Heo
2013-12-20 14:00                   ` Rafael J. Wysocki

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131213174932.GA27070@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=axboe@kernel.dk \
    --cc=fengguang.wu@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=tomaz.solc@tablix.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox