Linux-Block Archive on
 help / color / Atom feed
From: "Luis R. Rodriguez" <>
	"Luis R. Rodriguez" <>
Subject: [RFC 0/5] fs: replace kthread freezing with filesystem freeze/thaw
Date: Tue,  3 Oct 2017 11:53:08 -0700
Message-ID: <> (raw)

At the 2015 South Korea kernel summit Jiri Kosina had pointd out the issue of
the sloppy semantics of the kthread freezer, lwn covered this pretty well [0].
In short, he explained how the original purpose of the freezer was to aid
in going to suspend to ensure no uwanted IO activity would cause filesystem
corruption. Kernel kthreads require special freezer handling though, the call
try_to_freeze() is often sprinkled at strategic places, but sometimes this is
done without set_freezable() making try_to_freeze() useless. Other helpers such
as freezable_schedule_timeout() exist, and very likely they are not used in any
consistent and proper way either all over the kernel. Dealing with these
helpers alone also does not and cannot ensure that everything that has been
spawned asynchronously from a kthread (such as timers) are stopped as well,
these are left to the kthread user implementation, and chances are pretty high
there are many bugs lurking here. There are even non-IO bounds kthreads now
using the freezer APIs, where this is not even needed!

Jiri suggested we can easily replace the complexities of the kthread freezer
by just using the existing filesystem freeze/thaw callbacks on hibernation and

I've picked up Jiri's work given there are bugs which after inspection don't
see like real bugs, but just random IO loosely waiting to be completed and the
kernel not really being able to tell me who the culprit is. In fact even if one
plugs a fix, one ends up in another random place and its really unclear who is
to blaim for next.

For instance, to reproduce a simple suspend bug on what may at first seem to be
an XFS bug, one can issue a dd onto disk prior to suspend, and we'll get a
stall on our way to suspend, claiming the issue was the SCSI layer not being
able to quiesce the disk. This was reported on OpenSUSE and reproduced on
linux-next easily [1]. The following script can be run while we loop on
systemctl suspend and qemu system_wakeup calls to resume:

	while true; do
		dd if=/dev/zero of=crap bs=1M count=1024 &> /dev/null

You end up with with a hung suspend attempt, and eventually a splat
as follows with a hunk task notification:

INFO: task kworker/u8:8:1320 blocked for more than 10 seconds.
      Tainted: G            E   4.13.0-next-20170907+ #88
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:8    D    0  1320      2 0x80000000
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 ? remove_wait_queue+0x70/0x70
 scsi_execute+0x7a/0x310 [scsi_mod]
 sd_sync_cache+0xa3/0x190 [sd_mod]
 ? blk_run_queue+0x3f/0x50
 sd_suspend_common+0x7b/0x130 [sd_mod]
 ? scsi_print_result+0x270/0x270 [scsi_mod]
 sd_suspend_system+0x13/0x20 [sd_mod]
 do_scsi_suspend+0x1b/0x30 [scsi_mod]
 scsi_bus_suspend_common+0xb1/0xd0 [scsi_mod]
 ? device_for_each_child+0x69/0x90
 scsi_bus_suspend+0x15/0x20 [scsi_mod]
 ? scsi_bus_freeze+0x20/0x20 [scsi_mod]
 ? process_one_work+0x380/0x380
 ? kthread_create_on_node+0x70/0x70

Sprinkling freezer_do_not_count() on scsi_execute() *iff* we're going to
suspend is one way to fix this splat, however this just pushes the bug
elsewhere, and we keep doing this ever on. The right thing to do here is ensure
that after userpsace is frozen we also freeze all filesystem behaviour.

Its possible other symptoms have been observed elsewhere, and similar work
arounds are being devised. Perhaps this work can help put a stop gap for such
work arounds and complexities.

While this all seems trivial, well discussed, and so it should be well accepted
for us to go full swing with a replacement of the kthread freezer with
filesystem suspend / thaw -- not all filesystems have a respective freeze/thaw
call. Also in practice it would seem some thaw'ing calls need some revision.
For instance thawing with a sync call on ext4 currently issues a bio_submit(),
which in turn stalls on resume. Each call/filesystem then should be vetted
manually, as I've done for ext4 in this series.

Likewise while kthread freezing may have been designed to help with avoiding IO
on disk, its sloppy use on kthreads may now be providing buggy functionality
which *helps* somehow now. The removal of freezer helpers on kthreads which are
not IO bound should also be carefully vetted for to avoid potential

Then we have the few areas in the kernel which may generate disk IO which do
not come from filesystems -- this was mentioned recently the the Alpine Linux
Persistence and Storage Summit [2] -- such drivers would need to get proper
suspend calls eventually.

Given all the above then I'd like to move *carefully* with the kthread freezer
replacement with filesystem freeze/thawing instead of doing it all in one shot,
as Jiri had originally intended.

Filesystmes which have been vetted to work properly can use the super block
FS_FREEZE_ON_SUSPEND flag when things are ready as a transitional flag. I hope
we can get to the point we can rely on all filesystmes supporting it, so that
once all filesystem code is vetted for, and all non-IO bound kthread callers
have been cleared of the freezer call we can just remove the
FS_FREEZE_ON_SUSPEND flag and then kill the kthread freezer completely (the
last patch in this series).

The way this would be merged then is, only the first four patches would
be considered for upstream at first. We'd then move on to do the same with
other filesystems. Then or in parallel move on to tackle vetting removal
of all disk non-IO bounds kthread freezer callers. Only once this is all
done should we consider the last patch to put final nail on the kthread
freezer coffin.


Thoughts? Rants?

Luis R. Rodriguez (5):
  fs: add iterate_supers_reverse()
  fs: freeze on suspend and thaw on resume
  xfs: allow fs freeze on suspend/hibernation
  ext4: add fs freezing support on suspend/hibernation
  pm: remove kernel thread freezing

 Documentation/power/freezing-of-tasks.txt |   9 ---
 drivers/xen/manage.c                      |   6 --
 fs/ext4/super.c                           |  13 ++--
 fs/super.c                                | 122 ++++++++++++++++++++++++++++++
 fs/xfs/xfs_super.c                        |   3 +-
 fs/xfs/xfs_trans_ail.c                    |   7 +-
 include/linux/freezer.h                   |   4 -
 include/linux/fs.h                        |  14 ++++
 kernel/power/hibernate.c                  |  10 +--
 kernel/power/power.h                      |  20 +----
 kernel/power/process.c                    |  61 ++++-----------
 kernel/power/user.c                       |   9 ---
 tools/power/pm-graph/   |   1 -
 13 files changed, 163 insertions(+), 116 deletions(-)


             reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 18:53 Luis R. Rodriguez [this message]
2017-10-03 18:53 ` [RFC 1/5] fs: add iterate_supers_reverse() Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 2/5] fs: freeze on suspend and thaw on resume Luis R. Rodriguez
2017-10-03 20:02   ` Bart Van Assche
2017-10-03 20:23     ` Luis R. Rodriguez
2017-10-03 20:32       ` Bart Van Assche
2017-10-03 20:39         ` Luis R. Rodriguez
2017-10-03 20:06   ` Jiri Kosina
2017-10-03 20:58   ` Dave Chinner
2017-10-03 21:16     ` Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 3/5] xfs: allow fs freeze on suspend/hibernation Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 4/5] ext4: add fs freezing support " Luis R. Rodriguez
2017-10-03 19:59   ` Theodore Ts'o
2017-10-03 20:13     ` Luis R. Rodriguez
2017-10-04  1:42       ` Theodore Ts'o
2017-10-04  7:05         ` Dave Chinner
2017-10-04 15:25           ` Bart Van Assche
2017-10-04 16:48           ` Theodore Ts'o
2017-10-04 22:22             ` Dave Chinner
2017-10-03 18:53 ` [RFC 5/5] pm: remove kernel thread freezing Luis R. Rodriguez
2017-10-03 18:59   ` Rafael J. Wysocki
2017-10-03 21:15     ` Rafael J. Wysocki
2017-10-04  0:47       ` Luis R. Rodriguez
2017-10-04  1:03         ` Bart Van Assche
2017-11-29 23:05           ` Luis R. Rodriguez
2017-10-04  7:18         ` Dave Chinner
2017-10-03 20:12   ` Pavel Machek
2017-10-03 20:15     ` Jiri Kosina
2017-10-03 20:21       ` Pavel Machek
2017-10-03 20:38         ` Jiri Kosina
2017-10-03 20:41           ` Rafael J. Wysocki
2017-10-03 20:57           ` Pavel Machek
2017-10-03 21:00             ` Jiri Kosina
2017-10-03 21:09               ` Shuah Khan
2017-10-03 21:18                 ` Luis R. Rodriguez
2017-10-03 20:49     ` Luis R. Rodriguez
2017-10-06 12:07       ` Pavel Machek
2017-10-06 12:54         ` Theodore Ts'o
2017-10-03 20:13   ` Bart Van Assche
2017-10-03 20:17     ` Jiri Kosina
2017-10-03 20:21       ` Bart Van Assche
2017-10-03 20:24         ` Jiri Kosina
2017-10-03 20:27         ` Luis R. Rodriguez
2017-10-03 20:51       ` Jiri Kosina
2017-10-03 21:04   ` Dave Chinner
2017-10-03 21:07     ` Luis R. Rodriguez
2017-10-04  6:07   ` Hannes Reinecke
2017-10-03 19:33 ` [RFC 0/5] fs: replace kthread freezing with filesystem freeze/thaw Ming Lei
2017-10-03 20:05   ` Luis R. Rodriguez
2017-10-03 20:47     ` Matthew Wilcox
2017-10-03 20:54       ` Luis R. Rodriguez
2017-10-03 20:59       ` Bart Van Assche
2017-10-04 15:43     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on

Archives are clonable:
	git clone --mirror linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ \
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone