linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, stable@kernel.org,
	Dave Chinner <dchinner@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: [PATCH 4.9 29/30] fs: dont scan the inode cache before SB_BORN is set
Date: Mon,  4 Feb 2019 11:37:07 +0100	[thread overview]
Message-ID: <20190204103610.652434682@linuxfoundation.org> (raw)
In-Reply-To: <20190204103605.271746870@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@redhat.com>

commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.

We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land.  It produces
an oops down this path during the failed mount:

  radix_tree_gang_lookup_tag+0xc4/0x130
  xfs_perag_get_tag+0x37/0xf0
  xfs_reclaim_inodes_count+0x32/0x40
  xfs_fs_nr_cached_objects+0x11/0x20
  super_cache_count+0x35/0xc0
  shrink_slab.part.66+0xb1/0x370
  shrink_node+0x7e/0x1a0
  try_to_free_pages+0x199/0x470
  __alloc_pages_slowpath+0x3a1/0xd20
  __alloc_pages_nodemask+0x1c3/0x200
  cache_grow_begin+0x20b/0x2e0
  fallback_alloc+0x160/0x200
  kmem_cache_alloc+0x111/0x4e0

The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.

To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.

Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/super.c |   30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

--- a/fs/super.c
+++ b/fs/super.c
@@ -119,13 +119,23 @@ static unsigned long super_cache_count(s
 	sb = container_of(shrink, struct super_block, s_shrink);
 
 	/*
-	 * Don't call trylock_super as it is a potential
-	 * scalability bottleneck. The counts could get updated
-	 * between super_cache_count and super_cache_scan anyway.
-	 * Call to super_cache_count with shrinker_rwsem held
-	 * ensures the safety of call to list_lru_shrink_count() and
-	 * s_op->nr_cached_objects().
+	 * We don't call trylock_super() here as it is a scalability bottleneck,
+	 * so we're exposed to partial setup state. The shrinker rwsem does not
+	 * protect filesystem operations backing list_lru_shrink_count() or
+	 * s_op->nr_cached_objects(). Counts can change between
+	 * super_cache_count and super_cache_scan, so we really don't need locks
+	 * here.
+	 *
+	 * However, if we are currently mounting the superblock, the underlying
+	 * filesystem might be in a state of partial construction and hence it
+	 * is dangerous to access it.  trylock_super() uses a MS_BORN check to
+	 * avoid this situation, so do the same here. The memory barrier is
+	 * matched with the one in mount_fs() as we don't hold locks here.
 	 */
+	if (!(sb->s_flags & MS_BORN))
+		return 0;
+	smp_rmb();
+
 	if (sb->s_op && sb->s_op->nr_cached_objects)
 		total_objects = sb->s_op->nr_cached_objects(sb, sc);
 
@@ -1193,6 +1203,14 @@ mount_fs(struct file_system_type *type,
 	sb = root->d_sb;
 	BUG_ON(!sb);
 	WARN_ON(!sb->s_bdi);
+
+	/*
+	 * Write barrier is for super_cache_count(). We place it before setting
+	 * MS_BORN as the data dependency between the two functions is the
+	 * superblock structure contents that we just set up, not the MS_BORN
+	 * flag.
+	 */
+	smp_wmb();
 	sb->s_flags |= MS_BORN;
 
 	error = security_sb_kern_mount(sb, flags, secdata);



  parent reply	other threads:[~2019-02-04 10:44 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 10:36 [PATCH 4.9 00/30] 4.9.155-stable review Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 01/30] Fix "net: ipv4: do not handle duplicate fragments as overlapping" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 02/30] fs: add the fsnotify call to vfs_iter_write Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 03/30] ipv6: Consider sk_bound_dev_if when binding a socket to an address Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 04/30] l2tp: copy 4 more bytes to linear part if necessary Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 05/30] net/mlx4_core: Add masking for a few queries on HCA caps Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 06/30] netrom: switch to sock timer API Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 07/30] net/rose: fix NULL ax25_cb kernel panic Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 08/30] ucc_geth: Reset BQL queue when stopping device Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 09/30] net/mlx5e: Allow MAC invalidation while spoofchk is ON Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 10/30] l2tp: remove l2specific_len dependency in l2tp_core Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 11/30] l2tp: fix reading optional fields of L2TPv3 Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 12/30] ipvlan, l3mdev: fix broken l3s mode wrt local routes Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 13/30] CIFS: Do not count -ENODATA as failure for query directory Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 14/30] fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 15/30] ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 16/30] arm64: kaslr: ensure randomized quantities are clean also when kaslr is off Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 17/30] arm64: hyp-stub: Forbid kprobing of the hyp-stub Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 18/30] arm64: hibernate: Clean the __hyp_text to PoC after resume Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 19/30] gfs2: Revert "Fix loop in gfs2_rbm_find" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 20/30] platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.9 21/30] platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 22/30] mmc: sdhci-iproc: handle mmc_of_parse() errors during probe Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 23/30] kernel/exit.c: release ptraced tasks before zap_pid_ns_processes Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 24/30] mm, oom: fix use-after-free in oom_kill_process Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 25/30] mm: hwpoison: use do_send_sig_info() instead of force_sig() Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 26/30] mm: migrate: dont rely on __PageMovable() of newpage after unlocking it Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 27/30] cifs: Always resolve hostname before reconnecting Greg Kroah-Hartman
2019-02-04 10:37 ` [PATCH 4.9 28/30] drivers: core: Remove glue dirs from sysfs earlier Greg Kroah-Hartman
2019-02-04 10:37 ` Greg Kroah-Hartman [this message]
2019-02-04 10:37 ` [PATCH 4.9 30/30] fanotify: fix handling of events on child sub-directory Greg Kroah-Hartman
2019-02-04 10:48   ` Amir Goldstein
2019-02-04 11:03     ` Greg Kroah-Hartman
2019-02-04 21:48 ` [PATCH 4.9 00/30] 4.9.155-stable review Guenter Roeck
2019-02-05  6:39 ` Naresh Kamboju
2019-02-05 11:38 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204103610.652434682@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=dchinner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).