All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: stable@vger.kernel.org, stable-commits@vger.kernel.org
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>,
	Sergei Antonov <saproj@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Vyacheslav Dubeyko <slava@dubeyko.com>,
	Sougata Santra <sougata@tuxera.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sasha.levin@oracle.com>
Subject: [added to the 3.18 stable tree] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Date: Wed, 28 Oct 2015 01:22:13 -0400	[thread overview]
Message-ID: <1446009925-26739-37-git-send-email-sasha.levin@oracle.com> (raw)
In-Reply-To: <1446009925-26739-1-git-send-email-sasha.levin@oracle.com>

From: Hin-Tak Leung <htl10@users.sourceforge.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 ]

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/hfs/bnode.c     | 9 ++++-----
 fs/hfsplus/bnode.c | 3 ---
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index d3fa6bd..221719e 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -398,11 +397,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;
 
-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }
 
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 759708f..6392466 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -566,13 +565,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-#if 0
 	int i;
 
 	for (i = 0; i < node->tree->pages_per_bnode; i++)
 		if (node->page[i])
 			page_cache_release(node->page[i]);
-#endif
 	kfree(node);
 }
 
-- 
2.1.4


  parent reply	other threads:[~2015-10-28  5:27 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] tg3: Fix temperature reporting Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] mac80211: enable assoc check for mesh interfaces Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: kconfig: Move LIST_POISON to a safe value Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: compat: fix vfp save/restore across signal handlers in big-endian Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: head.S: initialise mdcr_el2 in el2_setup Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: errata: add module build workaround for erratum #843419 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: KVM: Disable virtual timer even if the guest is not using it Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Input: evdev - do not report errors form flush() Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Enable headphone jack detect on old Fujitsu laptops Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Recompute hash value after a failed update Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] CIFS: fix type confusion in copy offload ioctl Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Add radeon suspend/resume quirk for HP Compaq dc5750 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] x86/mm: Initialize pmd_idx in page_table_range_init_count() Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] rc-core: fix remove uevent generation Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] v4l: omap3isp: Fix sub-device power management code Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Btrfs: check if previous transaction aborted to avoid fs corruption Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] NFS: nfs_set_pgio_error sometimes misses errors Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Use double word condition in 64bit CAS operation Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Filter out spurious interrupts in PA-RISC irq handler Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] vmscan: fix increasing nr_isolated incurred by putback unevictable pages Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] fs: if a coredump already exists, unlink and recreate with O_EXCL Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] mmc: core: fix race condition in mmc_wait_data_done Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] md/raid10: always set reshape_safe when initializing reshape_position Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs: fix B-tree corruption after insertion at position 0 Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/qib: Change lkey table allocation to support more MRs Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: reject invalid or unknown opcodes Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: Fix race between ib_uverbs_open and remove_one Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Forbid using sysfs to change RoCE pkeys Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Use correct SL on AH query under RoCE Sasha Levin
2015-10-28  5:22 ` Sasha Levin [this message]
2015-10-28  5:22 ` [added to the 3.18 stable tree] if_link: Add an additional parameter to ifla_vf_info for RSS querying Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] ip6_gre: release cached dst on tunnel removal Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] ipv6: fix exthdrs offload registration in out_rt path Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] net/ipv6: Correct PIM6 mrt_lock handling Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] netlink, mmap: transform mmap skb into full skb on taps Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] sctp: fix race on protocol/netns initialization Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] openvswitch: Zero flows on allocation Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] fib_rules: fix fib rule dumps across multiple skbs Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] packet: missing dev_put() in packet_do_bind() Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] udp: fix dst races with multicast early demux Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] bna: fix interrupts storm caused by erroneous packets Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Improve nested NMI comments Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Reorder nested NMI checks Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1446009925-26739-37-git-send-email-sasha.levin@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=htl10@users.sourceforge.net \
    --cc=saproj@gmail.com \
    --cc=slava@dubeyko.com \
    --cc=sougata@tuxera.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.