All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>,
	Sergei Antonov <saproj@gmail.com>,
	Anton Altaparmakov <anton@tuxera.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Vyacheslav Dubeyko <slava@dubeyko.com>,
	Sougata Santra <sougata@tuxera.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 20/38] [PATCH 20/38] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Date: Sun, 29 Nov 2015 22:47:22 +0100	[thread overview]
Message-ID: <20151129214703.726752716@1wt.eu> (raw)
In-Reply-To: <8acf8256ccc72771a80b7851061027bc@local>

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit dd04e674cde34f570509b9e2a6b549af89897640)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfs/bnode.c     | 9 ++++-----
 fs/hfsplus/bnode.c | 9 ++++-----
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index 0d20006..3136308 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -286,7 +286,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -396,11 +395,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;
 
-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }
 
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 29da657..7d75904 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -446,7 +446,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -556,11 +555,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;
 
-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




  parent reply	other threads:[~2015-11-29 22:01 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-29 21:47 [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 01/38] [PATCH 01/38] dcache: Handle escaped paths in prepend_path Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 03/38] [PATCH 03/38] md: use kzalloc() when bitmap is disabled Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 04/38] [PATCH 04/38] ipv6: addrconf: validate new MTU before applying it Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 05/38] [PATCH 05/38] virtio-net: drop NETIF_F_FRAGLIST Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 06/38] [PATCH 06/38] USB: whiteheat: fix potential null-deref at probe Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 07/38] [PATCH 07/38] ipc/sem.c: fully initialize sem_array before making it visible Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 08/38] [PATCH 08/38] Initialize msg/shm IPC objects before doing ipc_addid() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 10/38] [PATCH 10/38] rds: fix an integer overflow test in rds_info_getsockopt() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 11/38] [PATCH 11/38] net: Clone skb before setting peeked flag Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 12/38] [PATCH 12/38] net: Fix skb_set_peeked use-after-free bug Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 13/38] [PATCH 13/38] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 14/38] [PATCH 14/38] devres: fix devres_get() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 15/38] [PATCH 15/38] windfarm: decrement client count when unregistering Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 16/38] [PATCH 16/38] xfs: Fix xfs_attr_leafblock definition Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 17/38] [PATCH 17/38] SUNRPC: xs_reset_transport must mark the connection as disconnected Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 18/38] [PATCH 18/38] Input: evdev - do not report errors form flush() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-11-30  1:54   ` Ben Hutchings
2015-11-30  7:01     ` Willy Tarreau
2015-11-30  7:01       ` Willy Tarreau
2015-11-30 11:30       ` Willy Tarreau
2015-11-30 11:49         ` Konstantin Khlebnikov
2015-11-30 12:13           ` Willy Tarreau
2015-11-30 14:55         ` Ben Hutchings
2015-11-30 15:14           ` Willy Tarreau
2015-11-30 15:14             ` Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau [this message]
2015-11-29 21:47 ` [PATCH 2.6.32 21/38] [PATCH 21/38] hfs: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 22/38] [PATCH 22/38] x86/paravirt: Replace the paravirt nop with a bona fide empty function Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 23/38] [PATCH 23/38] RDS: verify the underlying transport exists before creating a connection Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 24/38] [PATCH 24/38] net: Fix skb csum races when peeking Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 25/38] [PATCH 25/38] net: add length argument to skb_copy_and_csum_datagram_iovec Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 26/38] [PATCH 26/38] module: Fix locking in symbol_put_addr() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 27/38] [PATCH 27/38] x86/process: Add proper bound checks in 64bit get_wchan() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 28/38] [PATCH 28/38] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 29/38] [PATCH 29/38] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 31/38] [PATCH 31/38] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 32/38] [PATCH 32/38] HID: core: Avoid uninitialized buffer access Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 33/38] [PATCH 33/38] devres: fix a for loop bounds check Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 34/38] [PATCH 34/38] binfmt_elf: Dont clobber passed executables file header Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 35/38] [PATCH 35/38] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 36/38] [PATCH 36/38] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 37/38] [PATCH 37/38] net: avoid NULL deref in inet_ctl_sock_destroy() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 38/38] [PATCH 38/38] splice: sendfile() at once fails for big files Willy Tarreau
2015-11-30  1:25 ` [PATCH 2.6.32 09/38] [PATCH 09/38] xhci: fix off by one error in TRB DMA address boundary check Willy Tarreau
2015-11-30  2:04 ` [PATCH 2.6.32 30/38] [PATCH 30/38] mvsas: Fix NULL pointer dereference in mvs_slot_task_free Willy Tarreau
2015-11-30  2:42 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-11-30  6:51   ` Willy Tarreau
2015-11-30  6:51     ` Willy Tarreau
2015-11-30 11:23     ` Willy Tarreau
2015-11-30 14:43     ` Ben Hutchings
2015-11-30 15:10       ` Willy Tarreau
     [not found] ` <20151129214702.957590241@1wt.eu>
2015-11-30  6:44   ` [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Willy Tarreau
2015-11-30 16:04 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-30 16:04   ` Willy Tarreau
2015-11-30 16:04   ` [PATCH 2.6.32 39/38] vfs: Test for and handle paths that are unreachable from their mnt_root Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 40/38] security: add cred argument to security_capable() Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-12-01  0:43   ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-12-01  6:57     ` Willy Tarreau
2015-12-01  6:57       ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151129214703.726752716@1wt.eu \
    --to=w@1wt.eu \
    --cc=akpm@linux-foundation.org \
    --cc=anton@tuxera.com \
    --cc=ben@decadent.org.uk \
    --cc=hch@infradead.org \
    --cc=htl10@users.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=saproj@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=slava@dubeyko.com \
    --cc=sougata@tuxera.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.