From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hin-Tak Leung Subject: Re: [PATCH] hfsplus: release bnode pages after use, not before Date: Thu, 18 Jun 2015 03:51:09 +0100 Message-ID: References: <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: Anton Altaparmakov , Andrew Morton , linux-fsdevel@vger.kernel.org, Sasha Levin , Anton Altaparmakov , Al Viro , Christoph Hellwig , Vyacheslav Dubeyko , Sougata Santra , Sergei Antonov To: Hin-Tak Leung Return-path: Received: from mail-wi0-f171.google.com ([209.85.212.171]:38580 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751367AbbFRCvL convert rfc822-to-8bit (ORCPT ); Wed, 17 Jun 2015 22:51:11 -0400 Received: by wibdq8 with SMTP id dq8so73044816wib.1 for ; Wed, 17 Jun 2015 19:51:09 -0700 (PDT) In-Reply-To: <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Andrew and everybody else, I looked through the pre-git history and seem to have found the reason of how the current code had come to be, and why Sergei's fix seems to involve re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, in that the first !REF_PAGES should be REF_PAGES. Then the "remove old debug code" commit below did not remove the right stuff. Looking at the old code, I think REF_PAGES may have started out being 1 (i.e. pages are released right after create, then get/put when needed, no need to release on bnode_free), then the code was modified for efficiency so that pages are not released on bnode_create and not put/get at bnode_put/get but release at bnode_free. But it was still largely working in the REF_PAGE=1 mode (and with the commented out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). Then it was flipped over, and seems to work, and things were forgotten. I think the release at bnode_free was commented out because the person who wrote it wasn't sure - I am not sure either, since it seems like there might be other/more places that one might need to release the pages than just at bnode_free(). Does this train of thought make sense? Hin-Tak commit a5e3985fa014029eb6795664c704953720cc7f7d Author: Roman Zippel Date: Tue Sep 6 15:18:47 2005 -0700 [PATCH] hfs: remove debug code This removes some old debug code, which is no longer needed. Signed-off-by: Roman Zippel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index a096c5a..3d5cdc6 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -13,8 +13,6 @@ #include "btree.h" -#define REF_PAGES 0 - void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) { @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } -#if !REF_PAGES page_cache_release(page); -#endif node->page[i] = page; } @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) { if (node) { atomic_inc(&node->refcnt); -#if REF_PAGES - { - int i; - for (i = 0; i < node->tree->pages_per_bnode; i++) - get_page(node->page[i]); - } -#endif dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", node->tree->cnid, node->this, atomic_read(&node->refcnt)); } @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) node->tree->cnid, node->this, atomic_read(&node->refcnt)); if (!atomic_read(&node->refcnt)) BUG(); - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { -#if REF_PAGES - for (i = 0; i < tree->pages_per_bnode; i++) - put_page(node->page[i]); -#endif + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) return; - } for (i = 0; i < tree->pages_per_bnode; i++) { if (!node->page[i]) continue; mark_page_accessed(node->page[i]); -#if REF_PAGES - put_page(node->page[i]); -#endif } if (test_bit(HFS_BNODE_DELETED, &node->flags)) { diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 8868d3b..b85abc6 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -18,8 +18,6 @@ #include "hfsplus_fs.h" #include "hfsplus_raw.h" -#define REF_PAGES 0 - /* Copy a specified range of bytes from the raw data of a node */ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) { @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } -#if !REF_PAGES page_cache_release(page); -#endif node->page[i] = page; } @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) { if (node) { atomic_inc(&node->refcnt); -#if REF_PAGES - { - int i; - for (i = 0; i < node->tree->pages_per_bnode; i++) - get_page(node->page[i]); - } -#endif dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", node->tree->cnid, node->this, atomic_read(&node->refcnt)); } @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) node->tree->cnid, node->this, atomic_read(&node->refcnt)); if (!atomic_read(&node->refcnt)) BUG(); - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { -#if REF_PAGES - for (i = 0; i < tree->pages_per_bnode; i++) - put_page(node->page[i]); -#endif + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) return; - } for (i = 0; i < tree->pages_per_bnode; i++) { if (!node->page[i]) continue; mark_page_accessed(node->page[i]); -#if REF_PAGES - put_page(node->page[i]); -#endif } if (test_bit(HFS_BNODE_DELETED, &node->flags)) { On 18 June 2015 at 00:41, Hin-Tak Leung wrote: > ------------------------------ > On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: > >>Hi Andrew, >> >>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >> >>Feel free to add: >> >>Reviewed-by: Anton Altaparmakov >> >>Thanks a lot in advance! >> >>Best regards, >> >> Anton > > I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. > > Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. > > BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. > I'm looking over the pre-git history to see how that comes about. > > Hin-Tak > >>> On 7 Jun 2015, at 03:42, Sergei Antonov wrote: >>> >>> Fix this bugreport by Sasha Levin: >>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>> >>> Cc: Al Viro >>> Cc: Christoph Hellwig >>> Cc: Andrew Morton >>> Cc: Vyacheslav Dubeyko >>> Cc: Hin-Tak Leung >>> Cc: Sougata Santra >>> Reported-by: Sasha Levin >>> Signed-off-by: Sergei Antonov >>> --- >>> fs/hfsplus/bnode.c | 6 ++---- >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>> index 759708f..5af50fb 100644 >>> --- a/fs/hfsplus/bnode.c >>> +++ b/fs/hfsplus/bnode.c >>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> - page_cache_release(page); >>> node->page[i] = page; >>> } >>> >>> @@ -566,13 +565,12 @@ node_error: >>> >>> void hfs_bnode_free(struct hfs_bnode *node) >>> { >>> -#if 0 >>> int i; >>> >>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>> if (node->page[i]) >>> page_cache_release(node->page[i]); >>> -#endif >>> + } >>> kfree(node); >>> } >>> >>> -- >>> 2.3.0 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>-- >>Anton Altaparmakov (replace at with @) >>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>Linux NTFS maintainer >> >