* [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus @ 2015-06-28 1:39 Hin-Tak Leung 2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung 2015-06-28 1:39 ` [PATCH] hfs: fix B-tree corruption after insertion at position 0 Hin-Tak Leung 0 siblings, 2 replies; 16+ messages in thread From: Hin-Tak Leung @ 2015-06-28 1:39 UTC (permalink / raw) To: linux-fsdevel; +Cc: Hin-Tak Leung From: Hin-Tak Leung <htl10@users.sourceforge.net> At least 4 people had expressed concerns about the clarity of the patch description of Sergei Antonov's "hfsplus: release bnode pages after use, not before"; but also at least two people (me and Anton Altaparmakov) besides Sergei really think the issue addressed by the patch is important to fix and should be fixed soon. So here is an attempt to write a different and hopefully clearer patch description. Along the way, I also think it would be a good idea to keep the hfs and hfsplus code in sync where it is obvious to do so, so the first patch is a combo hfs+hfsplus one, but otherwise functionally identical to Sergei's. The 2nd patch is an hfs follow-up to a previous hfsplus change. Hin-Tak Leung (2): hfs,hfsplus: cache pages correctly between bnode_create and bnode_free hfs: fix B-tree corruption after insertion at position 0 fs/hfs/bnode.c | 9 ++++----- fs/hfs/brec.c | 20 +++++++++++--------- fs/hfsplus/bnode.c | 3 --- 3 files changed, 15 insertions(+), 17 deletions(-) -- 2.4.3 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-06-28 1:39 [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus Hin-Tak Leung @ 2015-06-28 1:39 ` Hin-Tak Leung 2015-06-28 18:52 ` Sergei Antonov 2015-06-30 15:55 ` Fwd: " Hin-Tak Leung 2015-06-28 1:39 ` [PATCH] hfs: fix B-tree corruption after insertion at position 0 Hin-Tak Leung 1 sibling, 2 replies; 16+ messages in thread From: Hin-Tak Leung @ 2015-06-28 1:39 UTC (permalink / raw) To: linux-fsdevel Cc: Hin-Tak Leung, Sergei Antonov, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra From: Hin-Tak Leung <htl10@users.sourceforge.net> Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and hfs_bnode_find() for finding or creating pages corresponding to an inode) are immediately kmap()'ed and used (both read and write) and kunmap()'ed, and should not be page_cache_release()'ed until hfs_bnode_free(). This patch fixes a problem I first saw in July 2012: merely running "du" on a large hfsplus-mounted directory a few times on a reasonably loaded system would get the hfsplus driver all confused and complaining about B-tree inconsistencies, and generates a "BUG: Bad page state". Most recently, I can generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt" simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of the full Mac OS X 10.9: $ df -i / /home /mnt Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 551665 2725135 17% / /dev/mapper/fedora-home 52879360 716221 52163139 2% /home /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt After applying the patch, I was able to run "du /" (60+ times) and "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. There are many reports of the hfsplus driver getting confused under load and generating "BUG: Bad page state" or other similar issues over the years. [1] The unpatched code [2] has always been wrong since it entered the kernel tree. The only reason why it gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else, most of the time. The current RW driver appears to have followed the design and development of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a B-tree node-centric approach to read_cache_page()/page_cache_release() per bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching and releasing pages per inode extents. When the current RW code first entered the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented out code) to switch between B-node centric paging to inode-centric paging. There was a mistake with the direction of one of the REF_PAGES conditionals in __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were removed, but a page_cache_release() was mistakenly left in (propagating the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release() in bnode_release() (which should be spanned by !REF_PAGES) was never enabled. References: [1]: Michael Fox, Apr 2013 http://www.spinics.net/lists/linux-fsdevel/msg63807.html ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") Sasha Levin, Feb 2015 http://lkml.org/lkml/2015/2/20/85 ("use after free") https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 https://bugzilla.kernel.org/show_bug.cgi?id=42342 https://bugzilla.kernel.org/show_bug.cgi?id=63841 https://bugzilla.kernel.org/show_bug.cgi?id=78761 [2]: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db commit d1081202f1d0ee35ab0beb490da4b65d4bc763db Author: Andrew Morton <akpm@osdl.org> Date: Wed Feb 25 16:17:36 2004 -0800 [PATCH] HFS rewrite http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton <akpm@osdl.org> Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support [3]: http://sourceforge.net/projects/linux-hfsplus/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ fs/hfsplus/bnode.c?r1=1.4&r2=1.5 Date: Thu Jun 6 09:45:14 2002 +0000 Use buffer cache instead of page cache in bnode.c. Cache inode extents. [4]: http://git.kernel.org/cgit/linux/kernel/git/\ stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d commit a5e3985fa014029eb6795664c704953720cc7f7d Author: Roman Zippel <zippel@linux-m68k.org> Date: Tue Sep 6 15:18:47 2005 -0700 [PATCH] hfs: remove debug code Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Sougata Santra <sougata@tuxera.com> --- fs/hfs/bnode.c | 9 ++++----- fs/hfsplus/bnode.c | 3 --- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index d3fa6bd..221719e 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } - page_cache_release(page); node->page[i] = page; } @@ -398,11 +397,11 @@ node_error: void hfs_bnode_free(struct hfs_bnode *node) { - //int i; + int i; - //for (i = 0; i < node->tree->pages_per_bnode; i++) - // if (node->page[i]) - // page_cache_release(node->page[i]); + for (i = 0; i < node->tree->pages_per_bnode; i++) + if (node->page[i]) + page_cache_release(node->page[i]); kfree(node); } diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 759708f..6392466 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } - page_cache_release(page); node->page[i] = page; } @@ -566,13 +565,11 @@ node_error: void hfs_bnode_free(struct hfs_bnode *node) { -#if 0 int i; for (i = 0; i < node->tree->pages_per_bnode; i++) if (node->page[i]) page_cache_release(node->page[i]); -#endif kfree(node); } -- 2.4.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung @ 2015-06-28 18:52 ` Sergei Antonov 2015-06-30 15:40 ` Hin-Tak Leung 2015-06-30 15:55 ` Fwd: " Hin-Tak Leung 1 sibling, 1 reply; 16+ messages in thread From: Sergei Antonov @ 2015-06-28 18:52 UTC (permalink / raw) To: Hin-Tak Leung Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 28 June 2015 at 03:39, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > From: Hin-Tak Leung <htl10@users.sourceforge.net> > > Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and > hfs_bnode_find() for finding or creating pages corresponding to an inode) are > immediately kmap()'ed and used (both read and write) and kunmap()'ed, and > should not be page_cache_release()'ed until hfs_bnode_free(). > > This patch fixes a problem I first saw in July 2012: merely running "du" on a > large hfsplus-mounted directory a few times on a reasonably loaded system would > get the hfsplus driver all confused and complaining about B-tree > inconsistencies, and generates a "BUG: Bad page state". Most recently, I can > generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by > running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt" > simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of > the full Mac OS X 10.9: > > $ df -i / /home /mnt > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/mapper/fedora-root 3276800 551665 2725135 17% / > /dev/mapper/fedora-home 52879360 716221 52163139 2% /home > /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt > > After applying the patch, I was able to run "du /" (60+ times) and > "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. > > There are many reports of the hfsplus driver getting confused under load and > generating "BUG: Bad page state" or other similar issues over the years. [1] > > The unpatched code [2] has always been wrong since it entered the kernel tree. > The only reason why it gets away with it is that the kmap/memcpy/kunmap follow > very quickly after the page_cache_release() so the kernel has not had a chance > to reuse the memory for something else, most of the time. > > The current RW driver appears to have followed the design and development of > the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a > B-tree node-centric approach to read_cache_page()/page_cache_release() per > bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching > and releasing pages per inode extents. When the current RW code first entered > the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented > out code) to switch between B-node centric paging to inode-centric paging. > There was a mistake with the direction of one of the REF_PAGES conditionals in > __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the > read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were > removed, but a page_cache_release() was mistakenly left in (propagating the > "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release() > in bnode_release() (which should be spanned by !REF_PAGES) was never enabled. > > References: > [1]: > Michael Fox, Apr 2013 > http://www.spinics.net/lists/linux-fsdevel/msg63807.html > ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") > > Sasha Levin, Feb 2015 > http://lkml.org/lkml/2015/2/20/85 ("use after free") > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 > https://bugzilla.kernel.org/show_bug.cgi?id=42342 > https://bugzilla.kernel.org/show_bug.cgi?id=63841 > https://bugzilla.kernel.org/show_bug.cgi?id=78761 > > [2]: > http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ > fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db > commit d1081202f1d0ee35ab0beb490da4b65d4bc763db > Author: Andrew Morton <akpm@osdl.org> > Date: Wed Feb 25 16:17:36 2004 -0800 > > [PATCH] HFS rewrite > > http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ > fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd > > commit 91556682e0bf004d98a529bf829d339abb98bbbd > Author: Andrew Morton <akpm@osdl.org> > Date: Wed Feb 25 16:17:48 2004 -0800 > > [PATCH] HFS+ support > > [3]: > http://sourceforge.net/projects/linux-hfsplus/ > > http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ > http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ > > http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ > fs/hfsplus/bnode.c?r1=1.4&r2=1.5 > > Date: Thu Jun 6 09:45:14 2002 +0000 > Use buffer cache instead of page cache in bnode.c. Cache inode extents. > > [4]: > http://git.kernel.org/cgit/linux/kernel/git/\ > stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d > > commit a5e3985fa014029eb6795664c704953720cc7f7d > Author: Roman Zippel <zippel@linux-m68k.org> > Date: Tue Sep 6 15:18:47 2005 -0700 > > [PATCH] hfs: remove debug code > > Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> > Signed-off-by: Sergei Antonov <saproj@gmail.com> > Reviewed-by: Anton Altaparmakov <anton@tuxera.com> > Reported-by: Sasha Levin <sasha.levin@oracle.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Vyacheslav Dubeyko <slava@dubeyko.com> > Cc: Sougata Santra <sougata@tuxera.com> > --- > fs/hfs/bnode.c | 9 ++++----- > fs/hfsplus/bnode.c | 3 --- > 2 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c > index d3fa6bd..221719e 100644 > --- a/fs/hfs/bnode.c > +++ b/fs/hfs/bnode.c > @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) > page_cache_release(page); > goto fail; > } > - page_cache_release(page); > node->page[i] = page; > } > > @@ -398,11 +397,11 @@ node_error: > > void hfs_bnode_free(struct hfs_bnode *node) > { > - //int i; > + int i; > > - //for (i = 0; i < node->tree->pages_per_bnode; i++) > - // if (node->page[i]) > - // page_cache_release(node->page[i]); > + for (i = 0; i < node->tree->pages_per_bnode; i++) > + if (node->page[i]) > + page_cache_release(node->page[i]); > kfree(node); > } > > diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c > index 759708f..6392466 100644 > --- a/fs/hfsplus/bnode.c > +++ b/fs/hfsplus/bnode.c > @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) > page_cache_release(page); > goto fail; > } > - page_cache_release(page); > node->page[i] = page; > } > > @@ -566,13 +565,11 @@ node_error: > > void hfs_bnode_free(struct hfs_bnode *node) > { > -#if 0 > int i; > > for (i = 0; i < node->tree->pages_per_bnode; i++) > if (node->page[i]) > page_cache_release(node->page[i]); > -#endif > kfree(node); > } > > -- > 2.4.3 If I fix something else in hfsplus in the future, will you again submit a combined hfsplus+hfs patch? I would prefer separation. Hoped to receive your "Tested-by:" for my "hfsplus: release bnode pages after use, not before" and then submit a V2 of it with a longer description. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-06-28 18:52 ` Sergei Antonov @ 2015-06-30 15:40 ` Hin-Tak Leung 2015-07-01 16:09 ` Sergei Antonov 0 siblings, 1 reply; 16+ messages in thread From: Hin-Tak Leung @ 2015-06-30 15:40 UTC (permalink / raw) To: Sergei Antonov Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 28 June 2015 at 19:52, Sergei Antonov <saproj@gmail.com> wrote: <snipped> > If I fix something else in hfsplus in the future, will you again > submit a combined hfsplus+hfs patch? I would prefer separation. Hoped > to receive your "Tested-by:" for my "hfsplus: release bnode pages > after use, not before" and then submit a V2 of it with a longer > description. Possibly yes, if the patch description is clearly unsatisfactory and deemed incomprehensible, and you have not re-submitted a v2 within a reasonable time. I already explained why I re-submitted with a different patch description in the first of 3 below: [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and [PATCH] hfs: fix B-tree corruption after insertion at position 0 Please just re-submit v2 yourself if more than a few people thinks your patch description is unsatisfactory, instead of waiting for somebody else to do it for you; and also please just say "thank you", when others are willing spend their valuable time to look at and check and verify what you do. I would not be willing to put my name down with a Test-By: or Signed-of: on your original patch as it was. Hin-Tak ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-06-30 15:40 ` Hin-Tak Leung @ 2015-07-01 16:09 ` Sergei Antonov 2015-07-01 23:24 ` Hin-Tak Leung 0 siblings, 1 reply; 16+ messages in thread From: Sergei Antonov @ 2015-07-01 16:09 UTC (permalink / raw) To: Hin-Tak Leung Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 30 June 2015 at 17:40, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 28 June 2015 at 19:52, Sergei Antonov <saproj@gmail.com> wrote: > <snipped> >> If I fix something else in hfsplus in the future, will you again >> submit a combined hfsplus+hfs patch? I would prefer separation. Hoped >> to receive your "Tested-by:" for my "hfsplus: release bnode pages >> after use, not before" and then submit a V2 of it with a longer >> description. > > Possibly yes, if the patch description is clearly unsatisfactory and > deemed incomprehensible, and you have not re-submitted a v2 > within a reasonable time. I already explained why I re-submitted > with a different patch description in the first of 3 below: > > [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus > [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and > [PATCH] hfs: fix B-tree corruption after insertion at position 0 > > Please just re-submit v2 yourself if more than a few people thinks your patch > description is unsatisfactory, instead of waiting for somebody else to > do it for you; > and also please just say "thank you", when others are willing spend their > valuable time to look at and check and verify what you do. This "what I do" fixes the problem you have been complaining about for years. The historical research you have done is interesting, but simple testing is to be expected in the first place. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-01 16:09 ` Sergei Antonov @ 2015-07-01 23:24 ` Hin-Tak Leung 2015-07-02 17:04 ` Sergei Antonov 0 siblings, 1 reply; 16+ messages in thread From: Hin-Tak Leung @ 2015-07-01 23:24 UTC (permalink / raw) To: Sergei Antonov Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 1 July 2015 at 17:09, Sergei Antonov <saproj@gmail.com> wrote: > On 30 June 2015 at 17:40, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >> On 28 June 2015 at 19:52, Sergei Antonov <saproj@gmail.com> wrote: >> <snipped> >>> If I fix something else in hfsplus in the future, will you again >>> submit a combined hfsplus+hfs patch? I would prefer separation. Hoped >>> to receive your "Tested-by:" for my "hfsplus: release bnode pages >>> after use, not before" and then submit a V2 of it with a longer >>> description. >> >> Possibly yes, if the patch description is clearly unsatisfactory and >> deemed incomprehensible, and you have not re-submitted a v2 >> within a reasonable time. I already explained why I re-submitted >> with a different patch description in the first of 3 below: >> >> [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus >> [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and >> [PATCH] hfs: fix B-tree corruption after insertion at position 0 >> >> Please just re-submit v2 yourself if more than a few people thinks your patch >> description is unsatisfactory, instead of waiting for somebody else to >> do it for you; >> and also please just say "thank you", when others are willing spend their >> valuable time to look at and check and verify what you do. > > This "what I do" fixes the problem you have been complaining about for > years. The historical research you have done is interesting, but > simple testing is to be expected in the first place. You are still trying to argue that your patch description is not poor, as you did a few times in this thread already. Quite a few of us had already said it is poor. I did not think you were going to submit a v2, because you have not really accepted any criticisms as valid, either. You don't think the lost development history between 2001 and 2005 is important. I think it is. That is clearly reflected in how poor your patch description is, and why I re-wrote the patch description. Not new idea here. I am still waiting for that 'thank you' for time spent on testing and collating all the discussion into the new patch description. Hin-Tak ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-01 23:24 ` Hin-Tak Leung @ 2015-07-02 17:04 ` Sergei Antonov 2015-07-02 18:02 ` Hin-Tak Leung 0 siblings, 1 reply; 16+ messages in thread From: Sergei Antonov @ 2015-07-02 17:04 UTC (permalink / raw) To: Hin-Tak Leung Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 2 July 2015 at 01:24, Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 1 July 2015 at 17:09, Sergei Antonov <saproj@gmail.com> wrote: >> On 30 June 2015 at 17:40, Hin-Tak Leung <hintak.leung@gmail.com> wrote: >>> On 28 June 2015 at 19:52, Sergei Antonov <saproj@gmail.com> wrote: >>> <snipped> >>>> If I fix something else in hfsplus in the future, will you again >>>> submit a combined hfsplus+hfs patch? I would prefer separation. Hoped >>>> to receive your "Tested-by:" for my "hfsplus: release bnode pages >>>> after use, not before" and then submit a V2 of it with a longer >>>> description. >>> >>> Possibly yes, if the patch description is clearly unsatisfactory and >>> deemed incomprehensible, and you have not re-submitted a v2 >>> within a reasonable time. I already explained why I re-submitted >>> with a different patch description in the first of 3 below: >>> >>> [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus >>> [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and >>> [PATCH] hfs: fix B-tree corruption after insertion at position 0 >>> >>> Please just re-submit v2 yourself if more than a few people thinks your patch >>> description is unsatisfactory, instead of waiting for somebody else to >>> do it for you; >>> and also please just say "thank you", when others are willing spend their >>> valuable time to look at and check and verify what you do. >> >> This "what I do" fixes the problem you have been complaining about for >> years. The historical research you have done is interesting, but >> simple testing is to be expected in the first place. > > You are still trying to argue that your patch description is not poor, > as you did a few times in this thread already. Quite a few of us had > already said > it is poor. I did not think you were going to submit a v2, because you > have not really accepted any criticisms as valid, either. > > You don't think the lost development history between 2001 > and 2005 is important. I think it is. That is clearly reflected in how poor > your patch description is, and why I re-wrote the patch description. > Not new idea here. > > I am still waiting for that 'thank you' for time spent on testing and collating > all the discussion into the new patch description. Just sent a V2 of my patch. Took three links to previous bug reports found by you. Thank you for them! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-02 17:04 ` Sergei Antonov @ 2015-07-02 18:02 ` Hin-Tak Leung 2015-07-07 22:19 ` Andrew Morton 0 siblings, 1 reply; 16+ messages in thread From: Hin-Tak Leung @ 2015-07-02 18:02 UTC (permalink / raw) To: Sergei Antonov Cc: linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Andrew Morton, Vyacheslav Dubeyko, Sougata Santra On 2 July 2015 at 18:04, Sergei Antonov <saproj@gmail.com> wrote: <snipped> > > > Just sent a V2 of my patch. Took three links to previous bug reports > found by you. Thank you for them! Your v2 is still poor, and you have not taken much review on board. I am getting tired of this not-discussion. I'd like to see who in the CC'ed want to Acked-By/Reviewed-By/Signed-off on which one to go into the kernel. Hin-Tak ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-02 18:02 ` Hin-Tak Leung @ 2015-07-07 22:19 ` Andrew Morton 2015-07-07 23:19 ` Sergei Antonov ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Andrew Morton @ 2015-07-07 22:19 UTC (permalink / raw) To: Hin-Tak Leung Cc: Sergei Antonov, linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra On Thu, 2 Jul 2015 19:02:15 +0100 Hin-Tak Leung <hintak.leung@gmail.com> wrote: > On 2 July 2015 at 18:04, Sergei Antonov <saproj@gmail.com> wrote: > <snipped> > > > > > > Just sent a V2 of my patch. Took three links to previous bug reports > > found by you. Thank you for them! > > Your v2 is still poor, and you have not taken much review on board. > I am getting tired of this not-discussion. > > I'd like to see who in the CC'ed want to Acked-By/Reviewed-By/Signed-off > on which one to go into the kernel. > c'mon guys, this drama isn't helping. I merged Hin-Tak's two patches based on Vyacheslav's recommendation and because I like elaborate changelogs. Sergei, should I convert Cc:you into Signed-off-by:you? I added cc:stable to both patches. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-07 22:19 ` Andrew Morton @ 2015-07-07 23:19 ` Sergei Antonov 2015-07-08 14:58 ` Sergei Antonov 2015-08-03 13:33 ` Luc Pionchon 2 siblings, 0 replies; 16+ messages in thread From: Sergei Antonov @ 2015-07-07 23:19 UTC (permalink / raw) To: Andrew Morton Cc: Hin-Tak Leung, linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra On 8 July 2015 at 00:19, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 2 Jul 2015 19:02:15 +0100 Hin-Tak Leung <hintak.leung@gmail.com> wrote: > >> On 2 July 2015 at 18:04, Sergei Antonov <saproj@gmail.com> wrote: >> <snipped> >> > >> > >> > Just sent a V2 of my patch. Took three links to previous bug reports >> > found by you. Thank you for them! >> >> Your v2 is still poor, and you have not taken much review on board. >> I am getting tired of this not-discussion. >> >> I'd like to see who in the CC'ed want to Acked-By/Reviewed-By/Signed-off >> on which one to go into the kernel. >> > > c'mon guys, this drama isn't helping. > > I merged Hin-Tak's two patches based on Vyacheslav's recommendation and > because I like elaborate changelogs. > > Sergei, should I convert Cc:you into Signed-off-by:you? No, thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-07 22:19 ` Andrew Morton 2015-07-07 23:19 ` Sergei Antonov @ 2015-07-08 14:58 ` Sergei Antonov 2015-07-08 15:29 ` Hin-Tak Leung 2015-08-03 13:33 ` Luc Pionchon 2 siblings, 1 reply; 16+ messages in thread From: Sergei Antonov @ 2015-07-08 14:58 UTC (permalink / raw) To: Andrew Morton Cc: Hin-Tak Leung, linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra On 8 July 2015 at 00:19, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 2 Jul 2015 19:02:15 +0100 Hin-Tak Leung <hintak.leung@gmail.com> wrote: > >> On 2 July 2015 at 18:04, Sergei Antonov <saproj@gmail.com> wrote: >> <snipped> >> > >> > >> > Just sent a V2 of my patch. Took three links to previous bug reports >> > found by you. Thank you for them! >> >> Your v2 is still poor, and you have not taken much review on board. >> I am getting tired of this not-discussion. >> >> I'd like to see who in the CC'ed want to Acked-By/Reviewed-By/Signed-off >> on which one to go into the kernel. >> > > c'mon guys, this drama isn't helping. Can't help mentioning it: this drama reminds me that of Grisha Perelman's and Shing-Tung Yau's. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-08 14:58 ` Sergei Antonov @ 2015-07-08 15:29 ` Hin-Tak Leung 0 siblings, 0 replies; 16+ messages in thread From: Hin-Tak Leung @ 2015-07-08 15:29 UTC (permalink / raw) To: Sergei Antonov Cc: Andrew Morton, linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra On 8 July 2015 at 15:58, Sergei Antonov <saproj@gmail.com> wrote: > On 8 July 2015 at 00:19, Andrew Morton <akpm@linux-foundation.org> wrote: >> On Thu, 2 Jul 2015 19:02:15 +0100 Hin-Tak Leung <hintak.leung@gmail.com> wrote: >> >>> On 2 July 2015 at 18:04, Sergei Antonov <saproj@gmail.com> wrote: >>> <snipped> >>> > >>> > >>> > Just sent a V2 of my patch. Took three links to previous bug reports >>> > found by you. Thank you for them! >>> >>> Your v2 is still poor, and you have not taken much review on board. >>> I am getting tired of this not-discussion. >>> >>> I'd like to see who in the CC'ed want to Acked-By/Reviewed-By/Signed-off >>> on which one to go into the kernel. >>> >> >> c'mon guys, this drama isn't helping. > > Can't help mentioning it: this drama reminds me that of Grisha > Perelman's and Shing-Tung Yau's. I do not know of either of those individuals, and just google'd them and learned that they are mathematicians of some sort, the latter obviously chinese, the former, apparently russian. I would really prefer that you do not try to make any points about others' cultural/ethnic origins. You are not endearing yourself to others by making any points other than quiet acceptance of others' cultural/ethnic origins. That sort of behaviour is called "bigotry", and frowned upon in most modern civilized societies. Please do not do that. FWIW, my usual response to "are you chinese?" from an unknown chinese person in the street, is usually either "no", or pretend that I do not understand the question. In any case, it is irrelevant and none of anybody's business, including you. Hin-Tak ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-07-07 22:19 ` Andrew Morton 2015-07-07 23:19 ` Sergei Antonov 2015-07-08 14:58 ` Sergei Antonov @ 2015-08-03 13:33 ` Luc Pionchon 2 siblings, 0 replies; 16+ messages in thread From: Luc Pionchon @ 2015-08-03 13:33 UTC (permalink / raw) To: Andrew Morton Cc: Hin-Tak Leung, Sergei Antonov, linux-fsdevel, Hin-Tak Leung, Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra On 8 July 2015 at 00:19, Andrew Morton <akpm@linux-foundation.org> wrote: > [snip] > > I merged Hin-Tak's two patches based on Vyacheslav's recommendation and > because I like elaborate changelogs. into which kernels were the patches applied ? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Fwd: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free 2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung 2015-06-28 18:52 ` Sergei Antonov @ 2015-06-30 15:55 ` Hin-Tak Leung [not found] ` <CABbL6oanheE9JAwevN8Mrf-5o=y-e7JV2Nt4VW_-iW9Pt3jQuQ@mail.gmail.com> 1 sibling, 1 reply; 16+ messages in thread From: Hin-Tak Leung @ 2015-06-30 15:55 UTC (permalink / raw) To: 415fox, linux-fsdevel Michael Fox: FYI. If you feel like responding (and possibly add a "Tested-By:"), please include liux-fs-devel linux-fsdevel in the reply. ---------- Forwarded message ---------- From: Hin-Tak Leung <hintak.leung@gmail.com> Date: 28 June 2015 at 02:39 Subject: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free To: linux-fsdevel@vger.kernel.org Cc: Hin-Tak Leung <htl10@users.sourceforge.net>, Sergei Antonov <saproj@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Vyacheslav Dubeyko <slava@dubeyko.com>, Sougata Santra <sougata@tuxera.com> From: Hin-Tak Leung <htl10@users.sourceforge.net> Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and hfs_bnode_find() for finding or creating pages corresponding to an inode) are immediately kmap()'ed and used (both read and write) and kunmap()'ed, and should not be page_cache_release()'ed until hfs_bnode_free(). This patch fixes a problem I first saw in July 2012: merely running "du" on a large hfsplus-mounted directory a few times on a reasonably loaded system would get the hfsplus driver all confused and complaining about B-tree inconsistencies, and generates a "BUG: Bad page state". Most recently, I can generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt" simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of the full Mac OS X 10.9: $ df -i / /home /mnt Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 551665 2725135 17% / /dev/mapper/fedora-home 52879360 716221 52163139 2% /home /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt After applying the patch, I was able to run "du /" (60+ times) and "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. There are many reports of the hfsplus driver getting confused under load and generating "BUG: Bad page state" or other similar issues over the years. [1] The unpatched code [2] has always been wrong since it entered the kernel tree. The only reason why it gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else, most of the time. The current RW driver appears to have followed the design and development of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a B-tree node-centric approach to read_cache_page()/page_cache_release() per bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching and releasing pages per inode extents. When the current RW code first entered the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented out code) to switch between B-node centric paging to inode-centric paging. There was a mistake with the direction of one of the REF_PAGES conditionals in __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were removed, but a page_cache_release() was mistakenly left in (propagating the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release() in bnode_release() (which should be spanned by !REF_PAGES) was never enabled. References: [1]: Michael Fox, Apr 2013 http://www.spinics.net/lists/linux-fsdevel/msg63807.html ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") Sasha Levin, Feb 2015 http://lkml.org/lkml/2015/2/20/85 ("use after free") https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 https://bugzilla.kernel.org/show_bug.cgi?id=42342 https://bugzilla.kernel.org/show_bug.cgi?id=63841 https://bugzilla.kernel.org/show_bug.cgi?id=78761 [2]: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db commit d1081202f1d0ee35ab0beb490da4b65d4bc763db Author: Andrew Morton <akpm@osdl.org> Date: Wed Feb 25 16:17:36 2004 -0800 [PATCH] HFS rewrite http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton <akpm@osdl.org> Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support [3]: http://sourceforge.net/projects/linux-hfsplus/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ fs/hfsplus/bnode.c?r1=1.4&r2=1.5 Date: Thu Jun 6 09:45:14 2002 +0000 Use buffer cache instead of page cache in bnode.c. Cache inode extents. [4]: http://git.kernel.org/cgit/linux/kernel/git/\ stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d commit a5e3985fa014029eb6795664c704953720cc7f7d Author: Roman Zippel <zippel@linux-m68k.org> Date: Tue Sep 6 15:18:47 2005 -0700 [PATCH] hfs: remove debug code Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Sougata Santra <sougata@tuxera.com> --- fs/hfs/bnode.c | 9 ++++----- fs/hfsplus/bnode.c | 3 --- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index d3fa6bd..221719e 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } - page_cache_release(page); node->page[i] = page; } @@ -398,11 +397,11 @@ node_error: void hfs_bnode_free(struct hfs_bnode *node) { - //int i; + int i; - //for (i = 0; i < node->tree->pages_per_bnode; i++) - // if (node->page[i]) - // page_cache_release(node->page[i]); + for (i = 0; i < node->tree->pages_per_bnode; i++) + if (node->page[i]) + page_cache_release(node->page[i]); kfree(node); } diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 759708f..6392466 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } - page_cache_release(page); node->page[i] = page; } @@ -566,13 +565,11 @@ node_error: void hfs_bnode_free(struct hfs_bnode *node) { -#if 0 int i; for (i = 0; i < node->tree->pages_per_bnode; i++) if (node->page[i]) page_cache_release(node->page[i]); -#endif kfree(node); } -- 2.4.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <CABbL6oanheE9JAwevN8Mrf-5o=y-e7JV2Nt4VW_-iW9Pt3jQuQ@mail.gmail.com>]
[parent not found: <CABbL6oZ2vQsU9xG+vh_GkmfHHdcZvSYPcZYjUpnZq-r3b_HjPQ@mail.gmail.com>]
* Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free [not found] ` <CABbL6oZ2vQsU9xG+vh_GkmfHHdcZvSYPcZYjUpnZq-r3b_HjPQ@mail.gmail.com> @ 2015-07-01 23:29 ` Hin-Tak Leung 0 siblings, 0 replies; 16+ messages in thread From: Hin-Tak Leung @ 2015-07-01 23:29 UTC (permalink / raw) To: Michael Fox, linux-fsdevel Hopefully this should go through. I am posting from my gmail account (which I rarely use) as vger.kernel.org (not just linux-fsdevel@, but also git@) is bouncing my @users.sf.net account also. On 30 June 2015 at 17:59, Michael Fox <415fox@gmail.com> wrote: > By the way, linux-fsdevel@vger.kernel.org, bounced my response. > > On Tue, Jun 30, 2015 at 9:56 AM, Michael Fox <415fox@gmail.com> wrote: >> >> Hi Hin-Tak. I gave up on dual-booting many hard drives ago so I have >> nowhere to test your patch. Thank you for your work. Your e-mail is >> especially thorough and professional. I hope your patch gets accepted soon. >> >> On Tue, Jun 30, 2015 at 8:55 AM, Hin-Tak Leung <hintak.leung@gmail.com> >> wrote: >>> >>> Michael Fox: FYI. If you feel like responding (and possibly add a >>> "Tested-By:"), >>> please include liux-fs-devel linux-fsdevel in the reply. >>> >>> >>> ---------- Forwarded message ---------- >>> From: Hin-Tak Leung <hintak.leung@gmail.com> >>> Date: 28 June 2015 at 02:39 >>> Subject: [PATCH v2] hfs,hfsplus: cache pages correctly between >>> bnode_create and bnode_free >>> To: linux-fsdevel@vger.kernel.org >>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>, Sergei Antonov >>> <saproj@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>, Christoph >>> Hellwig <hch@infradead.org>, Andrew Morton >>> <akpm@linux-foundation.org>, Vyacheslav Dubeyko <slava@dubeyko.com>, >>> Sougata Santra <sougata@tuxera.com> >>> >>> >>> From: Hin-Tak Leung <htl10@users.sourceforge.net> >>> >>> Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and >>> hfs_bnode_find() for finding or creating pages corresponding to an inode) >>> are >>> immediately kmap()'ed and used (both read and write) and kunmap()'ed, and >>> should not be page_cache_release()'ed until hfs_bnode_free(). >>> >>> This patch fixes a problem I first saw in July 2012: merely running "du" >>> on a >>> large hfsplus-mounted directory a few times on a reasonably loaded system >>> would >>> get the hfsplus driver all confused and complaining about B-tree >>> inconsistencies, and generates a "BUG: Bad page state". Most recently, I >>> can >>> generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, >>> by >>> running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du >>> /mnt" >>> simultaneously on two windows, where /mnt is a lightly-used QEMU VM image >>> of >>> the full Mac OS X 10.9: >>> >>> $ df -i / /home /mnt >>> Filesystem Inodes IUsed IFree IUse% Mounted on >>> /dev/mapper/fedora-root 3276800 551665 2725135 17% / >>> /dev/mapper/fedora-home 52879360 716221 52163139 2% /home >>> /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt >>> >>> After applying the patch, I was able to run "du /" (60+ times) and >>> "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. >>> >>> There are many reports of the hfsplus driver getting confused under load >>> and >>> generating "BUG: Bad page state" or other similar issues over the years. >>> [1] >>> >>> The unpatched code [2] has always been wrong since it entered the kernel >>> tree. >>> The only reason why it gets away with it is that the kmap/memcpy/kunmap >>> follow >>> very quickly after the page_cache_release() so the kernel has not had a >>> chance >>> to reuse the memory for something else, most of the time. >>> >>> The current RW driver appears to have followed the design and development >>> of >>> the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) >>> had a >>> B-tree node-centric approach to read_cache_page()/page_cache_release() >>> per >>> bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of >>> caching >>> and releasing pages per inode extents. When the current RW code first >>> entered >>> the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" >>> commented >>> out code) to switch between B-node centric paging to inode-centric >>> paging. >>> There was a mistake with the direction of one of the REF_PAGES >>> conditionals in >>> __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the >>> read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were >>> removed, but a page_cache_release() was mistakenly left in (propagating >>> the >>> "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out >>> page_cache_release() >>> in bnode_release() (which should be spanned by !REF_PAGES) was never >>> enabled. >>> >>> References: >>> [1]: >>> Michael Fox, Apr 2013 >>> http://www.spinics.net/lists/linux-fsdevel/msg63807.html >>> ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") >>> >>> Sasha Levin, Feb 2015 >>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>> >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 >>> https://bugzilla.kernel.org/show_bug.cgi?id=42342 >>> https://bugzilla.kernel.org/show_bug.cgi?id=63841 >>> https://bugzilla.kernel.org/show_bug.cgi?id=78761 >>> >>> [2]: >>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ >>> fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db >>> commit d1081202f1d0ee35ab0beb490da4b65d4bc763db >>> Author: Andrew Morton <akpm@osdl.org> >>> Date: Wed Feb 25 16:17:36 2004 -0800 >>> >>> [PATCH] HFS rewrite >>> >>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ >>> fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd >>> >>> commit 91556682e0bf004d98a529bf829d339abb98bbbd >>> Author: Andrew Morton <akpm@osdl.org> >>> Date: Wed Feb 25 16:17:48 2004 -0800 >>> >>> [PATCH] HFS+ support >>> >>> [3]: >>> http://sourceforge.net/projects/linux-hfsplus/ >>> >>> >>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ >>> >>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>> >>> http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ >>> fs/hfsplus/bnode.c?r1=1.4&r2=1.5 >>> >>> Date: Thu Jun 6 09:45:14 2002 +0000 >>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>> >>> [4]: >>> http://git.kernel.org/cgit/linux/kernel/git/\ >>> >>> stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d >>> >>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>> Author: Roman Zippel <zippel@linux-m68k.org> >>> Date: Tue Sep 6 15:18:47 2005 -0700 >>> >>> [PATCH] hfs: remove debug code >>> >>> Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> >>> Signed-off-by: Sergei Antonov <saproj@gmail.com> >>> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> >>> Reported-by: Sasha Levin <sasha.levin@oracle.com> >>> Cc: Al Viro <viro@zeniv.linux.org.uk> >>> Cc: Christoph Hellwig <hch@infradead.org> >>> Cc: Andrew Morton <akpm@linux-foundation.org> >>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> >>> Cc: Sougata Santra <sougata@tuxera.com> >>> --- >>> fs/hfs/bnode.c | 9 ++++----- >>> fs/hfsplus/bnode.c | 3 --- >>> 2 files changed, 4 insertions(+), 8 deletions(-) >>> >>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>> index d3fa6bd..221719e 100644 >>> --- a/fs/hfs/bnode.c >>> +++ b/fs/hfs/bnode.c >>> @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> - page_cache_release(page); >>> node->page[i] = page; >>> } >>> >>> @@ -398,11 +397,11 @@ node_error: >>> >>> void hfs_bnode_free(struct hfs_bnode *node) >>> { >>> - //int i; >>> + int i; >>> >>> - //for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - // if (node->page[i]) >>> - // page_cache_release(node->page[i]); >>> + for (i = 0; i < node->tree->pages_per_bnode; i++) >>> + if (node->page[i]) >>> + page_cache_release(node->page[i]); >>> kfree(node); >>> } >>> >>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>> index 759708f..6392466 100644 >>> --- a/fs/hfsplus/bnode.c >>> +++ b/fs/hfsplus/bnode.c >>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> - page_cache_release(page); >>> node->page[i] = page; >>> } >>> >>> @@ -566,13 +565,11 @@ node_error: >>> >>> void hfs_bnode_free(struct hfs_bnode *node) >>> { >>> -#if 0 >>> int i; >>> >>> for (i = 0; i < node->tree->pages_per_bnode; i++) >>> if (node->page[i]) >>> page_cache_release(node->page[i]); >>> -#endif >>> kfree(node); >>> } >>> >>> -- >>> 2.4.3 >> >> >> >> >> -- >> >> - >> Michael > > > > > -- > > - > Michael ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] hfs: fix B-tree corruption after insertion at position 0 2015-06-28 1:39 [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus Hin-Tak Leung 2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung @ 2015-06-28 1:39 ` Hin-Tak Leung 1 sibling, 0 replies; 16+ messages in thread From: Hin-Tak Leung @ 2015-06-28 1:39 UTC (permalink / raw) To: linux-fsdevel Cc: Hin-Tak Leung, Sergei Antonov, Joe Perches, Al Viro, Christoph Hellwig, Andrew Morton From: Hin-Tak Leung <htl10@users.sourceforge.net> Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). This is an identical change to the corresponding hfs b-tree code to Sergei Antonov's "hfsplus: fix B-tree corruption after insertion at position 0", to keep similar code paths in the hfs and hfsplus drivers in sync, where appropriate. Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Sergei Antonov <saproj@gmail.com> Cc: Joe Perches <joe@perches.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Anton Altaparmakov <anton@tuxera.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> --- fs/hfs/brec.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c index 9f4ee7f..6fc766d 100644 --- a/fs/hfs/brec.c +++ b/fs/hfs/brec.c @@ -131,13 +131,16 @@ skip: hfs_bnode_write(node, entry, data_off + key_len, entry_len); hfs_bnode_dump(node); - if (new_node) { - /* update parent key if we inserted a key - * at the start of the first node - */ - if (!rec && new_node != node) - hfs_brec_update_parent(fd); + /* + * update parent key if we inserted a key + * at the start of the node and it is not the new node + */ + if (!rec && new_node != node) { + hfs_bnode_read_key(node, fd->search_key, data_off + size); + hfs_brec_update_parent(fd); + } + if (new_node) { hfs_bnode_put(fd->bnode); if (!new_node->parent) { hfs_btree_inc_height(tree); @@ -166,9 +169,6 @@ skip: goto again; } - if (!rec) - hfs_brec_update_parent(fd); - return 0; } @@ -366,6 +366,8 @@ again: if (IS_ERR(parent)) return PTR_ERR(parent); __hfs_brec_find(parent, fd); + if (fd->record < 0) + return -ENOENT; hfs_bnode_dump(parent); rec = fd->record; -- 2.4.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-08-03 13:34 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-28 1:39 [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus Hin-Tak Leung 2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung 2015-06-28 18:52 ` Sergei Antonov 2015-06-30 15:40 ` Hin-Tak Leung 2015-07-01 16:09 ` Sergei Antonov 2015-07-01 23:24 ` Hin-Tak Leung 2015-07-02 17:04 ` Sergei Antonov 2015-07-02 18:02 ` Hin-Tak Leung 2015-07-07 22:19 ` Andrew Morton 2015-07-07 23:19 ` Sergei Antonov 2015-07-08 14:58 ` Sergei Antonov 2015-07-08 15:29 ` Hin-Tak Leung 2015-08-03 13:33 ` Luc Pionchon 2015-06-30 15:55 ` Fwd: " Hin-Tak Leung [not found] ` <CABbL6oanheE9JAwevN8Mrf-5o=y-e7JV2Nt4VW_-iW9Pt3jQuQ@mail.gmail.com> [not found] ` <CABbL6oZ2vQsU9xG+vh_GkmfHHdcZvSYPcZYjUpnZq-r3b_HjPQ@mail.gmail.com> 2015-07-01 23:29 ` Hin-Tak Leung 2015-06-28 1:39 ` [PATCH] hfs: fix B-tree corruption after insertion at position 0 Hin-Tak Leung
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.