Re: [PATCH] hfsplus: release bnode pages after use, not before

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] hfsplus: release bnode pages after use, not before
       [not found] <1433781680.24509.YahooMailBasic@web172301.mail.ir2.yahoo.com>
@ 2015-06-08 16:47 ` Sergei Antonov
  0 siblings, 0 replies; 26+ messages in thread
From: Sergei Antonov @ 2015-06-08 16:47 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Vyacheslav Dubeyko, linux-fsdevel, Sasha Levin,
	Anton Altaparmakov, Al Viro, Christoph Hellwig, Andrew Morton,
	Sougata Santra

On 8 June 2015 at 18:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> ------------------------------
> On Mon, Jun 8, 2015 4:45 PM BST Vyacheslav Dubeyko wrote:
>
>>On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
>> Fix this bugreport by Sasha Levin:
>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>
>>
>>Sorry, I missed the point. What do you try to fix? How this change fixes
>>the issue?
>>
>>I think that maybe this fix makes sense. But it needs to describe it
>>more deeply. Could you describe the fix with more details?
>>
>>Thanks,
>>Vyacheslav Dubeyko.
>>
>>
>
> About this change, I also have two comments to make, one
> stylistic, one logical - you basically removed the ''#if 0",
> so I'd like to know how the "#if 0" came to be in the first place -
> I just haven't got the time to look through 'git blame' yet.
> Dead code should be removed, and that wasn't removed probably
> because it was too old i.e. already there for years.

"#if 0" has been there since the code was submitted to the kernel.

> Also there is no need to add those braces {}. I believe
> the kernel coding guideline explicitly stated not to,
> for one-statement bodies in loops?

Only for if statements (if I understand the coding style right).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 22:16             ` Hin-Tak Leung
@ 2015-06-19  1:30               ` Sergei Antonov
  0 siblings, 0 replies; 26+ messages in thread
From: Sergei Antonov @ 2015-06-19  1:30 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 19 June 2015 at 00:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>>>> Hi Andrew and everybody else,
>>>>>>>
>>>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>>>> debug code"
>>>>>>> commit below did not remove the right stuff.
>>>>>>>
>>>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>>>> need to release on bnode_free),
>>>>>>> then the code was modified for efficiency so that pages are not released
>>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>>>
>>>>>>> I think the release at bnode_free was commented out because the person who
>>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>>>> other/more places that one might need to release the pages than just
>>>>>>> at bnode_free().
>>>>>>>
>>>>>>> Does this train of thought make sense?
>>>>>>
>>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>>>> at first it was my thought too. But later I considered what the logic
>>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>>>> !REF_PAGES was what he really meant.
>>>>>>
>>>>>
>>>>> You are wrong, it seems. I found even earlier versions of the code,
>>>>> with history, at
>>>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>>>> The earlier version of the code
>>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>>> )
>>>>> have those removed, dated 2001 december and 2002 june.
>>>>
>>>> Interesting. I only looked at historical linux git.
>>>>
>>>>> The change log says so:
>>>>>
>>>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>>>
>>>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>>>
>>>>> The change also removed any page_cache_release() . It would seem that
>>>>> the additional REF_PAGES
>>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>>>> development logic, and eventually took over.
>>>>>
>>>>> The release-early code came first, not after. That makes sense, as
>>>>> releasing early
>>>>
>>>> We may be talking about different things. I did not write what code
>>>> came earlier and what after. Anyway, this becomes to difficult to
>>>> follow. My point was that in the variant without refs, calling
>>>> page_cache_release() right after to read_cache_page() was exactly what
>>>> the author wanted (although it is a bug).
>>>>
>>>> Again, we are probably arguing about different things. And I'm a
>>>> little less zealous than you to do this code archeology.
>>>>
>>>
>>> I would really prefer that you don't use personal descriptions such as "zealous"
>>> as well as making the kind of comments such as
>>> "The result of the test (which I'm sure will show the bug is gone) will
>>> be by far more valuable contribution than your other messages in the
>>> thread.". Those are not helpful.
>>>
>>> In my first reply I have already stated my reasoning: your fix consists
>>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
>>> dead code was written for a purpose which is now lost and forgotten.
>>
>> Of course. And it is not hard to guess that purpose quite reliably :).
>> The code used to release pages correctly, i. e. when freeing the bnode
>> structure.
>>
>>> And here is that purpose - early development of the code was done
>>> with getting and putting pages within each routine, for safeness, and migrating
>>> towards caching pages for a longer life cycle in 2001-2002. When the
>>> code was cleaned
>>> up and re-worked before putting into the kernel in 2005, this
>>> development process
>>> was repeated but with a REF_PAGES conditional and some additional code
>>> which was // commented out.
>>>
>>> The migration to caching pages
>>> would have been more or less correct, if REF_PAGES was flipped
>>> and the // commented out code was enabled.  Except there was a mistake
>>> in one of the conditionals
>>
>> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
>> a mistake under the conditions you provided above. With REF_PAGES
>> flipped to 1 and the // piece uncommented you don't want to release in
>> __hfs_bnode_create() because releasing is done in hfs_bnode_free().
>>
>
> Yes, that's what I am saying. I believe the development of the code happens
> as follows:
>
> - bnode_get() does get_page(), bnode_put() does put_page(), and the rest
> of the routines - i.e. bnode_create() look up the pages but
> immediately release them. bnode_free()
> does nothing about pages.
>
> - then it was desired to keep the pages longer without the repeated
> get_pages and put_pages.
> so those are "#if REF_PAGES" out. And we no longer get and put pages on demand
> and should also stop releasing them after looking up at bnode_create() also.
> (the bnode_create() is a poor name for what it does, but never mind). New code
> is added to release at bnode_free(). The new code should have been spanned
> by the opposite, i.e. "#if !REF_PAGES", but was inserted as place
> holder with //.
>
> The rest of the story I have already tried to explain a couple of
> times. The switch
> from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per
> bnode_get/bnode_put routines,
> to keeping pages around longer between bnode_create/bnode_free) did
> not quite work correctly.
>
> For the others who want to see the version of the code with those REF_PAGES
> and want to weigh in on this,
> I cannot find a web interface for the repo, but it is the commit below
> in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

A web link: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/log/fs/hfsplus/bnode.c

> commit 91556682e0bf004d98a529bf829d339abb98bbbd
> Author: Andrew Morton <akpm@osdl.org>
> Date:   Wed Feb 25 16:17:48 2004 -0800
>
>     [PATCH] HFS+ support
>
>     From: Roman Zippel <zippel@linux-m68k.org>
>
>     This driver adds full read/write support for HFS+ and is based on the
>     readonly driver by Brad Broyer.
>
>     Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to
>     make the driver more compliant with the spec.
>
> I would also repeat that earlier code and change history was at
> http://sourceforge.net/projects/linux-hfsplus/, and it showed the
> changes between
> 2001 dec to 2002 june
>
> from get_page/put_page per bnode_get/bnode_put to removing those and
> switching to a longer page life-cycle with the log message:
>
>  "Use buffer cache instead of page cache in bnode.c. Cache inode extents."

Alright. I did not correctly understand your original idea and write
some irrelevant stuff. Maybe you were right. However, it is only a
question of historical interest.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 20:51           ` Sergei Antonov
@ 2015-06-18 22:16             ` Hin-Tak Leung
  2015-06-19  1:30               ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-18 22:16 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>>> Hi Andrew and everybody else,
>>>>>>
>>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>>> debug code"
>>>>>> commit below did not remove the right stuff.
>>>>>>
>>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>>> need to release on bnode_free),
>>>>>> then the code was modified for efficiency so that pages are not released
>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>>
>>>>>> I think the release at bnode_free was commented out because the person who
>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>>> other/more places that one might need to release the pages than just
>>>>>> at bnode_free().
>>>>>>
>>>>>> Does this train of thought make sense?
>>>>>
>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>>> at first it was my thought too. But later I considered what the logic
>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>>> !REF_PAGES was what he really meant.
>>>>>
>>>>
>>>> You are wrong, it seems. I found even earlier versions of the code,
>>>> with history, at
>>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>>> The earlier version of the code
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>> )
>>>> have those removed, dated 2001 december and 2002 june.
>>>
>>> Interesting. I only looked at historical linux git.
>>>
>>>> The change log says so:
>>>>
>>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>>
>>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>>
>>>> The change also removed any page_cache_release() . It would seem that
>>>> the additional REF_PAGES
>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>>> development logic, and eventually took over.
>>>>
>>>> The release-early code came first, not after. That makes sense, as
>>>> releasing early
>>>
>>> We may be talking about different things. I did not write what code
>>> came earlier and what after. Anyway, this becomes to difficult to
>>> follow. My point was that in the variant without refs, calling
>>> page_cache_release() right after to read_cache_page() was exactly what
>>> the author wanted (although it is a bug).
>>>
>>> Again, we are probably arguing about different things. And I'm a
>>> little less zealous than you to do this code archeology.
>>>
>>
>> I would really prefer that you don't use personal descriptions such as "zealous"
>> as well as making the kind of comments such as
>> "The result of the test (which I'm sure will show the bug is gone) will
>> be by far more valuable contribution than your other messages in the
>> thread.". Those are not helpful.
>>
>> In my first reply I have already stated my reasoning: your fix consists
>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
>> dead code was written for a purpose which is now lost and forgotten.
>
> Of course. And it is not hard to guess that purpose quite reliably :).
> The code used to release pages correctly, i. e. when freeing the bnode
> structure.
>
>> And here is that purpose - early development of the code was done
>> with getting and putting pages within each routine, for safeness, and migrating
>> towards caching pages for a longer life cycle in 2001-2002. When the
>> code was cleaned
>> up and re-worked before putting into the kernel in 2005, this
>> development process
>> was repeated but with a REF_PAGES conditional and some additional code
>> which was // commented out.
>>
>> The migration to caching pages
>> would have been more or less correct, if REF_PAGES was flipped
>> and the // commented out code was enabled.  Except there was a mistake
>> in one of the conditionals
>
> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
> a mistake under the conditions you provided above. With REF_PAGES
> flipped to 1 and the // piece uncommented you don't want to release in
> __hfs_bnode_create() because releasing is done in hfs_bnode_free().
>

Yes, that's what I am saying. I believe the development of the code happens
as follows:

- bnode_get() does get_page(), bnode_put() does put_page(), and the rest
of the routines - i.e. bnode_create() look up the pages but
immediately release them. bnode_free()
does nothing about pages.

- then it was desired to keep the pages longer without the repeated
get_pages and put_pages.
so those are "#if REF_PAGES" out. And we no longer get and put pages on demand
and should also stop releasing them after looking up at bnode_create() also.
(the bnode_create() is a poor name for what it does, but never mind). New code
is added to release at bnode_free(). The new code should have been spanned
by the opposite, i.e. "#if !REF_PAGES", but was inserted as place
holder with //.

The rest of the story I have already tried to explain a couple of
times. The switch
from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per
bnode_get/bnode_put routines,
to keeping pages around longer between bnode_create/bnode_free) did
not quite work correctly.

For the others who want to see the version of the code with those REF_PAGES
and want to weigh in on this,
I cannot find a web interface for the repo, but it is the commit below
in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

    From: Roman Zippel <zippel@linux-m68k.org>

    This driver adds full read/write support for HFS+ and is based on the
    readonly driver by Brad Broyer.

    Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to
    make the driver more compliant with the spec.

I would also repeat that earlier code and change history was at
http://sourceforge.net/projects/linux-hfsplus/, and it showed the
changes between
2001 dec to 2002 june

from get_page/put_page per bnode_get/bnode_put to removing those and
switching to a longer page life-cycle with the log message:

 "Use buffer cache instead of page cache in bnode.c. Cache inode extents."


>>, and also the commented out code never got
>> enabled. The commented out code was where a !REF_PAGES supposed to be.
>> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
>> was a mistake that has propagated.
>>
>>
>>>> and read_cache_page on each bnode_get is the slower, safer but
>>>> inefficient approach.
>>>> It would be absurd to go from caching pages, then to 'experiment' to
>>>> see if release-early + later read_cache_page
>>>> on each bnode_get 'improves'.
>>>>
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>> <commit>
>>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>>
>>>>>>     [PATCH] hfs: remove debug code
>>>>>>
>>>>>>     This removes some old debug code, which is no longer needed.
>>>>>>
>>>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>>>
>>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>>> index a096c5a..3d5cdc6 100644
>>>>>> --- a/fs/hfs/bnode.c
>>>>>> +++ b/fs/hfs/bnode.c
>>>>>> @@ -13,8 +13,6 @@
>>>>>>
>>>>>>  #include "btree.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>>          int off, int len)
>>>>>>  {
>>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 8868d3b..b85abc6 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -18,8 +18,6 @@
>>>>>>  #include "hfsplus_fs.h"
>>>>>>  #include "hfsplus_raw.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>>  {
>>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> </commit>
>>>>>>
>>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>>>> ------------------------------
>>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>>
>>>>>>>>Hi Andrew,
>>>>>>>>
>>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>>
>>>>>>>>Feel free to add:
>>>>>>>>
>>>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>>>
>>>>>>>>Thanks a lot in advance!
>>>>>>>>
>>>>>>>>Best regards,
>>>>>>>>
>>>>>>>>    Anton
>>>>>>>
>>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>>
>>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>>
>>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>>
>>>>>>> Hin-Tak
>>>>>>>
>>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>>
>>>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>>>> ---
>>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>>             page_cache_release(page);
>>>>>>>>>             goto fail;
>>>>>>>>>         }
>>>>>>>>> -        page_cache_release(page);
>>>>>>>>>         node->page[i] = page;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>>
>>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>>> {
>>>>>>>>> -#if 0
>>>>>>>>>     int i;
>>>>>>>>>
>>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>>         if (node->page[i])
>>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>>> -#endif
>>>>>>>>> +    }
>>>>>>>>>     kfree(node);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.3.0
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>--
>>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>>Linux NTFS maintainer
>>>>>>>>
>>>>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 17:16         ` Hin-Tak Leung
@ 2015-06-18 20:51           ` Sergei Antonov
  2015-06-18 22:16             ` Hin-Tak Leung
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-18 20:51 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>> Hi Andrew and everybody else,
>>>>>
>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>> debug code"
>>>>> commit below did not remove the right stuff.
>>>>>
>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>> need to release on bnode_free),
>>>>> then the code was modified for efficiency so that pages are not released
>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>
>>>>> I think the release at bnode_free was commented out because the person who
>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>> other/more places that one might need to release the pages than just
>>>>> at bnode_free().
>>>>>
>>>>> Does this train of thought make sense?
>>>>
>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>> at first it was my thought too. But later I considered what the logic
>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>> !REF_PAGES was what he really meant.
>>>>
>>>
>>> You are wrong, it seems. I found even earlier versions of the code,
>>> with history, at
>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>> The earlier version of the code
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>> )
>>> have those removed, dated 2001 december and 2002 june.
>>
>> Interesting. I only looked at historical linux git.
>>
>>> The change log says so:
>>>
>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>
>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>
>>> The change also removed any page_cache_release() . It would seem that
>>> the additional REF_PAGES
>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>> development logic, and eventually took over.
>>>
>>> The release-early code came first, not after. That makes sense, as
>>> releasing early
>>
>> We may be talking about different things. I did not write what code
>> came earlier and what after. Anyway, this becomes to difficult to
>> follow. My point was that in the variant without refs, calling
>> page_cache_release() right after to read_cache_page() was exactly what
>> the author wanted (although it is a bug).
>>
>> Again, we are probably arguing about different things. And I'm a
>> little less zealous than you to do this code archeology.
>>
>
> I would really prefer that you don't use personal descriptions such as "zealous"
> as well as making the kind of comments such as
> "The result of the test (which I'm sure will show the bug is gone) will
> be by far more valuable contribution than your other messages in the
> thread.". Those are not helpful.
>
> In my first reply I have already stated my reasoning: your fix consists
> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
> dead code was written for a purpose which is now lost and forgotten.

Of course. And it is not hard to guess that purpose quite reliably :).
The code used to release pages correctly, i. e. when freeing the bnode
structure.

> And here is that purpose - early development of the code was done
> with getting and putting pages within each routine, for safeness, and migrating
> towards caching pages for a longer life cycle in 2001-2002. When the
> code was cleaned
> up and re-worked before putting into the kernel in 2005, this
> development process
> was repeated but with a REF_PAGES conditional and some additional code
> which was // commented out.
>
> The migration to caching pages
> would have been more or less correct, if REF_PAGES was flipped
> and the // commented out code was enabled.  Except there was a mistake
> in one of the conditionals

By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
a mistake under the conditions you provided above. With REF_PAGES
flipped to 1 and the // piece uncommented you don't want to release in
__hfs_bnode_create() because releasing is done in hfs_bnode_free().

>, and also the commented out code never got
> enabled. The commented out code was where a !REF_PAGES supposed to be.
> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
> was a mistake that has propagated.
>
>
>>> and read_cache_page on each bnode_get is the slower, safer but
>>> inefficient approach.
>>> It would be absurd to go from caching pages, then to 'experiment' to
>>> see if release-early + later read_cache_page
>>> on each bnode_get 'improves'.
>>>
>>>>>
>>>>> Hin-Tak
>>>>>
>>>>> <commit>
>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>
>>>>>     [PATCH] hfs: remove debug code
>>>>>
>>>>>     This removes some old debug code, which is no longer needed.
>>>>>
>>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>>
>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>> index a096c5a..3d5cdc6 100644
>>>>> --- a/fs/hfs/bnode.c
>>>>> +++ b/fs/hfs/bnode.c
>>>>> @@ -13,8 +13,6 @@
>>>>>
>>>>>  #include "btree.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>          int off, int len)
>>>>>  {
>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>> index 8868d3b..b85abc6 100644
>>>>> --- a/fs/hfsplus/bnode.c
>>>>> +++ b/fs/hfsplus/bnode.c
>>>>> @@ -18,8 +18,6 @@
>>>>>  #include "hfsplus_fs.h"
>>>>>  #include "hfsplus_raw.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>  {
>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> </commit>
>>>>>
>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>>> ------------------------------
>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>
>>>>>>>Hi Andrew,
>>>>>>>
>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>
>>>>>>>Feel free to add:
>>>>>>>
>>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>>
>>>>>>>Thanks a lot in advance!
>>>>>>>
>>>>>>>Best regards,
>>>>>>>
>>>>>>>    Anton
>>>>>>
>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>
>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>
>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>
>>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>>> ---
>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>             page_cache_release(page);
>>>>>>>>             goto fail;
>>>>>>>>         }
>>>>>>>> -        page_cache_release(page);
>>>>>>>>         node->page[i] = page;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>
>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>> {
>>>>>>>> -#if 0
>>>>>>>>     int i;
>>>>>>>>
>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>         if (node->page[i])
>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>> -#endif
>>>>>>>> +    }
>>>>>>>>     kfree(node);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.3.0
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>--
>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>Linux NTFS maintainer
>>>>>>>
>>>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 16:19       ` Sergei Antonov
@ 2015-06-18 17:16         ` Hin-Tak Leung
  2015-06-18 20:51           ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-18 17:16 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>> Hi Andrew and everybody else,
>>>>
>>>> I looked through the pre-git history and seem to have found the reason of how
>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>> debug code"
>>>> commit below did not remove the right stuff.
>>>>
>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>> need to release on bnode_free),
>>>> then the code was modified for efficiency so that pages are not released
>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>
>>>> I think the release at bnode_free was commented out because the person who
>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>> other/more places that one might need to release the pages than just
>>>> at bnode_free().
>>>>
>>>> Does this train of thought make sense?
>>>
>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>> at first it was my thought too. But later I considered what the logic
>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>> !REF_PAGES was what he really meant.
>>>
>>
>> You are wrong, it seems. I found even earlier versions of the code,
>> with history, at
>> http://sourceforge.net/projects/linux-hfsplus/ .
>> The earlier version of the code
>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>> does get_page and put_page at bnode_get and bnode_put, while the later version
>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>> )
>> have those removed, dated 2001 december and 2002 june.
>
> Interesting. I only looked at historical linux git.
>
>> The change log says so:
>>
>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>
>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>
>> The change also removed any page_cache_release() . It would seem that
>> the additional REF_PAGES
>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>> development logic, and eventually took over.
>>
>> The release-early code came first, not after. That makes sense, as
>> releasing early
>
> We may be talking about different things. I did not write what code
> came earlier and what after. Anyway, this becomes to difficult to
> follow. My point was that in the variant without refs, calling
> page_cache_release() right after to read_cache_page() was exactly what
> the author wanted (although it is a bug).
>
> Again, we are probably arguing about different things. And I'm a
> little less zealous than you to do this code archeology.
>

I would really prefer that you don't use personal descriptions such as "zealous"
as well as making the kind of comments such as
"The result of the test (which I'm sure will show the bug is gone) will
be by far more valuable contribution than your other messages in the
thread.". Those are not helpful.

In my first reply I have already stated my reasoning: your fix consists
mainly of removing an "if 0" and re-enabling seemingly dead code. So the
dead code was written for a purpose which is now lost and forgotten.

And here is that purpose - early development of the code was done
with getting and putting pages within each routine, for safeness, and migrating
towards caching pages for a longer life cycle in 2001-2002. When the
code was cleaned
up and re-worked before putting into the kernel in 2005, this
development process
was repeated but with a REF_PAGES conditional and some additional code
which was // commented out.

The migration to caching pages
would have been more or less correct, if REF_PAGES was flipped
and the // commented out code was enabled. Except there was a mistake
in one of the conditionals, and also the commented out code never got
enabled. The commented out code was where a !REF_PAGES supposed to be.
the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
was a mistake that has propagated.


>> and read_cache_page on each bnode_get is the slower, safer but
>> inefficient approach.
>> It would be absurd to go from caching pages, then to 'experiment' to
>> see if release-early + later read_cache_page
>> on each bnode_get 'improves'.
>>
>>>>
>>>> Hin-Tak
>>>>
>>>> <commit>
>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>
>>>>     [PATCH] hfs: remove debug code
>>>>
>>>>     This removes some old debug code, which is no longer needed.
>>>>
>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>
>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>> index a096c5a..3d5cdc6 100644
>>>> --- a/fs/hfs/bnode.c
>>>> +++ b/fs/hfs/bnode.c
>>>> @@ -13,8 +13,6 @@
>>>>
>>>>  #include "btree.h"
>>>>
>>>> -#define REF_PAGES    0
>>>> -
>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>          int off, int len)
>>>>  {
>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>> hfs_btree *tree, u32 cnid)
>>>>              page_cache_release(page);
>>>>              goto fail;
>>>>          }
>>>> -#if !REF_PAGES
>>>>          page_cache_release(page);
>>>> -#endif
>>>>          node->page[i] = page;
>>>>      }
>>>>
>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>  {
>>>>      if (node) {
>>>>          atomic_inc(&node->refcnt);
>>>> -#if REF_PAGES
>>>> -        {
>>>> -        int i;
>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> -            get_page(node->page[i]);
>>>> -        }
>>>> -#endif
>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>      }
>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>          if (!atomic_read(&node->refcnt))
>>>>              BUG();
>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>> -#if REF_PAGES
>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>> -                put_page(node->page[i]);
>>>> -#endif
>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>              return;
>>>> -        }
>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>              if (!node->page[i])
>>>>                  continue;
>>>>              mark_page_accessed(node->page[i]);
>>>> -#if REF_PAGES
>>>> -            put_page(node->page[i]);
>>>> -#endif
>>>>          }
>>>>
>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>> index 8868d3b..b85abc6 100644
>>>> --- a/fs/hfsplus/bnode.c
>>>> +++ b/fs/hfsplus/bnode.c
>>>> @@ -18,8 +18,6 @@
>>>>  #include "hfsplus_fs.h"
>>>>  #include "hfsplus_raw.h"
>>>>
>>>> -#define REF_PAGES    0
>>>> -
>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>  {
>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>> hfs_btree *tree, u32 cnid)
>>>>              page_cache_release(page);
>>>>              goto fail;
>>>>          }
>>>> -#if !REF_PAGES
>>>>          page_cache_release(page);
>>>> -#endif
>>>>          node->page[i] = page;
>>>>      }
>>>>
>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>  {
>>>>      if (node) {
>>>>          atomic_inc(&node->refcnt);
>>>> -#if REF_PAGES
>>>> -        {
>>>> -        int i;
>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> -            get_page(node->page[i]);
>>>> -        }
>>>> -#endif
>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>      }
>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>          if (!atomic_read(&node->refcnt))
>>>>              BUG();
>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>> -#if REF_PAGES
>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>> -                put_page(node->page[i]);
>>>> -#endif
>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>              return;
>>>> -        }
>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>              if (!node->page[i])
>>>>                  continue;
>>>>              mark_page_accessed(node->page[i]);
>>>> -#if REF_PAGES
>>>> -            put_page(node->page[i]);
>>>> -#endif
>>>>          }
>>>>
>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>> </commit>
>>>>
>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>> ------------------------------
>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>
>>>>>>Hi Andrew,
>>>>>>
>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>
>>>>>>Feel free to add:
>>>>>>
>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>
>>>>>>Thanks a lot in advance!
>>>>>>
>>>>>>Best regards,
>>>>>>
>>>>>>    Anton
>>>>>
>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>
>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>
>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>
>>>>> Hin-Tak
>>>>>
>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>
>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>
>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>> ---
>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>> index 759708f..5af50fb 100644
>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>             page_cache_release(page);
>>>>>>>             goto fail;
>>>>>>>         }
>>>>>>> -        page_cache_release(page);
>>>>>>>         node->page[i] = page;
>>>>>>>     }
>>>>>>>
>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>
>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>> {
>>>>>>> -#if 0
>>>>>>>     int i;
>>>>>>>
>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>         if (node->page[i])
>>>>>>>             page_cache_release(node->page[i]);
>>>>>>> -#endif
>>>>>>> +    }
>>>>>>>     kfree(node);
>>>>>>> }
>>>>>>>
>>>>>>> --
>>>>>>> 2.3.0
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>--
>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>Linux NTFS maintainer
>>>>>>
>>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 15:37     ` Hin-Tak Leung
@ 2015-06-18 16:19       ` Sergei Antonov
  2015-06-18 17:16         ` Hin-Tak Leung
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-18 16:19 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> Hi Andrew and everybody else,
>>>
>>> I looked through the pre-git history and seem to have found the reason of how
>>> the current code had come to be, and why Sergei's fix seems to involve
>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>> debug code"
>>> commit below did not remove the right stuff.
>>>
>>> Looking at the old code, I think REF_PAGES may have started out being
>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>> need to release on bnode_free),
>>> then the code was modified for efficiency so that pages are not released
>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>
>>> I think the release at bnode_free was commented out because the person who
>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>> other/more places that one might need to release the pages than just
>>> at bnode_free().
>>>
>>> Does this train of thought make sense?
>>
>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>> at first it was my thought too. But later I considered what the logic
>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>> came to a conclusion that REF_PAGES=1 was a safe approach and
>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>> !REF_PAGES was what he really meant.
>>
>
> You are wrong, it seems. I found even earlier versions of the code,
> with history, at
> http://sourceforge.net/projects/linux-hfsplus/ .
> The earlier version of the code
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
> does get_page and put_page at bnode_get and bnode_put, while the later version
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
> )
> have those removed, dated 2001 december and 2002 june.

Interesting. I only looked at historical linux git.

> The change log says so:
>
> Date:   Thu Jun 6 09:45:14 2002 +0000
>
>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>
> The change also removed any page_cache_release() . It would seem that
> the additional REF_PAGES
> conditional was a fork/rewrite/clean-up which re-uses the earlier
> development logic, and eventually took over.
>
> The release-early code came first, not after. That makes sense, as
> releasing early

We may be talking about different things. I did not write what code
came earlier and what after. Anyway, this becomes to difficult to
follow. My point was that in the variant without refs, calling
page_cache_release() right after to read_cache_page() was exactly what
the author wanted (although it is a bug).

Again, we are probably arguing about different things. And I'm a
little less zealous than you to do this code archeology.

> and read_cache_page on each bnode_get is the slower, safer but
> inefficient approach.
> It would be absurd to go from caching pages, then to 'experiment' to
> see if release-early + later read_cache_page
> on each bnode_get 'improves'.
>
>>>
>>> Hin-Tak
>>>
>>> <commit>
>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>
>>>     [PATCH] hfs: remove debug code
>>>
>>>     This removes some old debug code, which is no longer needed.
>>>
>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>
>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>> index a096c5a..3d5cdc6 100644
>>> --- a/fs/hfs/bnode.c
>>> +++ b/fs/hfs/bnode.c
>>> @@ -13,8 +13,6 @@
>>>
>>>  #include "btree.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>          int off, int len)
>>>  {
>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 8868d3b..b85abc6 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -18,8 +18,6 @@
>>>  #include "hfsplus_fs.h"
>>>  #include "hfsplus_raw.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>  {
>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> </commit>
>>>
>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>> ------------------------------
>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>
>>>>>Hi Andrew,
>>>>>
>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>
>>>>>Feel free to add:
>>>>>
>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>
>>>>>Thanks a lot in advance!
>>>>>
>>>>>Best regards,
>>>>>
>>>>>    Anton
>>>>
>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>
>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>
>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>> I'm looking over the pre-git history to see how that comes about.
>>>>
>>>> Hin-Tak
>>>>
>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>
>>>>>> Fix this bugreport by Sasha Levin:
>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>
>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>> ---
>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 759708f..5af50fb 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>             page_cache_release(page);
>>>>>>             goto fail;
>>>>>>         }
>>>>>> -        page_cache_release(page);
>>>>>>         node->page[i] = page;
>>>>>>     }
>>>>>>
>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>
>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>> {
>>>>>> -#if 0
>>>>>>     int i;
>>>>>>
>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>         if (node->page[i])
>>>>>>             page_cache_release(node->page[i]);
>>>>>> -#endif
>>>>>> +    }
>>>>>>     kfree(node);
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.3.0
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>--
>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>Linux NTFS maintainer
>>>>>
>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18 12:58   ` Sergei Antonov
@ 2015-06-18 15:37     ` Hin-Tak Leung
  2015-06-18 16:19       ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-18 15:37 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> Hi Andrew and everybody else,
>>
>> I looked through the pre-git history and seem to have found the reason of how
>> the current code had come to be, and why Sergei's fix seems to involve
>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>> debug code"
>> commit below did not remove the right stuff.
>>
>> Looking at the old code, I think REF_PAGES may have started out being
>> 1 (i.e. pages are released right after create, then get/put when needed, no
>> need to release on bnode_free),
>> then the code was modified for efficiency so that pages are not released
>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>> Then it was flipped over, and seems to work, and things were forgotten.
>>
>> I think the release at bnode_free was commented out because the person who
>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>> other/more places that one might need to release the pages than just
>> at bnode_free().
>>
>> Does this train of thought make sense?
>
> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
> at first it was my thought too. But later I considered what the logic
> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
> came to a conclusion that REF_PAGES=1 was a safe approach and
> REF_PAGES=0 was an experimental unsafe release-early approach, so
> !REF_PAGES was what he really meant.
>

You are wrong, it seems. I found even earlier versions of the code,
with history, at
http://sourceforge.net/projects/linux-hfsplus/ .
The earlier version of the code
(http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
does get_page and put_page at bnode_get and bnode_put, while the later version
(http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
)
have those removed, dated 2001 december and 2002 june.

The change log says so:

Date:   Thu Jun 6 09:45:14 2002 +0000

    Use buffer cache instead of page cache in bnode.c. Cache inode extents.

The change also removed any page_cache_release() . It would seem that
the additional REF_PAGES
conditional was a fork/rewrite/clean-up which re-uses the earlier
development logic, and eventually took over.

The release-early code came first, not after. That makes sense, as
releasing early
and read_cache_page on each bnode_get is the slower, safer but
inefficient approach.
It would be absurd to go from caching pages, then to 'experiment' to
see if release-early + later read_cache_page
on each bnode_get 'improves'.

>>
>> Hin-Tak
>>
>> <commit>
>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>> Author: Roman Zippel <zippel@linux-m68k.org>
>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>
>>     [PATCH] hfs: remove debug code
>>
>>     This removes some old debug code, which is no longer needed.
>>
>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>
>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>> index a096c5a..3d5cdc6 100644
>> --- a/fs/hfs/bnode.c
>> +++ b/fs/hfs/bnode.c
>> @@ -13,8 +13,6 @@
>>
>>  #include "btree.h"
>>
>> -#define REF_PAGES    0
>> -
>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>          int off, int len)
>>  {
>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>> hfs_btree *tree, u32 cnid)
>>              page_cache_release(page);
>>              goto fail;
>>          }
>> -#if !REF_PAGES
>>          page_cache_release(page);
>> -#endif
>>          node->page[i] = page;
>>      }
>>
>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>  {
>>      if (node) {
>>          atomic_inc(&node->refcnt);
>> -#if REF_PAGES
>> -        {
>> -        int i;
>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>> -            get_page(node->page[i]);
>> -        }
>> -#endif
>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>      }
>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>          if (!atomic_read(&node->refcnt))
>>              BUG();
>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>> -#if REF_PAGES
>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>> -                put_page(node->page[i]);
>> -#endif
>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>              return;
>> -        }
>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>              if (!node->page[i])
>>                  continue;
>>              mark_page_accessed(node->page[i]);
>> -#if REF_PAGES
>> -            put_page(node->page[i]);
>> -#endif
>>          }
>>
>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>> index 8868d3b..b85abc6 100644
>> --- a/fs/hfsplus/bnode.c
>> +++ b/fs/hfsplus/bnode.c
>> @@ -18,8 +18,6 @@
>>  #include "hfsplus_fs.h"
>>  #include "hfsplus_raw.h"
>>
>> -#define REF_PAGES    0
>> -
>>  /* Copy a specified range of bytes from the raw data of a node */
>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>  {
>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>> hfs_btree *tree, u32 cnid)
>>              page_cache_release(page);
>>              goto fail;
>>          }
>> -#if !REF_PAGES
>>          page_cache_release(page);
>> -#endif
>>          node->page[i] = page;
>>      }
>>
>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>  {
>>      if (node) {
>>          atomic_inc(&node->refcnt);
>> -#if REF_PAGES
>> -        {
>> -        int i;
>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>> -            get_page(node->page[i]);
>> -        }
>> -#endif
>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>      }
>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>          if (!atomic_read(&node->refcnt))
>>              BUG();
>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>> -#if REF_PAGES
>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>> -                put_page(node->page[i]);
>> -#endif
>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>              return;
>> -        }
>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>              if (!node->page[i])
>>                  continue;
>>              mark_page_accessed(node->page[i]);
>> -#if REF_PAGES
>> -            put_page(node->page[i]);
>> -#endif
>>          }
>>
>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>> </commit>
>>
>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>> ------------------------------
>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>
>>>>Hi Andrew,
>>>>
>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>
>>>>Feel free to add:
>>>>
>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>
>>>>Thanks a lot in advance!
>>>>
>>>>Best regards,
>>>>
>>>>    Anton
>>>
>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>
>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>
>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>> I'm looking over the pre-git history to see how that comes about.
>>>
>>> Hin-Tak
>>>
>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>
>>>>> Fix this bugreport by Sasha Levin:
>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>
>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>> ---
>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>> index 759708f..5af50fb 100644
>>>>> --- a/fs/hfsplus/bnode.c
>>>>> +++ b/fs/hfsplus/bnode.c
>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>             page_cache_release(page);
>>>>>             goto fail;
>>>>>         }
>>>>> -        page_cache_release(page);
>>>>>         node->page[i] = page;
>>>>>     }
>>>>>
>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>
>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>> {
>>>>> -#if 0
>>>>>     int i;
>>>>>
>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>         if (node->page[i])
>>>>>             page_cache_release(node->page[i]);
>>>>> -#endif
>>>>> +    }
>>>>>     kfree(node);
>>>>> }
>>>>>
>>>>> --
>>>>> 2.3.0
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>--
>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>Linux NTFS maintainer
>>>>
>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-17 23:26                 ` Hin-Tak Leung
@ 2015-06-18 13:09                   ` Sergei Antonov
  0 siblings, 0 replies; 26+ messages in thread
From: Sergei Antonov @ 2015-06-18 13:09 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Andrew Morton, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, Christoph Hellwig,
	Hin-Tak Leung, Sougata Santra

On 18 June 2015 at 01:26, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 14 June 2015 at 15:18, Sergei Antonov <saproj@gmail.com> wrote:
>> On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
>>> ...
>>>> I did not try to change the logic of the driver. Just fixed one
>>>> glaring defect. Which, by the way, in addition to the aforementioned
>>>> bug by Sasha Levin caused:
>>>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>>>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>>>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>>>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>>>
>>> For some unknown reason majordomo won't let my regular sourceforge
>>> account/alias post
>>> to its lists (not just fsdevel, but also git), anyway, I just managed
>>> to reproduced the issue -
>>
>> With the patch or without?
>
> Without. My older computer died about 9 months ago and I have not tried
> to see the problem on my current hardware until now. The point
> I am trying to see is to first see the problem on my current hardware,
> then test the change (which I'll hope to find time to do in the next few days).
> Otherwise, it is from not-see-problem to not-see-problem.
>
> I'll write again after I've tested the change.

I'm looking forward to it.
The result of the test (which I'm sure will show the bug is gone) will
be by far more valuable contribution than your other messages in the
thread.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-18  2:51 ` Hin-Tak Leung
@ 2015-06-18 12:58   ` Sergei Antonov
  2015-06-18 15:37     ` Hin-Tak Leung
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-18 12:58 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Hin-Tak Leung, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
	Sasha Levin, Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra

On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> Hi Andrew and everybody else,
>
> I looked through the pre-git history and seem to have found the reason of how
> the current code had come to be, and why Sergei's fix seems to involve
> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
> debug code"
> commit below did not remove the right stuff.
>
> Looking at the old code, I think REF_PAGES may have started out being
> 1 (i.e. pages are released right after create, then get/put when needed, no
> need to release on bnode_free),
> then the code was modified for efficiency so that pages are not released
> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
> But it was still largely working in the REF_PAGE=1 mode (and with the commented
> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
> Then it was flipped over, and seems to work, and things were forgotten.
>
> I think the release at bnode_free was commented out because the person who
> wrote it wasn't sure - I am not sure either, since it seems like there might be
> other/more places that one might need to release the pages than just
> at bnode_free().
>
> Does this train of thought make sense?

No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
at first it was my thought too. But later I considered what the logic
of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
came to a conclusion that REF_PAGES=1 was a safe approach and
REF_PAGES=0 was an experimental unsafe release-early approach, so
!REF_PAGES was what he really meant.

>
> Hin-Tak
>
> <commit>
> commit a5e3985fa014029eb6795664c704953720cc7f7d
> Author: Roman Zippel <zippel@linux-m68k.org>
> Date:   Tue Sep 6 15:18:47 2005 -0700
>
>     [PATCH] hfs: remove debug code
>
>     This removes some old debug code, which is no longer needed.
>
>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
> index a096c5a..3d5cdc6 100644
> --- a/fs/hfs/bnode.c
> +++ b/fs/hfs/bnode.c
> @@ -13,8 +13,6 @@
>
>  #include "btree.h"
>
> -#define REF_PAGES    0
> -
>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>          int off, int len)
>  {
> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
> hfs_btree *tree, u32 cnid)
>              page_cache_release(page);
>              goto fail;
>          }
> -#if !REF_PAGES
>          page_cache_release(page);
> -#endif
>          node->page[i] = page;
>      }
>
> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>  {
>      if (node) {
>          atomic_inc(&node->refcnt);
> -#if REF_PAGES
> -        {
> -        int i;
> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
> -            get_page(node->page[i]);
> -        }
> -#endif
>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>      }
> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>          if (!atomic_read(&node->refcnt))
>              BUG();
> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
> -#if REF_PAGES
> -            for (i = 0; i < tree->pages_per_bnode; i++)
> -                put_page(node->page[i]);
> -#endif
> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>              return;
> -        }
>          for (i = 0; i < tree->pages_per_bnode; i++) {
>              if (!node->page[i])
>                  continue;
>              mark_page_accessed(node->page[i]);
> -#if REF_PAGES
> -            put_page(node->page[i]);
> -#endif
>          }
>
>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 8868d3b..b85abc6 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -18,8 +18,6 @@
>  #include "hfsplus_fs.h"
>  #include "hfsplus_raw.h"
>
> -#define REF_PAGES    0
> -
>  /* Copy a specified range of bytes from the raw data of a node */
>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>  {
> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
> hfs_btree *tree, u32 cnid)
>              page_cache_release(page);
>              goto fail;
>          }
> -#if !REF_PAGES
>          page_cache_release(page);
> -#endif
>          node->page[i] = page;
>      }
>
> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>  {
>      if (node) {
>          atomic_inc(&node->refcnt);
> -#if REF_PAGES
> -        {
> -        int i;
> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
> -            get_page(node->page[i]);
> -        }
> -#endif
>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>      }
> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>          if (!atomic_read(&node->refcnt))
>              BUG();
> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
> -#if REF_PAGES
> -            for (i = 0; i < tree->pages_per_bnode; i++)
> -                put_page(node->page[i]);
> -#endif
> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>              return;
> -        }
>          for (i = 0; i < tree->pages_per_bnode; i++) {
>              if (!node->page[i])
>                  continue;
>              mark_page_accessed(node->page[i]);
> -#if REF_PAGES
> -            put_page(node->page[i]);
> -#endif
>          }
>
>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
> </commit>
>
> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>> ------------------------------
>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>
>>>Hi Andrew,
>>>
>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>
>>>Feel free to add:
>>>
>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>
>>>Thanks a lot in advance!
>>>
>>>Best regards,
>>>
>>>    Anton
>>
>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>
>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>
>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>> I'm looking over the pre-git history to see how that comes about.
>>
>> Hin-Tak
>>
>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>
>>>> Fix this bugreport by Sasha Levin:
>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>
>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>> ---
>>>> fs/hfsplus/bnode.c | 6 ++----
>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>> index 759708f..5af50fb 100644
>>>> --- a/fs/hfsplus/bnode.c
>>>> +++ b/fs/hfsplus/bnode.c
>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>             page_cache_release(page);
>>>>             goto fail;
>>>>         }
>>>> -        page_cache_release(page);
>>>>         node->page[i] = page;
>>>>     }
>>>>
>>>> @@ -566,13 +565,12 @@ node_error:
>>>>
>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>> {
>>>> -#if 0
>>>>     int i;
>>>>
>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>         if (node->page[i])
>>>>             page_cache_release(node->page[i]);
>>>> -#endif
>>>> +    }
>>>>     kfree(node);
>>>> }
>>>>
>>>> --
>>>> 2.3.0
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>--
>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>Linux NTFS maintainer
>>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
       [not found] <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com>
@ 2015-06-18  2:51 ` Hin-Tak Leung
  2015-06-18 12:58   ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-18  2:51 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Anton Altaparmakov, Andrew Morton, linux-fsdevel, Sasha Levin,
	Anton Altaparmakov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra, Sergei Antonov

Hi Andrew and everybody else,

I looked through the pre-git history and seem to have found the reason of how
the current code had come to be, and why Sergei's fix seems to involve
re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
debug code"
commit below did not remove the right stuff.

Looking at the old code, I think REF_PAGES may have started out being
1 (i.e. pages are released right after create, then get/put when needed, no
need to release on bnode_free),
then the code was modified for efficiency so that pages are not released
on bnode_create and not put/get at bnode_put/get but release at bnode_free.
But it was still largely working in the REF_PAGE=1 mode (and with the commented
out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
Then it was flipped over, and seems to work, and things were forgotten.

I think the release at bnode_free was commented out because the person who
wrote it wasn't sure - I am not sure either, since it seems like there might be
other/more places that one might need to release the pages than just
at bnode_free().

Does this train of thought make sense?

Hin-Tak

<commit>
commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

    [PATCH] hfs: remove debug code

    This removes some old debug code, which is no longer needed.

    Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index a096c5a..3d5cdc6 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -13,8 +13,6 @@

 #include "btree.h"

-#define REF_PAGES    0
-
 void hfs_bnode_read(struct hfs_bnode *node, void *buf,
         int off, int len)
 {
@@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
             page_cache_release(page);
             goto fail;
         }
-#if !REF_PAGES
         page_cache_release(page);
-#endif
         node->page[i] = page;
     }

@@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
 {
     if (node) {
         atomic_inc(&node->refcnt);
-#if REF_PAGES
-        {
-        int i;
-        for (i = 0; i < node->tree->pages_per_bnode; i++)
-            get_page(node->page[i]);
-        }
-#endif
         dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
     }
@@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
         if (!atomic_read(&node->refcnt))
             BUG();
-        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
-#if REF_PAGES
-            for (i = 0; i < tree->pages_per_bnode; i++)
-                put_page(node->page[i]);
-#endif
+        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
             return;
-        }
         for (i = 0; i < tree->pages_per_bnode; i++) {
             if (!node->page[i])
                 continue;
             mark_page_accessed(node->page[i]);
-#if REF_PAGES
-            put_page(node->page[i]);
-#endif
         }

         if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 8868d3b..b85abc6 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -18,8 +18,6 @@
 #include "hfsplus_fs.h"
 #include "hfsplus_raw.h"

-#define REF_PAGES    0
-
 /* Copy a specified range of bytes from the raw data of a node */
 void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
 {
@@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
             page_cache_release(page);
             goto fail;
         }
-#if !REF_PAGES
         page_cache_release(page);
-#endif
         node->page[i] = page;
     }

@@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
 {
     if (node) {
         atomic_inc(&node->refcnt);
-#if REF_PAGES
-        {
-        int i;
-        for (i = 0; i < node->tree->pages_per_bnode; i++)
-            get_page(node->page[i]);
-        }
-#endif
         dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
     }
@@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
         if (!atomic_read(&node->refcnt))
             BUG();
-        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
-#if REF_PAGES
-            for (i = 0; i < tree->pages_per_bnode; i++)
-                put_page(node->page[i]);
-#endif
+        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
             return;
-        }
         for (i = 0; i < tree->pages_per_bnode; i++) {
             if (!node->page[i])
                 continue;
             mark_page_accessed(node->page[i]);
-#if REF_PAGES
-            put_page(node->page[i]);
-#endif
         }

         if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
</commit>

On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> ------------------------------
> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>
>>Hi Andrew,
>>
>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>
>>Feel free to add:
>>
>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>
>>Thanks a lot in advance!
>>
>>Best regards,
>>
>>    Anton
>
> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>
> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>
> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
> I'm looking over the pre-git history to see how that comes about.
>
> Hin-Tak
>
>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>
>>> Fix this bugreport by Sasha Levin:
>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>
>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>> Cc: Christoph Hellwig <hch@infradead.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>> Cc: Sougata Santra <sougata@tuxera.com>
>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>> ---
>>> fs/hfsplus/bnode.c | 6 ++----
>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 759708f..5af50fb 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>             page_cache_release(page);
>>>             goto fail;
>>>         }
>>> -        page_cache_release(page);
>>>         node->page[i] = page;
>>>     }
>>>
>>> @@ -566,13 +565,12 @@ node_error:
>>>
>>> void hfs_bnode_free(struct hfs_bnode *node)
>>> {
>>> -#if 0
>>>     int i;
>>>
>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>         if (node->page[i])
>>>             page_cache_release(node->page[i]);
>>> -#endif
>>> +    }
>>>     kfree(node);
>>> }
>>>
>>> --
>>> 2.3.0
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>--
>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>Linux NTFS maintainer
>>
>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-14 14:18               ` Sergei Antonov
@ 2015-06-17 23:26                 ` Hin-Tak Leung
  2015-06-18 13:09                   ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-17 23:26 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Andrew Morton, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, Christoph Hellwig,
	Hin-Tak Leung, Sougata Santra

On 14 June 2015 at 15:18, Sergei Antonov <saproj@gmail.com> wrote:
> On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
>> ...
>>> I did not try to change the logic of the driver. Just fixed one
>>> glaring defect. Which, by the way, in addition to the aforementioned
>>> bug by Sasha Levin caused:
>>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>>
>> For some unknown reason majordomo won't let my regular sourceforge
>> account/alias post
>> to its lists (not just fsdevel, but also git), anyway, I just managed
>> to reproduced the issue -
>
> With the patch or without?

Without. My older computer died about 9 months ago and I have not tried
to see the problem on my current hardware until now. The point
I am trying to see is to first see the problem on my current hardware,
then test the change (which I'll hope to find time to do in the next few days).
Otherwise, it is from not-see-problem to not-see-problem.

I'll write again after I've tested the change.

Hin-Tak

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-14  2:27             ` Hin-Tak Leung
@ 2015-06-14 14:18               ` Sergei Antonov
  2015-06-17 23:26                 ` Hin-Tak Leung
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-14 14:18 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Andrew Morton, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, Christoph Hellwig,
	Hin-Tak Leung, Sougata Santra

On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
> ...
>> I did not try to change the logic of the driver. Just fixed one
>> glaring defect. Which, by the way, in addition to the aforementioned
>> bug by Sasha Levin caused:
>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>
> For some unknown reason majordomo won't let my regular sourceforge
> account/alias post
> to its lists (not just fsdevel, but also git), anyway, I just managed
> to reproduced the issue -

With the patch or without?

> my older hardware has been retired so it is harder on more-generous
> memory systems.
> The steps are:
> 1. mount mac os x system disk image (I do read-only, and also from
> qemu-nbd read-only, but those
> details don't seem to be important)
> 2. run "du /mnt" on the mac os x disk mount repeatedly; I wasn't able
> to cause the kernel
> to fault on my current system - 8GB, kernel 4.0.5-300.fc22.x86_64 ,
> until I also run 'du /' (i.e. the whole linux system) in a separate window.
>
> Then I get this:
> --------------
> [17257.917645] hfsplus: recoff 22364 too large
> [17257.993329] hfsplus: recoff 22364 too large
> [17274.686714] hfsplus: recoff 27136 too large
> [17274.686747] hfsplus: recoff 27136 too large
> [17276.643246] hfsplus: recoff 28528 too large
> [17276.643283] hfsplus: recoff 28528 too large
> [17276.643334] hfsplus: recoff 31034 too large
> [17276.643432] hfsplus: recoff 31034 too large
> [17276.644568] hfsplus: recoff 31034 too large
> [17276.645408] hfsplus: recoff 31034 too large
> [17276.648083] hfsplus: recoff 31034 too large
> [17283.456405] hfsplus: recoff 18688 too large
> [17283.484236] hfsplus: recoff 18688 too large
> [17283.489637] hfsplus: recoff 18688 too large
> [17283.567689] hfsplus: recoff 18688 too large
> [17283.580719] hfsplus: recoff 18688 too large
> [17283.586192] hfsplus: recoff 18688 too large
> [17283.590115] hfsplus: recoff 18688 too large
> [17283.594808] hfsplus: recoff 18688 too large
> [17283.598225] hfsplus: recoff 18688 too large
> [17283.608131] hfsplus: recoff 18688 too large
> [17283.613090] hfsplus: recoff 18688 too large
> [17283.621220] hfsplus: recoff 18688 too large
> [17283.777970] hfsplus: recoff 18688 too large
> [17283.816374] hfsplus: recoff 18688 too large
> [17283.902398] hfsplus: recoff 18688 too large
> [17284.552987] hfsplus: recoff 16933 too large
> [17284.738526] hfsplus: recoff 25661 too large
> [17284.738557] hfsplus: recoff 25661 too large
> [17284.871063] hfsplus: recoff 14131 too large
> [17284.871992] hfsplus: recoff 14131 too large
> [17287.330129] hfsplus: recoff 14131 too large
> [17287.685383] hfsplus: recoff 14131 too large
> [17287.860485] hfsplus: recoff 14131 too large
> [17287.881720] hfsplus: recoff 14131 too large
> [17287.923793] hfsplus: recoff 14131 too large
> [17288.133657] hfsplus: recoff 14131 too large
> [17288.310771] hfsplus: recoff 14131 too large
> [17288.789545] BUG: Bad page state in process firefox  pfn:22cb45
> [17288.789559] page:ffffea0008b2d140 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.789563] flags: 0x5ffff800000004(referenced)
> [17288.789570] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.789573] bad because of flags:
> [17288.789576] flags: 0x4(referenced)
> [17288.789580] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.789660]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.789722] CPU: 3 PID: 2844 Comm: firefox Tainted: G           O
>  4.0.5-300.fc22.x86_64 #1
> [17288.789726] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.789730]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.789736]  0000000000000000 ffffea0008b2d140 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.789742]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.789748] Call Trace:
> [17288.789762]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.789771]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.789780]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.789787]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.789794]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.789802]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.789808]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.789817]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.789825]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.789832]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.789839]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.789846]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.789851]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.789858]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17288.789862] Disabling lock debugging due to kernel taint
> [17288.790999] BUG: Bad page state in process firefox  pfn:22ce72
> [17288.791042] page:ffffea0008b39c80 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.791046] flags: 0x5ffff800000004(referenced)
> [17288.791053] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.791055] bad because of flags:
> [17288.791057] flags: 0x4(referenced)
> [17288.791062] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.791141]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.791202] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
>  4.0.5-300.fc22.x86_64 #1
> [17288.791206] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.791209]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.791215]  0000000000000000 ffffea0008b39c80 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.791221]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.791226] Call Trace:
> [17288.791241]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.791249]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.791256]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.791263]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.791271]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.791278]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.791283]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.791292]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.791300]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.791307]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.791314]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.791320]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.791325]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.791332]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17288.791338] BUG: Bad page state in process firefox  pfn:22ce73
> [17288.791342] page:ffffea0008b39cc0 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.791344] flags: 0x5ffff800000004(referenced)
> [17288.791349] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.791351] bad because of flags:
> [17288.791354] flags: 0x4(referenced)
> [17288.791357] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.791419]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.791464] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
>  4.0.5-300.fc22.x86_64 #1
> [17288.791467] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.791470]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.791475]  0000000000000000 ffffea0008b39cc0 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.791480]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.791486] Call Trace:
> [17288.791492]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.791497]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.791503]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.791509]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.791514]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.791521]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.791527]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.791534]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.791540]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.791546]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.791552]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.791558]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.791563]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.791568]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17289.351078] hfsplus: recoff 14131 too large
> [17289.900811] hfsplus: recoff 14131 too large
> [17290.059094] hfsplus: recoff 16933 too large
> [17323.760264] hfsplus: recoff 27392 too large
> [17323.761847] hfsplus: recoff 27392 too large
> [17323.762163] hfsplus: recoff 27392 too large
> [17323.762435] hfsplus: recoff 27392 too large
> [17323.762545] hfsplus: recoff 27392 too large
> [17323.762811] hfsplus: recoff 27392 too large
> [17323.763119] hfsplus: recoff 27392 too large
> [17323.763236] hfsplus: recoff 27392 too large
> [17323.763345] hfsplus: recoff 27392 too large
> [17323.763453] hfsplus: recoff 27392 too large
> [17323.763725] hfsplus: recoff 27392 too large
> [17348.390526] hfsplus: recoff 30565 too large
> [17348.390629] hfsplus: recoff 30565 too large
> [17371.062906] hfsplus: bad catalog entry type
> [17371.062944] hfsplus: bad catalog entry type
> [17387.389622] hfsplus: recoff 22364 too large
> [17387.501186] hfsplus: recoff 22364 too large
> [17394.430052] hfsplus: walked past end of dir
> [17394.430099] hfsplus: walked past end of dir
> [17395.077257] hfsplus: recoff 18688 too large
> [17395.081749] hfsplus: recoff 18688 too large
> [17395.086034] hfsplus: recoff 18688 too large
> [17395.090322] hfsplus: recoff 18688 too large
> [17395.094605] hfsplus: recoff 18688 too large
> [17395.099004] hfsplus: recoff 18688 too large
> [17395.103987] hfsplus: recoff 18688 too large
> [17395.108969] hfsplus: recoff 18688 too large
> [17395.113921] hfsplus: recoff 18688 too large
> [17395.123899] hfsplus: recoff 18688 too large
> [17395.128711] hfsplus: recoff 18688 too large
> [17395.133313] hfsplus: recoff 18688 too large
> [17395.142577] hfsplus: recoff 18688 too large
> [17395.147776] hfsplus: recoff 18688 too large
> [17395.157595] hfsplus: recoff 18688 too large
> [17399.292963] BUG: Bad page state in process qemu-nbd  pfn:2307de
> [17399.292975] page:ffffea0008c1f780 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17399.292979] flags: 0x5ffff800000004(referenced)
> [17399.292986] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17399.292988] bad because of flags:
> [17399.292991] flags: 0x4(referenced)
> [17399.292995] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17399.293117]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17399.293181] CPU: 3 PID: 14819 Comm: qemu-nbd Tainted: G    B      O
>    4.0.5-300.fc22.x86_64 #1
> [17399.293185] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17399.293189]  0000000000000000 00000000ffffffff ffff88010a30b948
> ffffffff817834f4
> [17399.293194]  0000000000000000 ffffea0008c1f780 ffff88010a30b978
> ffffffff811a48ec
> [17399.293199]  ffffffff811c0330 ffff88010a30bb38 ffff88023ed98e38
> ffff88023ed98e58
> [17399.293205] Call Trace:
> [17399.293219]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17399.293227]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17399.293234]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17399.293241]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17399.293249]  [<ffffffff813965a1>] ? cfq_dispatch_requests+0xaf1/0xca0
> [17399.293256]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17399.293263]  [<ffffffff810d3cfc>] ? update_curr+0x5c/0x160
> [17399.293272]  [<ffffffff811f2e21>] alloc_pages_current+0x91/0x110
> [17399.293277]  [<ffffffff8119fbdf>] __page_cache_alloc+0xaf/0xd0
> [17399.293284]  [<ffffffff811ad601>] __do_page_cache_readahead+0x101/0x250
> [17399.293291]  [<ffffffff81117a14>] ? futex_wait+0x204/0x290
> [17399.293298]  [<ffffffff811ad9c4>] ondemand_readahead+0x274/0x280
> [17399.293304]  [<ffffffff811adb1e>] page_cache_sync_readahead+0x2e/0x50
> [17399.293309]  [<ffffffff811a1574>] generic_file_read_iter+0x504/0x620
> [17399.293315]  [<ffffffff8121c92e>] new_sync_read+0x8e/0xd0
> [17399.293321]  [<ffffffff8121dc38>] __vfs_read+0x18/0x50
> [17399.293326]  [<ffffffff8121dcf7>] vfs_read+0x87/0x140
> [17399.293331]  [<ffffffff8121dfea>] SyS_pread64+0x9a/0xc0
> [17399.293339]  [<ffffffff81789b09>] system_call_fastpath+0x12/0x17
> ------------------
>
> My previous experience with is problem has a sightly different oops,
> but closer to what's in
> http://www.spinics.net/lists/linux-fsdevel/msg63807.html
> involving hfsplus_brec_find and hfsplus_bnode_read .
>
> I remember there were other simular reports concerning updatedb,but I
> could only find
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
> No doubt there are more.
>
> I agree that Sergei needs to explain the problem clearer though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 23:40           ` Sergei Antonov
@ 2015-06-14  2:27             ` Hin-Tak Leung
  2015-06-14 14:18               ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Hin-Tak Leung @ 2015-06-14  2:27 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Andrew Morton, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, Christoph Hellwig,
	Hin-Tak Leung, Sougata Santra

On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
...
> I did not try to change the logic of the driver. Just fixed one
> glaring defect. Which, by the way, in addition to the aforementioned
> bug by Sasha Levin caused:
> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342

For some unknown reason majordomo won't let my regular sourceforge
account/alias post
to its lists (not just fsdevel, but also git), anyway, I just managed
to reproduced the issue -
my older hardware has been retired so it is harder on more-generous
memory systems.
The steps are:
1. mount mac os x system disk image (I do read-only, and also from
qemu-nbd read-only, but those
details don't seem to be important)
2. run "du /mnt" on the mac os x disk mount repeatedly; I wasn't able
to cause the kernel
to fault on my current system - 8GB, kernel 4.0.5-300.fc22.x86_64 ,
until I also run 'du /' (i.e. the whole linux system) in a separate window.

Then I get this:
--------------
[17257.917645] hfsplus: recoff 22364 too large
[17257.993329] hfsplus: recoff 22364 too large
[17274.686714] hfsplus: recoff 27136 too large
[17274.686747] hfsplus: recoff 27136 too large
[17276.643246] hfsplus: recoff 28528 too large
[17276.643283] hfsplus: recoff 28528 too large
[17276.643334] hfsplus: recoff 31034 too large
[17276.643432] hfsplus: recoff 31034 too large
[17276.644568] hfsplus: recoff 31034 too large
[17276.645408] hfsplus: recoff 31034 too large
[17276.648083] hfsplus: recoff 31034 too large
[17283.456405] hfsplus: recoff 18688 too large
[17283.484236] hfsplus: recoff 18688 too large
[17283.489637] hfsplus: recoff 18688 too large
[17283.567689] hfsplus: recoff 18688 too large
[17283.580719] hfsplus: recoff 18688 too large
[17283.586192] hfsplus: recoff 18688 too large
[17283.590115] hfsplus: recoff 18688 too large
[17283.594808] hfsplus: recoff 18688 too large
[17283.598225] hfsplus: recoff 18688 too large
[17283.608131] hfsplus: recoff 18688 too large
[17283.613090] hfsplus: recoff 18688 too large
[17283.621220] hfsplus: recoff 18688 too large
[17283.777970] hfsplus: recoff 18688 too large
[17283.816374] hfsplus: recoff 18688 too large
[17283.902398] hfsplus: recoff 18688 too large
[17284.552987] hfsplus: recoff 16933 too large
[17284.738526] hfsplus: recoff 25661 too large
[17284.738557] hfsplus: recoff 25661 too large
[17284.871063] hfsplus: recoff 14131 too large
[17284.871992] hfsplus: recoff 14131 too large
[17287.330129] hfsplus: recoff 14131 too large
[17287.685383] hfsplus: recoff 14131 too large
[17287.860485] hfsplus: recoff 14131 too large
[17287.881720] hfsplus: recoff 14131 too large
[17287.923793] hfsplus: recoff 14131 too large
[17288.133657] hfsplus: recoff 14131 too large
[17288.310771] hfsplus: recoff 14131 too large
[17288.789545] BUG: Bad page state in process firefox  pfn:22cb45
[17288.789559] page:ffffea0008b2d140 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.789563] flags: 0x5ffff800000004(referenced)
[17288.789570] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.789573] bad because of flags:
[17288.789576] flags: 0x4(referenced)
[17288.789580] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.789660]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.789722] CPU: 3 PID: 2844 Comm: firefox Tainted: G           O
 4.0.5-300.fc22.x86_64 #1
[17288.789726] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.789730]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.789736]  0000000000000000 ffffea0008b2d140 ffff8801fff9bb28
ffffffff811a48ec
[17288.789742]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.789748] Call Trace:
[17288.789762]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.789771]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.789780]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.789787]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.789794]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.789802]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.789808]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.789817]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.789825]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.789832]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.789839]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.789846]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.789851]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.789858]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17288.789862] Disabling lock debugging due to kernel taint
[17288.790999] BUG: Bad page state in process firefox  pfn:22ce72
[17288.791042] page:ffffea0008b39c80 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.791046] flags: 0x5ffff800000004(referenced)
[17288.791053] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.791055] bad because of flags:
[17288.791057] flags: 0x4(referenced)
[17288.791062] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.791141]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.791202] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
 4.0.5-300.fc22.x86_64 #1
[17288.791206] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.791209]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.791215]  0000000000000000 ffffea0008b39c80 ffff8801fff9bb28
ffffffff811a48ec
[17288.791221]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.791226] Call Trace:
[17288.791241]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.791249]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.791256]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.791263]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.791271]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.791278]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.791283]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.791292]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.791300]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.791307]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.791314]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.791320]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.791325]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.791332]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17288.791338] BUG: Bad page state in process firefox  pfn:22ce73
[17288.791342] page:ffffea0008b39cc0 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.791344] flags: 0x5ffff800000004(referenced)
[17288.791349] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.791351] bad because of flags:
[17288.791354] flags: 0x4(referenced)
[17288.791357] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.791419]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.791464] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
 4.0.5-300.fc22.x86_64 #1
[17288.791467] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.791470]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.791475]  0000000000000000 ffffea0008b39cc0 ffff8801fff9bb28
ffffffff811a48ec
[17288.791480]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.791486] Call Trace:
[17288.791492]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.791497]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.791503]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.791509]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.791514]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.791521]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.791527]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.791534]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.791540]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.791546]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.791552]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.791558]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.791563]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.791568]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17289.351078] hfsplus: recoff 14131 too large
[17289.900811] hfsplus: recoff 14131 too large
[17290.059094] hfsplus: recoff 16933 too large
[17323.760264] hfsplus: recoff 27392 too large
[17323.761847] hfsplus: recoff 27392 too large
[17323.762163] hfsplus: recoff 27392 too large
[17323.762435] hfsplus: recoff 27392 too large
[17323.762545] hfsplus: recoff 27392 too large
[17323.762811] hfsplus: recoff 27392 too large
[17323.763119] hfsplus: recoff 27392 too large
[17323.763236] hfsplus: recoff 27392 too large
[17323.763345] hfsplus: recoff 27392 too large
[17323.763453] hfsplus: recoff 27392 too large
[17323.763725] hfsplus: recoff 27392 too large
[17348.390526] hfsplus: recoff 30565 too large
[17348.390629] hfsplus: recoff 30565 too large
[17371.062906] hfsplus: bad catalog entry type
[17371.062944] hfsplus: bad catalog entry type
[17387.389622] hfsplus: recoff 22364 too large
[17387.501186] hfsplus: recoff 22364 too large
[17394.430052] hfsplus: walked past end of dir
[17394.430099] hfsplus: walked past end of dir
[17395.077257] hfsplus: recoff 18688 too large
[17395.081749] hfsplus: recoff 18688 too large
[17395.086034] hfsplus: recoff 18688 too large
[17395.090322] hfsplus: recoff 18688 too large
[17395.094605] hfsplus: recoff 18688 too large
[17395.099004] hfsplus: recoff 18688 too large
[17395.103987] hfsplus: recoff 18688 too large
[17395.108969] hfsplus: recoff 18688 too large
[17395.113921] hfsplus: recoff 18688 too large
[17395.123899] hfsplus: recoff 18688 too large
[17395.128711] hfsplus: recoff 18688 too large
[17395.133313] hfsplus: recoff 18688 too large
[17395.142577] hfsplus: recoff 18688 too large
[17395.147776] hfsplus: recoff 18688 too large
[17395.157595] hfsplus: recoff 18688 too large
[17399.292963] BUG: Bad page state in process qemu-nbd  pfn:2307de
[17399.292975] page:ffffea0008c1f780 count:0 mapcount:0 mapping:
   (null) index:0x2
[17399.292979] flags: 0x5ffff800000004(referenced)
[17399.292986] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17399.292988] bad because of flags:
[17399.292991] flags: 0x4(referenced)
[17399.292995] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17399.293117]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17399.293181] CPU: 3 PID: 14819 Comm: qemu-nbd Tainted: G    B      O
   4.0.5-300.fc22.x86_64 #1
[17399.293185] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17399.293189]  0000000000000000 00000000ffffffff ffff88010a30b948
ffffffff817834f4
[17399.293194]  0000000000000000 ffffea0008c1f780 ffff88010a30b978
ffffffff811a48ec
[17399.293199]  ffffffff811c0330 ffff88010a30bb38 ffff88023ed98e38
ffff88023ed98e58
[17399.293205] Call Trace:
[17399.293219]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17399.293227]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17399.293234]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17399.293241]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17399.293249]  [<ffffffff813965a1>] ? cfq_dispatch_requests+0xaf1/0xca0
[17399.293256]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17399.293263]  [<ffffffff810d3cfc>] ? update_curr+0x5c/0x160
[17399.293272]  [<ffffffff811f2e21>] alloc_pages_current+0x91/0x110
[17399.293277]  [<ffffffff8119fbdf>] __page_cache_alloc+0xaf/0xd0
[17399.293284]  [<ffffffff811ad601>] __do_page_cache_readahead+0x101/0x250
[17399.293291]  [<ffffffff81117a14>] ? futex_wait+0x204/0x290
[17399.293298]  [<ffffffff811ad9c4>] ondemand_readahead+0x274/0x280
[17399.293304]  [<ffffffff811adb1e>] page_cache_sync_readahead+0x2e/0x50
[17399.293309]  [<ffffffff811a1574>] generic_file_read_iter+0x504/0x620
[17399.293315]  [<ffffffff8121c92e>] new_sync_read+0x8e/0xd0
[17399.293321]  [<ffffffff8121dc38>] __vfs_read+0x18/0x50
[17399.293326]  [<ffffffff8121dcf7>] vfs_read+0x87/0x140
[17399.293331]  [<ffffffff8121dfea>] SyS_pread64+0x9a/0xc0
[17399.293339]  [<ffffffff81789b09>] system_call_fastpath+0x12/0x17
------------------

My previous experience with is problem has a sightly different oops,
but closer to what's in
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
involving hfsplus_brec_find and hfsplus_bnode_read .

I remember there were other simular reports concerning updatedb,but I
could only find
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
No doubt there are more.

I agree that Sergei needs to explain the problem clearer though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 22:15         ` Andrew Morton
  2015-06-09 23:00           ` Anton Altaparmakov
  2015-06-09 23:08           ` Anton Altaparmakov
@ 2015-06-09 23:40           ` Sergei Antonov
  2015-06-14  2:27             ` Hin-Tak Leung
  2 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-09 23:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Viacheslav Dubeyko, Anton Altaparmakov, linux-fsdevel,
	Sasha Levin, Al Viro, Christoph Hellwig, Hin-Tak Leung,
	Sougata Santra

On 10 June 2015 at 00:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:
>
>> >> You are basically saying you don___t understand it. Too bad, because the
>> >> bug is very simple. It is the ___use after free___ type of bug, and it can
>> >> be illustrated by this:
>> >> (1) void *ptr = malloc(___);
>> >> (2) free(ptr);
>> >> (3) memcpy(___, ptr, 1);
>> >> Guess which two of these three lines are executed in wrong order.
>> >>
>> >> My patch is about the same type of bug, but with memory pages mapping.
>> >> The driver currently accesses pages that may be unavailable, or
>> >> contain different data. The problem is more likely to occur when
>> >> memory is a limited resource. I reproduced it while running a
>> >> memory-hungry program.
>> >
>> > I worried not about myself but about potential readers of description of
>> > the fix. The description is completely obscure. And it needs to describe
>> > the fix in clear and descriptive manner. This is my request. Please,
>> > describe the fix in a clear way.
>>
>> The description is just right.
>
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.

By calling page_cache_release() when it is OK to.

> The code is distressingly undocumented and has been that way since
> Roman Zippel's original commit in 2004.

I looked into it before submitting the patch. The code submitted in
2004 was already broken.

> From the looks of it, that loop in __hfs_bnode_create() is simply doing
> readahead and is designed as a performance optimisation.  The pages are
> pulled into pagecache in the expectation that they will soon be
> accessed.  What your patch does is to instead pin the pages in
> pagecache until the bnode is freed.  If we're going to do this then we
> need to be very careful about worst-case scenarios: we could even run
> the machine out of memory.

I did not try to change the logic of the driver. Just fixed one
glaring defect. Which, by the way, in addition to the aforementioned
bug by Sasha Levin caused:
1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
4. https://bugzilla.kernel.org/show_bug.cgi?id=42342

> If I'm correct, and this is just readahead then the bug lies elsewhere:

I pinpointed the bug so well. My test code dumped the content of the
page at the moment corrupted data was detected. Then I looked at the
dumped data, and - guess what - data from the memory-hungry program I
was using sneaked in there! So I'm certain the cause of the bug is
indeed the wrong sequence of
read_mapping_page/kmap/kunmap/page_cache_release.

> if other parts of hfsplus are assuming that this memory is in pagecache
> then that's an error - that code (where is it?) should instead be performing
> a pagecache lookup and if the page isn't present, read it from disk
> again.
>
> But for others to be able to review and understand this change and
> suggest alternatives, we'll need a much much better changelog!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 23:16             ` Andrew Morton
@ 2015-06-09 23:34               ` Anton Altaparmakov
  0 siblings, 0 replies; 26+ messages in thread
From: Anton Altaparmakov @ 2015-06-09 23:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergei Antonov, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, hch, Hin-Tak Leung,
	Sougata Santra

Hi,

> On 10 Jun 2015, at 02:16, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 9 Jun 2015 23:08:48 +0000 Anton Altaparmakov <anton@tuxera.com> wrote:
>> Hi Andrew,
>> 
>> Forgot to reply to one point you made:
>> 
>>> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Yes, I too would like to hear much more about your thinking on this,
>>> and a detailed description of the bug and how the patch fixes it.
>> 
>> Perhaps the patch description is lacking but here is what the current code does:
>> 
>> struct page *page = read_mapping_page();
>> page_cache_release(page);
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> 
>> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
>> 
>> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
>> 
>> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
>> 
>> And his patch changes the above to this instead:
>> 
>> struct page *page = read_mapping_page();
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> page_cache_release(page);
>> 
>> Which is the correct sequence of events.
> 
> OK, pinning 8 pages for the duration of hfs_bnode_find() sounds
> reasonable.
> 
> This is a painful way to write a changelog :(

I will grant you that Sergei's change log was a bit brief.  I had to wade through the code to ensure I knew what he was talking about which the changelog should have spared me from doing.

Sergei, perhaps your take home message is that more verbose changelogs would be a good idea because even if something is obvious to you because you have studied and worked on the code it does not mean it is obvious to anyone else who likely has never seen the HFS+ code except in passing.  (-;

/offtopic alert: This reminds me of the maths professor who wrote a long and very complicated proof of some difficult problem on the blackboard and finished with "... and thus this obviously concludes the proof." and an incredulous student asked him "Excuse me professor but is it obvious?".  The professor left the room came back some time later and simply answered "Yes it is" without further explanation.  (-;

>> Although perhaps there should also be a mark_page_accessed(page);
>> thrown in there, too, before the page_cache_release() in the
>> expectation that the B tree node is likely to be used again?
> 
> Probably.
> 
> Also, using read_mapping_page() is quite inefficient: it's a
> synchronous read.  Putting a single call to read_cache_pages() before
> the loop would be sufficient to get all that IO done in a single lump.

That is very true.  IIRC the code came before the advent of the ->readpages address space operation...  But yes it needs modernising...

> But first we fix the bug.

I am glad we agree on that point.  (-:

Best regards,

	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 23:08           ` Anton Altaparmakov
  2015-06-09 23:16             ` Andrew Morton
@ 2015-06-09 23:23             ` Anton Altaparmakov
  1 sibling, 0 replies; 26+ messages in thread
From: Anton Altaparmakov @ 2015-06-09 23:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergei Antonov, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, hch, Hin-Tak Leung,
	Sougata Santra

btw. XFS does something very similar except they actually go and vm_map_ram() the pages into a contiguous virtual memory region so they can just do normal accesses into the linear buffer instead of having to mess about with the fact the node is spread across multiple pages like HFS+ is doing and they use submit_bio() directly instead of read_mapping_page() but the idea is much the same...  See fs/xfs/xfs_buf.c...

Best regards,

	Anton

> On 10 Jun 2015, at 02:08, Anton Altaparmakov <anton@tuxera.com> wrote:
> 
> Hi Andrew,
> 
> Forgot to reply to one point you made:
> 
>> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
>> Yes, I too would like to hear much more about your thinking on this,
>> and a detailed description of the bug and how the patch fixes it.
> 
> Perhaps the patch description is lacking but here is what the current code does:
> 
> struct page *page = read_mapping_page();
> page_cache_release(page);
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> 
> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
> 
> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
> 
> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
> 
> And his patch changes the above to this instead:
> 
> struct page *page = read_mapping_page();
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> page_cache_release(page);
> 
> Which is the correct sequence of events.
> 
> Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again?
> 
> Best regards,
> 
> 	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 23:08           ` Anton Altaparmakov
@ 2015-06-09 23:16             ` Andrew Morton
  2015-06-09 23:34               ` Anton Altaparmakov
  2015-06-09 23:23             ` Anton Altaparmakov
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2015-06-09 23:16 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Sergei Antonov, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, hch, Hin-Tak Leung,
	Sougata Santra

On Tue, 9 Jun 2015 23:08:48 +0000 Anton Altaparmakov <anton@tuxera.com> wrote:

> Hi Andrew,
> 
> Forgot to reply to one point you made:
> 
> > On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Yes, I too would like to hear much more about your thinking on this,
> > and a detailed description of the bug and how the patch fixes it.
> 
> Perhaps the patch description is lacking but here is what the current code does:
> 
> struct page *page = read_mapping_page();
> page_cache_release(page);
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> 
> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
> 
> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
> 
> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
> 
> And his patch changes the above to this instead:
> 
> struct page *page = read_mapping_page();
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> page_cache_release(page);
> 
> Which is the correct sequence of events.

OK, pinning 8 pages for the duration of hfs_bnode_find() sounds
reasonable.

This is a painful way to write a changelog :(

> Although perhaps there should also be a mark_page_accessed(page);
> thrown in there, too, before the page_cache_release() in the
> expectation that the B tree node is likely to be used again?

Probably.

Also, using read_mapping_page() is quite inefficient: it's a
synchronous read.  Putting a single call to read_cache_pages() before
the loop would be sufficient to get all that IO done in a single lump. 

But first we fix the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 22:15         ` Andrew Morton
  2015-06-09 23:00           ` Anton Altaparmakov
@ 2015-06-09 23:08           ` Anton Altaparmakov
  2015-06-09 23:16             ` Andrew Morton
  2015-06-09 23:23             ` Anton Altaparmakov
  2015-06-09 23:40           ` Sergei Antonov
  2 siblings, 2 replies; 26+ messages in thread
From: Anton Altaparmakov @ 2015-06-09 23:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergei Antonov, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, hch, Hin-Tak Leung,
	Sougata Santra

Hi Andrew,

Forgot to reply to one point you made:

> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.

Perhaps the patch description is lacking but here is what the current code does:

struct page *page = read_mapping_page();
page_cache_release(page);
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);

Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().

The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.

Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.

And his patch changes the above to this instead:

struct page *page = read_mapping_page();
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);
page_cache_release(page);

Which is the correct sequence of events.

Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again?

Best regards,

	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-09 22:15         ` Andrew Morton
@ 2015-06-09 23:00           ` Anton Altaparmakov
  2015-06-09 23:08           ` Anton Altaparmakov
  2015-06-09 23:40           ` Sergei Antonov
  2 siblings, 0 replies; 26+ messages in thread
From: Anton Altaparmakov @ 2015-06-09 23:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergei Antonov, Viacheslav Dubeyko, Anton Altaparmakov,
	linux-fsdevel, Sasha Levin, Al Viro, hch, Hin-Tak Leung,
	Sougata Santra

Hi Andrew,

> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:
>>>> You are basically saying you don___t understand it. Too bad, because the
>>>> bug is very simple. It is the ___use after free___ type of bug, and it can
>>>> be illustrated by this:
>>>> (1) void *ptr = malloc(___);
>>>> (2) free(ptr);
>>>> (3) memcpy(___, ptr, 1);
>>>> Guess which two of these three lines are executed in wrong order.
>>>> 
>>>> My patch is about the same type of bug, but with memory pages mapping.
>>>> The driver currently accesses pages that may be unavailable, or
>>>> contain different data. The problem is more likely to occur when
>>>> memory is a limited resource. I reproduced it while running a
>>>> memory-hungry program.
>>> 
>>> I worried not about myself but about potential readers of description of
>>> the fix. The description is completely obscure. And it needs to describe
>>> the fix in clear and descriptive manner. This is my request. Please,
>>> describe the fix in a clear way.
>> 
>> The description is just right.
> 
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.
> 
> 
> The code is distressingly undocumented and has been that way since
> Roman Zippel's original commit in 2004.
> 
> From the looks of it, that loop in __hfs_bnode_create() is simply doing
> readahead and is designed as a performance optimisation.  The pages are
> pulled into pagecache in the expectation that they will soon be
> accessed.  What your patch does is to instead pin the pages in
> pagecache until the bnode is freed.  If we're going to do this then we
> need to be very careful about worst-case scenarios: we could even run
> the machine out of memory.
> 
> If I'm correct, and this is just readahead then the bug lies elsewhere:
> if other parts of hfsplus are assuming that this memory is in pagecache
> then that's an error - that code (where is it?) should instead be performing
> a pagecache lookup and if the page isn't present, read it from disk
> again.
> 
> But for others to be able to review and understand this change and
> suggest alternatives, we'll need a much much better changelog!

Given I had just looked at the code...  __hfs_bnode_create() is not doing read-ahead at all from my reading of the code.  What it does is to gather the needed pages that are immediately processed, i.e.

hfs_bnode_find() does:

- call __hfs_bnode_create() which gathers the pages into the array of pages @node->page

- kmap() some page from @node->page array, read some information from the kmapped page then kunmap it again (this is actually insane - either all the pages should have been kmapped in __hfs_bnode_create or it should be using kmap_atomic!).

Here is the actual code:

        desc = (struct hfs_bnode_desc *)(kmap(node->page[0]) +
                        node->page_offset);
        node->prev = be32_to_cpu(desc->prev);
        node->next = be32_to_cpu(desc->next);
        node->num_recs = be16_to_cpu(desc->num_recs);
        node->type = desc->type;
        node->height = desc->height;
        kunmap(node->page[0]);

That actually makes some sense (though kmap_atomic would still be much better!) but then this follows:

	off = hfs_bnode_read_u16(node, rec_off);

and (omitting lines for clarity):

        for (i = 1; i <= node->num_recs; off = next_off, i++) {
                ...
                next_off = hfs_bnode_read_u16(node, rec_off);
                ...
                key_size = hfs_bnode_read_u16(node, off) + 2;
                ...
        }

Where all those hfs_bnode_read_u16() are simply "kmap, copy u16 from kmapped page, kunmap" so again crazy thing to be doing - either kmap_atomic or probably actually better just kmap all the pages to start with in @node->page array at the time of reading them all in...

If you consider __hfs_bnode_create to be doing readahead then all those calls to hfs_bnode_read_u16() would have to do a read_mapping_page() and incur the overhead of a radix tree lookup for each u16 read.  That would not be very good for performance/cpu usage.

Also note there is no danger of running out of RAM as the largest allowed B-tree node size in HFS+ according to Apple's own specification is 32kiB, i.e. on 4k page size only 8 pages maximum - readpages functions deal with significantly more pages at a time without worrying about running out of RAM.

But yes I agree with you that HFS+ is an undocumented mess but at least Sergei is putting some effort to make it better!

Best regards,

	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-08 16:50       ` Sergei Antonov
@ 2015-06-09 22:15         ` Andrew Morton
  2015-06-09 23:00           ` Anton Altaparmakov
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Andrew Morton @ 2015-06-09 22:15 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: Viacheslav Dubeyko, Anton Altaparmakov, linux-fsdevel,
	Sasha Levin, Al Viro, Christoph Hellwig, Hin-Tak Leung,
	Sougata Santra

On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:

> >> You are basically saying you don___t understand it. Too bad, because the
> >> bug is very simple. It is the ___use after free___ type of bug, and it can
> >> be illustrated by this:
> >> (1) void *ptr = malloc(___);
> >> (2) free(ptr);
> >> (3) memcpy(___, ptr, 1);
> >> Guess which two of these three lines are executed in wrong order.
> >>
> >> My patch is about the same type of bug, but with memory pages mapping.
> >> The driver currently accesses pages that may be unavailable, or
> >> contain different data. The problem is more likely to occur when
> >> memory is a limited resource. I reproduced it while running a
> >> memory-hungry program.
> >
> > I worried not about myself but about potential readers of description of
> > the fix. The description is completely obscure. And it needs to describe
> > the fix in clear and descriptive manner. This is my request. Please,
> > describe the fix in a clear way.
> 
> The description is just right.

Yes, I too would like to hear much more about your thinking on this,
and a detailed description of the bug and how the patch fixes it.


The code is distressingly undocumented and has been that way since
Roman Zippel's original commit in 2004.

>From the looks of it, that loop in __hfs_bnode_create() is simply doing
readahead and is designed as a performance optimisation.  The pages are
pulled into pagecache in the expectation that they will soon be
accessed.  What your patch does is to instead pin the pages in
pagecache until the bnode is freed.  If we're going to do this then we
need to be very careful about worst-case scenarios: we could even run
the machine out of memory.

If I'm correct, and this is just readahead then the bug lies elsewhere:
if other parts of hfsplus are assuming that this memory is in pagecache
then that's an error - that code (where is it?) should instead be performing
a pagecache lookup and if the page isn't present, read it from disk
again.

But for others to be able to review and understand this change and
suggest alternatives, we'll need a much much better changelog!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-07  0:42 Sergei Antonov
  2015-06-08 15:45 ` Vyacheslav Dubeyko
@ 2015-06-09 18:06 ` Anton Altaparmakov
  1 sibling, 0 replies; 26+ messages in thread
From: Anton Altaparmakov @ 2015-06-09 18:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel, Sasha Levin, Anton Altaparmakov, Al Viro, hch,
	Vyacheslav Dubeyko, Hin-Tak Leung, Sougata Santra,
	Sergei Antonov

Hi Andrew,

Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.

Feel free to add:

Reviewed-by: Anton Altaparmakov <anton@tuxera.com>

Thanks a lot in advance!

Best regards,

	Anton

> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
> 
> Fix this bugreport by Sasha Levin:
> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> Cc: Sougata Santra <sougata@tuxera.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> ---
> fs/hfsplus/bnode.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 759708f..5af50fb 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
> 			page_cache_release(page);
> 			goto fail;
> 		}
> -		page_cache_release(page);
> 		node->page[i] = page;
> 	}
> 
> @@ -566,13 +565,12 @@ node_error:
> 
> void hfs_bnode_free(struct hfs_bnode *node)
> {
> -#if 0
> 	int i;
> 
> -	for (i = 0; i < node->tree->pages_per_bnode; i++)
> +	for (i = 0; i < node->tree->pages_per_bnode; i++) {
> 		if (node->page[i])
> 			page_cache_release(node->page[i]);
> -#endif
> +	}
> 	kfree(node);
> }
> 
> -- 
> 2.3.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-08 16:45     ` Viacheslav Dubeyko
@ 2015-06-08 16:50       ` Sergei Antonov
  2015-06-09 22:15         ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-08 16:50 UTC (permalink / raw)
  To: Viacheslav Dubeyko, Anton Altaparmakov
  Cc: linux-fsdevel, Sasha Levin, Al Viro, Christoph Hellwig,
	Andrew Morton, Hin-Tak Leung, Sougata Santra

On 8 June 2015 at 18:45, Viacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Mon, 2015-06-08 at 18:32 +0200, Sergei Antonov wrote:
>> On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
>> > On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
>> >> Fix this bugreport by Sasha Levin:
>> >> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>> >> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>> >>
>> >
>> > Sorry, I missed the point. What do you try to fix? How this change fixes
>> > the issue?
>> >
>> > I think that maybe this fix makes sense. But it needs to describe it
>> > more deeply. Could you describe the fix with more details?
>>
>> You are basically saying you don’t understand it. Too bad, because the
>> bug is very simple. It is the „use after free“ type of bug, and it can
>> be illustrated by this:
>> (1) void *ptr = malloc(…);
>> (2) free(ptr);
>> (3) memcpy(…, ptr, 1);
>> Guess which two of these three lines are executed in wrong order.
>>
>> My patch is about the same type of bug, but with memory pages mapping.
>> The driver currently accesses pages that may be unavailable, or
>> contain different data. The problem is more likely to occur when
>> memory is a limited resource. I reproduced it while running a
>> memory-hungry program.
>
> I worried not about myself but about potential readers of description of
> the fix. The description is completely obscure. And it needs to describe
> the fix in clear and descriptive manner. This is my request. Please,
> describe the fix in a clear way.

The description is just right.
Anton, can you give your opinion? You commented my patches before.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-08 16:32   ` Sergei Antonov
@ 2015-06-08 16:45     ` Viacheslav Dubeyko
  2015-06-08 16:50       ` Sergei Antonov
  0 siblings, 1 reply; 26+ messages in thread
From: Viacheslav Dubeyko @ 2015-06-08 16:45 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: linux-fsdevel, Sasha Levin, Anton Altaparmakov, Al Viro,
	Christoph Hellwig, Andrew Morton, Hin-Tak Leung, Sougata Santra

On Mon, 2015-06-08 at 18:32 +0200, Sergei Antonov wrote:
> On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> > On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
> >> Fix this bugreport by Sasha Levin:
> >> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> >> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> >>
> >
> > Sorry, I missed the point. What do you try to fix? How this change fixes
> > the issue?
> >
> > I think that maybe this fix makes sense. But it needs to describe it
> > more deeply. Could you describe the fix with more details?
> 
> You are basically saying you don’t understand it. Too bad, because the
> bug is very simple. It is the „use after free“ type of bug, and it can
> be illustrated by this:
> (1) void *ptr = malloc(…);
> (2) free(ptr);
> (3) memcpy(…, ptr, 1);
> Guess which two of these three lines are executed in wrong order.
> 
> My patch is about the same type of bug, but with memory pages mapping.
> The driver currently accesses pages that may be unavailable, or
> contain different data. The problem is more likely to occur when
> memory is a limited resource. I reproduced it while running a
> memory-hungry program.

I worried not about myself but about potential readers of description of
the fix. The description is completely obscure. And it needs to describe
the fix in clear and descriptive manner. This is my request. Please,
describe the fix in a clear way.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-08 15:45 ` Vyacheslav Dubeyko
@ 2015-06-08 16:32   ` Sergei Antonov
  2015-06-08 16:45     ` Viacheslav Dubeyko
  0 siblings, 1 reply; 26+ messages in thread
From: Sergei Antonov @ 2015-06-08 16:32 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-fsdevel, Sasha Levin, Anton Altaparmakov, Al Viro,
	Christoph Hellwig, Andrew Morton, Hin-Tak Leung, Sougata Santra

On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
>> Fix this bugreport by Sasha Levin:
>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>
>
> Sorry, I missed the point. What do you try to fix? How this change fixes
> the issue?
>
> I think that maybe this fix makes sense. But it needs to describe it
> more deeply. Could you describe the fix with more details?

You are basically saying you don’t understand it. Too bad, because the
bug is very simple. It is the „use after free“ type of bug, and it can
be illustrated by this:
(1) void *ptr = malloc(…);
(2) free(ptr);
(3) memcpy(…, ptr, 1);
Guess which two of these three lines are executed in wrong order.

My patch is about the same type of bug, but with memory pages mapping.
The driver currently accesses pages that may be unavailable, or
contain different data. The problem is more likely to occur when
memory is a limited resource. I reproduced it while running a
memory-hungry program.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hfsplus: release bnode pages after use, not before
  2015-06-07  0:42 Sergei Antonov
@ 2015-06-08 15:45 ` Vyacheslav Dubeyko
  2015-06-08 16:32   ` Sergei Antonov
  2015-06-09 18:06 ` Anton Altaparmakov
  1 sibling, 1 reply; 26+ messages in thread
From: Vyacheslav Dubeyko @ 2015-06-08 15:45 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: linux-fsdevel, Sasha Levin, Anton Altaparmakov, Al Viro,
	Christoph Hellwig, Andrew Morton, Hin-Tak Leung, Sougata Santra

On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
> Fix this bugreport by Sasha Levin:
> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> 

Sorry, I missed the point. What do you try to fix? How this change fixes
the issue?

I think that maybe this fix makes sense. But it needs to describe it
more deeply. Could you describe the fix with more details?

Thanks,
Vyacheslav Dubeyko.


> Cc: Anton Altaparmakov <aia21@cam.ac.uk>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> Cc: Sougata Santra <sougata@tuxera.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> ---
>  fs/hfsplus/bnode.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 759708f..5af50fb 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>  			page_cache_release(page);
>  			goto fail;
>  		}
> -		page_cache_release(page);
>  		node->page[i] = page;
>  	}
>  
> @@ -566,13 +565,12 @@ node_error:
>  
>  void hfs_bnode_free(struct hfs_bnode *node)
>  {
> -#if 0
>  	int i;
>  
> -	for (i = 0; i < node->tree->pages_per_bnode; i++)
> +	for (i = 0; i < node->tree->pages_per_bnode; i++) {
>  		if (node->page[i])
>  			page_cache_release(node->page[i]);
> -#endif
> +	}
>  	kfree(node);
>  }
>  



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] hfsplus: release bnode pages after use, not before
@ 2015-06-07  0:42 Sergei Antonov
  2015-06-08 15:45 ` Vyacheslav Dubeyko
  2015-06-09 18:06 ` Anton Altaparmakov
  0 siblings, 2 replies; 26+ messages in thread
From: Sergei Antonov @ 2015-06-07  0:42 UTC (permalink / raw)
  To: linux-fsdevel, Sasha Levin
  Cc: Anton Altaparmakov, Al Viro, Christoph Hellwig, Andrew Morton,
	Vyacheslav Dubeyko, Hin-Tak Leung, Sougata Santra,
	Sergei Antonov

Fix this bugreport by Sasha Levin:
http://lkml.org/lkml/2015/2/20/85 ("use after free")
Make sure mapped pages are available for the entire lifetime of hfs_bnode.

Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sougata Santra <sougata@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
---
 fs/hfsplus/bnode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 759708f..5af50fb 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -566,13 +565,12 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-#if 0
 	int i;
 
-	for (i = 0; i < node->tree->pages_per_bnode; i++)
+	for (i = 0; i < node->tree->pages_per_bnode; i++) {
 		if (node->page[i])
 			page_cache_release(node->page[i]);
-#endif
+	}
 	kfree(node);
 }
 
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-06-19  1:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1433781680.24509.YahooMailBasic@web172301.mail.ir2.yahoo.com>
2015-06-08 16:47 ` [PATCH] hfsplus: release bnode pages after use, not before Sergei Antonov
     [not found] <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com>
2015-06-18  2:51 ` Hin-Tak Leung
2015-06-18 12:58   ` Sergei Antonov
2015-06-18 15:37     ` Hin-Tak Leung
2015-06-18 16:19       ` Sergei Antonov
2015-06-18 17:16         ` Hin-Tak Leung
2015-06-18 20:51           ` Sergei Antonov
2015-06-18 22:16             ` Hin-Tak Leung
2015-06-19  1:30               ` Sergei Antonov
2015-06-07  0:42 Sergei Antonov
2015-06-08 15:45 ` Vyacheslav Dubeyko
2015-06-08 16:32   ` Sergei Antonov
2015-06-08 16:45     ` Viacheslav Dubeyko
2015-06-08 16:50       ` Sergei Antonov
2015-06-09 22:15         ` Andrew Morton
2015-06-09 23:00           ` Anton Altaparmakov
2015-06-09 23:08           ` Anton Altaparmakov
2015-06-09 23:16             ` Andrew Morton
2015-06-09 23:34               ` Anton Altaparmakov
2015-06-09 23:23             ` Anton Altaparmakov
2015-06-09 23:40           ` Sergei Antonov
2015-06-14  2:27             ` Hin-Tak Leung
2015-06-14 14:18               ` Sergei Antonov
2015-06-17 23:26                 ` Hin-Tak Leung
2015-06-18 13:09                   ` Sergei Antonov
2015-06-09 18:06 ` Anton Altaparmakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).