linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anton Altaparmakov <anton@tuxera.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Sergei Antonov <saproj@gmail.com>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"hch@infradead.org" <hch@infradead.org>,
	Hin-Tak Leung <htl10@users.sourceforge.net>,
	Sougata Santra <sougata@tuxera.com>
Subject: Re: [PATCH] hfsplus: release bnode pages after use, not before
Date: Tue, 9 Jun 2015 23:34:27 +0000	[thread overview]
Message-ID: <19E0B0BA-C122-4C1F-B54B-C16C3E3A53FB@tuxera.com> (raw)
In-Reply-To: <20150609161656.9c5c67b41d5dd12edfe6e0db@linux-foundation.org>

Hi,

> On 10 Jun 2015, at 02:16, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 9 Jun 2015 23:08:48 +0000 Anton Altaparmakov <anton@tuxera.com> wrote:
>> Hi Andrew,
>> 
>> Forgot to reply to one point you made:
>> 
>>> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Yes, I too would like to hear much more about your thinking on this,
>>> and a detailed description of the bug and how the patch fixes it.
>> 
>> Perhaps the patch description is lacking but here is what the current code does:
>> 
>> struct page *page = read_mapping_page();
>> page_cache_release(page);
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> 
>> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
>> 
>> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
>> 
>> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
>> 
>> And his patch changes the above to this instead:
>> 
>> struct page *page = read_mapping_page();
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> page_cache_release(page);
>> 
>> Which is the correct sequence of events.
> 
> OK, pinning 8 pages for the duration of hfs_bnode_find() sounds
> reasonable.
> 
> This is a painful way to write a changelog :(

I will grant you that Sergei's change log was a bit brief.  I had to wade through the code to ensure I knew what he was talking about which the changelog should have spared me from doing.

Sergei, perhaps your take home message is that more verbose changelogs would be a good idea because even if something is obvious to you because you have studied and worked on the code it does not mean it is obvious to anyone else who likely has never seen the HFS+ code except in passing.  (-;

/offtopic alert: This reminds me of the maths professor who wrote a long and very complicated proof of some difficult problem on the blackboard and finished with "... and thus this obviously concludes the proof." and an incredulous student asked him "Excuse me professor but is it obvious?".  The professor left the room came back some time later and simply answered "Yes it is" without further explanation.  (-;

>> Although perhaps there should also be a mark_page_accessed(page);
>> thrown in there, too, before the page_cache_release() in the
>> expectation that the B tree node is likely to be used again?
> 
> Probably.
> 
> Also, using read_mapping_page() is quite inefficient: it's a
> synchronous read.  Putting a single call to read_cache_pages() before
> the loop would be sufficient to get all that IO done in a single lump.

That is very true.  IIRC the code came before the advent of the ->readpages address space operation...  But yes it needs modernising...

> But first we fix the bug.

I am glad we agree on that point.  (-:

Best regards,

	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


  reply	other threads:[~2015-06-09 23:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-07  0:42 [PATCH] hfsplus: release bnode pages after use, not before Sergei Antonov
2015-06-08 15:45 ` Vyacheslav Dubeyko
2015-06-08 16:32   ` Sergei Antonov
2015-06-08 16:45     ` Viacheslav Dubeyko
2015-06-08 16:50       ` Sergei Antonov
2015-06-09 22:15         ` Andrew Morton
2015-06-09 23:00           ` Anton Altaparmakov
2015-06-09 23:08           ` Anton Altaparmakov
2015-06-09 23:16             ` Andrew Morton
2015-06-09 23:34               ` Anton Altaparmakov [this message]
2015-06-09 23:23             ` Anton Altaparmakov
2015-06-09 23:40           ` Sergei Antonov
2015-06-14  2:27             ` Hin-Tak Leung
2015-06-14 14:18               ` Sergei Antonov
2015-06-17 23:26                 ` Hin-Tak Leung
2015-06-18 13:09                   ` Sergei Antonov
2015-06-09 18:06 ` Anton Altaparmakov
     [not found] <1433781680.24509.YahooMailBasic@web172301.mail.ir2.yahoo.com>
2015-06-08 16:47 ` Sergei Antonov
     [not found] <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com>
2015-06-18  2:51 ` Hin-Tak Leung
2015-06-18 12:58   ` Sergei Antonov
2015-06-18 15:37     ` Hin-Tak Leung
2015-06-18 16:19       ` Sergei Antonov
2015-06-18 17:16         ` Hin-Tak Leung
2015-06-18 20:51           ` Sergei Antonov
2015-06-18 22:16             ` Hin-Tak Leung
2015-06-19  1:30               ` Sergei Antonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19E0B0BA-C122-4C1F-B54B-C16C3E3A53FB@tuxera.com \
    --to=anton@tuxera.com \
    --cc=aia21@cam.ac.uk \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=htl10@users.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=saproj@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=slava@dubeyko.com \
    --cc=sougata@tuxera.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).