linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anton Altaparmakov <anton@tuxera.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Sergei Antonov <saproj@gmail.com>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"hch@infradead.org" <hch@infradead.org>,
	Hin-Tak Leung <htl10@users.sourceforge.net>,
	Sougata Santra <sougata@tuxera.com>
Subject: Re: [PATCH] hfsplus: release bnode pages after use, not before
Date: Tue, 9 Jun 2015 23:08:48 +0000	[thread overview]
Message-ID: <792FFF79-079C-4F6E-89DD-C196C9AFFFBF@tuxera.com> (raw)
In-Reply-To: <20150609151545.8a146bb5d29051e604d4d211@linux-foundation.org>

Hi Andrew,

Forgot to reply to one point you made:

> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.

Perhaps the patch description is lacking but here is what the current code does:

struct page *page = read_mapping_page();
page_cache_release(page);
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);

Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().

The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.

Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.

And his patch changes the above to this instead:

struct page *page = read_mapping_page();
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);
page_cache_release(page);

Which is the correct sequence of events.

Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again?

Best regards,

	Anton
-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


  parent reply	other threads:[~2015-06-09 23:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-07  0:42 [PATCH] hfsplus: release bnode pages after use, not before Sergei Antonov
2015-06-08 15:45 ` Vyacheslav Dubeyko
2015-06-08 16:32   ` Sergei Antonov
2015-06-08 16:45     ` Viacheslav Dubeyko
2015-06-08 16:50       ` Sergei Antonov
2015-06-09 22:15         ` Andrew Morton
2015-06-09 23:00           ` Anton Altaparmakov
2015-06-09 23:08           ` Anton Altaparmakov [this message]
2015-06-09 23:16             ` Andrew Morton
2015-06-09 23:34               ` Anton Altaparmakov
2015-06-09 23:23             ` Anton Altaparmakov
2015-06-09 23:40           ` Sergei Antonov
2015-06-14  2:27             ` Hin-Tak Leung
2015-06-14 14:18               ` Sergei Antonov
2015-06-17 23:26                 ` Hin-Tak Leung
2015-06-18 13:09                   ` Sergei Antonov
2015-06-09 18:06 ` Anton Altaparmakov
     [not found] <1433781680.24509.YahooMailBasic@web172301.mail.ir2.yahoo.com>
2015-06-08 16:47 ` Sergei Antonov
     [not found] <1434584504.1063.YahooMailBasic@web172304.mail.ir2.yahoo.com>
2015-06-18  2:51 ` Hin-Tak Leung
2015-06-18 12:58   ` Sergei Antonov
2015-06-18 15:37     ` Hin-Tak Leung
2015-06-18 16:19       ` Sergei Antonov
2015-06-18 17:16         ` Hin-Tak Leung
2015-06-18 20:51           ` Sergei Antonov
2015-06-18 22:16             ` Hin-Tak Leung
2015-06-19  1:30               ` Sergei Antonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=792FFF79-079C-4F6E-89DD-C196C9AFFFBF@tuxera.com \
    --to=anton@tuxera.com \
    --cc=aia21@cam.ac.uk \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=htl10@users.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=saproj@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=slava@dubeyko.com \
    --cc=sougata@tuxera.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).