From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Altaparmakov Subject: Re: [PATCH] hfsplus: release bnode pages after use, not before Date: Tue, 9 Jun 2015 23:08:48 +0000 Message-ID: <792FFF79-079C-4F6E-89DD-C196C9AFFFBF@tuxera.com> References: <1433637776-3559-1-git-send-email-saproj@gmail.com> <1433778309.2513.11.camel@ubuntu-slavad-14.04> <1433781918.2659.3.camel@slavad-ubuntu-14.04> <20150609151545.8a146bb5d29051e604d4d211@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: Sergei Antonov , Viacheslav Dubeyko , Anton Altaparmakov , "linux-fsdevel@vger.kernel.org" , Sasha Levin , "Al Viro" , "hch@infradead.org" , Hin-Tak Leung , Sougata Santra To: Andrew Morton Return-path: Received: from nebula-exfe-01.nebula.fi ([83.145.198.163]:56130 "EHLO ex10.nebula.fi" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753352AbbFIXIu convert rfc822-to-8bit (ORCPT ); Tue, 9 Jun 2015 19:08:50 -0400 In-Reply-To: <20150609151545.8a146bb5d29051e604d4d211@linux-foundation.org> Content-Language: en-US Content-ID: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Andrew, Forgot to reply to one point you made: > On 10 Jun 2015, at 01:15, Andrew Morton wrote: > Yes, I too would like to hear much more about your thinking on this, > and a detailed description of the bug and how the patch fixes it. Perhaps the patch description is lacking but here is what the current code does: struct page *page = read_mapping_page(); page_cache_release(page); u8 *kaddr = kmap(page); memcpy(..., kaddr, ...); kunmap(page); Now in what world is that a valid thing to do? When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap(). The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else. Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount. And his patch changes the above to this instead: struct page *page = read_mapping_page(); u8 *kaddr = kmap(page); memcpy(..., kaddr, ...); kunmap(page); page_cache_release(page); Which is the correct sequence of events. Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again? Best regards, Anton -- Anton Altaparmakov (replace at with @) Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ Linux NTFS maintainer