From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932985AbbLBQs6 (ORCPT ); Wed, 2 Dec 2015 11:48:58 -0500 Received: from g4t3427.houston.hp.com ([15.201.208.55]:36173 "EHLO g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932933AbbLBQs4 (ORCPT ); Wed, 2 Dec 2015 11:48:56 -0500 Message-ID: <1449078237.31589.30.camel@hpe.com> Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping From: Toshi Kani To: Dan Williams Cc: Andrew Morton , "Kirill A. Shutemov" , Matthew Wilcox , Ross Zwisler , mauricio.porto@hpe.com, Linux MM , linux-fsdevel , "linux-nvdimm@lists.01.org" , "linux-kernel@vger.kernel.org" Date: Wed, 02 Dec 2015 10:43:57 -0700 In-Reply-To: References: <1448309082-20851-1-git-send-email-toshi.kani@hpe.com> <1449022764.31589.24.camel@hpe.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 (3.16.5-3.fc22) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2015-12-01 at 19:45 -0800, Dan Williams wrote: > On Tue, Dec 1, 2015 at 6:19 PM, Toshi Kani wrote: > > On Mon, 2015-11-30 at 14:08 -0800, Dan Williams wrote: > > > On Mon, Nov 23, 2015 at 12:04 PM, Toshi Kani wrote: > > > > The following oops was observed when mmap() with MAP_POPULATE > > > > pre-faulted pmd mappings of a DAX file. follow_trans_huge_pmd() > > > > expects that a target address has a struct page. > > > > > > > > BUG: unable to handle kernel paging request at ffffea0012220000 > > > > follow_trans_huge_pmd+0xba/0x390 > > > > follow_page_mask+0x33d/0x420 > > > > __get_user_pages+0xdc/0x800 > > > > populate_vma_page_range+0xb5/0xe0 > > > > __mm_populate+0xc5/0x150 > > > > vm_mmap_pgoff+0xd5/0xe0 > > > > SyS_mmap_pgoff+0x1c1/0x290 > > > > SyS_mmap+0x1b/0x30 > > > > > > > > Fix it by making the PMD pre-fault handling consistent with PTE. > > > > After pre-faulted in faultin_page(), follow_page_mask() calls > > > > follow_trans_huge_pmd(), which is changed to call follow_pfn_pmd() > > > > for VM_PFNMAP or VM_MIXEDMAP. follow_pfn_pmd() handles FOLL_TOUCH > > > > and returns with -EEXIST. > > > > > > > > Reported-by: Mauricio Porto > > > > Signed-off-by: Toshi Kani > > > > Cc: Andrew Morton > > > > Cc: Kirill A. Shutemov > > > > Cc: Matthew Wilcox > > > > Cc: Dan Williams > > > > Cc: Ross Zwisler > > > > --- > > > > > > Hey Toshi, > > > > > > I ended up fixing this differently with follow_pmd_devmap() introduced > > > in this series: > > > > > > https://lists.01.org/pipermail/linux-nvdimm/2015-November/003033.html > > > > > > Does the latest libnvdimm-pending branch [1] pass your test case? > > > > Hi Dan, > > > > I ran several test cases, and they all hit the case "pfn not in memmap" in > > __dax_pmd_fault() during mmap(MAP_POPULATE). Looking at the dax.pfn, > > PFN_DEV is > > set but PFN_MAP is not. I have not looked into why, but I thought I let you > > know first. I've also seen the test thread got hung up at the end sometime. > > That PFN_MAP flag will not be set by default for NFIT-defined > persistent memory. See pmem_should_map_pages() for pmem namespaces > that will have it set by default, currently only e820 type-12 memory > ranges. > > NFIT-defined persistent memory can have a memmap array dynamically > allocated by setting up a pfn device (similar to setting up a btt). > We don't map it by default because the NFIT may describe hundreds of > gigabytes of persistent and the overhead of the memmap may be too > large to locate the memmap in ram. Oh, I see. I will setup the memmap array and run the tests again. But, why does the PMD mapping depend on the memmap array? We have observed major performance improvement with PMD. This feature should always be enabled with DAX regardless of the option to allocate the memmap array. Thanks, -Toshi