From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753431AbbDAMty (ORCPT ); Wed, 1 Apr 2015 08:49:54 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:35822 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753185AbbDAMtu (ORCPT ); Wed, 1 Apr 2015 08:49:50 -0400 Message-ID: <551BE96A.9060603@plexistor.com> Date: Wed, 01 Apr 2015 15:49:46 +0300 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Christoph Hellwig CC: linux-nvdimm@ml01.01.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, ross.zwisler@linux.intel.com, axboe@kernel.dk Subject: Re: another pmem variant V2 References: <1427358764-6126-1-git-send-email-hch@lst.de> <55143A8B.2060304@plexistor.com> <20150331092526.GA25958@lst.de> <551AB9C7.6020505@plexistor.com> <20150331161648.GA1318@lst.de> In-Reply-To: <20150331161648.GA1318@lst.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/31/2015 07:16 PM, Christoph Hellwig wrote: > On Tue, Mar 31, 2015 at 06:14:15PM +0300, Boaz Harrosh wrote: >> We can not accept it as is right now. > > Who is we? > >> We have conducted farther tests. And it messes up NUMA. > > Only you if you use the memmap option in weird ways. > No weird ways just the single range. I would be very happy if you can teach me the proper way, believe me I'm trying for 2 days. > Sounds like I should simply remove the memmap= option so people don't > abuse it. The main point is to parse the e820 tables, which works fine. > What abuse? the single range and the problem shows up. So you are just pasting over the problem sir. The patch that you are submitting has grave problems and covering it up with not allowing memmap=! will not fix it. The Kernel as is after your patch does not like this half baked beast as we defined it, the defined pmem-memory-range messes things up. > And those people having fake pmem, or pcie cards that they are too lazy > to submit proper drivers for can stick to their out of tree hacks? > I have now conducted more tests on real type-12 DDR3 system and the exact same problem as I reported exists with real type-12 chips! And who are we kidding? the "memmap=!" yes or no, makes no difference at all. All it does is edit the table as if it was the table the BIOS gave us. There is no extra processing done on memmap=. Your e820 patch trashes NUMA. But I fix it for good this time here is the fix below. After I apply below patch every thing boots and work just as expected. All the problems I reported disappear. Any configuration, any number of ranges, cross NUMA or not, just works, exactly as before with my patches. The fix is over your V2, I will post one later that fixes your V3 and adds back the memmap=! --- If you inspect my fix below You will see that what happened is that the original patch was too aggressive in making pmem look like ram. In fact it started the ARCH side memory initialization and was only skipping the generic initialization of memory. This messed up internal real-memory structures. With this fix below block/drivers/pmem.ko loads just fine finds its resources and maps them, and everything just works. including any "abuse" to memmap=! and any NUMA configurations. Both real HW, type-12 HW and NUMA Vms. (Preliminary testing we are conducting the full test rig as we speak) (I'm still going over it, I might send some more cleaning) Cheers --- git diff --stat -p -M HEAD arch/x86/kernel/e820.c | 7 ++++--- arch/x86/kernel/pmem.c | 17 ----------------- arch/x86/kernel/setup.c | 2 -- 3 files changed, 4 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 07246e5..dce0e84 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -963,9 +963,10 @@ void __init e820_reserve_resources(void) * pci device BAR resource and insert them later in * pcibios_resource_survey() */ - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { - if (e820.map[i].type != E820_PRAM) - res->flags |= IORESOURCE_BUSY; + if (((e820.map[i].type != E820_RESERVED) && + (e820.map[i].type != E820_PRAM)) || + res->start < (1ULL<<20)) { + res->flags |= IORESOURCE_BUSY; insert_resource(&iomem_resource, res); } res++; diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c index f970048..fcdbc20 100644 --- a/arch/x86/kernel/pmem.c +++ b/arch/x86/kernel/pmem.c @@ -9,23 +9,6 @@ #include #include -void __init reserve_pmem(void) -{ - int i; - - for (i = 0; i < e820.nr_map; i++) { - struct e820entry *ei = &e820.map[i]; - - if (ei->type != E820_PRAM) - continue; - - memblock_reserve(ei->addr, ei->addr + ei->size); - max_pfn_mapped = init_memory_mapping( - ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr, - ei->addr + ei->size); - } -} - static __init void register_pmem_device(struct resource *res) { struct platform_device *pdev; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f2bed2b..0a2421c 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1158,8 +1158,6 @@ void __init setup_arch(char **cmdline_p) early_acpi_boot_init(); - reserve_pmem(); - initmem_init(); dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);