From: Alexander Duyck <alexander.h.duyck@linux.intel.com> To: Michal Hocko <mhocko@kernel.org> Cc: "Pasha Tatashin" <pavel.tatashin@microsoft.com>, osalvador@techadventures.net, linux-nvdimm <linux-nvdimm@lists.01.org>, "Dave Hansen" <dave.hansen@intel.com>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, "Ingo Molnar" <mingo@kernel.org>, "Linux MM" <linux-mm@kvack.org>, "Jérôme Glisse" <jglisse@redhat.com>, rppt@linux.vnet.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Date: Mon, 29 Oct 2018 12:59:11 -0700 [thread overview] Message-ID: <3281f3044fa231bbc1b02d5c5efca3502a0d05a8.camel@linux.intel.com> (raw) In-Reply-To: <20181029181827.GO32673@dhcp22.suse.cz> On Mon, 2018-10-29 at 19:18 +0100, Michal Hocko wrote: > On Mon 29-10-18 10:42:33, Alexander Duyck wrote: > > On Mon, 2018-10-29 at 18:24 +0100, Michal Hocko wrote: > > > On Mon 29-10-18 10:01:28, Alexander Duyck wrote: > > [...] > > > > So there end up being a few different issues with constructors. First > > > > in my mind is that it means we have to initialize the region of memory > > > > and cannot assume what the constructors are going to do for us. As a > > > > result we will have to initialize the LRU pointers, and then overwrite > > > > them with the pgmap and hmm_data. > > > > > > Why we would do that? What does really prevent you from making a fully > > > customized constructor? > > > > It is more an argument of complexity. Do I just pass a single pointer > > and write that value, or the LRU values in init, or do I have to pass a > > function pointer, some abstracted data, and then call said function > > pointer while passing the page and the abstracted data? > > I though you have said that pgmap is the current common denominator for > zone device users. I really do not see what is the problem to do > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 89d2a2ab3fe6..9105a4ed2c96 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5516,7 +5516,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > > not_early: > page = pfn_to_page(pfn); > - __init_single_page(page, pfn, zone, nid); > + if (pgmap && pgmap->init_page) > + pgmap->init_page(page, pfn, zone, nid, pgmap); > + else > + __init_single_page(page, pfn, zone, nid); > if (context == MEMMAP_HOTPLUG) > SetPageReserved(page); > > that would require to replace altmap throughout the call chain and > replace it by pgmap. Altmap could be then renamed to something more > clear So as I had pointed out earlier doing per-page init is much slower than initializing pages in bulk. Ideally I would like to see us seperate the memmap_init_zone function into two pieces, one section for handling hotplug and another for the everything else case. As is the fact that you have to jump over a bunch of tests for the "not_early" case is quite ugly in my opinion. I could probably take your patch and test it. I'm suspecting this is going to be a signficant slow-down in general as the indirect function pointer stuff is probably going to come into play. The "init_page" function in this case is going to end up being much more complex then it really needs to be in this design as well since I have to get the altmap and figure out if the page was used for vmmemmap storage or is an actual DAX page. I might just see if I could add an additional test for the pfn being before the end of the vmmemmap in the case of pgmap being present. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 89d2a2ab3fe6..048e4cc72fdf 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5474,8 +5474,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > * Honor reservation requested by the driver for this ZONE_DEVICE > * memory > */ > - if (altmap && start_pfn == altmap->base_pfn) > - start_pfn += altmap->reserve; > + if (pgmap && pgmap->get_memmap) > + start_pfn = pgmap->get_memmap(pgmap, start_pfn); > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > /* > > [...] The only reason why I hadn't bothered with these bits is that I was actually trying to leave this generic since I thought I had seen other discussions about hotplug scenerios where memory may want to change where the vmmemmap is initialized other than just the case of ZONE_DEVICE pages. So I was thinking at some point we may see altmap without the pgmap. > > If I have to implement the code to verify the slowdown I will, but I > > really feel like it is just going to be time wasted since we have seen > > this in other spots within the kernel. > > Please try to understand that I am not trying to force you write some > artificial benchmarks. All I really do care about is that we have sane > interfaces with reasonable performance. Especially for one-off things > in relattively slow paths. I fully recognize that ZONE_DEVICE begs for a > better integration but really, try to go incremental and try to unify > the code first and microptimize on top. Is that way too much to ask for? No, but the patches I had already proposed I thought were heading in that direction. I had unified memmap_init_zone, memmap_init_zone_device, and the deferred page initialization onto a small set of functions and all had improved performance as a result. > Anyway we have gone into details while the primary problem here was that > the hotplug lock doesn't scale AFAIR. And my question was why cannot we > pull move_pfn_range_to_zone and what has to be done to achieve that. > That is a fundamental thing to address first. Then you can microptimize > on top. Yes, the hotplug lock was part of the original issue. However that starts to drift into the area I believe Oscar was working on as a part of his patch set in encapsulating the move_pfn_range_to_zone and other calls that were contained in the hotplug lock into their own functions. Most of the changes I have in my follow-on patch set can work regardless of how we deal with the lock issue. I just feel like what you are pushing for is going to be a massive patch set by the time we are done and I really need to be able to work this a piece at a time. The patches Andrew pushed addressed the immediate issue so that now systems with nvdimm/DAX memory can at least initialize quick enough that systemd doesn't refuse to mount the root file system due to a timeout. The next patch set I have refactors things to reduce code and allow us to reuse some of the hotplug code for the deferred page init, https://lore.kernel.org/lkml/20181017235043.17213.92459.stgit@localhost.localdomain/ . After that I was planning to work on dealing with the PageReserved flag and trying to get that sorted out. I was hoping to wait until after Dan's HMM patches and Oscar's changes had been sorted before I get into any further refactor of this specific code. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Duyck <alexander.h.duyck@linux.intel.com> To: Michal Hocko <mhocko@kernel.org> Cc: "Dan Williams" <dan.j.williams@intel.com>, "Linux MM" <linux-mm@kvack.org>, "Andrew Morton" <akpm@linux-foundation.org>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, "Pasha Tatashin" <pavel.tatashin@microsoft.com>, "Dave Hansen" <dave.hansen@intel.com>, "Jérôme Glisse" <jglisse@redhat.com>, rppt@linux.vnet.ibm.com, "Ingo Molnar" <mingo@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, yi.z.zhang@linux.intel.com, osalvador@techadventures.net Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Date: Mon, 29 Oct 2018 12:59:11 -0700 [thread overview] Message-ID: <3281f3044fa231bbc1b02d5c5efca3502a0d05a8.camel@linux.intel.com> (raw) In-Reply-To: <20181029181827.GO32673@dhcp22.suse.cz> On Mon, 2018-10-29 at 19:18 +0100, Michal Hocko wrote: > On Mon 29-10-18 10:42:33, Alexander Duyck wrote: > > On Mon, 2018-10-29 at 18:24 +0100, Michal Hocko wrote: > > > On Mon 29-10-18 10:01:28, Alexander Duyck wrote: > > [...] > > > > So there end up being a few different issues with constructors. First > > > > in my mind is that it means we have to initialize the region of memory > > > > and cannot assume what the constructors are going to do for us. As a > > > > result we will have to initialize the LRU pointers, and then overwrite > > > > them with the pgmap and hmm_data. > > > > > > Why we would do that? What does really prevent you from making a fully > > > customized constructor? > > > > It is more an argument of complexity. Do I just pass a single pointer > > and write that value, or the LRU values in init, or do I have to pass a > > function pointer, some abstracted data, and then call said function > > pointer while passing the page and the abstracted data? > > I though you have said that pgmap is the current common denominator for > zone device users. I really do not see what is the problem to do > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 89d2a2ab3fe6..9105a4ed2c96 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5516,7 +5516,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > > not_early: > page = pfn_to_page(pfn); > - __init_single_page(page, pfn, zone, nid); > + if (pgmap && pgmap->init_page) > + pgmap->init_page(page, pfn, zone, nid, pgmap); > + else > + __init_single_page(page, pfn, zone, nid); > if (context == MEMMAP_HOTPLUG) > SetPageReserved(page); > > that would require to replace altmap throughout the call chain and > replace it by pgmap. Altmap could be then renamed to something more > clear So as I had pointed out earlier doing per-page init is much slower than initializing pages in bulk. Ideally I would like to see us seperate the memmap_init_zone function into two pieces, one section for handling hotplug and another for the everything else case. As is the fact that you have to jump over a bunch of tests for the "not_early" case is quite ugly in my opinion. I could probably take your patch and test it. I'm suspecting this is going to be a signficant slow-down in general as the indirect function pointer stuff is probably going to come into play. The "init_page" function in this case is going to end up being much more complex then it really needs to be in this design as well since I have to get the altmap and figure out if the page was used for vmmemmap storage or is an actual DAX page. I might just see if I could add an additional test for the pfn being before the end of the vmmemmap in the case of pgmap being present. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 89d2a2ab3fe6..048e4cc72fdf 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5474,8 +5474,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > * Honor reservation requested by the driver for this ZONE_DEVICE > * memory > */ > - if (altmap && start_pfn == altmap->base_pfn) > - start_pfn += altmap->reserve; > + if (pgmap && pgmap->get_memmap) > + start_pfn = pgmap->get_memmap(pgmap, start_pfn); > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > /* > > [...] The only reason why I hadn't bothered with these bits is that I was actually trying to leave this generic since I thought I had seen other discussions about hotplug scenerios where memory may want to change where the vmmemmap is initialized other than just the case of ZONE_DEVICE pages. So I was thinking at some point we may see altmap without the pgmap. > > If I have to implement the code to verify the slowdown I will, but I > > really feel like it is just going to be time wasted since we have seen > > this in other spots within the kernel. > > Please try to understand that I am not trying to force you write some > artificial benchmarks. All I really do care about is that we have sane > interfaces with reasonable performance. Especially for one-off things > in relattively slow paths. I fully recognize that ZONE_DEVICE begs for a > better integration but really, try to go incremental and try to unify > the code first and microptimize on top. Is that way too much to ask for? No, but the patches I had already proposed I thought were heading in that direction. I had unified memmap_init_zone, memmap_init_zone_device, and the deferred page initialization onto a small set of functions and all had improved performance as a result. > Anyway we have gone into details while the primary problem here was that > the hotplug lock doesn't scale AFAIR. And my question was why cannot we > pull move_pfn_range_to_zone and what has to be done to achieve that. > That is a fundamental thing to address first. Then you can microptimize > on top. Yes, the hotplug lock was part of the original issue. However that starts to drift into the area I believe Oscar was working on as a part of his patch set in encapsulating the move_pfn_range_to_zone and other calls that were contained in the hotplug lock into their own functions. Most of the changes I have in my follow-on patch set can work regardless of how we deal with the lock issue. I just feel like what you are pushing for is going to be a massive patch set by the time we are done and I really need to be able to work this a piece at a time. The patches Andrew pushed addressed the immediate issue so that now systems with nvdimm/DAX memory can at least initialize quick enough that systemd doesn't refuse to mount the root file system due to a timeout. The next patch set I have refactors things to reduce code and allow us to reuse some of the hotplug code for the deferred page init, https://lore.kernel.org/lkml/20181017235043.17213.92459.stgit@localhost.localdomain/ . After that I was planning to work on dealing with the PageReserved flag and trying to get that sorted out. I was hoping to wait until after Dan's HMM patches and Oscar's changes had been sorted before I get into any further refactor of this specific code.
next prev parent reply other threads:[~2018-10-29 19:59 UTC|newest] Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-09-25 20:18 [PATCH v5 0/4] Address issues slowing persistent memory initialization Alexander Duyck 2018-09-25 20:18 ` Alexander Duyck 2018-09-25 20:19 ` [PATCH v5 1/4] mm: Remove now defunct NO_BOOTMEM from depends list for deferred init Alexander Duyck 2018-09-25 20:19 ` Alexander Duyck 2018-09-25 21:05 ` Mike Rapoport 2018-09-25 21:05 ` Mike Rapoport 2018-09-25 20:20 ` [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning Alexander Duyck 2018-09-25 20:20 ` Alexander Duyck 2018-09-25 20:26 ` Dave Hansen 2018-09-25 20:26 ` Dave Hansen 2018-09-25 20:38 ` Alexander Duyck 2018-09-25 20:38 ` Alexander Duyck 2018-09-25 22:14 ` Dave Hansen 2018-09-25 22:14 ` Dave Hansen 2018-09-25 22:14 ` Dave Hansen 2018-09-25 22:27 ` Alexander Duyck 2018-09-25 22:27 ` Alexander Duyck 2018-09-25 22:27 ` Alexander Duyck 2018-09-26 7:38 ` Michal Hocko 2018-09-26 7:38 ` Michal Hocko 2018-09-26 15:24 ` Alexander Duyck 2018-09-26 15:39 ` Michal Hocko 2018-09-26 15:39 ` Michal Hocko 2018-09-26 15:41 ` Dave Hansen 2018-09-26 15:41 ` Dave Hansen 2018-09-26 16:18 ` Alexander Duyck 2018-09-26 15:36 ` Dave Hansen 2018-09-26 22:36 ` Andrew Morton 2018-09-26 22:36 ` Andrew Morton 2018-09-25 20:20 ` [PATCH v5 3/4] mm: Create non-atomic version of SetPageReserved for init use Alexander Duyck 2018-09-25 20:20 ` Alexander Duyck 2018-09-25 20:21 ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Alexander Duyck 2018-09-25 20:21 ` Alexander Duyck 2018-09-26 7:55 ` Michal Hocko 2018-09-26 18:25 ` Alexander Duyck 2018-09-26 18:25 ` Alexander Duyck 2018-09-26 18:52 ` Dan Williams 2018-09-26 18:52 ` Dan Williams 2018-09-27 11:20 ` Michal Hocko 2018-09-27 11:20 ` Michal Hocko 2018-09-27 11:09 ` Michal Hocko 2018-09-27 11:09 ` Michal Hocko 2018-09-27 12:25 ` Oscar Salvador 2018-09-27 13:13 ` Michal Hocko 2018-09-27 14:50 ` Oscar Salvador 2018-09-27 14:50 ` Oscar Salvador 2018-09-27 14:50 ` Oscar Salvador 2018-09-27 15:41 ` David Hildenbrand 2018-09-27 15:41 ` David Hildenbrand 2018-09-28 8:12 ` Oscar Salvador 2018-09-28 8:12 ` Oscar Salvador 2018-09-28 8:44 ` Oscar Salvador 2018-09-28 8:44 ` Oscar Salvador 2018-09-28 15:50 ` Dan Williams 2018-09-28 15:50 ` Dan Williams 2018-09-27 12:32 ` Oscar Salvador 2018-10-08 21:01 ` Dan Williams 2018-10-08 21:01 ` Dan Williams 2018-10-08 21:38 ` Alexander Duyck 2018-10-08 21:38 ` Alexander Duyck 2018-10-08 22:00 ` Dan Williams 2018-10-08 22:00 ` Dan Williams 2018-10-08 22:00 ` Dan Williams 2018-10-08 22:07 ` Alexander Duyck 2018-10-08 22:07 ` Alexander Duyck 2018-10-08 22:36 ` Alexander Duyck 2018-10-08 22:36 ` Alexander Duyck 2018-10-08 22:59 ` Dan Williams 2018-10-08 23:34 ` [mm PATCH] memremap: Fix reference count for pgmap in devm_memremap_pages Alexander Duyck 2018-10-08 23:34 ` Alexander Duyck 2018-10-09 0:20 ` Dan Williams 2018-10-09 0:20 ` Dan Williams 2018-10-09 17:00 ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Yi Zhang 2018-10-09 17:00 ` Yi Zhang 2018-10-09 18:04 ` Dan Williams 2018-10-09 18:04 ` Dan Williams 2018-10-09 20:26 ` Alexander Duyck 2018-10-09 20:26 ` Alexander Duyck 2018-10-09 21:19 ` Dan Williams 2018-10-09 21:19 ` Dan Williams 2018-10-10 12:52 ` Yi Zhang 2018-10-10 12:52 ` Yi Zhang 2018-10-10 15:27 ` Alexander Duyck 2018-10-10 15:27 ` Alexander Duyck 2018-10-11 8:17 ` Yi Zhang 2018-10-11 8:17 ` Yi Zhang 2018-10-10 9:58 ` Michal Hocko 2018-10-10 16:39 ` Alexander Duyck 2018-10-10 16:39 ` Alexander Duyck 2018-10-10 17:24 ` Michal Hocko 2018-10-10 17:24 ` Michal Hocko 2018-10-10 17:39 ` Alexander Duyck 2018-10-10 17:39 ` Alexander Duyck 2018-10-10 17:53 ` Michal Hocko 2018-10-10 17:53 ` Michal Hocko 2018-10-10 18:13 ` Alexander Duyck 2018-10-10 18:13 ` Alexander Duyck 2018-10-10 18:52 ` Michal Hocko 2018-10-10 18:52 ` Michal Hocko 2018-10-11 8:55 ` Michal Hocko 2018-10-11 8:55 ` Michal Hocko 2018-10-11 17:38 ` Alexander Duyck 2018-10-11 18:22 ` Dan Williams 2018-10-11 18:22 ` Dan Williams 2018-10-17 7:52 ` Michal Hocko 2018-10-17 7:52 ` Michal Hocko 2018-10-17 15:02 ` Alexander Duyck 2018-10-17 15:02 ` Alexander Duyck 2018-10-29 14:12 ` Michal Hocko 2018-10-29 14:12 ` Michal Hocko 2018-10-29 15:59 ` Alexander Duyck 2018-10-29 15:59 ` Alexander Duyck 2018-10-29 15:59 ` Alexander Duyck 2018-10-29 16:35 ` Michal Hocko 2018-10-29 16:35 ` Michal Hocko 2018-10-29 17:01 ` Alexander Duyck 2018-10-29 17:24 ` Michal Hocko 2018-10-29 17:24 ` Michal Hocko 2018-10-29 17:34 ` Dan Williams 2018-10-29 17:34 ` Dan Williams 2018-10-29 17:45 ` Michal Hocko 2018-10-29 17:45 ` Michal Hocko 2018-10-29 17:42 ` Alexander Duyck 2018-10-29 17:42 ` Alexander Duyck 2018-10-29 18:18 ` Michal Hocko 2018-10-29 18:18 ` Michal Hocko 2018-10-29 19:59 ` Alexander Duyck [this message] 2018-10-29 19:59 ` Alexander Duyck 2018-10-30 6:29 ` Michal Hocko 2018-10-30 6:29 ` Michal Hocko 2018-10-30 6:55 ` Dan Williams 2018-10-30 8:17 ` Michal Hocko 2018-10-30 8:17 ` Michal Hocko 2018-10-30 15:57 ` Dan Williams 2018-10-30 8:05 ` Oscar Salvador 2018-10-29 15:49 ` Dan Williams 2018-10-29 15:49 ` Dan Williams 2018-10-29 15:56 ` Michal Hocko 2018-10-10 18:18 ` Dan Williams 2018-10-10 18:18 ` Dan Williams 2018-10-11 8:39 ` Yi Zhang 2018-10-11 8:39 ` Yi Zhang 2018-10-11 15:38 ` Alexander Duyck 2018-10-11 15:38 ` Alexander Duyck
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3281f3044fa231bbc1b02d5c5efca3502a0d05a8.camel@linux.intel.com \ --to=alexander.h.duyck@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@intel.com \ --cc=jglisse@redhat.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mhocko@kernel.org \ --cc=mingo@kernel.org \ --cc=osalvador@techadventures.net \ --cc=pavel.tatashin@microsoft.com \ --cc=rppt@linux.vnet.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.