All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: "Pasha Tatashin" <pavel.tatashin@microsoft.com>,
	osalvador@techadventures.net,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	rppt@linux.vnet.ibm.com,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Tue, 30 Oct 2018 07:29:15 +0100	[thread overview]
Message-ID: <20181030062915.GT32673@dhcp22.suse.cz> (raw)
In-Reply-To: <3281f3044fa231bbc1b02d5c5efca3502a0d05a8.camel@linux.intel.com>

On Mon 29-10-18 12:59:11, Alexander Duyck wrote:
> On Mon, 2018-10-29 at 19:18 +0100, Michal Hocko wrote:
[...]

I will try to get to your other points later.

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 89d2a2ab3fe6..048e4cc72fdf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5474,8 +5474,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  	 * Honor reservation requested by the driver for this ZONE_DEVICE
> >  	 * memory
> >  	 */
> > -	if (altmap && start_pfn == altmap->base_pfn)
> > -		start_pfn += altmap->reserve;
> > +	if (pgmap && pgmap->get_memmap)
> > +		start_pfn = pgmap->get_memmap(pgmap, start_pfn);
> >  
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >  		/*
> > 
> > [...]
> 
> The only reason why I hadn't bothered with these bits is that I was
> actually trying to leave this generic since I thought I had seen other
> discussions about hotplug scenerios where memory may want to change
> where the vmmemmap is initialized other than just the case of
> ZONE_DEVICE pages. So I was thinking at some point we may see altmap
> without the pgmap.

I wanted to abuse altmap to allocate struct pages from the physical
range to be added. In that case I would abstract the
allocation/initialization part of pgmap into a more abstract type.
Something trivially to be done without affecting end users of the
hotplug API.

[...]
> > Anyway we have gone into details while the primary problem here was that
> > the hotplug lock doesn't scale AFAIR. And my question was why cannot we
> > pull move_pfn_range_to_zone and what has to be done to achieve that.
> > That is a fundamental thing to address first. Then you can microptimize
> > on top.
> 
> Yes, the hotplug lock was part of the original issue. However that
> starts to drift into the area I believe Oscar was working on as a part
> of his patch set in encapsulating the move_pfn_range_to_zone and other
> calls that were contained in the hotplug lock into their own functions.

Well, I would really love to keep the external API as simple as
possible. That means that we need arch_add_memory/add_pages and 
move_pfn_range_to_zone to associate pages with a zone. The hotplug lock
should be preferably hidden from callers of those two and ideally it
shouldn't be a global lock. We should be good with a range lock.
 
> The patches Andrew pushed addressed the immediate issue so that now
> systems with nvdimm/DAX memory can at least initialize quick enough
> that systemd doesn't refuse to mount the root file system due to a
> timeout.

This is about the first time you actually mention that. I have re-read
the cover letter and all changelogs of patches in this serious. Unless I
have missed something there is nothing about real users hitting issues
out there. nvdimm is still considered a toy because there is no real HW
users can play with.

And hence my complains about half baked solutions rushed in just to fix
a performance regression. I can certainly understand that a pressing
problem might justify to rush things a bit but this should be always
carefuly justified.

> The next patch set I have refactors things to reduce code and
> allow us to reuse some of the hotplug code for the deferred page init, 
> https://lore.kernel.org/lkml/20181017235043.17213.92459.stgit@localhost.localdomain/
> . After that I was planning to work on dealing with the PageReserved
> flag and trying to get that sorted out.
> 
> I was hoping to wait until after Dan's HMM patches and Oscar's changes
> had been sorted before I get into any further refactor of this specific
> code.

Yes there is quite a lot going on here. I would really appreciate if we
all sit and actually try to come up with something robust rather than
hack here and there. I haven't yet seen your follow up series completely
so maybe you are indeed heading the correct direction.

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: "Dan Williams" <dan.j.williams@intel.com>,
	"Linux MM" <linux-mm@kvack.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Pasha Tatashin" <pavel.tatashin@microsoft.com>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	rppt@linux.vnet.ibm.com, "Ingo Molnar" <mingo@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	yi.z.zhang@linux.intel.com, osalvador@techadventures.net
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Tue, 30 Oct 2018 07:29:15 +0100	[thread overview]
Message-ID: <20181030062915.GT32673@dhcp22.suse.cz> (raw)
In-Reply-To: <3281f3044fa231bbc1b02d5c5efca3502a0d05a8.camel@linux.intel.com>

On Mon 29-10-18 12:59:11, Alexander Duyck wrote:
> On Mon, 2018-10-29 at 19:18 +0100, Michal Hocko wrote:
[...]

I will try to get to your other points later.

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 89d2a2ab3fe6..048e4cc72fdf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5474,8 +5474,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  	 * Honor reservation requested by the driver for this ZONE_DEVICE
> >  	 * memory
> >  	 */
> > -	if (altmap && start_pfn == altmap->base_pfn)
> > -		start_pfn += altmap->reserve;
> > +	if (pgmap && pgmap->get_memmap)
> > +		start_pfn = pgmap->get_memmap(pgmap, start_pfn);
> >  
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >  		/*
> > 
> > [...]
> 
> The only reason why I hadn't bothered with these bits is that I was
> actually trying to leave this generic since I thought I had seen other
> discussions about hotplug scenerios where memory may want to change
> where the vmmemmap is initialized other than just the case of
> ZONE_DEVICE pages. So I was thinking at some point we may see altmap
> without the pgmap.

I wanted to abuse altmap to allocate struct pages from the physical
range to be added. In that case I would abstract the
allocation/initialization part of pgmap into a more abstract type.
Something trivially to be done without affecting end users of the
hotplug API.

[...]
> > Anyway we have gone into details while the primary problem here was that
> > the hotplug lock doesn't scale AFAIR. And my question was why cannot we
> > pull move_pfn_range_to_zone and what has to be done to achieve that.
> > That is a fundamental thing to address first. Then you can microptimize
> > on top.
> 
> Yes, the hotplug lock was part of the original issue. However that
> starts to drift into the area I believe Oscar was working on as a part
> of his patch set in encapsulating the move_pfn_range_to_zone and other
> calls that were contained in the hotplug lock into their own functions.

Well, I would really love to keep the external API as simple as
possible. That means that we need arch_add_memory/add_pages and 
move_pfn_range_to_zone to associate pages with a zone. The hotplug lock
should be preferably hidden from callers of those two and ideally it
shouldn't be a global lock. We should be good with a range lock.
 
> The patches Andrew pushed addressed the immediate issue so that now
> systems with nvdimm/DAX memory can at least initialize quick enough
> that systemd doesn't refuse to mount the root file system due to a
> timeout.

This is about the first time you actually mention that. I have re-read
the cover letter and all changelogs of patches in this serious. Unless I
have missed something there is nothing about real users hitting issues
out there. nvdimm is still considered a toy because there is no real HW
users can play with.

And hence my complains about half baked solutions rushed in just to fix
a performance regression. I can certainly understand that a pressing
problem might justify to rush things a bit but this should be always
carefuly justified.

> The next patch set I have refactors things to reduce code and
> allow us to reuse some of the hotplug code for the deferred page init, 
> https://lore.kernel.org/lkml/20181017235043.17213.92459.stgit@localhost.localdomain/
> . After that I was planning to work on dealing with the PageReserved
> flag and trying to get that sorted out.
> 
> I was hoping to wait until after Dan's HMM patches and Oscar's changes
> had been sorted before I get into any further refactor of this specific
> code.

Yes there is quite a lot going on here. I would really appreciate if we
all sit and actually try to come up with something robust rather than
hack here and there. I haven't yet seen your follow up series completely
so maybe you are indeed heading the correct direction.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-10-30  6:29 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 20:18 [PATCH v5 0/4] Address issues slowing persistent memory initialization Alexander Duyck
2018-09-25 20:18 ` Alexander Duyck
2018-09-25 20:19 ` [PATCH v5 1/4] mm: Remove now defunct NO_BOOTMEM from depends list for deferred init Alexander Duyck
2018-09-25 20:19   ` Alexander Duyck
2018-09-25 21:05   ` Mike Rapoport
2018-09-25 21:05     ` Mike Rapoport
2018-09-25 20:20 ` [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:26   ` Dave Hansen
2018-09-25 20:26     ` Dave Hansen
2018-09-25 20:38     ` Alexander Duyck
2018-09-25 20:38       ` Alexander Duyck
2018-09-25 22:14       ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:27         ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-26  7:38   ` Michal Hocko
2018-09-26  7:38     ` Michal Hocko
2018-09-26 15:24     ` Alexander Duyck
2018-09-26 15:39       ` Michal Hocko
2018-09-26 15:39         ` Michal Hocko
2018-09-26 15:41       ` Dave Hansen
2018-09-26 15:41         ` Dave Hansen
2018-09-26 16:18         ` Alexander Duyck
2018-09-26 15:36     ` Dave Hansen
2018-09-26 22:36       ` Andrew Morton
2018-09-26 22:36         ` Andrew Morton
2018-09-25 20:20 ` [PATCH v5 3/4] mm: Create non-atomic version of SetPageReserved for init use Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:21 ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Alexander Duyck
2018-09-25 20:21   ` Alexander Duyck
2018-09-26  7:55   ` Michal Hocko
2018-09-26 18:25     ` Alexander Duyck
2018-09-26 18:25       ` Alexander Duyck
2018-09-26 18:52       ` Dan Williams
2018-09-26 18:52         ` Dan Williams
2018-09-27 11:20         ` Michal Hocko
2018-09-27 11:20           ` Michal Hocko
2018-09-27 11:09       ` Michal Hocko
2018-09-27 11:09         ` Michal Hocko
2018-09-27 12:25         ` Oscar Salvador
2018-09-27 13:13           ` Michal Hocko
2018-09-27 14:50             ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 15:41               ` David Hildenbrand
2018-09-27 15:41                 ` David Hildenbrand
2018-09-28  8:12             ` Oscar Salvador
2018-09-28  8:12               ` Oscar Salvador
2018-09-28  8:44               ` Oscar Salvador
2018-09-28  8:44                 ` Oscar Salvador
2018-09-28 15:50                 ` Dan Williams
2018-09-28 15:50                   ` Dan Williams
2018-09-27 12:32       ` Oscar Salvador
2018-10-08 21:01   ` Dan Williams
2018-10-08 21:01     ` Dan Williams
2018-10-08 21:38     ` Alexander Duyck
2018-10-08 21:38       ` Alexander Duyck
2018-10-08 22:00       ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:07         ` Alexander Duyck
2018-10-08 22:07           ` Alexander Duyck
2018-10-08 22:36         ` Alexander Duyck
2018-10-08 22:36           ` Alexander Duyck
2018-10-08 22:59           ` Dan Williams
2018-10-08 23:34     ` [mm PATCH] memremap: Fix reference count for pgmap in devm_memremap_pages Alexander Duyck
2018-10-08 23:34       ` Alexander Duyck
2018-10-09  0:20       ` Dan Williams
2018-10-09  0:20         ` Dan Williams
2018-10-09 17:00   ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Yi Zhang
2018-10-09 17:00     ` Yi Zhang
2018-10-09 18:04     ` Dan Williams
2018-10-09 18:04       ` Dan Williams
2018-10-09 20:26       ` Alexander Duyck
2018-10-09 20:26         ` Alexander Duyck
2018-10-09 21:19         ` Dan Williams
2018-10-09 21:19           ` Dan Williams
2018-10-10 12:52           ` Yi Zhang
2018-10-10 12:52             ` Yi Zhang
2018-10-10 15:27             ` Alexander Duyck
2018-10-10 15:27               ` Alexander Duyck
2018-10-11  8:17               ` Yi Zhang
2018-10-11  8:17                 ` Yi Zhang
2018-10-10  9:58         ` Michal Hocko
2018-10-10 16:39           ` Alexander Duyck
2018-10-10 16:39             ` Alexander Duyck
2018-10-10 17:24             ` Michal Hocko
2018-10-10 17:24               ` Michal Hocko
2018-10-10 17:39               ` Alexander Duyck
2018-10-10 17:39                 ` Alexander Duyck
2018-10-10 17:53                 ` Michal Hocko
2018-10-10 17:53                   ` Michal Hocko
2018-10-10 18:13                   ` Alexander Duyck
2018-10-10 18:13                     ` Alexander Duyck
2018-10-10 18:52                 ` Michal Hocko
2018-10-10 18:52                   ` Michal Hocko
2018-10-11  8:55                   ` Michal Hocko
2018-10-11  8:55                     ` Michal Hocko
2018-10-11 17:38                     ` Alexander Duyck
2018-10-11 18:22                       ` Dan Williams
2018-10-11 18:22                         ` Dan Williams
2018-10-17  7:52                       ` Michal Hocko
2018-10-17  7:52                         ` Michal Hocko
2018-10-17 15:02                         ` Alexander Duyck
2018-10-17 15:02                           ` Alexander Duyck
2018-10-29 14:12                           ` Michal Hocko
2018-10-29 14:12                             ` Michal Hocko
2018-10-29 15:59                             ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 16:35                               ` Michal Hocko
2018-10-29 16:35                                 ` Michal Hocko
2018-10-29 17:01                                 ` Alexander Duyck
2018-10-29 17:24                                   ` Michal Hocko
2018-10-29 17:24                                     ` Michal Hocko
2018-10-29 17:34                                     ` Dan Williams
2018-10-29 17:34                                       ` Dan Williams
2018-10-29 17:45                                       ` Michal Hocko
2018-10-29 17:45                                         ` Michal Hocko
2018-10-29 17:42                                     ` Alexander Duyck
2018-10-29 17:42                                       ` Alexander Duyck
2018-10-29 18:18                                       ` Michal Hocko
2018-10-29 18:18                                         ` Michal Hocko
2018-10-29 19:59                                         ` Alexander Duyck
2018-10-29 19:59                                           ` Alexander Duyck
2018-10-30  6:29                                           ` Michal Hocko [this message]
2018-10-30  6:29                                             ` Michal Hocko
2018-10-30  6:55                                             ` Dan Williams
2018-10-30  8:17                                               ` Michal Hocko
2018-10-30  8:17                                                 ` Michal Hocko
2018-10-30 15:57                                                 ` Dan Williams
2018-10-30  8:05                                           ` Oscar Salvador
2018-10-29 15:49                           ` Dan Williams
2018-10-29 15:49                             ` Dan Williams
2018-10-29 15:56                             ` Michal Hocko
2018-10-10 18:18               ` Dan Williams
2018-10-10 18:18                 ` Dan Williams
2018-10-11  8:39                 ` Yi Zhang
2018-10-11  8:39                   ` Yi Zhang
2018-10-11 15:38                   ` Alexander Duyck
2018-10-11 15:38                     ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181030062915.GT32673@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=osalvador@techadventures.net \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.