All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: alexander.h.duyck@linux.intel.com
Cc: "Pasha Tatashin" <pavel.tatashin@microsoft.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	rppt@linux.vnet.ibm.com,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Wed, 26 Sep 2018 11:52:56 -0700	[thread overview]
Message-ID: <CAPcyv4iVnodai0bB74yeSCD2H+hoLsZYUk4sR9jV0pPAE+Zorw@mail.gmail.com> (raw)
In-Reply-To: <6f87a5d7-05e2-00f4-8568-bb3521869cea@linux.intel.com>

On Wed, Sep 26, 2018 at 11:25 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
>
>
> On 9/26/2018 12:55 AM, Michal Hocko wrote:
> > On Tue 25-09-18 13:21:24, Alexander Duyck wrote:
> >> The ZONE_DEVICE pages were being initialized in two locations. One was with
> >> the memory_hotplug lock held and another was outside of that lock. The
> >> problem with this is that it was nearly doubling the memory initialization
> >> time. Instead of doing this twice, once while holding a global lock and
> >> once without, I am opting to defer the initialization to the one outside of
> >> the lock. This allows us to avoid serializing the overhead for memory init
> >> and we can instead focus on per-node init times.
> >>
> >> One issue I encountered is that devm_memremap_pages and
> >> hmm_devmmem_pages_create were initializing only the pgmap field the same
> >> way. One wasn't initializing hmm_data, and the other was initializing it to
> >> a poison value. Since this is something that is exposed to the driver in
> >> the case of hmm I am opting for a third option and just initializing
> >> hmm_data to 0 since this is going to be exposed to unknown third party
> >> drivers.
> >
> > Why cannot you pull move_pfn_range_to_zone out of the hotplug lock? In
> > other words why are you making zone device even more special in the
> > generic hotplug code when it already has its own means to initialize the
> > pfn range by calling move_pfn_range_to_zone. Not to mention the code
> > duplication.
>
> So there were a few things I wasn't sure we could pull outside of the
> hotplug lock. One specific example is the bits related to resizing the
> pgdat and zone. I wanted to avoid pulling those bits outside of the
> hotplug lock.
>
> The other bit that I left inside the hot-plug lock with this approach
> was the initialization of the pages that contain the vmemmap.
>
> > That being said I really dislike this patch.
>
> In my mind this was a patch that "killed two birds with one stone". I
> had two issues to address, the first one being the fact that we were
> performing the memmap_init_zone while holding the hotplug lock, and the
> other being the loop that was going through and initializing pgmap in
> the hmm and memremap calls essentially added another 20 seconds
> (measured for 3TB of memory per node) to the init time. With this patch
> I was able to cut my init time per node by that 20 seconds, and then
> made it so that we could scale as we added nodes as they could run in
> parallel.

Yeah, at the very least there is no reason for devm_memremap_pages()
to do another loop through all pages, the core should handle this, but
cleaning up the scope of the hotplug lock is needed.

> With that said I am open to suggestions if you still feel like I need to
> follow this up with some additional work. I just want to avoid
> introducing any regressions in regards to functionality or performance.

Could we push the hotplug lock deeper to the places that actually need
it? What I found with my initial investigation is that we don't even
need the hotplug lock for the vmemmap initialization with this patch
[1].

Alternatively it seems the hotplug lock wants to synchronize changes
to the zone and the page init work. If the hotplug lock was an rwsem
the zone changes would be a write lock, but the init work could be
done as a read lock to allow parallelism. I.e. still provide a sync
point to be able to assert that no hotplug work is in-flight will
holding the write lock, but otherwise allow threads that are touching
independent parts of the memmap to run at the same time.

[1]: https://patchwork.kernel.org/patch/10527229/ just focus on the
mm/sparse-vmemmap.c changes at the end.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: alexander.h.duyck@linux.intel.com
Cc: "Michal Hocko" <mhocko@kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Pasha Tatashin" <pavel.tatashin@microsoft.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	rppt@linux.vnet.ibm.com, "Logan Gunthorpe" <logang@deltatee.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Wed, 26 Sep 2018 11:52:56 -0700	[thread overview]
Message-ID: <CAPcyv4iVnodai0bB74yeSCD2H+hoLsZYUk4sR9jV0pPAE+Zorw@mail.gmail.com> (raw)
In-Reply-To: <6f87a5d7-05e2-00f4-8568-bb3521869cea@linux.intel.com>

On Wed, Sep 26, 2018 at 11:25 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
>
>
> On 9/26/2018 12:55 AM, Michal Hocko wrote:
> > On Tue 25-09-18 13:21:24, Alexander Duyck wrote:
> >> The ZONE_DEVICE pages were being initialized in two locations. One was with
> >> the memory_hotplug lock held and another was outside of that lock. The
> >> problem with this is that it was nearly doubling the memory initialization
> >> time. Instead of doing this twice, once while holding a global lock and
> >> once without, I am opting to defer the initialization to the one outside of
> >> the lock. This allows us to avoid serializing the overhead for memory init
> >> and we can instead focus on per-node init times.
> >>
> >> One issue I encountered is that devm_memremap_pages and
> >> hmm_devmmem_pages_create were initializing only the pgmap field the same
> >> way. One wasn't initializing hmm_data, and the other was initializing it to
> >> a poison value. Since this is something that is exposed to the driver in
> >> the case of hmm I am opting for a third option and just initializing
> >> hmm_data to 0 since this is going to be exposed to unknown third party
> >> drivers.
> >
> > Why cannot you pull move_pfn_range_to_zone out of the hotplug lock? In
> > other words why are you making zone device even more special in the
> > generic hotplug code when it already has its own means to initialize the
> > pfn range by calling move_pfn_range_to_zone. Not to mention the code
> > duplication.
>
> So there were a few things I wasn't sure we could pull outside of the
> hotplug lock. One specific example is the bits related to resizing the
> pgdat and zone. I wanted to avoid pulling those bits outside of the
> hotplug lock.
>
> The other bit that I left inside the hot-plug lock with this approach
> was the initialization of the pages that contain the vmemmap.
>
> > That being said I really dislike this patch.
>
> In my mind this was a patch that "killed two birds with one stone". I
> had two issues to address, the first one being the fact that we were
> performing the memmap_init_zone while holding the hotplug lock, and the
> other being the loop that was going through and initializing pgmap in
> the hmm and memremap calls essentially added another 20 seconds
> (measured for 3TB of memory per node) to the init time. With this patch
> I was able to cut my init time per node by that 20 seconds, and then
> made it so that we could scale as we added nodes as they could run in
> parallel.

Yeah, at the very least there is no reason for devm_memremap_pages()
to do another loop through all pages, the core should handle this, but
cleaning up the scope of the hotplug lock is needed.

> With that said I am open to suggestions if you still feel like I need to
> follow this up with some additional work. I just want to avoid
> introducing any regressions in regards to functionality or performance.

Could we push the hotplug lock deeper to the places that actually need
it? What I found with my initial investigation is that we don't even
need the hotplug lock for the vmemmap initialization with this patch
[1].

Alternatively it seems the hotplug lock wants to synchronize changes
to the zone and the page init work. If the hotplug lock was an rwsem
the zone changes would be a write lock, but the init work could be
done as a read lock to allow parallelism. I.e. still provide a sync
point to be able to assert that no hotplug work is in-flight will
holding the write lock, but otherwise allow threads that are touching
independent parts of the memmap to run at the same time.

[1]: https://patchwork.kernel.org/patch/10527229/ just focus on the
mm/sparse-vmemmap.c changes at the end.

  reply	other threads:[~2018-09-26 18:53 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 20:18 [PATCH v5 0/4] Address issues slowing persistent memory initialization Alexander Duyck
2018-09-25 20:18 ` Alexander Duyck
2018-09-25 20:19 ` [PATCH v5 1/4] mm: Remove now defunct NO_BOOTMEM from depends list for deferred init Alexander Duyck
2018-09-25 20:19   ` Alexander Duyck
2018-09-25 21:05   ` Mike Rapoport
2018-09-25 21:05     ` Mike Rapoport
2018-09-25 20:20 ` [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:26   ` Dave Hansen
2018-09-25 20:26     ` Dave Hansen
2018-09-25 20:38     ` Alexander Duyck
2018-09-25 20:38       ` Alexander Duyck
2018-09-25 22:14       ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:27         ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-26  7:38   ` Michal Hocko
2018-09-26  7:38     ` Michal Hocko
2018-09-26 15:24     ` Alexander Duyck
2018-09-26 15:39       ` Michal Hocko
2018-09-26 15:39         ` Michal Hocko
2018-09-26 15:41       ` Dave Hansen
2018-09-26 15:41         ` Dave Hansen
2018-09-26 16:18         ` Alexander Duyck
2018-09-26 15:36     ` Dave Hansen
2018-09-26 22:36       ` Andrew Morton
2018-09-26 22:36         ` Andrew Morton
2018-09-25 20:20 ` [PATCH v5 3/4] mm: Create non-atomic version of SetPageReserved for init use Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:21 ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Alexander Duyck
2018-09-25 20:21   ` Alexander Duyck
2018-09-26  7:55   ` Michal Hocko
2018-09-26 18:25     ` Alexander Duyck
2018-09-26 18:25       ` Alexander Duyck
2018-09-26 18:52       ` Dan Williams [this message]
2018-09-26 18:52         ` Dan Williams
2018-09-27 11:20         ` Michal Hocko
2018-09-27 11:20           ` Michal Hocko
2018-09-27 11:09       ` Michal Hocko
2018-09-27 11:09         ` Michal Hocko
2018-09-27 12:25         ` Oscar Salvador
2018-09-27 13:13           ` Michal Hocko
2018-09-27 14:50             ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 15:41               ` David Hildenbrand
2018-09-27 15:41                 ` David Hildenbrand
2018-09-28  8:12             ` Oscar Salvador
2018-09-28  8:12               ` Oscar Salvador
2018-09-28  8:44               ` Oscar Salvador
2018-09-28  8:44                 ` Oscar Salvador
2018-09-28 15:50                 ` Dan Williams
2018-09-28 15:50                   ` Dan Williams
2018-09-27 12:32       ` Oscar Salvador
2018-10-08 21:01   ` Dan Williams
2018-10-08 21:01     ` Dan Williams
2018-10-08 21:38     ` Alexander Duyck
2018-10-08 21:38       ` Alexander Duyck
2018-10-08 22:00       ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:07         ` Alexander Duyck
2018-10-08 22:07           ` Alexander Duyck
2018-10-08 22:36         ` Alexander Duyck
2018-10-08 22:36           ` Alexander Duyck
2018-10-08 22:59           ` Dan Williams
2018-10-08 23:34     ` [mm PATCH] memremap: Fix reference count for pgmap in devm_memremap_pages Alexander Duyck
2018-10-08 23:34       ` Alexander Duyck
2018-10-09  0:20       ` Dan Williams
2018-10-09  0:20         ` Dan Williams
2018-10-09 17:00   ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Yi Zhang
2018-10-09 17:00     ` Yi Zhang
2018-10-09 18:04     ` Dan Williams
2018-10-09 18:04       ` Dan Williams
2018-10-09 20:26       ` Alexander Duyck
2018-10-09 20:26         ` Alexander Duyck
2018-10-09 21:19         ` Dan Williams
2018-10-09 21:19           ` Dan Williams
2018-10-10 12:52           ` Yi Zhang
2018-10-10 12:52             ` Yi Zhang
2018-10-10 15:27             ` Alexander Duyck
2018-10-10 15:27               ` Alexander Duyck
2018-10-11  8:17               ` Yi Zhang
2018-10-11  8:17                 ` Yi Zhang
2018-10-10  9:58         ` Michal Hocko
2018-10-10 16:39           ` Alexander Duyck
2018-10-10 16:39             ` Alexander Duyck
2018-10-10 17:24             ` Michal Hocko
2018-10-10 17:24               ` Michal Hocko
2018-10-10 17:39               ` Alexander Duyck
2018-10-10 17:39                 ` Alexander Duyck
2018-10-10 17:53                 ` Michal Hocko
2018-10-10 17:53                   ` Michal Hocko
2018-10-10 18:13                   ` Alexander Duyck
2018-10-10 18:13                     ` Alexander Duyck
2018-10-10 18:52                 ` Michal Hocko
2018-10-10 18:52                   ` Michal Hocko
2018-10-11  8:55                   ` Michal Hocko
2018-10-11  8:55                     ` Michal Hocko
2018-10-11 17:38                     ` Alexander Duyck
2018-10-11 18:22                       ` Dan Williams
2018-10-11 18:22                         ` Dan Williams
2018-10-17  7:52                       ` Michal Hocko
2018-10-17  7:52                         ` Michal Hocko
2018-10-17 15:02                         ` Alexander Duyck
2018-10-17 15:02                           ` Alexander Duyck
2018-10-29 14:12                           ` Michal Hocko
2018-10-29 14:12                             ` Michal Hocko
2018-10-29 15:59                             ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 16:35                               ` Michal Hocko
2018-10-29 16:35                                 ` Michal Hocko
2018-10-29 17:01                                 ` Alexander Duyck
2018-10-29 17:24                                   ` Michal Hocko
2018-10-29 17:24                                     ` Michal Hocko
2018-10-29 17:34                                     ` Dan Williams
2018-10-29 17:34                                       ` Dan Williams
2018-10-29 17:45                                       ` Michal Hocko
2018-10-29 17:45                                         ` Michal Hocko
2018-10-29 17:42                                     ` Alexander Duyck
2018-10-29 17:42                                       ` Alexander Duyck
2018-10-29 18:18                                       ` Michal Hocko
2018-10-29 18:18                                         ` Michal Hocko
2018-10-29 19:59                                         ` Alexander Duyck
2018-10-29 19:59                                           ` Alexander Duyck
2018-10-30  6:29                                           ` Michal Hocko
2018-10-30  6:29                                             ` Michal Hocko
2018-10-30  6:55                                             ` Dan Williams
2018-10-30  8:17                                               ` Michal Hocko
2018-10-30  8:17                                                 ` Michal Hocko
2018-10-30 15:57                                                 ` Dan Williams
2018-10-30  8:05                                           ` Oscar Salvador
2018-10-29 15:49                           ` Dan Williams
2018-10-29 15:49                             ` Dan Williams
2018-10-29 15:56                             ` Michal Hocko
2018-10-10 18:18               ` Dan Williams
2018-10-10 18:18                 ` Dan Williams
2018-10-11  8:39                 ` Yi Zhang
2018-10-11  8:39                   ` Yi Zhang
2018-10-11 15:38                   ` Alexander Duyck
2018-10-11 15:38                     ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4iVnodai0bB74yeSCD2H+hoLsZYUk4sR9jV0pPAE+Zorw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.