All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@techadventures.net>
To: Michal Hocko <mhocko@kernel.org>
Cc: pavel.tatashin@microsoft.com, linux-nvdimm@lists.01.org,
	dave.hansen@intel.com, linux-kernel@vger.kernel.org,
	mingo@kernel.org, linux-mm@kvack.org, jglisse@redhat.com,
	rppt@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	akpm@linux-foundation.org
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Fri, 28 Sep 2018 10:12:24 +0200	[thread overview]
Message-ID: <20180928081224.GA25561@techadventures.net> (raw)
In-Reply-To: <20180927131329.GI6278@dhcp22.suse.cz>

On Thu, Sep 27, 2018 at 03:13:29PM +0200, Michal Hocko wrote:
> I would have to double check but is the hotplug lock really serializing
> access to the state initialized by init_currently_empty_zone? E.g.
> zone_start_pfn is a nice example of a state that is used outside of the
> lock. zone's free lists are similar. So do we really need the hoptlug
> lock? And more broadly, what does the hotplug lock is supposed to
> serialize in general. A proper documentation would surely help to answer
> these questions. There is way too much of "do not touch this code and
> just make my particular hack" mindset which made the whole memory
> hotplug a giant pile of mess. We really should start with some proper
> engineering here finally.

* Locking rules:
*
* zone_start_pfn and spanned_pages are protected by span_seqlock.
* It is a seqlock because it has to be read outside of zone->lock,
* and it is done in the main allocator path.  But, it is written
* quite infrequently.
*
* Write access to present_pages at runtime should be protected by
* mem_hotplug_begin/end(). Any reader who can't tolerant drift of
* present_pages should get_online_mems() to get a stable value.

IIUC, looks like zone_start_pfn should be envolved with
zone_span_writelock/zone_span_writeunlock, and since zone_start_pfn is changed
in init_currently_empty_zone, I guess that the whole function should be within
that lock.

So, a blind shot, but could we do something like the following?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 898e1f816821..49f87252f1b1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
        int nid = pgdat->node_id;
        unsigned long flags;
 
-       if (zone_is_empty(zone))
-               init_currently_empty_zone(zone, start_pfn, nr_pages);
-
        clear_zone_contiguous(zone);
 
        /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
        pgdat_resize_lock(pgdat, &flags);
        zone_span_writelock(zone);
+       if (zone_is_empty(zone))
+               init_currently_empty_zone(zone, start_pfn, nr_pages);
        resize_zone_range(zone, start_pfn, nr_pages);
        zone_span_writeunlock(zone);
        resize_pgdat_range(pgdat, start_pfn, nr_pages);

Then, we could take move_pfn_range_to_zone out of the hotplug lock.

Although I am not sure about leaving memmap_init_zone unprotected.
For the normal memory, that is not a problem since the memblock's lock
protects us from touching the same pages at the same time in online/offline_pages,
but for HMM/devm the story is different.

I am totally unaware of HMM/devm, so I am not sure if its protected somehow.
e.g: what happens if devm_memremap_pages and devm_memremap_pages_release are running
at the same time for the same memory-range (with the assumption that the hotplug-lock
does not protect move_pfn_range_to_zone anymore).

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@techadventures.net>
To: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
	pavel.tatashin@microsoft.com, dave.jiang@intel.com,
	dave.hansen@intel.com, jglisse@redhat.com,
	rppt@linux.vnet.ibm.com, dan.j.williams@intel.com,
	logang@deltatee.com, mingo@kernel.org,
	kirill.shutemov@linux.intel.com
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
Date: Fri, 28 Sep 2018 10:12:24 +0200	[thread overview]
Message-ID: <20180928081224.GA25561@techadventures.net> (raw)
In-Reply-To: <20180927131329.GI6278@dhcp22.suse.cz>

On Thu, Sep 27, 2018 at 03:13:29PM +0200, Michal Hocko wrote:
> I would have to double check but is the hotplug lock really serializing
> access to the state initialized by init_currently_empty_zone? E.g.
> zone_start_pfn is a nice example of a state that is used outside of the
> lock. zone's free lists are similar. So do we really need the hoptlug
> lock? And more broadly, what does the hotplug lock is supposed to
> serialize in general. A proper documentation would surely help to answer
> these questions. There is way too much of "do not touch this code and
> just make my particular hack" mindset which made the whole memory
> hotplug a giant pile of mess. We really should start with some proper
> engineering here finally.

* Locking rules:
*
* zone_start_pfn and spanned_pages are protected by span_seqlock.
* It is a seqlock because it has to be read outside of zone->lock,
* and it is done in the main allocator path.  But, it is written
* quite infrequently.
*
* Write access to present_pages at runtime should be protected by
* mem_hotplug_begin/end(). Any reader who can't tolerant drift of
* present_pages should get_online_mems() to get a stable value.

IIUC, looks like zone_start_pfn should be envolved with
zone_span_writelock/zone_span_writeunlock, and since zone_start_pfn is changed
in init_currently_empty_zone, I guess that the whole function should be within
that lock.

So, a blind shot, but could we do something like the following?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 898e1f816821..49f87252f1b1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
        int nid = pgdat->node_id;
        unsigned long flags;
 
-       if (zone_is_empty(zone))
-               init_currently_empty_zone(zone, start_pfn, nr_pages);
-
        clear_zone_contiguous(zone);
 
        /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
        pgdat_resize_lock(pgdat, &flags);
        zone_span_writelock(zone);
+       if (zone_is_empty(zone))
+               init_currently_empty_zone(zone, start_pfn, nr_pages);
        resize_zone_range(zone, start_pfn, nr_pages);
        zone_span_writeunlock(zone);
        resize_pgdat_range(pgdat, start_pfn, nr_pages);

Then, we could take move_pfn_range_to_zone out of the hotplug lock.

Although I am not sure about leaving memmap_init_zone unprotected.
For the normal memory, that is not a problem since the memblock's lock
protects us from touching the same pages at the same time in online/offline_pages,
but for HMM/devm the story is different.

I am totally unaware of HMM/devm, so I am not sure if its protected somehow.
e.g: what happens if devm_memremap_pages and devm_memremap_pages_release are running
at the same time for the same memory-range (with the assumption that the hotplug-lock
does not protect move_pfn_range_to_zone anymore).

-- 
Oscar Salvador
SUSE L3

  parent reply	other threads:[~2018-09-28  8:12 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 20:18 [PATCH v5 0/4] Address issues slowing persistent memory initialization Alexander Duyck
2018-09-25 20:18 ` Alexander Duyck
2018-09-25 20:19 ` [PATCH v5 1/4] mm: Remove now defunct NO_BOOTMEM from depends list for deferred init Alexander Duyck
2018-09-25 20:19   ` Alexander Duyck
2018-09-25 21:05   ` Mike Rapoport
2018-09-25 21:05     ` Mike Rapoport
2018-09-25 20:20 ` [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:26   ` Dave Hansen
2018-09-25 20:26     ` Dave Hansen
2018-09-25 20:38     ` Alexander Duyck
2018-09-25 20:38       ` Alexander Duyck
2018-09-25 22:14       ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:14         ` Dave Hansen
2018-09-25 22:27         ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-25 22:27           ` Alexander Duyck
2018-09-26  7:38   ` Michal Hocko
2018-09-26  7:38     ` Michal Hocko
2018-09-26 15:24     ` Alexander Duyck
2018-09-26 15:39       ` Michal Hocko
2018-09-26 15:39         ` Michal Hocko
2018-09-26 15:41       ` Dave Hansen
2018-09-26 15:41         ` Dave Hansen
2018-09-26 16:18         ` Alexander Duyck
2018-09-26 15:36     ` Dave Hansen
2018-09-26 22:36       ` Andrew Morton
2018-09-26 22:36         ` Andrew Morton
2018-09-25 20:20 ` [PATCH v5 3/4] mm: Create non-atomic version of SetPageReserved for init use Alexander Duyck
2018-09-25 20:20   ` Alexander Duyck
2018-09-25 20:21 ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Alexander Duyck
2018-09-25 20:21   ` Alexander Duyck
2018-09-26  7:55   ` Michal Hocko
2018-09-26 18:25     ` Alexander Duyck
2018-09-26 18:25       ` Alexander Duyck
2018-09-26 18:52       ` Dan Williams
2018-09-26 18:52         ` Dan Williams
2018-09-27 11:20         ` Michal Hocko
2018-09-27 11:20           ` Michal Hocko
2018-09-27 11:09       ` Michal Hocko
2018-09-27 11:09         ` Michal Hocko
2018-09-27 12:25         ` Oscar Salvador
2018-09-27 13:13           ` Michal Hocko
2018-09-27 14:50             ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 14:50               ` Oscar Salvador
2018-09-27 15:41               ` David Hildenbrand
2018-09-27 15:41                 ` David Hildenbrand
2018-09-28  8:12             ` Oscar Salvador [this message]
2018-09-28  8:12               ` Oscar Salvador
2018-09-28  8:44               ` Oscar Salvador
2018-09-28  8:44                 ` Oscar Salvador
2018-09-28 15:50                 ` Dan Williams
2018-09-28 15:50                   ` Dan Williams
2018-09-27 12:32       ` Oscar Salvador
2018-10-08 21:01   ` Dan Williams
2018-10-08 21:01     ` Dan Williams
2018-10-08 21:38     ` Alexander Duyck
2018-10-08 21:38       ` Alexander Duyck
2018-10-08 22:00       ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:00         ` Dan Williams
2018-10-08 22:07         ` Alexander Duyck
2018-10-08 22:07           ` Alexander Duyck
2018-10-08 22:36         ` Alexander Duyck
2018-10-08 22:36           ` Alexander Duyck
2018-10-08 22:59           ` Dan Williams
2018-10-08 23:34     ` [mm PATCH] memremap: Fix reference count for pgmap in devm_memremap_pages Alexander Duyck
2018-10-08 23:34       ` Alexander Duyck
2018-10-09  0:20       ` Dan Williams
2018-10-09  0:20         ` Dan Williams
2018-10-09 17:00   ` [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Yi Zhang
2018-10-09 17:00     ` Yi Zhang
2018-10-09 18:04     ` Dan Williams
2018-10-09 18:04       ` Dan Williams
2018-10-09 20:26       ` Alexander Duyck
2018-10-09 20:26         ` Alexander Duyck
2018-10-09 21:19         ` Dan Williams
2018-10-09 21:19           ` Dan Williams
2018-10-10 12:52           ` Yi Zhang
2018-10-10 12:52             ` Yi Zhang
2018-10-10 15:27             ` Alexander Duyck
2018-10-10 15:27               ` Alexander Duyck
2018-10-11  8:17               ` Yi Zhang
2018-10-11  8:17                 ` Yi Zhang
2018-10-10  9:58         ` Michal Hocko
2018-10-10 16:39           ` Alexander Duyck
2018-10-10 16:39             ` Alexander Duyck
2018-10-10 17:24             ` Michal Hocko
2018-10-10 17:24               ` Michal Hocko
2018-10-10 17:39               ` Alexander Duyck
2018-10-10 17:39                 ` Alexander Duyck
2018-10-10 17:53                 ` Michal Hocko
2018-10-10 17:53                   ` Michal Hocko
2018-10-10 18:13                   ` Alexander Duyck
2018-10-10 18:13                     ` Alexander Duyck
2018-10-10 18:52                 ` Michal Hocko
2018-10-10 18:52                   ` Michal Hocko
2018-10-11  8:55                   ` Michal Hocko
2018-10-11  8:55                     ` Michal Hocko
2018-10-11 17:38                     ` Alexander Duyck
2018-10-11 18:22                       ` Dan Williams
2018-10-11 18:22                         ` Dan Williams
2018-10-17  7:52                       ` Michal Hocko
2018-10-17  7:52                         ` Michal Hocko
2018-10-17 15:02                         ` Alexander Duyck
2018-10-17 15:02                           ` Alexander Duyck
2018-10-29 14:12                           ` Michal Hocko
2018-10-29 14:12                             ` Michal Hocko
2018-10-29 15:59                             ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 15:59                               ` Alexander Duyck
2018-10-29 16:35                               ` Michal Hocko
2018-10-29 16:35                                 ` Michal Hocko
2018-10-29 17:01                                 ` Alexander Duyck
2018-10-29 17:24                                   ` Michal Hocko
2018-10-29 17:24                                     ` Michal Hocko
2018-10-29 17:34                                     ` Dan Williams
2018-10-29 17:34                                       ` Dan Williams
2018-10-29 17:45                                       ` Michal Hocko
2018-10-29 17:45                                         ` Michal Hocko
2018-10-29 17:42                                     ` Alexander Duyck
2018-10-29 17:42                                       ` Alexander Duyck
2018-10-29 18:18                                       ` Michal Hocko
2018-10-29 18:18                                         ` Michal Hocko
2018-10-29 19:59                                         ` Alexander Duyck
2018-10-29 19:59                                           ` Alexander Duyck
2018-10-30  6:29                                           ` Michal Hocko
2018-10-30  6:29                                             ` Michal Hocko
2018-10-30  6:55                                             ` Dan Williams
2018-10-30  8:17                                               ` Michal Hocko
2018-10-30  8:17                                                 ` Michal Hocko
2018-10-30 15:57                                                 ` Dan Williams
2018-10-30  8:05                                           ` Oscar Salvador
2018-10-29 15:49                           ` Dan Williams
2018-10-29 15:49                             ` Dan Williams
2018-10-29 15:56                             ` Michal Hocko
2018-10-10 18:18               ` Dan Williams
2018-10-10 18:18                 ` Dan Williams
2018-10-11  8:39                 ` Yi Zhang
2018-10-11  8:39                   ` Yi Zhang
2018-10-11 15:38                   ` Alexander Duyck
2018-10-11 15:38                     ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180928081224.GA25561@techadventures.net \
    --to=osalvador@techadventures.net \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.