All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Michal Hocko" <mhocko@suse.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	stable <stable@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Vlastimil Babka" <vbabka@suse.cz>
Subject: Re: [PATCH v5 00/10] mm: Sub-section memory hotplug support
Date: Thu, 28 Mar 2019 22:54:34 +0100	[thread overview]
Message-ID: <b76b3a91-a0b5-460d-df5c-9358e6219915@redhat.com> (raw)
In-Reply-To: <CAPcyv4ivBagzsZ1fCDb2Cr3scz+R8ZVgyie5c=LWNd6QZuw36g@mail.gmail.com>

>>>> Reason I am asking is because I wonder how that would interact with the
>>>> memory block device infrastructure and hotplugging of system ram -
>>>> add_memory()/add_memory_resource(). I *assume* you are not changing the
>>>> add_memory() interface, so that one still only works with whole sections
>>>> (or well, memory_block_size_bytes()) - check_hotplug_memory_range().
>>>
>>> Like you found below, the implementation enforces that add_memory_*()
>>> interfaces maintain section alignment for @start and @size.
>>>
>>>> In general, mix and matching system RAM and persistent memory per
>>>> section, I am not a friend of that.
>>>
>>> You have no choice. The platform may decide to map PMEM and System RAM
>>> in the same section because the Linux section is too large compared to
>>> typical memory controller mapping granularity capability.
>>
>> I might be very wrong here, but do we actually care about something like
>> 64MB getting lost in the cracks? I mean if it simplifies core MM, let go
>> of the couple of MB of system ram and handle the PMEM part only. Treat
>> the system ram parts like memory holes we already have in ordinary
>> sections (well, there we simply set the relevant struct pages to
>> PG_reserved). Of course, if we have hundreds of unaligned devices and
>> stuff will start to add up ... but I assume this is not the case?
> 
> That's precisely what we do today and it has become untenable as the
> collision scenarios pile up. This thread [1] is worth a read if you
> care about  some of the gory details why I'm back to pushing for
> sub-section support, but most if it has already been summarized in the
> current discussion on this thread.

Thanks, exactly what I am interested in, will have a look!

>>>
>>> I don't see a strong reason why not, as long as it does not regress
>>> existing use cases. It might need to be an opt-in for new tooling that
>>> is aware of finer granularity hotplug. That said, I have no pressing
>>> need to go there and just care about the arch_add_memory() capability
>>> for now.
>>
>> Especially onlining/offlining of memory might end up very ugly. And that
>> goes hand in hand with memory block devices. They are either online or
>> offline, not something in between. (I went that path and Michal
>> correctly told me why it is not a good idea)
> 
> Thread reference?

Sure:

https://marc.info/?l=linux-mm&m=152362539714432&w=2

Onlining/offlining subsections was what I tried. (adding/removing whole
sections). But with the memory block device model (online/offline memory
blocks), this really was in some sense dirty, although it worked.

> 
>> I was recently trying to teach memory block devices who their owner is /
>> of which type they are. Right now I am looking into the option of using
>> drivers. Memory block devices that could belong to different drivers at
>> a time are well ... totally broken.
> 
> Sub-section support is aimed at a similar case where different
> portions of an 128MB span need to handed out to devices / drivers with
> independent lifetimes.

Right, but we are stuck here with memory block devices having certain
bigger granularity. We already went from 128MB to 2048MB because "there
were too many". Modeling this on 2MB level (e.g. subsections), no way.
And as I said, multiple users for one memory block device, very ugly.

What would be interesting is having memory block devices of variable
size. (64MB, 1024GB, 6GB ..), maybe even representing the unit in which
e.g. add_memory() was performed. But it would also have downsides when
it comes to changing the zone of memory blocks. Memory would be
onlined/offlined in way bigger chunks.

E.g. one DIMM = one memory block device.

> 
>> I assume it would still be a special
>> case, though, but conceptually speaking about the interface it would be
>> allowed.
>>
>> Memory block devices (and therefore 1..X sections) should have one owner
>> only. Anything else just does not fit.
> 
> Yes, but I would say the problem there is that the
> memory-block-devices interface design is showing its age and is being
> pressured with how systems want to deploy and use memory today.

Maybe, I guess the main "issue" started to pop up when different things
(RAM vs. PMEM) were started to be mapped into memory side by side. But
it is ABI, and basic kdump would completely break if removed. And of
course memory unplug and much more. It is a crucial part of how system
ram is handled today and might not at all be easy to replace.

-- 

Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Toshi Kani" <toshi.kani@hpe.com>,
	"Jeff Moyer" <jmoyer@redhat.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	stable <stable@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 00/10] mm: Sub-section memory hotplug support
Date: Thu, 28 Mar 2019 22:54:34 +0100	[thread overview]
Message-ID: <b76b3a91-a0b5-460d-df5c-9358e6219915@redhat.com> (raw)
In-Reply-To: <CAPcyv4ivBagzsZ1fCDb2Cr3scz+R8ZVgyie5c=LWNd6QZuw36g@mail.gmail.com>

>>>> Reason I am asking is because I wonder how that would interact with the
>>>> memory block device infrastructure and hotplugging of system ram -
>>>> add_memory()/add_memory_resource(). I *assume* you are not changing the
>>>> add_memory() interface, so that one still only works with whole sections
>>>> (or well, memory_block_size_bytes()) - check_hotplug_memory_range().
>>>
>>> Like you found below, the implementation enforces that add_memory_*()
>>> interfaces maintain section alignment for @start and @size.
>>>
>>>> In general, mix and matching system RAM and persistent memory per
>>>> section, I am not a friend of that.
>>>
>>> You have no choice. The platform may decide to map PMEM and System RAM
>>> in the same section because the Linux section is too large compared to
>>> typical memory controller mapping granularity capability.
>>
>> I might be very wrong here, but do we actually care about something like
>> 64MB getting lost in the cracks? I mean if it simplifies core MM, let go
>> of the couple of MB of system ram and handle the PMEM part only. Treat
>> the system ram parts like memory holes we already have in ordinary
>> sections (well, there we simply set the relevant struct pages to
>> PG_reserved). Of course, if we have hundreds of unaligned devices and
>> stuff will start to add up ... but I assume this is not the case?
> 
> That's precisely what we do today and it has become untenable as the
> collision scenarios pile up. This thread [1] is worth a read if you
> care about  some of the gory details why I'm back to pushing for
> sub-section support, but most if it has already been summarized in the
> current discussion on this thread.

Thanks, exactly what I am interested in, will have a look!

>>>
>>> I don't see a strong reason why not, as long as it does not regress
>>> existing use cases. It might need to be an opt-in for new tooling that
>>> is aware of finer granularity hotplug. That said, I have no pressing
>>> need to go there and just care about the arch_add_memory() capability
>>> for now.
>>
>> Especially onlining/offlining of memory might end up very ugly. And that
>> goes hand in hand with memory block devices. They are either online or
>> offline, not something in between. (I went that path and Michal
>> correctly told me why it is not a good idea)
> 
> Thread reference?

Sure:

https://marc.info/?l=linux-mm&m=152362539714432&w=2

Onlining/offlining subsections was what I tried. (adding/removing whole
sections). But with the memory block device model (online/offline memory
blocks), this really was in some sense dirty, although it worked.

> 
>> I was recently trying to teach memory block devices who their owner is /
>> of which type they are. Right now I am looking into the option of using
>> drivers. Memory block devices that could belong to different drivers at
>> a time are well ... totally broken.
> 
> Sub-section support is aimed at a similar case where different
> portions of an 128MB span need to handed out to devices / drivers with
> independent lifetimes.

Right, but we are stuck here with memory block devices having certain
bigger granularity. We already went from 128MB to 2048MB because "there
were too many". Modeling this on 2MB level (e.g. subsections), no way.
And as I said, multiple users for one memory block device, very ugly.

What would be interesting is having memory block devices of variable
size. (64MB, 1024GB, 6GB ..), maybe even representing the unit in which
e.g. add_memory() was performed. But it would also have downsides when
it comes to changing the zone of memory blocks. Memory would be
onlined/offlined in way bigger chunks.

E.g. one DIMM = one memory block device.

> 
>> I assume it would still be a special
>> case, though, but conceptually speaking about the interface it would be
>> allowed.
>>
>> Memory block devices (and therefore 1..X sections) should have one owner
>> only. Anything else just does not fit.
> 
> Yes, but I would say the problem there is that the
> memory-block-devices interface design is showing its age and is being
> pressured with how systems want to deploy and use memory today.

Maybe, I guess the main "issue" started to pop up when different things
(RAM vs. PMEM) were started to be mapped into memory side by side. But
it is ABI, and basic kdump would completely break if removed. And of
course memory unplug and much more. It is a crucial part of how system
ram is handled today and might not at all be easy to replace.

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-03-28 21:54 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 16:57 [PATCH v5 00/10] mm: Sub-section memory hotplug support Dan Williams
2019-03-22 16:57 ` Dan Williams
2019-03-22 16:57 ` [PATCH v5 01/10] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-03-22 16:57   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 02/10] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 03/10] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 04/10] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 05/10] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 06/10] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 07/10] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 08/10] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-22 16:58 ` [PATCH v5 09/10] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-03-22 16:58   ` Dan Williams
2019-03-27 14:00   ` Sasha Levin
2019-03-22 16:58 ` [PATCH v5 10/10] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-03-22 18:05 ` [PATCH v5 00/10] mm: Sub-section memory hotplug support Michal Hocko
2019-03-22 18:05   ` Michal Hocko
2019-03-22 18:32   ` Dan Williams
2019-03-22 18:32     ` Dan Williams
2019-03-25 10:19     ` Michal Hocko
2019-03-25 10:19       ` Michal Hocko
2019-03-25 14:28       ` Jeff Moyer
2019-03-25 14:28         ` Jeff Moyer
2019-03-25 14:50         ` Michal Hocko
2019-03-25 14:50           ` Michal Hocko
2019-03-25 20:03       ` Dan Williams
2019-03-25 20:03         ` Dan Williams
2019-03-26  8:04         ` Michal Hocko
2019-03-26  8:04           ` Michal Hocko
2019-03-27  0:20           ` Dan Williams
2019-03-27 16:13             ` Michal Hocko
2019-03-27 16:13               ` Michal Hocko
2019-03-27 16:17               ` Dan Williams
2019-03-27 16:17                 ` Dan Williams
2019-03-28 13:38               ` David Hildenbrand
2019-03-28 13:38                 ` David Hildenbrand
2019-03-28 14:16                 ` Michal Hocko
2019-03-28 14:16                   ` Michal Hocko
2019-04-01  9:18             ` David Hildenbrand
2019-03-28 20:10 ` David Hildenbrand
2019-03-28 20:10   ` David Hildenbrand
2019-03-28 20:43   ` Dan Williams
2019-03-28 21:17     ` David Hildenbrand
2019-03-28 21:17       ` David Hildenbrand
2019-03-28 21:32       ` Dan Williams
2019-03-28 21:32         ` Dan Williams
2019-03-28 21:54         ` David Hildenbrand [this message]
2019-03-28 21:54           ` David Hildenbrand
2019-04-10  9:51 ` David Hildenbrand
2019-04-10  9:51   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b76b3a91-a0b5-460d-df5c-9358e6219915@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.