linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks
Date: Thu, 1 Aug 2019 10:27:41 +0200	[thread overview]
Message-ID: <20190801082741.GK11627@dhcp22.suse.cz> (raw)
In-Reply-To: <f8767e9a-034d-dca6-05e6-dc6bbcb4d005@redhat.com>

On Thu 01-08-19 09:00:45, David Hildenbrand wrote:
> On 01.08.19 08:13, Michal Hocko wrote:
> > On Wed 31-07-19 16:43:58, David Hildenbrand wrote:
> >> On 31.07.19 16:37, Michal Hocko wrote:
> >>> On Wed 31-07-19 16:21:46, David Hildenbrand wrote:
> >>> [...]
> >>>>> Thinking about it some more, I believe that we can reasonably provide
> >>>>> both APIs controlable by a command line parameter for backwards
> >>>>> compatibility. It is the hotplug code to control sysfs APIs.  E.g.
> >>>>> create one sysfs entry per add_memory_resource for the new semantic.
> >>>>
> >>>> Yeah, but the real question is: who needs it. I can only think about
> >>>> some DIMM scenarios (some, not all). I would be interested in more use
> >>>> cases. Of course, to provide and maintain two APIs we need a good reason.
> >>>
> >>> Well, my 3TB machine that has 7 movable nodes could really go with less
> >>> than
> >>> $ find /sys/devices/system/memory -name "memory*" | wc -l
> >>> 1729>
> >>
> >> The question is if it would be sufficient to increase the memory block
> >> size even further for these kinds of systems (e.g., via a boot parameter
> >> - I think we have that on uv systems) instead of having blocks of
> >> different sizes. Say, 128GB blocks because you're not going to hotplug
> >> 128MB DIMMs into such a system - at least that's my guess ;)
> > 
> > The system has
> > [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x10000000000-0x17fffffffff]
> > [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x80000000000-0x87fffffffff]
> > [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x90000000000-0x97fffffffff]
> > [    0.000000] ACPI: SRAT: Node 4 PXM 4 [mem 0x100000000000-0x107fffffffff]
> > [    0.000000] ACPI: SRAT: Node 5 PXM 5 [mem 0x110000000000-0x117fffffffff]
> > [    0.000000] ACPI: SRAT: Node 6 PXM 6 [mem 0x180000000000-0x183fffffffff]
> > [    0.000000] ACPI: SRAT: Node 7 PXM 7 [mem 0x190000000000-0x191fffffffff]
> > 
> > hotplugable memory. I would love to have those 7 memory blocks to work
> > with. Any smaller grained split is just not helping as the platform will
> > not be able to hotremove it anyway.
> > 
> 
> So the smallest granularity in your system is indeed 128GB (btw, nice
> system, I wish I had something like that), the biggest one 512GB.
> 
> Using a memory block size of 128GB would imply on a 3TB system 24 memory
> blocks - which is tolerable IMHO. Especially, performance-wise there
> shouldn't be a real difference to 7 blocks. Hotunplug triggered via ACPI
> will take care of offlining the right DIMMs.

The problem with a fixed size memblock is that you might not know how
much memory you will have until much later after the boot. For example,
it should be quite reasonable to expect that this particular machine
would boot with node 0 only and have additional boards with memory added
during runtime. How big the memblock should be then? And I believe that
the virtualization usecase is similar in that regards. You get memory on
demand.
 
> Of course, 7 blocks would be nicer, but as discussed, not possible with
> the current ABI.

As I've said, if we want to move forward we have to change the API we
have right now. With backward compatible option of course.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-08-01  8:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 12:22 [PATCH v1] drivers/base/memory.c: Don't store end_section_nr in memory blocks David Hildenbrand
2019-07-31 12:43 ` Michal Hocko
2019-07-31 13:12   ` David Hildenbrand
2019-07-31 13:25     ` Michal Hocko
2019-07-31 13:42       ` David Hildenbrand
2019-07-31 14:04         ` David Hildenbrand
2019-07-31 14:15           ` Michal Hocko
2019-07-31 14:23             ` David Hildenbrand
2019-07-31 14:14         ` Michal Hocko
2019-07-31 14:21           ` David Hildenbrand
2019-07-31 14:37             ` Michal Hocko
2019-07-31 14:43               ` David Hildenbrand
2019-08-01  6:13                 ` Michal Hocko
2019-08-01  7:00                   ` David Hildenbrand
2019-08-01  8:27                     ` Michal Hocko [this message]
2019-08-01  8:36                       ` David Hildenbrand
2019-07-31 20:57 ` Andrew Morton
2019-08-01  6:48   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190801082741.GK11627@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).