linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	Tyler Hicks <tyhicks@linux.microsoft.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Marc Zyngier <maz@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Will Deacon <will.deacon@arm.com>,
	James Morse <james.morse@arm.com>,
	James Morris <jmorris@namei.org>
Subject: Re: dax alignment problem on arm64 (and other achitectures)
Date: Thu, 28 Jan 2021 16:03:07 +0100	[thread overview]
Message-ID: <db692fcd-40e8-9c2b-d63b-9803f4bf9d5e@redhat.com> (raw)
In-Reply-To: <CA+CK2bCjD7PujEwWMT32p4e6x6hZ-f5QOKXir10mT8RfijvnUA@mail.gmail.com>

>> One issue usually is that often firmware can allocate from available
>> system RAM and/or modify/initialize it. I assume you're running some
>> custom firmware :)
> 
> We have a special firmware that does not touch the last 2G of physical
> memory for its allocations :)
> 

Fancy :)

[...]

>> Personally, I think the future is 4k, especially for smaller machines.
>> (also, imagine right now how many 512MB THP you can actually use in your
>> 8GB VM ..., simply not suitable for small machines).
> 
> Um, this is not really about 512THP. Yes, this is smaller machine, but
> performance is very important to us. Boot budget for the kernel is
> under half a second. With 64K we save 0.2s  0.35s vs 0.55s. This is
> because fewer struct pages need to be initialized. Also, fewer TLB
> misses, and 3-level page tables add up as performance benefits. >
> For larger servers 64K pages make total sense: Less memory is wasted as metdata.

Yes, indeed, for very large servers it might make sense in that regard. 
However, once we can eventually free vmemmap of hugetlbfs things could 
change; assuming user space will be consuming huge pages (which large 
machines better be doing ... databases, hypervisors ... ).

Also, some hypervisors try allocating the memmap completely ... but I 
consider that rather a special case.

Personally, I consider being able to use THP/huge pages more important 
than having 64k base pages and saving some TLB space there. Also, with 
64k you have other drawbacks: for example, each stack, each TLS for 
threads in applications suddenly consumes 16 times more memory as "minimum".

Optimizing boot time/memmap initialization further is certainly an 
interesting topic.

Anyhow, you know your use case best, just sharing my thoughts :)

[...]

>>>
>>> Right, but I do not think it is possible to do for dax devices (as of
>>> right now). I assume, it contains information about what kind of
>>> device it is: devdax, fsdax, sector, uuid etc.
>>> See [1] namespaces tabel. It contains summary of pmem devices types,
>>> and which of them have label (all except for raw).
>>
>> Interesting, I wonder if the label is really required to get this
>> special use case running. I mean, all you want is to have dax/kmem
>> expose the whole thing as system RAM. You don't want to lose even 2MB if
>> it's just for the sake of unnecessary metadata - this is not a real
>> device, it's "fake" already.
> 
> Hm, would not it essentially  mean allowing memory hot-plug for raw
> pmem devices? Something like create mmap, and hot-add raw pmem?

Theoretically yes, but I have no idea if that would make sense for real 
"raw pmem" as well. Hope some of the pmem/nvdimm experts can clarify 
what's possible and what's not :)


-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-01-28 15:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 20:43 dax alignment problem on arm64 (and other achitectures) Pavel Tatashin
2021-01-27 21:09 ` David Hildenbrand
2021-01-27 21:49   ` Pavel Tatashin
2021-01-27 22:18     ` David Hildenbrand
2021-01-27 23:33       ` Pavel Tatashin
2021-01-28 15:03         ` David Hildenbrand [this message]
2021-01-29  2:06           ` Pavel Tatashin
2021-01-29 13:19             ` David Hildenbrand
2021-01-29 16:24               ` Pavel Tatashin
2021-01-29 19:06                 ` Pavel Tatashin
2021-01-29 19:12                   ` Pavel Tatashin
2021-01-29 19:41                     ` Pavel Tatashin
2021-01-29  2:55     ` Dan Williams
2021-01-29 13:50       ` Pavel Tatashin
2021-01-29 14:50         ` Joao Martins
2021-01-29 16:32           ` Pavel Tatashin
2021-01-29 17:22             ` Joao Martins
2021-01-29 20:26         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db692fcd-40e8-9c2b-d63b-9803f4bf9d5e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=james.morse@arm.com \
    --cc=jgg@ziepe.ca \
    --cc=jmorris@namei.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=sashal@kernel.org \
    --cc=tyhicks@linux.microsoft.com \
    --cc=vbabka@suse.cz \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).