From: David Hildenbrand <david@redhat.com>
To: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Sasha Levin <sashal@kernel.org>,
Tyler Hicks <tyhicks@linux.microsoft.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Michal Hocko <mhocko@suse.com>,
Oscar Salvador <osalvador@suse.de>,
Vlastimil Babka <vbabka@suse.cz>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Marc Zyngier <maz@kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Will Deacon <will.deacon@arm.com>,
James Morse <james.morse@arm.com>,
James Morris <jmorris@namei.org>
Subject: Re: dax alignment problem on arm64 (and other achitectures)
Date: Thu, 28 Jan 2021 16:03:07 +0100 [thread overview]
Message-ID: <db692fcd-40e8-9c2b-d63b-9803f4bf9d5e@redhat.com> (raw)
In-Reply-To: <CA+CK2bCjD7PujEwWMT32p4e6x6hZ-f5QOKXir10mT8RfijvnUA@mail.gmail.com>
>> One issue usually is that often firmware can allocate from available
>> system RAM and/or modify/initialize it. I assume you're running some
>> custom firmware :)
>
> We have a special firmware that does not touch the last 2G of physical
> memory for its allocations :)
>
Fancy :)
[...]
>> Personally, I think the future is 4k, especially for smaller machines.
>> (also, imagine right now how many 512MB THP you can actually use in your
>> 8GB VM ..., simply not suitable for small machines).
>
> Um, this is not really about 512THP. Yes, this is smaller machine, but
> performance is very important to us. Boot budget for the kernel is
> under half a second. With 64K we save 0.2s 0.35s vs 0.55s. This is
> because fewer struct pages need to be initialized. Also, fewer TLB
> misses, and 3-level page tables add up as performance benefits. >
> For larger servers 64K pages make total sense: Less memory is wasted as metdata.
Yes, indeed, for very large servers it might make sense in that regard.
However, once we can eventually free vmemmap of hugetlbfs things could
change; assuming user space will be consuming huge pages (which large
machines better be doing ... databases, hypervisors ... ).
Also, some hypervisors try allocating the memmap completely ... but I
consider that rather a special case.
Personally, I consider being able to use THP/huge pages more important
than having 64k base pages and saving some TLB space there. Also, with
64k you have other drawbacks: for example, each stack, each TLS for
threads in applications suddenly consumes 16 times more memory as "minimum".
Optimizing boot time/memmap initialization further is certainly an
interesting topic.
Anyhow, you know your use case best, just sharing my thoughts :)
[...]
>>>
>>> Right, but I do not think it is possible to do for dax devices (as of
>>> right now). I assume, it contains information about what kind of
>>> device it is: devdax, fsdax, sector, uuid etc.
>>> See [1] namespaces tabel. It contains summary of pmem devices types,
>>> and which of them have label (all except for raw).
>>
>> Interesting, I wonder if the label is really required to get this
>> special use case running. I mean, all you want is to have dax/kmem
>> expose the whole thing as system RAM. You don't want to lose even 2MB if
>> it's just for the sake of unnecessary metadata - this is not a real
>> device, it's "fake" already.
>
> Hm, would not it essentially mean allowing memory hot-plug for raw
> pmem devices? Something like create mmap, and hot-add raw pmem?
Theoretically yes, but I have no idea if that would make sense for real
"raw pmem" as well. Hope some of the pmem/nvdimm experts can clarify
what's possible and what's not :)
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-01-28 15:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 20:43 dax alignment problem on arm64 (and other achitectures) Pavel Tatashin
2021-01-27 21:09 ` David Hildenbrand
2021-01-27 21:49 ` Pavel Tatashin
2021-01-27 22:18 ` David Hildenbrand
2021-01-27 23:33 ` Pavel Tatashin
2021-01-28 15:03 ` David Hildenbrand [this message]
2021-01-29 2:06 ` Pavel Tatashin
2021-01-29 13:19 ` David Hildenbrand
2021-01-29 16:24 ` Pavel Tatashin
2021-01-29 19:06 ` Pavel Tatashin
2021-01-29 19:12 ` Pavel Tatashin
2021-01-29 19:41 ` Pavel Tatashin
2021-01-29 2:55 ` Dan Williams
2021-01-29 13:50 ` Pavel Tatashin
2021-01-29 14:50 ` Joao Martins
2021-01-29 16:32 ` Pavel Tatashin
2021-01-29 17:22 ` Joao Martins
2021-01-29 20:26 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=db692fcd-40e8-9c2b-d63b-9803f4bf9d5e@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=james.morse@arm.com \
--cc=jgg@ziepe.ca \
--cc=jmorris@namei.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maz@kernel.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=sashal@kernel.org \
--cc=tyhicks@linux.microsoft.com \
--cc=vbabka@suse.cz \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).