From: Joao Martins <joao.m.martins@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Christoph Hellwig <hch@lst.de>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
linux-s390 <linux-s390@vger.kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Alex Sierra <alex.sierra@amd.com>,
"Kuehling, Felix" <Felix.Kuehling@amd.com>,
Linux MM <linux-mm@kvack.org>,
Ralph Campbell <rcampbell@nvidia.com>,
Alistair Popple <apopple@nvidia.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: can we finally kill off CONFIG_FS_DAX_LIMITED
Date: Tue, 19 Oct 2021 16:20:16 +0100 [thread overview]
Message-ID: <a0001855-4f08-78b7-64ae-80ebbbb04f8d@oracle.com> (raw)
In-Reply-To: <20211019142032.GT2744544@nvidia.com>
On 10/19/21 15:20, Jason Gunthorpe wrote:
> On Mon, Oct 18, 2021 at 09:26:24PM -0700, Dan Williams wrote:
>> On Mon, Oct 18, 2021 at 4:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>> On Fri, Oct 15, 2021 at 01:22:41AM +0100, Joao Martins wrote:
>>> I'm not sure the comment is correct anyhow:
>>>
>>> /*
>>> * Unmap the largest mapping to avoid breaking up
>>> * device-dax mappings which are constant size. The
>>> * actual size of the mapping being torn down is
>>> * communicated in siginfo, see kill_proc()
>>> */
>>> unmap_mapping_range(page->mapping, start, size, 0);
>>>
>>> Beacuse for non PageAnon unmap_mapping_range() does either
>>> zap_huge_pud(), __split_huge_pmd(), or zap_huge_pmd().
>>>
>>> Despite it's name __split_huge_pmd() does not actually split, it will
>>> call __split_huge_pmd_locked:
>>>
>>> } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>>> goto out;
>>> __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>>>
>>> Which does
>>> if (!vma_is_anonymous(vma)) {
>>> old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
>>>
>>> Which is a zap, not split.
>>>
>>> So I wonder if there is a reason to use anything other than 4k here
>>> for DAX?
>>>
>>>> tk->size_shift = page_shift(compound_head(p));
>>>>
>>>> ... as page_shift() would just return PAGE_SHIFT (as compound_order() is 0).
>>>
>>> And what would be so wrong with memory failure doing this as a 4k
>>> page?
>>
>> device-dax does not support misaligned mappings. It makes hard
>> guarantees for applications that can not afford the page table
>> allocation overhead of sub-1GB mappings.
>
> memory-failure is the wrong layer to enforce this anyhow - if someday
> unmap_mapping_range() did learn to break up the 1GB pages then we'd
> want to put the condition to preserve device-dax mappings there, not
> way up in memory-failure.
>
> So we can just delete the detection of the page size and rely on the
> zap code to wipe out the entire level, not split it. Which is what we
> have today already.
On a quick note, wrt to @size_shift: memory-failure reflects it back to
userspace as contextual information (::addr_lsb) of the signal, when delivering
the intended SIGBUS(code=BUS_MCEERR_*). So the size needs to be reported
somehow.
next prev parent reply other threads:[~2021-10-19 15:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210820054340.GA28560@lst.de>
[not found] ` <20210823160546.0bf243bf@thinkpad>
[not found] ` <20210823214708.77979b3f@thinkpad>
[not found] ` <CAPcyv4jijqrb1O5OOTd5ftQ2Q-5SVwNRM7XMQ+N3MAFxEfvxpA@mail.gmail.com>
[not found] ` <e250feab-1873-c91d-5ea9-39ac6ef26458@oracle.com>
[not found] ` <CAPcyv4jYXPWmT2EzroTa7RDz1Z68Qz8Uj4MeheQHPbBXdfS4pA@mail.gmail.com>
[not found] ` <20210824202449.19d524b5@thinkpad>
[not found] ` <CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com>
2021-10-14 23:04 ` can we finally kill off CONFIG_FS_DAX_LIMITED Jason Gunthorpe
2021-10-15 0:22 ` Joao Martins
2021-10-18 23:30 ` Jason Gunthorpe
2021-10-19 4:26 ` Dan Williams
2021-10-19 14:20 ` Jason Gunthorpe
2021-10-19 15:20 ` Joao Martins [this message]
2021-10-19 15:38 ` Felix Kuehling
2021-10-19 17:38 ` Dan Williams
2021-10-19 17:54 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0001855-4f08-78b7-64ae-80ebbbb04f8d@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=Felix.Kuehling@amd.com \
--cc=alex.sierra@amd.com \
--cc=apopple@nvidia.com \
--cc=borntraeger@de.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=rcampbell@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).