linux-nvdimm.lists.01.org archive mirror

On 5/18/21 8:56 PM, Jane Chu wrote:
> On 5/18/2021 10:27 AM, Joao Martins wrote:
> 
>> On 5/5/21 11:36 PM, Joao Martins wrote:
>>> On 5/5/21 11:20 PM, Dan Williams wrote:
>>>> On Wed, May 5, 2021 at 12:50 PM Joao Martins  wrote:
>>>>> On 5/5/21 7:44 PM, Dan Williams wrote:
>>>>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  wrote:
>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>>>>>> index b46f63dcaed3..bb28d82dda5e 100644
>>>>>>> --- a/include/linux/memremap.h
>>>>>>> +++ b/include/linux/memremap.h
>>>>>>> @@ -114,6 +114,7 @@ struct dev_pagemap {
>>>>>>>          struct completion done;
>>>>>>>          enum memory_type type;
>>>>>>>          unsigned int flags;
>>>>>>> +       unsigned long align;
>>>>>> I think this wants some kernel-doc above to indicate that non-zero
>>>>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
>>>>>> means "use non-compound base pages".
>> [...]
>>
>>>>>> The non-zero value must be
>>>>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
>>>>>> Hmm, maybe it should be an
>>>>>> enum:
>>>>>>
>>>>>> enum devmap_geometry {
>>>>>>      DEVMAP_PTE,
>>>>>>      DEVMAP_PMD,
>>>>>>      DEVMAP_PUD,
>>>>>> }
>>>>>>
>>>>> I suppose a converter between devmap_geometry and page_size would be needed too? And maybe
>>>>> the whole dax/nvdimm align values change meanwhile (as a followup improvement)?
>>>> I think it is ok for dax/nvdimm to continue to maintain their align
>>>> value because it should be ok to have 4MB align if the device really
>>>> wanted. However, when it goes to map that alignment with
>>>> memremap_pages() it can pick a mode. For example, it's already the
>>>> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
>>>> they're already separate concepts that can stay separate.
>>>>
>>> Gotcha.
>> I am reconsidering part of the above. In general, yes, the meaning of devmap @align
>> represents a slightly different variation of the device @align i.e. how the metadata is
>> laid out **but** regardless of what kind of page table entries we use vmemmap.
>>
>> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax already
>> validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE and PUD_SIZE)
>> 2) the geometry of metadata is very much tied to the value we pick to @align at namespace
>> provisioning -- not the "align" we might use at mmap() perhaps that's what you referred
>> above? -- and 3) the value of geometry actually derives from dax device @align because we
>> will need to create compound pages representing a page size of @align value.
>>
>> Using your example above: you're saying that dax->align == 1G is mapped with DEVMAP_PTEs,
>> in reality the vmemmap is populated with PMDs/PUDs page tables (depending on what archs
>> decide to do at vmemmap_populate()) and uses base pages as its metadata regardless of what
>> device @align. In reality what we want to convey in @geometry is not page table sizes, but
>> just the page size used for the vmemmap of the dax device. Additionally, limiting its
>> value might not be desirable... if tomorrow Linux for some arch supports dax/nvdimm
>> devices with 4M align or 64K align, the value of @geometry will have to reflect the 4M to
>> create compound pages of order 10 for the said vmemmap.
>>
>> I am going to wait until you finish reviewing the remaining four patches of this series,
>> but maybe this is a simple misnomer (s/align/geometry/) with a comment but without
>> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum a
>> setter/getter to audit its value? Thoughts?
> 
> Good points there.
> 
> My understanding is that  dax->align  conveys granularity of size while 
> carving out a namespace it's a geometry attribute loosely akin to sector size of a spindle 
> disk.  I tend to think that device pagesize  has almost no relation to "align" in that, it's 
> possible to have 1G "align" and 4K pagesize, or verse versa.  That is, with the advent of compound page 
> support, it is possible to totally separate the two concepts.
> 
> How about adding a new option to "ndctl create-namespace" that describes 
> device creator's desired pagesize, and another parameter to describe whether the pagesize shall 
> be fixed or allowed to be split up, such that, if the intention is to never split up 2M pagesize, then it 
> would be possible to save a lot metadata space on the device?

Maybe that can be selected by the driver too, but it's an interesting point you raise
should we settle with the geometry (e.g. like a geometry sysfs entry IIUC your
suggestion?). device-dax for example would use geometry == align and therefore save space
(like what I propose in patch 10). But fsdax would retain the default that is geometry =
PAGE_SIZE and align = PMD_SIZE should it want to split pages.

Interestingly, devmap poisoning always occur at @align level regardless of @geometry.

What I am not sure is what value (vs added complexity) it brings to allow geometry *value*
to be selecteable by user given that so far we seem to only ever initialize metadata as
either sets of base pages [*] or sets of compound pages (of a size). And the difference
between both can possibly be summarized to split-ability like you say.

[*] that optionally can are morphed into compound pages by driver
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

linux-nvdimm.lists.01.org archive mirror

Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [PATCH v3] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

[PATCH v6 7/7] fs/xfs: Add dax dedupe support

[PATCH v6 6/7] fs/xfs: Handle CoW for fsdax write() path

[PATCH v6 5/7] fsdax: Dedup file range to use a compare function

[PATCH v6 4/7] iomap: Introduce iomap_apply2() for operations on two files

[PATCH v6 1/7] fsdax: Introduce dax_iomap_cow_copy()

[PATCH v6 3/7] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero

[PATCH v6 2/7] fsdax: Replace mmap entry in case of CoW

[PATCH v6 0/7] fsdax,xfs: Add reflink&dedupe support for fsdax

Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

Re: [PATCH v20 4/7] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

Re: [PATCH v20 4/7] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [ndctl PATCH] ndctl: Update nvdimm mailing list address

[ndctl PATCH] ndctl: Update nvdimm mailing list address

【Amazon】注文商品が返品されましたであるかを確認してください

Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas

Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users