Linux-NVDIMM Archive on lore.kernel.org
 help / color / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Dan Williams <dan.j.williams@intel.com>, Jeff Moyer <jmoyer@redhat.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2 1/4] mm/memremap_pages: Introduce memremap_compat_align()
Date: Fri, 14 Feb 2020 08:56:22 +0530
Message-ID: <878sl5x31d.fsf@linux.ibm.com> (raw)
In-Reply-To: <CAPcyv4i8xNEsdX=8c2+ehf24U2AFcc-sKmAPS9UoVvm8z0aRng@mail.gmail.com>

Dan Williams <dan.j.williams@intel.com> writes:

> On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmoyer@redhat.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > The "sub-section memory hotplug" facility allows memremap_pages() users
>> > like libnvdimm to compensate for hardware platforms like x86 that have a
>> > section size larger than their hardware memory mapping granularity.  The
>> > compensation that sub-section support affords is being tolerant of
>> > physical memory resources shifting by units smaller (64MiB on x86) than
>> > the memory-hotplug section size (128 MiB). Where the platform
>> > physical-memory mapping granularity is limited by the number and
>> > capability of address-decode-registers in the memory controller.
>> >
>> > While the sub-section support allows memremap_pages() to operate on
>> > sub-section (2MiB) granularity, the Power architecture may still
>> > require 16MiB alignment on "!radix_enabled()" platforms.
>> >
>> > In order for libnvdimm to be able to detect and manage this per-arch
>> > limitation, introduce memremap_compat_align() as a common minimum
>> > alignment across all driver-facing memory-mapping interfaces, and let
>> > Power override it to 16MiB in the "!radix_enabled()" case.
>> >
>> > The assumption / requirement for 16MiB to be a viable
>> > memremap_compat_align() value is that Power does not have platforms
>> > where its equivalent of address-decode-registers never hardware remaps a
>> > persistent memory resource on smaller than 16MiB boundaries. Note that I
>> > tried my best to not add a new Kconfig symbol, but header include
>> > entanglements defeated the #ifndef memremap_compat_align design pattern
>> > and the need to export it defeats the __weak design pattern for arch
>> > overrides.
>> >
>> > Based on an initial patch by Aneesh.
>>
>> I have just a couple of questions.
>>
>> First, can you please add a comment above the generic implementation of
>> memremap_compat_align describing its purpose, and why a platform might
>> want to override it?
>
> Sure, how about:
>
> /*
>  * The memremap() and memremap_pages() interfaces are alternately used
>  * to map persistent memory namespaces. These interfaces place different
>  * constraints on the alignment and size of the mapping (namespace).
>  * memremap() can map individual PAGE_SIZE pages. memremap_pages() can
>  * only map subsections (2MB), and at least one architecture (PowerPC)
>  * the minimum mapping granularity of memremap_pages() is 16MB.
>  *
>  * The role of memremap_compat_align() is to communicate the minimum
>  * arch supported alignment of a namespace such that it can freely
>  * switch modes without violating the arch constraint. Namely, do not
>  * allow a namespace to be PAGE_SIZE aligned since that namespace may be
>  * reconfigured into a mode that requires SUBSECTION_SIZE alignment.
>  */
>
>> Second, I will take it at face value that the power architecture
>> requires a 16MB alignment, but it's not clear to me why mmu_linear_psize
>> was chosen to represent that.  What's the relationship, there, and can
>> we please have a comment explaining it?
>
> Aneesh, can you help here?

With hash translation, we map the direct-map range with just one page
size. Based on different restrictions as described in htab_init_page_sizes
we can end up choosing 16M, 64K or even 4K. We use the variable
mmu_linear_psize to indicate which page size we used for direct-map
range. 

ie we should do. 

 +unsigned long arch_namespace_align_size(void)
 +{
 +	unsigned long sub_section_size = (1UL << SUBSECTION_SHIFT);
 +
 +	if (radix_enabled())
 +		return sub_section_size;
 +	return max(sub_section_size, (1UL << mmu_psize_defs[mmu_linear_psize].shift));
 +
 +}
 +EXPORT_SYMBOL_GPL(arch_namespace_align_size);

as done here

https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.kumar@linux.ibm.com/

Dan can you update the powerpc definition?

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13  0:48 [PATCH v2 0/4] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
2020-02-13  0:48 ` [PATCH v2 1/4] mm/memremap_pages: Introduce memremap_compat_align() Dan Williams
2020-02-13 16:57   ` Jeff Moyer
2020-02-13 18:26     ` Dan Williams
2020-02-14  3:26       ` Aneesh Kumar K.V [this message]
2020-02-14 20:59       ` Jeff Moyer
2020-02-14 23:05         ` Dan Williams
2020-02-13  0:48 ` [PATCH v2 2/4] libnvdimm/namespace: Enforce memremap_compat_align() Dan Williams
2020-02-13 19:16   ` Jeff Moyer
2020-02-13 21:55   ` Jeff Moyer
2020-02-13 22:43     ` Dan Williams
2020-02-14 16:44       ` Jeff Moyer
2020-02-14 16:55         ` Aneesh Kumar K.V
2020-02-13  0:48 ` [PATCH v2 3/4] libnvdimm/region: Introduce NDD_LABELING Dan Williams
2020-02-13 19:12   ` Jeff Moyer
2020-02-13  0:48 ` [PATCH v2 4/4] libnvdimm/region: Introduce an 'align' attribute Dan Williams
2020-02-14 20:19   ` Jeff Moyer
2020-02-14 21:03 ` [PATCH v2 0/4] libnvdimm: Cross-arch compatible namespace alignment Jeff Moyer

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sl5x31d.fsf@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=dan.j.williams@intel.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVDIMM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvdimm/0 linux-nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvdimm linux-nvdimm/ https://lore.kernel.org/linux-nvdimm \
		linux-nvdimm@lists.01.org
	public-inbox-index linux-nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.01.lists.linux-nvdimm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git