All of lore.kernel.org
 help / color / mirror / Atom feed
From: Justin He <Justin.He@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Steve Capper <Steve.Capper@arm.com>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	Hsin-Yi Wang <hsinyi@chromium.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Kees Cook <keescook@chromium.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Kaly Xin <Kaly.Xin@arm.com>
Subject: RE: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment
Date: Wed, 29 Jul 2020 08:27:58 +0000	[thread overview]
Message-ID: <AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <D1981D47-61F1-42E9-A426-6FEF0EC310C8@redhat.com>

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Wednesday, July 29, 2020 2:37 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Mike Rapoport <rppt@linux.ibm.com>; David
> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
> Will Deacon <will@kernel.org>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Rafael J. Wysocki <rafael@kernel.org>; Dave
> Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-foundation.org>;
> Steve Capper <Steve.Capper@arm.com>; Mark Rutland <Mark.Rutland@arm.com>;
> Logan Gunthorpe <logang@deltatee.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Hsin-Yi Wang <hsinyi@chromium.org>; Jason
> Gunthorpe <jgg@ziepe.ca>; Dave Hansen <dave.hansen@linux.intel.com>; Kees
> Cook <keescook@chromium.org>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-nvdimm@lists.01.org; linux-mm@kvack.org; Wei
> Yang <richardw.yang@linux.intel.com>; Pankaj Gupta
> <pankaj.gupta.linux@gmail.com>; Ira Weiny <ira.weiny@intel.com>; Kaly Xin
> <Kaly.Xin@arm.com>
> Subject: Re: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem
> alignment
> 
> 
> 
> > Am 29.07.2020 um 05:35 schrieb Jia He <justin.he@arm.com>:
> >
> > When enabling dax pmem as RAM device on arm64, I noticed that kmem_start
> > addr in dev_dax_kmem_probe() should be aligned w/
> SECTION_SIZE_BITS(30),i.e.
> > 1G memblock size. Even Dan Williams' sub-section patch series [1] had
> been
> > upstream merged, it was not helpful due to hard limitation of kmem_start:
> > $ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f
> -a 2M
> > $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> > $cat /proc/iomem
> > ...
> > 23c000000-23fffffff : System RAM
> >  23dd40000-23fecffff : reserved
> >  23fed0000-23fffffff : reserved
> > 240000000-33fdfffff : Persistent Memory
> >  240000000-2403fffff : namespace0.0
> >  280000000-2bfffffff : dax0.0          <- aligned with 1G boundary
> >    280000000-2bfffffff : System RAM
> > Hence there is a big gap between 0x2403fffff and 0x280000000 due to the
> 1G
> > alignment.
> >
> > Without this series, if qemu creates a 4G bytes nvdimm device, we can
> only
> > use 2G bytes for dax pmem(kmem) in the worst case.
> > e.g.
> > 240000000-33fdfffff : Persistent Memory
> > We can only use the memblock between [240000000, 2ffffffff] due to the
> hard
> > limitation. It wastes too much memory space.
> >
> > Decreasing the SECTION_SIZE_BITS on arm64 might be an alternative, but
> there
> > are too many concerns from other constraints, e.g. PAGE_SIZE, hugetlb,
> > SPARSEMEM_VMEMMAP, page bits in struct page ...
> >
> > Beside decreasing the SECTION_SIZE_BITS, we can also relax the kmem
> alignment
> > with memory_block_size_bytes().
> >
> > Tested on arm64 guest and x86 guest, qemu creates a 4G pmem device. dax
> pmem
> > can be used as ram with smaller gap. Also the kmem hotplug add/remove
> are both
> > tested on arm64/x86 guest.
> >
> 
> Hi,
> 
> I am not convinced this use case is worth such hacks (that’s what it is)
> for now. On real machines pmem is big - your example (losing 50% is
> extreme).
> 
> I would much rather want to see the section size on arm64 reduced. I
> remember there were patches and that at least with a base page size of 4k
> it can be reduced drastically (64k base pages are more problematic due to
> the ridiculous THP size of 512M). But could be a section size of 512 is
> possible on all configs right now.

Yes, I once investigated how to reduce section size on arm64 thoughtfully:
There are many constraints for reducing SECTION_SIZE_BITS
1. Given page->flags bits is limited, SECTION_SIZE_BITS can't be reduced too
   much.
2. Once CONFIG_SPARSEMEM_VMEMMAP is enabled, section id will not be counted
   into page->flags.
3. MAX_ORDER depends on SECTION_SIZE_BITS 
 - 3.1 mmzone.h
#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif
 - 3.2 hugepage_init()
MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);

Hence when ARM64_4K_PAGES && CONFIG_SPARSEMEM_VMEMMAP are enabled,
SECTION_SIZE_BITS can be reduced to 27.
But when ARM64_64K_PAGES, given 3.2, MAX_ORDER > 29-16 = 13.
Given 3.1 SECTION_SIZE_BITS >= MAX_ORDER+15 > 28. So SECTION_SIZE_BITS can not
be reduced to 27.

In one word, if we considered to reduce SECTION_SIZE_BITS on arm64, the Kconfig
might be very complicated,e.g. we still need to consider the case for
ARM64_16K_PAGES.

> 
> In the long term we might want to rework the memory block device model
> (eventually supporting old/new as discussed with Michal some time ago
> using a kernel parameter), dropping the fixed sizes

Has this been posted to Linux mm maillist? Sorry, searched and didn't find it.


--
Cheers,
Justin (Jia He)



> - allowing sizes / addresses aligned with subsection size
> - drastically reducing the number of devices for boot memory to only a
> hand full (e.g., one per resource / DIMM we can actually unplug again.
> 
> Long story short, I don’t like this hack.
> 
> 
> > This patch series (mainly patch6/6) is based on the fixing patch, ~v5.8-
> rc5 [2].
> >
> > [1] https://lkml.org/lkml/2019/6/19/67
> > [2] https://lkml.org/lkml/2020/7/8/1546
> > Jia He (6):
> >  mm/memory_hotplug: remove redundant memory block size alignment check
> >  resource: export find_next_iomem_res() helper
> >  mm/memory_hotplug: allow pmem kmem not to align with memory_block_size
> >  mm/page_alloc: adjust the start,end in dax pmem kmem case
> >  device-dax: relax the memblock size alignment for kmem_start
> >  arm64: fall back to vmemmap_populate_basepages if not aligned  with
> >    PMD_SIZE
> >
> > arch/arm64/mm/mmu.c    |  4 ++++
> > drivers/base/memory.c  | 24 ++++++++++++++++--------
> > drivers/dax/kmem.c     | 22 +++++++++++++---------
> > include/linux/ioport.h |  3 +++
> > kernel/resource.c      |  3 ++-
> > mm/memory_hotplug.c    | 39 ++++++++++++++++++++++++++++++++++++++-
> > mm/page_alloc.c        | 14 ++++++++++++++
> > 7 files changed, 90 insertions(+), 19 deletions(-)
> >
> > --
> > 2.17.1
> >

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Justin He <Justin.He@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Dave Jiang <dave.jiang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Steve Capper <Steve.Capper@arm.com>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	Hsin-Yi Wang <hsinyi@chromium.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Kees Cook <keescook@chromium.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Wei Yang <richardw.yang@linux.intel.com>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Ira Weiny <ira.weiny@intel.com>, Kaly Xin <Kaly.Xin@arm.com>
Subject: RE: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment
Date: Wed, 29 Jul 2020 08:27:58 +0000	[thread overview]
Message-ID: <AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <D1981D47-61F1-42E9-A426-6FEF0EC310C8@redhat.com>

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Wednesday, July 29, 2020 2:37 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Mike Rapoport <rppt@linux.ibm.com>; David
> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
> Will Deacon <will@kernel.org>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Rafael J. Wysocki <rafael@kernel.org>; Dave
> Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-foundation.org>;
> Steve Capper <Steve.Capper@arm.com>; Mark Rutland <Mark.Rutland@arm.com>;
> Logan Gunthorpe <logang@deltatee.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Hsin-Yi Wang <hsinyi@chromium.org>; Jason
> Gunthorpe <jgg@ziepe.ca>; Dave Hansen <dave.hansen@linux.intel.com>; Kees
> Cook <keescook@chromium.org>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-nvdimm@lists.01.org; linux-mm@kvack.org; Wei
> Yang <richardw.yang@linux.intel.com>; Pankaj Gupta
> <pankaj.gupta.linux@gmail.com>; Ira Weiny <ira.weiny@intel.com>; Kaly Xin
> <Kaly.Xin@arm.com>
> Subject: Re: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem
> alignment
> 
> 
> 
> > Am 29.07.2020 um 05:35 schrieb Jia He <justin.he@arm.com>:
> >
> > When enabling dax pmem as RAM device on arm64, I noticed that kmem_start
> > addr in dev_dax_kmem_probe() should be aligned w/
> SECTION_SIZE_BITS(30),i.e.
> > 1G memblock size. Even Dan Williams' sub-section patch series [1] had
> been
> > upstream merged, it was not helpful due to hard limitation of kmem_start:
> > $ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f
> -a 2M
> > $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> > $cat /proc/iomem
> > ...
> > 23c000000-23fffffff : System RAM
> >  23dd40000-23fecffff : reserved
> >  23fed0000-23fffffff : reserved
> > 240000000-33fdfffff : Persistent Memory
> >  240000000-2403fffff : namespace0.0
> >  280000000-2bfffffff : dax0.0          <- aligned with 1G boundary
> >    280000000-2bfffffff : System RAM
> > Hence there is a big gap between 0x2403fffff and 0x280000000 due to the
> 1G
> > alignment.
> >
> > Without this series, if qemu creates a 4G bytes nvdimm device, we can
> only
> > use 2G bytes for dax pmem(kmem) in the worst case.
> > e.g.
> > 240000000-33fdfffff : Persistent Memory
> > We can only use the memblock between [240000000, 2ffffffff] due to the
> hard
> > limitation. It wastes too much memory space.
> >
> > Decreasing the SECTION_SIZE_BITS on arm64 might be an alternative, but
> there
> > are too many concerns from other constraints, e.g. PAGE_SIZE, hugetlb,
> > SPARSEMEM_VMEMMAP, page bits in struct page ...
> >
> > Beside decreasing the SECTION_SIZE_BITS, we can also relax the kmem
> alignment
> > with memory_block_size_bytes().
> >
> > Tested on arm64 guest and x86 guest, qemu creates a 4G pmem device. dax
> pmem
> > can be used as ram with smaller gap. Also the kmem hotplug add/remove
> are both
> > tested on arm64/x86 guest.
> >
> 
> Hi,
> 
> I am not convinced this use case is worth such hacks (that’s what it is)
> for now. On real machines pmem is big - your example (losing 50% is
> extreme).
> 
> I would much rather want to see the section size on arm64 reduced. I
> remember there were patches and that at least with a base page size of 4k
> it can be reduced drastically (64k base pages are more problematic due to
> the ridiculous THP size of 512M). But could be a section size of 512 is
> possible on all configs right now.

Yes, I once investigated how to reduce section size on arm64 thoughtfully:
There are many constraints for reducing SECTION_SIZE_BITS
1. Given page->flags bits is limited, SECTION_SIZE_BITS can't be reduced too
   much.
2. Once CONFIG_SPARSEMEM_VMEMMAP is enabled, section id will not be counted
   into page->flags.
3. MAX_ORDER depends on SECTION_SIZE_BITS 
 - 3.1 mmzone.h
#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif
 - 3.2 hugepage_init()
MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);

Hence when ARM64_4K_PAGES && CONFIG_SPARSEMEM_VMEMMAP are enabled,
SECTION_SIZE_BITS can be reduced to 27.
But when ARM64_64K_PAGES, given 3.2, MAX_ORDER > 29-16 = 13.
Given 3.1 SECTION_SIZE_BITS >= MAX_ORDER+15 > 28. So SECTION_SIZE_BITS can not
be reduced to 27.

In one word, if we considered to reduce SECTION_SIZE_BITS on arm64, the Kconfig
might be very complicated,e.g. we still need to consider the case for
ARM64_16K_PAGES.

> 
> In the long term we might want to rework the memory block device model
> (eventually supporting old/new as discussed with Michal some time ago
> using a kernel parameter), dropping the fixed sizes

Has this been posted to Linux mm maillist? Sorry, searched and didn't find it.


--
Cheers,
Justin (Jia He)



> - allowing sizes / addresses aligned with subsection size
> - drastically reducing the number of devices for boot memory to only a
> hand full (e.g., one per resource / DIMM we can actually unplug again.
> 
> Long story short, I don’t like this hack.
> 
> 
> > This patch series (mainly patch6/6) is based on the fixing patch, ~v5.8-
> rc5 [2].
> >
> > [1] https://lkml.org/lkml/2019/6/19/67
> > [2] https://lkml.org/lkml/2020/7/8/1546
> > Jia He (6):
> >  mm/memory_hotplug: remove redundant memory block size alignment check
> >  resource: export find_next_iomem_res() helper
> >  mm/memory_hotplug: allow pmem kmem not to align with memory_block_size
> >  mm/page_alloc: adjust the start,end in dax pmem kmem case
> >  device-dax: relax the memblock size alignment for kmem_start
> >  arm64: fall back to vmemmap_populate_basepages if not aligned  with
> >    PMD_SIZE
> >
> > arch/arm64/mm/mmu.c    |  4 ++++
> > drivers/base/memory.c  | 24 ++++++++++++++++--------
> > drivers/dax/kmem.c     | 22 +++++++++++++---------
> > include/linux/ioport.h |  3 +++
> > kernel/resource.c      |  3 ++-
> > mm/memory_hotplug.c    | 39 ++++++++++++++++++++++++++++++++++++++-
> > mm/page_alloc.c        | 14 ++++++++++++++
> > 7 files changed, 90 insertions(+), 19 deletions(-)
> >
> > --
> > 2.17.1
> >


WARNING: multiple messages have this Message-ID (diff)
From: Justin He <Justin.He@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Dave Jiang <dave.jiang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Steve Capper <Steve.Capper@arm.com>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	Hsin-Yi Wang <hsinyi@chromium.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Kees Cook <keescook@chromium.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Wei Yang <richardw.yang@linux.intel.com>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Ira Weiny <ira.weiny@intel.com>, Kaly Xin <Kaly.Xin@arm.com>
Subject: RE: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment
Date: Wed, 29 Jul 2020 08:27:58 +0000	[thread overview]
Message-ID: <AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <D1981D47-61F1-42E9-A426-6FEF0EC310C8@redhat.com>

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Wednesday, July 29, 2020 2:37 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Mike Rapoport <rppt@linux.ibm.com>; David
> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
> Will Deacon <will@kernel.org>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Rafael J. Wysocki <rafael@kernel.org>; Dave
> Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-foundation.org>;
> Steve Capper <Steve.Capper@arm.com>; Mark Rutland <Mark.Rutland@arm.com>;
> Logan Gunthorpe <logang@deltatee.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Hsin-Yi Wang <hsinyi@chromium.org>; Jason
> Gunthorpe <jgg@ziepe.ca>; Dave Hansen <dave.hansen@linux.intel.com>; Kees
> Cook <keescook@chromium.org>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-nvdimm@lists.01.org; linux-mm@kvack.org; Wei
> Yang <richardw.yang@linux.intel.com>; Pankaj Gupta
> <pankaj.gupta.linux@gmail.com>; Ira Weiny <ira.weiny@intel.com>; Kaly Xin
> <Kaly.Xin@arm.com>
> Subject: Re: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem
> alignment
> 
> 
> 
> > Am 29.07.2020 um 05:35 schrieb Jia He <justin.he@arm.com>:
> >
> > When enabling dax pmem as RAM device on arm64, I noticed that kmem_start
> > addr in dev_dax_kmem_probe() should be aligned w/
> SECTION_SIZE_BITS(30),i.e.
> > 1G memblock size. Even Dan Williams' sub-section patch series [1] had
> been
> > upstream merged, it was not helpful due to hard limitation of kmem_start:
> > $ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f
> -a 2M
> > $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> > $cat /proc/iomem
> > ...
> > 23c000000-23fffffff : System RAM
> >  23dd40000-23fecffff : reserved
> >  23fed0000-23fffffff : reserved
> > 240000000-33fdfffff : Persistent Memory
> >  240000000-2403fffff : namespace0.0
> >  280000000-2bfffffff : dax0.0          <- aligned with 1G boundary
> >    280000000-2bfffffff : System RAM
> > Hence there is a big gap between 0x2403fffff and 0x280000000 due to the
> 1G
> > alignment.
> >
> > Without this series, if qemu creates a 4G bytes nvdimm device, we can
> only
> > use 2G bytes for dax pmem(kmem) in the worst case.
> > e.g.
> > 240000000-33fdfffff : Persistent Memory
> > We can only use the memblock between [240000000, 2ffffffff] due to the
> hard
> > limitation. It wastes too much memory space.
> >
> > Decreasing the SECTION_SIZE_BITS on arm64 might be an alternative, but
> there
> > are too many concerns from other constraints, e.g. PAGE_SIZE, hugetlb,
> > SPARSEMEM_VMEMMAP, page bits in struct page ...
> >
> > Beside decreasing the SECTION_SIZE_BITS, we can also relax the kmem
> alignment
> > with memory_block_size_bytes().
> >
> > Tested on arm64 guest and x86 guest, qemu creates a 4G pmem device. dax
> pmem
> > can be used as ram with smaller gap. Also the kmem hotplug add/remove
> are both
> > tested on arm64/x86 guest.
> >
> 
> Hi,
> 
> I am not convinced this use case is worth such hacks (that’s what it is)
> for now. On real machines pmem is big - your example (losing 50% is
> extreme).
> 
> I would much rather want to see the section size on arm64 reduced. I
> remember there were patches and that at least with a base page size of 4k
> it can be reduced drastically (64k base pages are more problematic due to
> the ridiculous THP size of 512M). But could be a section size of 512 is
> possible on all configs right now.

Yes, I once investigated how to reduce section size on arm64 thoughtfully:
There are many constraints for reducing SECTION_SIZE_BITS
1. Given page->flags bits is limited, SECTION_SIZE_BITS can't be reduced too
   much.
2. Once CONFIG_SPARSEMEM_VMEMMAP is enabled, section id will not be counted
   into page->flags.
3. MAX_ORDER depends on SECTION_SIZE_BITS 
 - 3.1 mmzone.h
#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif
 - 3.2 hugepage_init()
MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);

Hence when ARM64_4K_PAGES && CONFIG_SPARSEMEM_VMEMMAP are enabled,
SECTION_SIZE_BITS can be reduced to 27.
But when ARM64_64K_PAGES, given 3.2, MAX_ORDER > 29-16 = 13.
Given 3.1 SECTION_SIZE_BITS >= MAX_ORDER+15 > 28. So SECTION_SIZE_BITS can not
be reduced to 27.

In one word, if we considered to reduce SECTION_SIZE_BITS on arm64, the Kconfig
might be very complicated,e.g. we still need to consider the case for
ARM64_16K_PAGES.

> 
> In the long term we might want to rework the memory block device model
> (eventually supporting old/new as discussed with Michal some time ago
> using a kernel parameter), dropping the fixed sizes

Has this been posted to Linux mm maillist? Sorry, searched and didn't find it.


--
Cheers,
Justin (Jia He)



> - allowing sizes / addresses aligned with subsection size
> - drastically reducing the number of devices for boot memory to only a
> hand full (e.g., one per resource / DIMM we can actually unplug again.
> 
> Long story short, I don’t like this hack.
> 
> 
> > This patch series (mainly patch6/6) is based on the fixing patch, ~v5.8-
> rc5 [2].
> >
> > [1] https://lkml.org/lkml/2019/6/19/67
> > [2] https://lkml.org/lkml/2020/7/8/1546
> > Jia He (6):
> >  mm/memory_hotplug: remove redundant memory block size alignment check
> >  resource: export find_next_iomem_res() helper
> >  mm/memory_hotplug: allow pmem kmem not to align with memory_block_size
> >  mm/page_alloc: adjust the start,end in dax pmem kmem case
> >  device-dax: relax the memblock size alignment for kmem_start
> >  arm64: fall back to vmemmap_populate_basepages if not aligned  with
> >    PMD_SIZE
> >
> > arch/arm64/mm/mmu.c    |  4 ++++
> > drivers/base/memory.c  | 24 ++++++++++++++++--------
> > drivers/dax/kmem.c     | 22 +++++++++++++---------
> > include/linux/ioport.h |  3 +++
> > kernel/resource.c      |  3 ++-
> > mm/memory_hotplug.c    | 39 ++++++++++++++++++++++++++++++++++++++-
> > mm/page_alloc.c        | 14 ++++++++++++++
> > 7 files changed, 90 insertions(+), 19 deletions(-)
> >
> > --
> > 2.17.1
> >


WARNING: multiple messages have this Message-ID (diff)
From: Justin He <Justin.He@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>, Kaly Xin <Kaly.Xin@arm.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Will Deacon <will@kernel.org>, Dave Jiang <dave.jiang@intel.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Kees Cook <keescook@chromium.org>,
	Ira Weiny <ira.weiny@intel.com>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	Hsin-Yi Wang <hsinyi@chromium.org>,
	Dan Williams <dan.j.williams@intel.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Steve Capper <Steve.Capper@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Wei Yang <richardw.yang@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Logan Gunthorpe <logang@deltatee.com>
Subject: RE: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment
Date: Wed, 29 Jul 2020 08:27:58 +0000	[thread overview]
Message-ID: <AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <D1981D47-61F1-42E9-A426-6FEF0EC310C8@redhat.com>

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Wednesday, July 29, 2020 2:37 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Mike Rapoport <rppt@linux.ibm.com>; David
> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
> Will Deacon <will@kernel.org>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Rafael J. Wysocki <rafael@kernel.org>; Dave
> Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-foundation.org>;
> Steve Capper <Steve.Capper@arm.com>; Mark Rutland <Mark.Rutland@arm.com>;
> Logan Gunthorpe <logang@deltatee.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Hsin-Yi Wang <hsinyi@chromium.org>; Jason
> Gunthorpe <jgg@ziepe.ca>; Dave Hansen <dave.hansen@linux.intel.com>; Kees
> Cook <keescook@chromium.org>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-nvdimm@lists.01.org; linux-mm@kvack.org; Wei
> Yang <richardw.yang@linux.intel.com>; Pankaj Gupta
> <pankaj.gupta.linux@gmail.com>; Ira Weiny <ira.weiny@intel.com>; Kaly Xin
> <Kaly.Xin@arm.com>
> Subject: Re: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem
> alignment
> 
> 
> 
> > Am 29.07.2020 um 05:35 schrieb Jia He <justin.he@arm.com>:
> >
> > When enabling dax pmem as RAM device on arm64, I noticed that kmem_start
> > addr in dev_dax_kmem_probe() should be aligned w/
> SECTION_SIZE_BITS(30),i.e.
> > 1G memblock size. Even Dan Williams' sub-section patch series [1] had
> been
> > upstream merged, it was not helpful due to hard limitation of kmem_start:
> > $ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f
> -a 2M
> > $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> > $cat /proc/iomem
> > ...
> > 23c000000-23fffffff : System RAM
> >  23dd40000-23fecffff : reserved
> >  23fed0000-23fffffff : reserved
> > 240000000-33fdfffff : Persistent Memory
> >  240000000-2403fffff : namespace0.0
> >  280000000-2bfffffff : dax0.0          <- aligned with 1G boundary
> >    280000000-2bfffffff : System RAM
> > Hence there is a big gap between 0x2403fffff and 0x280000000 due to the
> 1G
> > alignment.
> >
> > Without this series, if qemu creates a 4G bytes nvdimm device, we can
> only
> > use 2G bytes for dax pmem(kmem) in the worst case.
> > e.g.
> > 240000000-33fdfffff : Persistent Memory
> > We can only use the memblock between [240000000, 2ffffffff] due to the
> hard
> > limitation. It wastes too much memory space.
> >
> > Decreasing the SECTION_SIZE_BITS on arm64 might be an alternative, but
> there
> > are too many concerns from other constraints, e.g. PAGE_SIZE, hugetlb,
> > SPARSEMEM_VMEMMAP, page bits in struct page ...
> >
> > Beside decreasing the SECTION_SIZE_BITS, we can also relax the kmem
> alignment
> > with memory_block_size_bytes().
> >
> > Tested on arm64 guest and x86 guest, qemu creates a 4G pmem device. dax
> pmem
> > can be used as ram with smaller gap. Also the kmem hotplug add/remove
> are both
> > tested on arm64/x86 guest.
> >
> 
> Hi,
> 
> I am not convinced this use case is worth such hacks (that’s what it is)
> for now. On real machines pmem is big - your example (losing 50% is
> extreme).
> 
> I would much rather want to see the section size on arm64 reduced. I
> remember there were patches and that at least with a base page size of 4k
> it can be reduced drastically (64k base pages are more problematic due to
> the ridiculous THP size of 512M). But could be a section size of 512 is
> possible on all configs right now.

Yes, I once investigated how to reduce section size on arm64 thoughtfully:
There are many constraints for reducing SECTION_SIZE_BITS
1. Given page->flags bits is limited, SECTION_SIZE_BITS can't be reduced too
   much.
2. Once CONFIG_SPARSEMEM_VMEMMAP is enabled, section id will not be counted
   into page->flags.
3. MAX_ORDER depends on SECTION_SIZE_BITS 
 - 3.1 mmzone.h
#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif
 - 3.2 hugepage_init()
MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);

Hence when ARM64_4K_PAGES && CONFIG_SPARSEMEM_VMEMMAP are enabled,
SECTION_SIZE_BITS can be reduced to 27.
But when ARM64_64K_PAGES, given 3.2, MAX_ORDER > 29-16 = 13.
Given 3.1 SECTION_SIZE_BITS >= MAX_ORDER+15 > 28. So SECTION_SIZE_BITS can not
be reduced to 27.

In one word, if we considered to reduce SECTION_SIZE_BITS on arm64, the Kconfig
might be very complicated,e.g. we still need to consider the case for
ARM64_16K_PAGES.

> 
> In the long term we might want to rework the memory block device model
> (eventually supporting old/new as discussed with Michal some time ago
> using a kernel parameter), dropping the fixed sizes

Has this been posted to Linux mm maillist? Sorry, searched and didn't find it.


--
Cheers,
Justin (Jia He)



> - allowing sizes / addresses aligned with subsection size
> - drastically reducing the number of devices for boot memory to only a
> hand full (e.g., one per resource / DIMM we can actually unplug again.
> 
> Long story short, I don’t like this hack.
> 
> 
> > This patch series (mainly patch6/6) is based on the fixing patch, ~v5.8-
> rc5 [2].
> >
> > [1] https://lkml.org/lkml/2019/6/19/67
> > [2] https://lkml.org/lkml/2020/7/8/1546
> > Jia He (6):
> >  mm/memory_hotplug: remove redundant memory block size alignment check
> >  resource: export find_next_iomem_res() helper
> >  mm/memory_hotplug: allow pmem kmem not to align with memory_block_size
> >  mm/page_alloc: adjust the start,end in dax pmem kmem case
> >  device-dax: relax the memblock size alignment for kmem_start
> >  arm64: fall back to vmemmap_populate_basepages if not aligned  with
> >    PMD_SIZE
> >
> > arch/arm64/mm/mmu.c    |  4 ++++
> > drivers/base/memory.c  | 24 ++++++++++++++++--------
> > drivers/dax/kmem.c     | 22 +++++++++++++---------
> > include/linux/ioport.h |  3 +++
> > kernel/resource.c      |  3 ++-
> > mm/memory_hotplug.c    | 39 ++++++++++++++++++++++++++++++++++++++-
> > mm/page_alloc.c        | 14 ++++++++++++++
> > 7 files changed, 90 insertions(+), 19 deletions(-)
> >
> > --
> > 2.17.1
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-29  8:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-29  3:34 [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment Jia He
2020-07-29  3:34 ` Jia He
2020-07-29  3:34 ` Jia He
2020-07-29  3:34 ` [RFC PATCH 1/6] mm/memory_hotplug: remove redundant memory block size alignment check Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34 ` [RFC PATCH 2/6] resource: export find_next_iomem_res() helper Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34 ` [RFC PATCH 3/6] mm/memory_hotplug: allow pmem kmem not to align with memory_block_size Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34 ` [RFC PATCH 4/6] mm/page_alloc: adjust the start,end in dax pmem kmem case Jia He
2020-07-29  3:34   ` [RFC PATCH 4/6] mm/page_alloc: adjust the start, end " Jia He
2020-07-29  3:34   ` [RFC PATCH 4/6] mm/page_alloc: adjust the start,end " Jia He
2020-07-29  3:34 ` [RFC PATCH 5/6] device-dax: relax the memblock size alignment for kmem_start Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34 ` [RFC PATCH 6/6] arm64: fall back to vmemmap_populate_basepages if not aligned with PMD_SIZE Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  3:34   ` Jia He
2020-07-29  6:36 ` [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment David Hildenbrand
2020-07-29  6:36   ` David Hildenbrand
2020-07-29  6:36   ` David Hildenbrand
2020-07-29  8:27   ` Justin He [this message]
2020-07-29  8:27     ` Justin He
2020-07-29  8:27     ` Justin He
2020-07-29  8:27     ` Justin He
2020-07-29  8:44     ` David Hildenbrand
2020-07-29  8:44       ` David Hildenbrand
2020-07-29  8:44       ` David Hildenbrand
2020-07-29  8:44       ` David Hildenbrand
2020-07-29  9:31     ` Mike Rapoport
2020-07-29  9:31       ` Mike Rapoport
2020-07-29  9:31       ` Mike Rapoport
2020-07-29  9:31       ` Mike Rapoport
2020-07-29  9:35       ` David Hildenbrand
2020-07-29  9:35         ` David Hildenbrand
2020-07-29  9:35         ` David Hildenbrand
2020-07-29  9:35         ` David Hildenbrand
2020-07-29 13:00         ` Mike Rapoport
2020-07-29 13:00           ` Mike Rapoport
2020-07-29 13:00           ` Mike Rapoport
2020-07-29 13:00           ` Mike Rapoport
2020-07-29 13:03           ` David Hildenbrand
2020-07-29 13:03             ` David Hildenbrand
2020-07-29 13:03             ` David Hildenbrand
2020-07-29 13:03             ` David Hildenbrand
2020-07-29 14:12             ` Mike Rapoport
2020-07-29 14:12               ` Mike Rapoport
2020-07-29 14:12               ` Mike Rapoport
2020-07-29 14:12               ` Mike Rapoport
2020-07-30  2:17         ` Justin He
2020-07-30  2:17           ` Justin He
2020-07-30  2:17           ` Justin He
2020-07-30  2:17           ` Justin He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM6PR08MB40690714A2E77A7128B2B2ADF7700@AM6PR08MB4069.eurprd08.prod.outlook.com \
    --to=justin.he@arm.com \
    --cc=Anshuman.Khandual@arm.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=Kaly.Xin@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=Steve.Capper@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hsinyi@chromium.org \
    --cc=jgg@ziepe.ca \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=rafael@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.