linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P
@ 2019-03-27 21:36 Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 1/7] RISC-V: Implement sparsemem Logan Gunthorpe
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe

Hi,

This patchset enables P2P on the RISC-V architecture. To do this on the
current kernel, we only need to be able to back IO memory with struct
pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
IO memory regions in hardware can be covered by the linear region
so that there is a linear relation ship between the virtual address and
the struct page address in the vmemmap region.

While our reason to do this work is for P2P, these features are all
useful, more generally, and also enable other kernel features.

The first patch in the series implements sparse mem. It was already
submitted and reviewed last cycle, only forgotten. It has been rebased
onto v5.1-rc2.

Patches 2 through 4 rework the architecture's virtual address space
mapping trying to get as much of the IO regions covered by the linear
mapping. With Sv39, we do not have enough address space to cover all the
typical hardware regions but we can get the majority of it.

Patch 5 and 6 implement memory hotplug and remove. These are relatively
straight forward additions similar to other arches.

Patch 7 implements pte_devmap which allows us to set
ARCH_HAS_ZONE_DEVICE.

The patchset was tested in QEMU and on a HiFive Unleashed board.
However, we were unable to actually test P2P transactions with this
exact set because we have been unable to get PCI working with v5.1-rc2.
We were able to get it running on a 4.19 era kernel (with a bunch of
out-of-tree patches for PCI on a Microsemi PolarFire board).

This series is based on v5.1-rc2 and a git tree is available here:

https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1

Thanks,

Logan

--

Logan Gunthorpe (7):
  RISC-V: Implement sparsemem
  RISC-V: doc: Add file describing the virtual memory map
  RISC-V: Rework kernel's virtual address space mapping
  RISC-V: Update page tables to cover the whole linear mapping
  RISC-V: Implement memory hotplug
  RISC-V: Implement memory hot remove
  RISC-V: Implement pte_devmap()

 Documentation/riscv/mm.txt            |  24 +++
 arch/riscv/Kconfig                    |  32 +++-
 arch/riscv/include/asm/page.h         |   2 -
 arch/riscv/include/asm/pgtable-64.h   |   2 +
 arch/riscv/include/asm/pgtable-bits.h |   8 +-
 arch/riscv/include/asm/pgtable.h      |  45 ++++-
 arch/riscv/include/asm/sparsemem.h    |  11 ++
 arch/riscv/kernel/setup.c             |   1 -
 arch/riscv/mm/init.c                  | 251 ++++++++++++++++++++++++--
 9 files changed, 354 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/riscv/mm.txt
 create mode 100644 arch/riscv/include/asm/sparsemem.h

--
2.20.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] RISC-V: Implement sparsemem
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map Logan Gunthorpe
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Andrew Waterman, Olof Johansson, Michael Clark,
	Rob Herring, Zong Li

This patch implements sparsemem support for risc-v which helps pave the
way for memory hotplug and eventually P2P support.

We introduce Kconfig options for virtual and physical address bits which
are used to calculate the size of the vmemmap and set the
MAX_PHYSMEM_BITS.

The vmemmap is located directly before the VMALLOC region and sized
such that we can allocate enough pages to populate all the virtual
address space in the system (similar to the way it's done in arm64).

During initialization, call memblocks_present() and sparse_init(),
and provide a stub for vmemmap_populate() (all of which is similar to
arm64).

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andrew Waterman <andrew@sifive.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Michael Clark <michaeljclark@mac.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Zong Li <zong@andestech.com>
---
 arch/riscv/Kconfig                 | 23 +++++++++++++++++++++++
 arch/riscv/include/asm/pgtable.h   | 21 +++++++++++++++++----
 arch/riscv/include/asm/sparsemem.h | 11 +++++++++++
 arch/riscv/mm/init.c               | 11 +++++++++++
 4 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/include/asm/sparsemem.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index eb56c82d8aa1..76fc340ae38e 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,12 +57,32 @@ config ZONE_DMA32
 	bool
 	default y if 64BIT
 
+config VA_BITS
+	int
+	default 32 if 32BIT
+	default 39 if 64BIT
+
+config PA_BITS
+	int
+	default 34 if 32BIT
+	default 56 if 64BIT
+
 config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
 	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
 	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
 
+config ARCH_FLATMEM_ENABLE
+	def_bool y
+
+config ARCH_SPARSEMEM_ENABLE
+	def_bool y
+	select SPARSEMEM_VMEMMAP_ENABLE
+
+config ARCH_SELECT_MEMORY_MODEL
+	def_bool ARCH_SPARSEMEM_ENABLE
+
 config STACKTRACE_SUPPORT
 	def_bool y
 
@@ -97,6 +117,9 @@ config PGTABLE_LEVELS
 	default 3 if 64BIT
 	default 2
 
+config HAVE_ARCH_PFN_VALID
+	def_bool y
+
 menu "Platform type"
 
 choice
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1141364d990e..5a9fea00ba09 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -89,6 +89,23 @@ extern pgd_t swapper_pg_dir[];
 #define __S110	PAGE_SHARED_EXEC
 #define __S111	PAGE_SHARED_EXEC
 
+#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
+#define VMALLOC_END      (PAGE_OFFSET - 1)
+#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
+
+/*
+ * Roughly size the vmemmap space to be large enough to fit enough
+ * struct pages to map half the virtual address space. Then
+ * position vmemmap directly below the VMALLOC region.
+ */
+#define VMEMMAP_SHIFT \
+	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+#define VMEMMAP_SIZE	(1UL << VMEMMAP_SHIFT)
+#define VMEMMAP_END	(VMALLOC_START - 1)
+#define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
+
+#define vmemmap		((struct page *)VMEMMAP_START)
+
 /*
  * ZERO_PAGE is a global shared page that is always zero,
  * used for zero-mapped memory areas, etc.
@@ -412,10 +429,6 @@ static inline void pgtable_cache_init(void)
 	/* No page table caches to initialize */
 }
 
-#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END      (PAGE_OFFSET - 1)
-#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
-
 /*
  * Task size is 0x40000000000 for RV64 or 0xb800000 for RV32.
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
diff --git a/arch/riscv/include/asm/sparsemem.h b/arch/riscv/include/asm/sparsemem.h
new file mode 100644
index 000000000000..b58ba2d9ed6e
--- /dev/null
+++ b/arch/riscv/include/asm/sparsemem.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_SPARSEMEM_H
+#define __ASM_SPARSEMEM_H
+
+#ifdef CONFIG_SPARSEMEM
+#define MAX_PHYSMEM_BITS	CONFIG_PA_BITS
+#define SECTION_SIZE_BITS	27
+#endif /* CONFIG_SPARSEMEM */
+
+#endif /* __ASM_SPARSEMEM_H */
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b379a75ac6a6..b9d50031e78f 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -141,6 +141,9 @@ void __init setup_bootmem(void)
 				  PFN_PHYS(end_pfn - start_pfn),
 				  &memblock.memory, 0);
 	}
+
+	memblocks_present();
+	sparse_init();
 }
 
 pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
@@ -224,3 +227,11 @@ asmlinkage void __init setup_vm(void)
 				__pgprot(_PAGE_TABLE));
 #endif
 }
+
+#ifdef CONFIG_SPARSEMEM
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+			       struct vmem_altmap *altmap)
+{
+	return vmemmap_populate_basepages(start, end, node);
+}
+#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 1/7] RISC-V: Implement sparsemem Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-28 11:49   ` Mike Rapoport
  2019-03-27 21:36 ` [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping Logan Gunthorpe
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Jonathan Corbet

This file is similar to the x86_64 equivalent (in
Documentation/x86/x86_64/mm.txt) and describes the virtuas address space
usage for RISC-V.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
---
 Documentation/riscv/mm.txt | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/riscv/mm.txt

diff --git a/Documentation/riscv/mm.txt b/Documentation/riscv/mm.txt
new file mode 100644
index 000000000000..725dc85f2c65
--- /dev/null
+++ b/Documentation/riscv/mm.txt
@@ -0,0 +1,24 @@
+Sv32:
+
+00000000 - 7fffffff   user space, different per mm (2G)
+80000000 - 81ffffff   virtual memory map (32MB)
+82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
+c0000000 - ffffffff   direct mapping of lower phys. memory (1GB)
+
+Sv39:
+
+0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
+hole caused by [38:63] sign extension
+ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
+ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
+ffffffd100000000 - ffffffffffffffff  linear mapping of physical space (188GB)
+  ffffffd200000000 - 0xfffffff200000000 linear mapping of all physical memory
+
+The RISC-V architecture defines virtual address bits in multiples of nine
+starting from 39. These are referred to as Sv39, Sv48, Sv57 and Sv64.
+Currently only Sv39 is supported. Bits 63 through to the most-significant
+implemented bit are sign extended. This causes a hole between user space
+and kernel addresses if you interpret them as unsigned.
+
+The direct mapping covers as much of the physical memory space as
+possible so that it may cover some IO memory.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 1/7] RISC-V: Implement sparsemem Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-28  5:39   ` Palmer Dabbelt
  2019-03-27 21:36 ` [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping Logan Gunthorpe
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Antony Pavlov, Stefan O'Rear, Anup Patel

The motivation for this is to support P2P transactions. P2P requires
having struct pages for IO memory which means the linear mapping must
be able to cover all of the IO regions. Unfortunately with Sv39 we are
not able to cover all the IO regions available on existing hardware,
but we can do better than what we currently do (which only cover's
physical memory).

To this end, we restructure the kernel's virtual address space region.
We position the vmemmap at the beginning of the region (based on how
many virtual address bits we have) and the VMALLOC region comes
immediately after. The linear mapping then takes up the remaining space.
PAGE_OFFSET will need to be within the linear mapping but may not be
the start of the mapping seeing many machines don't have RAM at address
zero and we may still want to access lower addresses through the
linear mapping.

With these changes, on a 64-bit system the virtual memory map (with
sparsemem enabled) will be:

32-bit:

00000000 - 7fffffff   user space, different per mm (2G)
80000000 - 81ffffff   virtual memory map (32MB)
82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
c0000000 - ffffffff   direct mapping of all phys. memory (1GB)

64-bit, Sv39:

0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
hole caused by [38:63] sign extension
ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)

On the Sifive hardware this allows us to provide struct pages for
the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
and 172GB of 240GB of the high I/O TileLink region. Once we progress to
Sv48 we should be able to cover all the available memory regions..

For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
2GB of address space, so we cannot cover the any of the I/O regions
that are higher than it but we do cover the lower I/O TileLink range.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Antony Pavlov <antonynpavlov@gmail.com>
Cc: "Stefan O'Rear" <sorear2@gmail.com>
Cc: Anup Patel <anup.patel@wdc.com>
---
 arch/riscv/Kconfig               |  2 +-
 arch/riscv/include/asm/page.h    |  2 --
 arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 76fc340ae38e..d21e6a12e8b6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -71,7 +71,7 @@ config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
 	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2a546a52f02a..fa0b8058a246 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,8 +31,6 @@
  */
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
-
 #ifndef __ASSEMBLY__
 
 #define PAGE_UP(addr)	(((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 5a9fea00ba09..2a5070540996 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
 #define __S110	PAGE_SHARED_EXEC
 #define __S111	PAGE_SHARED_EXEC
 
-#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END      (PAGE_OFFSET - 1)
-#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
+#define KERN_SPACE_START	(-1UL << (CONFIG_VA_BITS - 1))
 
 /*
  * Roughly size the vmemmap space to be large enough to fit enough
  * struct pages to map half the virtual address space. Then
  * position vmemmap directly below the VMALLOC region.
  */
-#define VMEMMAP_SHIFT \
-	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
-#define VMEMMAP_SIZE	(1UL << VMEMMAP_SHIFT)
-#define VMEMMAP_END	(VMALLOC_START - 1)
-#define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
-
+#ifdef CONFIG_SPARSEMEM
+#define VMEMMAP_SIZE	(UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
+				   STRUCT_PAGE_MAX_SHIFT))
+#define VMEMMAP_START	(KERN_SPACE_START)
+#define VMEMMAP_END	(VMEMMAP_START + VMEMMAP_SIZE - 1)
 #define vmemmap		((struct page *)VMEMMAP_START)
+#else
+#define VMEMMAP_END	KERN_SPACE_START
+#endif
+
+#ifdef CONFIG_32BIT
+#define VMALLOC_SIZE	((1UL << 30) - VMEMMAP_SIZE)
+#else
+#define VMALLOC_SIZE	(64UL << 30)
+#endif
+
+#define VMALLOC_START	(VMEMMAP_END + 1)
+#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE - 1)
 
 /*
  * ZERO_PAGE is a global shared page that is always zero,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
                   ` (2 preceding siblings ...)
  2019-03-27 21:36 ` [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-28 10:03   ` Anup Patel
  2019-03-27 21:36 ` [PATCH 5/7] RISC-V: Implement memory hotplug Logan Gunthorpe
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Anup Patel, Atish Patra, Paul Walmsley, Zong Li,
	Mike Rapoport

With the new virtual address changes in an earlier patch, we want the
page tables to cover more of the linear mapping region. Instead of
only mapping from PAGE_OFFSET and up, we instead map starting
from an aligned version of va_pa_offset such that all of the physical
address space will be mapped.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Zong Li <zongbox@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/riscv/kernel/setup.c |  1 -
 arch/riscv/mm/init.c      | 27 +++++++++++++++------------
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index ecb654f6a79e..8286df8be31a 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -59,7 +59,6 @@ EXPORT_SYMBOL(empty_zero_page);
 /* The lucky hart to first increment this variable will boot the other cores */
 atomic_t hart_lottery;
 unsigned long boot_cpu_hartid;
-
 void __init parse_dtb(unsigned int hartid, void *dtb)
 {
 	if (early_init_dt_scan(__va(dtb)))
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b9d50031e78f..315194557c3d 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -150,8 +150,8 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 pgd_t trampoline_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
 
 #ifndef __PAGETABLE_PMD_FOLDED
-#define NUM_SWAPPER_PMDS ((uintptr_t)-PAGE_OFFSET >> PGDIR_SHIFT)
-pmd_t swapper_pmd[PTRS_PER_PMD*((-PAGE_OFFSET)/PGDIR_SIZE)] __page_aligned_bss;
+#define NUM_SWAPPER_PMDS ((uintptr_t)-VMALLOC_END >> PGDIR_SHIFT)
+pmd_t swapper_pmd[PTRS_PER_PMD*((-VMALLOC_END)/PGDIR_SIZE)] __page_aligned_bss;
 pmd_t trampoline_pmd[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
 pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
 #endif
@@ -180,13 +180,18 @@ asmlinkage void __init setup_vm(void)
 	extern char _start;
 	uintptr_t i;
 	uintptr_t pa = (uintptr_t) &_start;
+	uintptr_t linear_start;
+	uintptr_t off;
 	pgprot_t prot = __pgprot(pgprot_val(PAGE_KERNEL) | _PAGE_EXEC);
 
 	va_pa_offset = PAGE_OFFSET - pa;
 	pfn_base = PFN_DOWN(pa);
 
+	linear_start = ALIGN_DOWN(va_pa_offset, PGDIR_SIZE);
+	off = linear_start - va_pa_offset;
+
 	/* Sanity check alignment and size */
-	BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
+	BUG_ON(linear_start <= VMALLOC_END);
 	BUG_ON((pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0);
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -195,15 +200,14 @@ asmlinkage void __init setup_vm(void)
 			__pgprot(_PAGE_TABLE));
 	trampoline_pmd[0] = pfn_pmd(PFN_DOWN(pa), prot);
 
-	for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
-		size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
-
+	for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
+		size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
 		swapper_pg_dir[o] =
 			pfn_pgd(PFN_DOWN((uintptr_t)swapper_pmd) + i,
 				__pgprot(_PAGE_TABLE));
 	}
 	for (i = 0; i < ARRAY_SIZE(swapper_pmd); i++)
-		swapper_pmd[i] = pfn_pmd(PFN_DOWN(pa + i * PMD_SIZE), prot);
+		swapper_pmd[i] = pfn_pmd(PFN_DOWN(off + i * PMD_SIZE), prot);
 
 	swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
 		pfn_pgd(PFN_DOWN((uintptr_t)fixmap_pmd),
@@ -215,11 +219,10 @@ asmlinkage void __init setup_vm(void)
 	trampoline_pg_dir[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] =
 		pfn_pgd(PFN_DOWN(pa), prot);
 
-	for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
-		size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
-
-		swapper_pg_dir[o] =
-			pfn_pgd(PFN_DOWN(pa + i * PGDIR_SIZE), prot);
+	for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
+		size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
+		swapper_pg_dir[o] = pfn_pgd(PFN_DOWN(off + i * PGDIR_SIZE),
+					    prot);
 	}
 
 	swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] RISC-V: Implement memory hotplug
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
                   ` (3 preceding siblings ...)
  2019-03-27 21:36 ` [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 6/7] RISC-V: Implement memory hot remove Logan Gunthorpe
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Mike Rapoport, Anup Patel, Atish Patra,
	Paul Walmsley, Zong Li, Guo Ren

In order to define ARCH_ENABLE_MEMORY_HOTPLUG we need to implement
arch_add_memory() and vmemmap_free().

arch_add_memory() is very similar to the x86 versions except we
don't need to fuss with the mapping as we've already mapped the
entire linear region in riscv.

For now, vmemmap_free() is empty which is similar to other arches that
don't implement hot remove.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Zong Li <zong@andestech.com>
Cc: Guo Ren <ren_guo@c-sky.com>
---
 arch/riscv/Kconfig   |  3 +++
 arch/riscv/mm/init.c | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d21e6a12e8b6..9477214a00e7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -83,6 +83,9 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool ARCH_SPARSEMEM_ENABLE
 
+config ARCH_ENABLE_MEMORY_HOTPLUG
+	def_bool y
+
 config STACKTRACE_SUPPORT
 	def_bool y
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 315194557c3d..0a54c3adf0ac 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -238,3 +238,30 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return vmemmap_populate_basepages(start, end, node);
 }
 #endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+void vmemmap_free(unsigned long start, unsigned long end,
+		  struct vmem_altmap *altmap)
+{
+}
+
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		    bool want_memblock)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int ret;
+
+	if ((start + size) > -va_pa_offset) {
+		pr_err("Cannot hotplug memory from %08llx to %08llx as it doesn't fall within the linear mapping\n",
+		       start, start + size);
+		return -EFAULT;
+	}
+
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	WARN_ON_ONCE(ret);
+
+	return ret;
+}
+
+#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] RISC-V: Implement memory hot remove
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
                   ` (4 preceding siblings ...)
  2019-03-27 21:36 ` [PATCH 5/7] RISC-V: Implement memory hotplug Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-03-27 21:36 ` [PATCH 7/7] RISC-V: Implement pte_devmap() Logan Gunthorpe
  2019-04-24 23:23 ` [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Palmer Dabbelt
  7 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Mike Rapoport, Stefan O'Rear, Anup Patel,
	Zong Li, Guo Ren

Implementing arch_remove_memory() and filling in vmemap_free() allows
us to declare ARCH_ENABLE_MEMORY_HOTREMOVE.

arch_remove_memory() is very similar to x86 and we roughly copy the
remove_pagetable() function from x86 but with a bunch of the unnecessary
features stripped out.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: "Stefan O'Rear" <sorear2@gmail.com>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Zong Li <zong@andestech.com>
Cc: Guo Ren <ren_guo@c-sky.com>
---
 arch/riscv/Kconfig                  |   3 +
 arch/riscv/include/asm/pgtable-64.h |   2 +
 arch/riscv/include/asm/pgtable.h    |   5 +
 arch/riscv/mm/init.c                | 186 ++++++++++++++++++++++++++++
 4 files changed, 196 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9477214a00e7..2cb39b4d6d6b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -86,6 +86,9 @@ config ARCH_SELECT_MEMORY_MODEL
 config ARCH_ENABLE_MEMORY_HOTPLUG
 	def_bool y
 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+	def_bool y
+
 config STACKTRACE_SUPPORT
 	def_bool y
 
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 7aa0ea9bd8bb..d369be5467cf 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -67,6 +67,8 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
 }
 
 #define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
 
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 2a5070540996..e071e2be3a6c 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -173,6 +173,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 	return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
 }
 
+static inline struct page *pud_page(pud_t pud)
+{
+	return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
+}
+
 /* Yields the page frame number (PFN) of a page table entry */
 static inline unsigned long pte_pfn(pte_t pte)
 {
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 0a54c3adf0ac..fffe1238434e 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -240,9 +240,175 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static void __meminit free_pagetable(struct page *page, int order)
+{
+	unsigned long magic;
+	unsigned int nr_pages = 1 << order;
+
+	/* bootmem page has reserved flag */
+	if (PageReserved(page)) {
+		__ClearPageReserved(page);
+
+		magic = (unsigned long)page->freelist;
+		if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
+			while (nr_pages--)
+				put_page_bootmem(page++);
+		} else {
+			while (nr_pages--)
+				free_reserved_page(page++);
+		}
+	} else {
+		free_pages((unsigned long)page_address(page), order);
+	}
+}
+
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+	pte_t *pte;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		pte = pte_start + i;
+		if (!pte_none(*pte))
+			return;
+	}
+
+	/* free a pte table */
+	free_pagetable(pmd_page(*pmd), 0);
+	spin_lock(&init_mm.page_table_lock);
+	pmd_clear(pmd);
+	spin_unlock(&init_mm.page_table_lock);
+}
+
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+	pmd_t *pmd;
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd = pmd_start + i;
+		if (!pmd_none(*pmd))
+			return;
+	}
+
+	/* free a pmd table */
+	free_pagetable(pud_page(*pud), 0);
+	spin_lock(&init_mm.page_table_lock);
+	pud_clear(pud);
+	spin_unlock(&init_mm.page_table_lock);
+}
+
+static void __meminit
+remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte;
+
+	pte = pte_start + pte_index(addr);
+	for (; addr < end; addr = next, pte++) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		if (next > end)
+			next = end;
+
+		if (!pte_present(*pte))
+			continue;
+
+		free_pagetable(pte_page(*pte), 0);
+
+		spin_lock(&init_mm.page_table_lock);
+		pte_clear(&init_mm, addr, pte);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+
+	flush_tlb_all();
+}
+
+static void __meminit
+remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pte_t *pte_base;
+	pmd_t *pmd;
+
+	pmd = pmd_start + pmd_index(addr);
+	for (; addr < end; addr = next, pmd++) {
+		next = pmd_addr_end(addr, end);
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
+		remove_pte_table(pte_base, addr, next);
+		free_pte_table(pte_base, pmd);
+	}
+}
+
+static void __meminit
+remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pmd_t *pmd_base;
+	pud_t *pud;
+
+	pud = pud_start + pud_index(addr);
+	for (; addr < end; addr = next, pud++) {
+		next = pud_addr_end(addr, end);
+
+		if (!pud_present(*pud))
+			continue;
+
+		pmd_base = pmd_offset(pud, 0);
+		remove_pmd_table(pmd_base, addr, next);
+		free_pmd_table(pmd_base, pud);
+	}
+}
+
+static void __meminit
+remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pud_t *pud_base;
+	p4d_t *p4d;
+
+	p4d = p4d_start + p4d_index(addr);
+	for (; addr < end; addr = next, p4d++) {
+		next = p4d_addr_end(addr, end);
+
+		if (!p4d_present(*p4d))
+			continue;
+
+		pud_base = pud_offset(p4d, 0);
+		remove_pud_table(pud_base, addr, next);
+	}
+}
+
+/* start and end are both virtual address. */
+static void __meminit
+remove_pagetable(unsigned long start, unsigned long end)
+{
+	unsigned long next;
+	unsigned long addr;
+	pgd_t *pgd;
+	p4d_t *p4d;
+
+	for (addr = start; addr < end; addr = next) {
+		next = pgd_addr_end(addr, end);
+
+		pgd = pgd_offset_k(addr);
+		if (!pgd_present(*pgd))
+			continue;
+
+		p4d = p4d_offset(pgd, 0);
+		remove_p4d_table(p4d, addr, next);
+	}
+
+	flush_tlb_all();
+}
+
 void vmemmap_free(unsigned long start, unsigned long end,
 		  struct vmem_altmap *altmap)
 {
+	remove_pagetable(start, end);
 }
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
@@ -264,4 +430,24 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 	return ret;
 }
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int __ref arch_remove_memory(int nid, u64 start, u64 size,
+			     struct vmem_altmap *altmap)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct page *page = pfn_to_page(start_pfn);
+	struct zone *zone;
+	int ret;
+
+	if (altmap)
+		page += vmem_altmap_offset(altmap);
+	zone = page_zone(page);
+	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
+	WARN_ON_ONCE(ret);
+
+	return ret;
+}
+
+#endif
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] RISC-V: Implement pte_devmap()
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
                   ` (5 preceding siblings ...)
  2019-03-27 21:36 ` [PATCH 6/7] RISC-V: Implement memory hot remove Logan Gunthorpe
@ 2019-03-27 21:36 ` Logan Gunthorpe
  2019-04-24 23:23 ` [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Palmer Dabbelt
  7 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-27 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-riscv
  Cc: Stephen Bates, Palmer Dabbelt, Christoph Hellwig, Albert Ou,
	Logan Gunthorpe, Laurent Dufour, Mike Rapoport, Anup Patel,
	Zong Li, Guo Ren, Stefan O'Rear

Use the 2nd software bit in the PTE as the devmap bit and add the
appropriate accessors.

This also allows us to set ARCH_HAS_ZONE_DEVICE.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Zong Li <zong@andestech.com>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: "Stefan O'Rear" <sorear2@gmail.com>
---
 arch/riscv/Kconfig                    |  1 +
 arch/riscv/include/asm/pgtable-bits.h |  8 ++++++--
 arch/riscv/include/asm/pgtable.h      | 10 ++++++++++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 2cb39b4d6d6b..d365d7e17ed2 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -49,6 +49,7 @@ config RISCV
 	select GENERIC_IRQ_MULTI_HANDLER
 	select ARCH_HAS_PTE_SPECIAL
 	select HAVE_EBPF_JIT if 64BIT
+	select ARCH_HAS_ZONE_DEVICE
 
 config MMU
 	def_bool y
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index 470755cb7558..9555d419a46f 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -30,9 +30,11 @@
 #define _PAGE_GLOBAL    (1 << 5)    /* Global */
 #define _PAGE_ACCESSED  (1 << 6)    /* Set by hardware on any access */
 #define _PAGE_DIRTY     (1 << 7)    /* Set by hardware on any write */
-#define _PAGE_SOFT      (1 << 8)    /* Reserved for software */
+#define _PAGE_SOFT1     (1 << 8)    /* Reserved for software */
+#define _PAGE_SOFT2     (1 << 9)    /* Reserved for software */
 
-#define _PAGE_SPECIAL   _PAGE_SOFT
+#define _PAGE_SPECIAL   _PAGE_SOFT1
+#define _PAGE_DEVMAP	_PAGE_SOFT2
 #define _PAGE_TABLE     _PAGE_PRESENT
 
 /*
@@ -41,6 +43,8 @@
  */
 #define _PAGE_PROT_NONE _PAGE_READ
 
+#define __HAVE_ARCH_PTE_DEVMAP
+
 #define _PAGE_PFN_SHIFT 10
 
 /* Set of bits to preserve across pte_modify() */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e071e2be3a6c..a0e6a5f8bbb5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -248,6 +248,11 @@ static inline int pte_special(pte_t pte)
 	return pte_val(pte) & _PAGE_SPECIAL;
 }
 
+static inline int pte_devmap(pte_t pte)
+{
+	return pte_val(pte) & _PAGE_DEVMAP;
+}
+
 /* static inline pte_t pte_rdprotect(pte_t pte) */
 
 static inline pte_t pte_wrprotect(pte_t pte)
@@ -289,6 +294,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
 	return __pte(pte_val(pte) | _PAGE_SPECIAL);
 }
 
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+	return __pte(pte_val(pte) | _PAGE_SPECIAL | _PAGE_DEVMAP);
+}
+
 /* Modify page protection bits */
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping
  2019-03-27 21:36 ` [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping Logan Gunthorpe
@ 2019-03-28  5:39   ` Palmer Dabbelt
  2019-03-28  6:28     ` Anup Patel
  0 siblings, 1 reply; 17+ messages in thread
From: Palmer Dabbelt @ 2019-03-28  5:39 UTC (permalink / raw)
  To: logang
  Cc: linux-kernel, linux-riscv, sbates, Christoph Hellwig, aou,
	logang, antonynpavlov, sorear2, anup.patel

On Wed, 27 Mar 2019 14:36:39 PDT (-0700), logang@deltatee.com wrote:
> The motivation for this is to support P2P transactions. P2P requires
> having struct pages for IO memory which means the linear mapping must
> be able to cover all of the IO regions. Unfortunately with Sv39 we are
> not able to cover all the IO regions available on existing hardware,
> but we can do better than what we currently do (which only cover's
> physical memory).
>
> To this end, we restructure the kernel's virtual address space region.
> We position the vmemmap at the beginning of the region (based on how
> many virtual address bits we have) and the VMALLOC region comes
> immediately after. The linear mapping then takes up the remaining space.
> PAGE_OFFSET will need to be within the linear mapping but may not be
> the start of the mapping seeing many machines don't have RAM at address
> zero and we may still want to access lower addresses through the
> linear mapping.
>
> With these changes, on a 64-bit system the virtual memory map (with
> sparsemem enabled) will be:
>
> 32-bit:
>
> 00000000 - 7fffffff   user space, different per mm (2G)
> 80000000 - 81ffffff   virtual memory map (32MB)
> 82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
> c0000000 - ffffffff   direct mapping of all phys. memory (1GB)
>
> 64-bit, Sv39:
>
> 0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
> hole caused by [38:63] sign extension
> ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
> ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
> ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)
>
> On the Sifive hardware this allows us to provide struct pages for
> the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> Sv48 we should be able to cover all the available memory regions..
>
> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> 2GB of address space, so we cannot cover the any of the I/O regions
> that are higher than it but we do cover the lower I/O TileLink range.

IIRC there was another patch floating around to fix an issue with overlapping 
regions in the 32-bit port, did you also fix that issue?  It's somewhere in my 
email queue...

> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: Antony Pavlov <antonynpavlov@gmail.com>
> Cc: "Stefan O'Rear" <sorear2@gmail.com>
> Cc: Anup Patel <anup.patel@wdc.com>
> ---
>  arch/riscv/Kconfig               |  2 +-
>  arch/riscv/include/asm/page.h    |  2 --
>  arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
>  3 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 76fc340ae38e..d21e6a12e8b6 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -71,7 +71,7 @@ config PAGE_OFFSET
>  	hex
>  	default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
>  	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> -	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> +	default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
>
>  config ARCH_FLATMEM_ENABLE
>  	def_bool y
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 2a546a52f02a..fa0b8058a246 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,8 +31,6 @@
>   */
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
>
> -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> -
>  #ifndef __ASSEMBLY__
>
>  #define PAGE_UP(addr)	(((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 5a9fea00ba09..2a5070540996 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
>  #define __S110	PAGE_SHARED_EXEC
>  #define __S111	PAGE_SHARED_EXEC
>
> -#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
> -#define VMALLOC_END      (PAGE_OFFSET - 1)
> -#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> +#define KERN_SPACE_START	(-1UL << (CONFIG_VA_BITS - 1))
>
>  /*
>   * Roughly size the vmemmap space to be large enough to fit enough
>   * struct pages to map half the virtual address space. Then
>   * position vmemmap directly below the VMALLOC region.
>   */
> -#define VMEMMAP_SHIFT \
> -	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> -#define VMEMMAP_SIZE	(1UL << VMEMMAP_SHIFT)
> -#define VMEMMAP_END	(VMALLOC_START - 1)
> -#define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
> -
> +#ifdef CONFIG_SPARSEMEM
> +#define VMEMMAP_SIZE	(UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> +				   STRUCT_PAGE_MAX_SHIFT))
> +#define VMEMMAP_START	(KERN_SPACE_START)
> +#define VMEMMAP_END	(VMEMMAP_START + VMEMMAP_SIZE - 1)
>  #define vmemmap		((struct page *)VMEMMAP_START)
> +#else
> +#define VMEMMAP_END	KERN_SPACE_START
> +#endif
> +
> +#ifdef CONFIG_32BIT
> +#define VMALLOC_SIZE	((1UL << 30) - VMEMMAP_SIZE)
> +#else
> +#define VMALLOC_SIZE	(64UL << 30)
> +#endif
> +
> +#define VMALLOC_START	(VMEMMAP_END + 1)
> +#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE - 1)
>
>  /*
>   * ZERO_PAGE is a global shared page that is always zero,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping
  2019-03-28  5:39   ` Palmer Dabbelt
@ 2019-03-28  6:28     ` Anup Patel
  2019-03-28 15:54       ` Logan Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Anup Patel @ 2019-03-28  6:28 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: logang, linux-kernel@vger.kernel.org List, linux-riscv, sbates,
	Christoph Hellwig, Albert Ou, antonynpavlov, sorear2, Anup Patel

On Thu, Mar 28, 2019 at 11:09 AM Palmer Dabbelt <palmer@sifive.com> wrote:
>
> On Wed, 27 Mar 2019 14:36:39 PDT (-0700), logang@deltatee.com wrote:
> > The motivation for this is to support P2P transactions. P2P requires
> > having struct pages for IO memory which means the linear mapping must
> > be able to cover all of the IO regions. Unfortunately with Sv39 we are
> > not able to cover all the IO regions available on existing hardware,
> > but we can do better than what we currently do (which only cover's
> > physical memory).
> >
> > To this end, we restructure the kernel's virtual address space region.
> > We position the vmemmap at the beginning of the region (based on how
> > many virtual address bits we have) and the VMALLOC region comes
> > immediately after. The linear mapping then takes up the remaining space.
> > PAGE_OFFSET will need to be within the linear mapping but may not be
> > the start of the mapping seeing many machines don't have RAM at address
> > zero and we may still want to access lower addresses through the
> > linear mapping.
> >
> > With these changes, on a 64-bit system the virtual memory map (with
> > sparsemem enabled) will be:
> >
> > 32-bit:
> >
> > 00000000 - 7fffffff   user space, different per mm (2G)
> > 80000000 - 81ffffff   virtual memory map (32MB)
> > 82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
> > c0000000 - ffffffff   direct mapping of all phys. memory (1GB)
> >
> > 64-bit, Sv39:
> >
> > 0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
> > hole caused by [38:63] sign extension
> > ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
> > ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
> > ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)
> >
> > On the Sifive hardware this allows us to provide struct pages for
> > the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> > and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> > Sv48 we should be able to cover all the available memory regions..
> >
> > For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> > 2GB of address space, so we cannot cover the any of the I/O regions
> > that are higher than it but we do cover the lower I/O TileLink range.
>
> IIRC there was another patch floating around to fix an issue with overlapping
> regions in the 32-bit port, did you also fix that issue?  It's somewhere in my
> email queue...

That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
regions.

This patch does not consider FIXMAP region.

I suggest we introduce asm/memory.h where we have all critical defines
related to virtual memory layout. Also, this header should have detailed
comments about virtual memory layout.

>
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> > Cc: Palmer Dabbelt <palmer@sifive.com>
> > Cc: Albert Ou <aou@eecs.berkeley.edu>
> > Cc: Antony Pavlov <antonynpavlov@gmail.com>
> > Cc: "Stefan O'Rear" <sorear2@gmail.com>
> > Cc: Anup Patel <anup.patel@wdc.com>
> > ---
> >  arch/riscv/Kconfig               |  2 +-
> >  arch/riscv/include/asm/page.h    |  2 --
> >  arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
> >  3 files changed, 19 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 76fc340ae38e..d21e6a12e8b6 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -71,7 +71,7 @@ config PAGE_OFFSET
> >       hex
> >       default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
> >       default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> > -     default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> > +     default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
> >
> >  config ARCH_FLATMEM_ENABLE
> >       def_bool y
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 2a546a52f02a..fa0b8058a246 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,8 +31,6 @@
> >   */
> >  #define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> >
> > -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> > -
> >  #ifndef __ASSEMBLY__
> >
> >  #define PAGE_UP(addr)        (((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 5a9fea00ba09..2a5070540996 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
> >  #define __S110       PAGE_SHARED_EXEC
> >  #define __S111       PAGE_SHARED_EXEC
> >
> > -#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
> > -#define VMALLOC_END      (PAGE_OFFSET - 1)
> > -#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> > +#define KERN_SPACE_START     (-1UL << (CONFIG_VA_BITS - 1))
> >
> >  /*
> >   * Roughly size the vmemmap space to be large enough to fit enough
> >   * struct pages to map half the virtual address space. Then
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> > -#define VMEMMAP_SHIFT \
> > -     (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> > -#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
> > -#define VMEMMAP_END  (VMALLOC_START - 1)
> > -#define VMEMMAP_START        (VMALLOC_START - VMEMMAP_SIZE)
> > -
> > +#ifdef CONFIG_SPARSEMEM
> > +#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> > +                                STRUCT_PAGE_MAX_SHIFT))
> > +#define VMEMMAP_START        (KERN_SPACE_START)
> > +#define VMEMMAP_END  (VMEMMAP_START + VMEMMAP_SIZE - 1)
> >  #define vmemmap              ((struct page *)VMEMMAP_START)
> > +#else
> > +#define VMEMMAP_END  KERN_SPACE_START
> > +#endif
> > +
> > +#ifdef CONFIG_32BIT
> > +#define VMALLOC_SIZE ((1UL << 30) - VMEMMAP_SIZE)
> > +#else
> > +#define VMALLOC_SIZE (64UL << 30)
> > +#endif
> > +
> > +#define VMALLOC_START        (VMEMMAP_END + 1)
> > +#define VMALLOC_END  (VMALLOC_START + VMALLOC_SIZE - 1)
> >
> >  /*
> >   * ZERO_PAGE is a global shared page that is always zero,

Regards,
Anup

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping
  2019-03-27 21:36 ` [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping Logan Gunthorpe
@ 2019-03-28 10:03   ` Anup Patel
  2019-03-28 18:24     ` Logan Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Anup Patel @ 2019-03-28 10:03 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel@vger.kernel.org List, linux-riscv, Stephen Bates,
	Palmer Dabbelt, Christoph Hellwig, Albert Ou, Anup Patel,
	Atish Patra, Paul Walmsley, Zong Li, Mike Rapoport

On Thu, Mar 28, 2019 at 3:06 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> With the new virtual address changes in an earlier patch, we want the
> page tables to cover more of the linear mapping region. Instead of
> only mapping from PAGE_OFFSET and up, we instead map starting
> from an aligned version of va_pa_offset such that all of the physical
> address space will be mapped.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: Anup Patel <anup.patel@wdc.com>
> Cc: Atish Patra <atish.patra@wdc.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Zong Li <zongbox@gmail.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/riscv/kernel/setup.c |  1 -
>  arch/riscv/mm/init.c      | 27 +++++++++++++++------------
>  2 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index ecb654f6a79e..8286df8be31a 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -59,7 +59,6 @@ EXPORT_SYMBOL(empty_zero_page);
>  /* The lucky hart to first increment this variable will boot the other cores */
>  atomic_t hart_lottery;
>  unsigned long boot_cpu_hartid;
> -
>  void __init parse_dtb(unsigned int hartid, void *dtb)
>  {
>         if (early_init_dt_scan(__va(dtb)))
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index b9d50031e78f..315194557c3d 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -150,8 +150,8 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>  pgd_t trampoline_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> -#define NUM_SWAPPER_PMDS ((uintptr_t)-PAGE_OFFSET >> PGDIR_SHIFT)
> -pmd_t swapper_pmd[PTRS_PER_PMD*((-PAGE_OFFSET)/PGDIR_SIZE)] __page_aligned_bss;
> +#define NUM_SWAPPER_PMDS ((uintptr_t)-VMALLOC_END >> PGDIR_SHIFT)
> +pmd_t swapper_pmd[PTRS_PER_PMD*((-VMALLOC_END)/PGDIR_SIZE)] __page_aligned_bss;
>  pmd_t trampoline_pmd[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>  pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
>  #endif
> @@ -180,13 +180,18 @@ asmlinkage void __init setup_vm(void)
>         extern char _start;
>         uintptr_t i;
>         uintptr_t pa = (uintptr_t) &_start;
> +       uintptr_t linear_start;
> +       uintptr_t off;
>         pgprot_t prot = __pgprot(pgprot_val(PAGE_KERNEL) | _PAGE_EXEC);
>
>         va_pa_offset = PAGE_OFFSET - pa;
>         pfn_base = PFN_DOWN(pa);
>
> +       linear_start = ALIGN_DOWN(va_pa_offset, PGDIR_SIZE);
> +       off = linear_start - va_pa_offset;
> +
>         /* Sanity check alignment and size */
> -       BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
> +       BUG_ON(linear_start <= VMALLOC_END);
>         BUG_ON((pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0);
>
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -195,15 +200,14 @@ asmlinkage void __init setup_vm(void)
>                         __pgprot(_PAGE_TABLE));
>         trampoline_pmd[0] = pfn_pmd(PFN_DOWN(pa), prot);
>
> -       for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
> -               size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> -
> +       for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
> +               size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
>                 swapper_pg_dir[o] =
>                         pfn_pgd(PFN_DOWN((uintptr_t)swapper_pmd) + i,
>                                 __pgprot(_PAGE_TABLE));
>         }
>         for (i = 0; i < ARRAY_SIZE(swapper_pmd); i++)
> -               swapper_pmd[i] = pfn_pmd(PFN_DOWN(pa + i * PMD_SIZE), prot);
> +               swapper_pmd[i] = pfn_pmd(PFN_DOWN(off + i * PMD_SIZE), prot);
>
>         swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
>                 pfn_pgd(PFN_DOWN((uintptr_t)fixmap_pmd),
> @@ -215,11 +219,10 @@ asmlinkage void __init setup_vm(void)
>         trampoline_pg_dir[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] =
>                 pfn_pgd(PFN_DOWN(pa), prot);
>
> -       for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
> -               size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> -
> -               swapper_pg_dir[o] =
> -                       pfn_pgd(PFN_DOWN(pa + i * PGDIR_SIZE), prot);
> +       for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
> +               size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> +               swapper_pg_dir[o] = pfn_pgd(PFN_DOWN(off + i * PGDIR_SIZE),
> +                                           prot);
>         }
>
>         swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
> --
> 2.20.1
>

I understand that this patch is inline with your virtual memory layout cleanup
but the way we map virtual memory in swapper_pg_dir is bound to change.

We should not be mapping complete virtual memory in swapper_pd_dir()
rather we should only map based on amount of RAM available.

Refer, https://www.lkml.org/lkml/2019/3/24/3

The setup_vm() should only map vmlinux_start to vmlinux_end plus the
FDT. Complete virtual memory mapping should be done after we have
done early parsing of FDT when we know available memory banks in
setup_vm_final() (called from paging_init())

Regards,
Anup

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map
  2019-03-27 21:36 ` [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map Logan Gunthorpe
@ 2019-03-28 11:49   ` Mike Rapoport
  2019-03-28 15:51     ` Logan Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2019-03-28 11:49 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-riscv, Albert Ou, Jonathan Corbet,
	Palmer Dabbelt, Stephen Bates, Christoph Hellwig

Hi,

On Wed, Mar 27, 2019 at 03:36:38PM -0600, Logan Gunthorpe wrote:
> This file is similar to the x86_64 equivalent (in
> Documentation/x86/x86_64/mm.txt) and describes the virtuas address space
> usage for RISC-V.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> ---
>  Documentation/riscv/mm.txt | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>  create mode 100644 Documentation/riscv/mm.txt
> 
> diff --git a/Documentation/riscv/mm.txt b/Documentation/riscv/mm.txt
> new file mode 100644
> index 000000000000..725dc85f2c65
> --- /dev/null
> +++ b/Documentation/riscv/mm.txt
> @@ -0,0 +1,24 @@
> +Sv32:
> +
> +00000000 - 7fffffff   user space, different per mm (2G)
> +80000000 - 81ffffff   virtual memory map (32MB)
> +82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
> +c0000000 - ffffffff   direct mapping of lower phys. memory (1GB)
> +
> +Sv39:
> +
> +0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
> +hole caused by [38:63] sign extension
> +ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
> +ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
> +ffffffd100000000 - ffffffffffffffff  linear mapping of physical space (188GB)
> +  ffffffd200000000 - 0xfffffff200000000 linear mapping of all physical memory
> +
> +The RISC-V architecture defines virtual address bits in multiples of nine
> +starting from 39. These are referred to as Sv39, Sv48, Sv57 and Sv64.
> +Currently only Sv39 is supported. Bits 63 through to the most-significant
> +implemented bit are sign extended. This causes a hole between user space
> +and kernel addresses if you interpret them as unsigned.
> +
> +The direct mapping covers as much of the physical memory space as
> +possible so that it may cover some IO memory.

Please move the text before the tables, so that meaning of Sv32 and Sv39
would be clear.

> -- 
> 2.20.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map
  2019-03-28 11:49   ` Mike Rapoport
@ 2019-03-28 15:51     ` Logan Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-28 15:51 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Albert Ou, Jonathan Corbet, Palmer Dabbelt, linux-kernel,
	Stephen Bates, linux-riscv, Christoph Hellwig



On 2019-03-28 5:49 a.m., Mike Rapoport wrote:
>> +
>> +The direct mapping covers as much of the physical memory space as
>> +possible so that it may cover some IO memory.
> 
> Please move the text before the tables, so that meaning of Sv32 and Sv39
> would be clear.
> 

Ok, thanks. I've queued up this change for a v2.

Logan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping
  2019-03-28  6:28     ` Anup Patel
@ 2019-03-28 15:54       ` Logan Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-28 15:54 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt
  Cc: sorear2, Albert Ou, Anup Patel,
	linux-kernel@vger.kernel.org List, sbates, antonynpavlov,
	linux-riscv, Christoph Hellwig



On 2019-03-28 12:28 a.m., Anup Patel wrote:
>>> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
>>> 2GB of address space, so we cannot cover the any of the I/O regions
>>> that are higher than it but we do cover the lower I/O TileLink range.
>>
>> IIRC there was another patch floating around to fix an issue with overlapping
>> regions in the 32-bit port, did you also fix that issue?  It's somewhere in my
>> email queue...
> 
> That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
> regions.
> 
> This patch does not consider FIXMAP region.

Correct.

> I suggest we introduce asm/memory.h where we have all critical defines
> related to virtual memory layout. Also, this header should have detailed
> comments about virtual memory layout.

Seems like a sensible cleanup. It did seem like the defines for this
stuff were all over the place. I'm not really clear on all the stuff
that would belong in asm/memory.h so I think I'll leave such a cleanup
to you.

The second patch in this series added documentation to describe the
virtual memory layout which matches how it was done in x86.

Logan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping
  2019-03-28 10:03   ` Anup Patel
@ 2019-03-28 18:24     ` Logan Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-03-28 18:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Albert Ou, Palmer Dabbelt, linux-kernel@vger.kernel.org List,
	Stephen Bates, Atish Patra, Anup Patel, Paul Walmsley,
	linux-riscv, Mike Rapoport, Christoph Hellwig, Zong Li



On 2019-03-28 4:03 a.m., Anup Patel wrote:
> I understand that this patch is inline with your virtual memory layout cleanup
> but the way we map virtual memory in swapper_pg_dir is bound to change.
> 
> We should not be mapping complete virtual memory in swapper_pd_dir()
> rather we should only map based on amount of RAM available.
> 
> Refer, https://www.lkml.org/lkml/2019/3/24/3
> 
> The setup_vm() should only map vmlinux_start to vmlinux_end plus the
> FDT. Complete virtual memory mapping should be done after we have
> done early parsing of FDT when we know available memory banks in
> setup_vm_final() (called from paging_init())

That makes sense, but I think a lot of it sounds a out of the scope of
what I'm doing in this patch set.

I could attempt to update my patchset so instead of expanding the linear
region on boot, we add the page tables in arch_add_memory. That would
make more sense when considering the direction you want to head with
setup_vm.

Logan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P
  2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
                   ` (6 preceding siblings ...)
  2019-03-27 21:36 ` [PATCH 7/7] RISC-V: Implement pte_devmap() Logan Gunthorpe
@ 2019-04-24 23:23 ` Palmer Dabbelt
  2019-04-26 16:37   ` Logan Gunthorpe
  7 siblings, 1 reply; 17+ messages in thread
From: Palmer Dabbelt @ 2019-04-24 23:23 UTC (permalink / raw)
  To: logang; +Cc: linux-kernel, linux-riscv, sbates, Christoph Hellwig, aou, logang

On Wed, 27 Mar 2019 14:36:36 PDT (-0700), logang@deltatee.com wrote:
> Hi,
>
> This patchset enables P2P on the RISC-V architecture. To do this on the
> current kernel, we only need to be able to back IO memory with struct
> pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
> ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
> turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
> IO memory regions in hardware can be covered by the linear region
> so that there is a linear relation ship between the virtual address and
> the struct page address in the vmemmap region.
>
> While our reason to do this work is for P2P, these features are all
> useful, more generally, and also enable other kernel features.
>
> The first patch in the series implements sparse mem. It was already
> submitted and reviewed last cycle, only forgotten. It has been rebased
> onto v5.1-rc2.
>
> Patches 2 through 4 rework the architecture's virtual address space
> mapping trying to get as much of the IO regions covered by the linear
> mapping. With Sv39, we do not have enough address space to cover all the
> typical hardware regions but we can get the majority of it.
>
> Patch 5 and 6 implement memory hotplug and remove. These are relatively
> straight forward additions similar to other arches.
>
> Patch 7 implements pte_devmap which allows us to set
> ARCH_HAS_ZONE_DEVICE.
>
> The patchset was tested in QEMU and on a HiFive Unleashed board.
> However, we were unable to actually test P2P transactions with this
> exact set because we have been unable to get PCI working with v5.1-rc2.
> We were able to get it running on a 4.19 era kernel (with a bunch of
> out-of-tree patches for PCI on a Microsemi PolarFire board).
>
> This series is based on v5.1-rc2 and a git tree is available here:
>
> https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1

Looks like these don't build on rv32 when applied on top of 5.1-rc6.  We now
have rv32_defconfig, which should make it easier to tests these sorts of
things.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P
  2019-04-24 23:23 ` [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Palmer Dabbelt
@ 2019-04-26 16:37   ` Logan Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Logan Gunthorpe @ 2019-04-26 16:37 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: linux-kernel, linux-riscv, sbates, Christoph Hellwig, aou



On 2019-04-24 5:23 p.m., Palmer Dabbelt wrote:
> On Wed, 27 Mar 2019 14:36:36 PDT (-0700), logang@deltatee.com wrote:
>> Hi,
>>
>> This patchset enables P2P on the RISC-V architecture. To do this on the
>> current kernel, we only need to be able to back IO memory with struct
>> pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
>> ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
>> turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
>> IO memory regions in hardware can be covered by the linear region
>> so that there is a linear relation ship between the virtual address and
>> the struct page address in the vmemmap region.
>>
>> While our reason to do this work is for P2P, these features are all
>> useful, more generally, and also enable other kernel features.
>>
>> The first patch in the series implements sparse mem. It was already
>> submitted and reviewed last cycle, only forgotten. It has been rebased
>> onto v5.1-rc2.
>>
>> Patches 2 through 4 rework the architecture's virtual address space
>> mapping trying to get as much of the IO regions covered by the linear
>> mapping. With Sv39, we do not have enough address space to cover all the
>> typical hardware regions but we can get the majority of it.
>>
>> Patch 5 and 6 implement memory hotplug and remove. These are relatively
>> straight forward additions similar to other arches.
>>
>> Patch 7 implements pte_devmap which allows us to set
>> ARCH_HAS_ZONE_DEVICE.
>>
>> The patchset was tested in QEMU and on a HiFive Unleashed board.
>> However, we were unable to actually test P2P transactions with this
>> exact set because we have been unable to get PCI working with v5.1-rc2.
>> We were able to get it running on a 4.19 era kernel (with a bunch of
>> out-of-tree patches for PCI on a Microsemi PolarFire board).
>>
>> This series is based on v5.1-rc2 and a git tree is available here:
>>
>> https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1
> 
> Looks like these don't build on rv32 when applied on top of 5.1-rc6.  We 
> now
> have rv32_defconfig, which should make it easier to tests these sorts of
> things.

Thanks for the note. I've queued up the fixes for this. However, I'm 
still a bit stuck on the memory hot remove stuff. I'm waiting for the 
similar work in arm64 to be done so I can reuse some of it[1].

The first patch of this series that implements sparsemem builds on rv32 
and was generally accepted in previous cycles; so I'd appreciate if you 
can pick that one up for v5.2.

Thanks.

Logan


[1] 
https://lore.kernel.org/lkml/1555221553-18845-1-git-send-email-anshuman.khandual@arm.com/T/#u

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-04-26 16:37 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 21:36 [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 1/7] RISC-V: Implement sparsemem Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map Logan Gunthorpe
2019-03-28 11:49   ` Mike Rapoport
2019-03-28 15:51     ` Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping Logan Gunthorpe
2019-03-28  5:39   ` Palmer Dabbelt
2019-03-28  6:28     ` Anup Patel
2019-03-28 15:54       ` Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping Logan Gunthorpe
2019-03-28 10:03   ` Anup Patel
2019-03-28 18:24     ` Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 5/7] RISC-V: Implement memory hotplug Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 6/7] RISC-V: Implement memory hot remove Logan Gunthorpe
2019-03-27 21:36 ` [PATCH 7/7] RISC-V: Implement pte_devmap() Logan Gunthorpe
2019-04-24 23:23 ` [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P Palmer Dabbelt
2019-04-26 16:37   ` Logan Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).