linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation
@ 2018-12-05 12:28 David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 1/7] agp: efficeon: no need to set PG_reserved on GATT tables David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Albert Ou, Alexander Duyck, Andrew Morton, Anthony Yznaga,
	Arnd Bergmann, Benjamin Herrenschmidt, Bhupesh Sharma,
	Catalin Marinas, Christophe Leroy, Dan Williams, Dave Kleikamp,
	David Airlie, Geert Uytterhoeven, Greg Kroah-Hartman,
	Heiko Carstens, James Morse, Kees Cook, Marc Zyngier,
	Mark Rutland, Martin Schwidefsky, Matthew Wilcox,
	Michael Ellerman, Michal Hocko, Michal Hocko, Mike Rapoport,
	Miles Chen, Palmer Dabbelt, Paul Mackerras, Pavel Tatashin,
	Souptick Joarder, Stephen Rothwell, Tobias Klauser,
	Vasily Gorbik, Will Deacon

I was recently going over all users of PG_reserved. Short story: it is
difficult and sometimes not really clear if setting/checking for
PG_reserved is only a relict from the past. Easy to break things.

I had way more cleanups in this series inititally,
but some architectures take PG_reserved as a way to apply a different
caching strategy (for MMIO pages). So I decided to only include the most
obvious changes (that are less likely to break something).

So let's see if the documentation update for PG_reserved I crafted
actually covers most cases or if there is plenty more.

Most notably, for device memory we can hopefully soon stop setting
it PG_reserved

I only briefly tested this on s390x.

David Hildenbrand (7):
  agp: efficeon: no need to set PG_reserved on GATT tables
  s390/vdso: don't clear PG_reserved
  powerpc/vdso: don't clear PG_reserved
  riscv/vdso: don't clear PG_reserved
  m68k/mm: use __ClearPageReserved()
  arm64: kexec: no need to ClearPageReserved()
  mm: better document PG_reserved

 arch/arm64/kernel/machine_kexec.c |  1 -
 arch/m68k/mm/memory.c             |  2 +-
 arch/powerpc/kernel/vdso.c        |  2 --
 arch/riscv/kernel/vdso.c          |  1 -
 arch/s390/kernel/vdso.c           |  2 --
 drivers/char/agp/efficeon-agp.c   |  2 --
 include/linux/page-flags.h        | 18 ++++++++++++++++--
 7 files changed, 17 insertions(+), 11 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 1/7] agp: efficeon: no need to set PG_reserved on GATT tables
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 2/7] s390/vdso: don't clear PG_reserved David Hildenbrand
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	David Airlie, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	Michal Hocko, Matthew Wilcox

The l1 GATT page table is kept in a special on-chip page with 64 entries.
We allocate the l2 page table pages via get_zeroed_page() and enter them
into the table. These l2 pages are modified accordingly when
inserting/removing memory via efficeon_insert_memory and
efficeon_remove_memory.

Apart from that, these pages are not exposed or ioremap'ed. We can stop
setting them reserved (propably copied from generic code).

Cc: David Airlie <airlied@linux.ie>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/char/agp/efficeon-agp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/char/agp/efficeon-agp.c b/drivers/char/agp/efficeon-agp.c
index 7f88490b5479..c53f0f9ef5b0 100644
--- a/drivers/char/agp/efficeon-agp.c
+++ b/drivers/char/agp/efficeon-agp.c
@@ -163,7 +163,6 @@ static int efficeon_free_gatt_table(struct agp_bridge_data *bridge)
 		unsigned long page = efficeon_private.l1_table[index];
 		if (page) {
 			efficeon_private.l1_table[index] = 0;
-			ClearPageReserved(virt_to_page((char *)page));
 			free_page(page);
 			freed++;
 		}
@@ -219,7 +218,6 @@ static int efficeon_create_gatt_table(struct agp_bridge_data *bridge)
 			efficeon_free_gatt_table(agp_bridge);
 			return -ENOMEM;
 		}
-		SetPageReserved(virt_to_page((char *)page));
 
 		for (offset = 0; offset < PAGE_SIZE; offset += clflush_chunk)
 			clflush((char *)page+offset);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 2/7] s390/vdso: don't clear PG_reserved
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 1/7] agp: efficeon: no need to set PG_reserved on GATT tables David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 3/7] powerpc/vdso: " David Hildenbrand
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Martin Schwidefsky, Heiko Carstens, Matthew Wilcox,
	Mike Rapoport, Michal Hocko, Vasily Gorbik, Kees Cook,
	Souptick Joarder, Andrew Morton, Michal Hocko

The VDSO is part of the kernel image and therefore the struct pages are
marked as reserved during boot.

As we install a special mapping, the actual struct pages will never be
exposed to MM via the page tables. We can therefore leave the pages
marked as reserved.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kernel/vdso.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index ebe748a9f472..9e24d23c26c0 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -292,7 +292,6 @@ static int __init vdso_init(void)
 	BUG_ON(vdso32_pagelist == NULL);
 	for (i = 0; i < vdso32_pages - 1; i++) {
 		struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-		ClearPageReserved(pg);
 		get_page(pg);
 		vdso32_pagelist[i] = pg;
 	}
@@ -310,7 +309,6 @@ static int __init vdso_init(void)
 	BUG_ON(vdso64_pagelist == NULL);
 	for (i = 0; i < vdso64_pages - 1; i++) {
 		struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-		ClearPageReserved(pg);
 		get_page(pg);
 		vdso64_pagelist[i] = pg;
 	}
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 3/7] powerpc/vdso: don't clear PG_reserved
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 1/7] agp: efficeon: no need to set PG_reserved on GATT tables David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 2/7] s390/vdso: don't clear PG_reserved David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 4/7] riscv/vdso: " David Hildenbrand
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Christophe Leroy, Kees Cook, Andrew Morton, Michal Hocko,
	Matthew Wilcox

The VDSO is part of the kernel image and therefore the struct pages are
marked as reserved during boot.

As we install a special mapping, the actual struct pages will never be
exposed to MM via the page tables. We can therefore leave the pages
marked as reserved.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/kernel/vdso.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 65b3bdb99f0b..d59dc2e9a695 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -795,7 +795,6 @@ static int __init vdso_init(void)
 	BUG_ON(vdso32_pagelist == NULL);
 	for (i = 0; i < vdso32_pages; i++) {
 		struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-		ClearPageReserved(pg);
 		get_page(pg);
 		vdso32_pagelist[i] = pg;
 	}
@@ -809,7 +808,6 @@ static int __init vdso_init(void)
 	BUG_ON(vdso64_pagelist == NULL);
 	for (i = 0; i < vdso64_pages; i++) {
 		struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-		ClearPageReserved(pg);
 		get_page(pg);
 		vdso64_pagelist[i] = pg;
 	}
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 4/7] riscv/vdso: don't clear PG_reserved
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
                   ` (2 preceding siblings ...)
  2018-12-05 12:28 ` [PATCH RFC 3/7] powerpc/vdso: " David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-07 18:45   ` Palmer Dabbelt
  2018-12-05 12:28 ` [PATCH RFC 5/7] m68k/mm: use __ClearPageReserved() David Hildenbrand
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Palmer Dabbelt, Albert Ou, Tobias Klauser, Andrew Morton,
	Michal Hocko, Matthew Wilcox

The VDSO is part of the kernel image and therefore the struct pages are
marked as reserved during boot.

As we install a special mapping, the actual struct pages will never be
exposed to MM via the page tables. We can therefore leave the pages
marked as reserved.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/riscv/kernel/vdso.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index 582cb153eb24..0cd044122234 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -54,7 +54,6 @@ static int __init vdso_init(void)
 		struct page *pg;
 
 		pg = virt_to_page(vdso_start + (i << PAGE_SHIFT));
-		ClearPageReserved(pg);
 		vdso_pagelist[i] = pg;
 	}
 	vdso_pagelist[i] = virt_to_page(vdso_data);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 5/7] m68k/mm: use __ClearPageReserved()
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
                   ` (3 preceding siblings ...)
  2018-12-05 12:28 ` [PATCH RFC 4/7] riscv/vdso: " David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 12:28 ` [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved() David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Geert Uytterhoeven, Andrew Morton, Michal Hocko, Matthew Wilcox

The PG_reserved flag is cleared from memory that is part of the kernel
image (and therefore marked as PG_reserved). Avoid using PG_reserved
directly.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/m68k/mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/m68k/mm/memory.c b/arch/m68k/mm/memory.c
index b86a2e21693b..227c04fe60d2 100644
--- a/arch/m68k/mm/memory.c
+++ b/arch/m68k/mm/memory.c
@@ -51,7 +51,7 @@ void __init init_pointer_table(unsigned long ptable)
 	pr_debug("init_pointer_table: %lx, %x\n", ptable, PD_MARKBITS(dp));
 
 	/* unreserve the page so it's possible to free that page */
-	PD_PAGE(dp)->flags &= ~(1 << PG_reserved);
+	__ClearPageReserved(PD_PAGE(dp));
 	init_page_count(PD_PAGE(dp));
 
 	return;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved()
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
                   ` (4 preceding siblings ...)
  2018-12-05 12:28 ` [PATCH RFC 5/7] m68k/mm: use __ClearPageReserved() David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 14:00   ` James Morse
  2018-12-05 12:28 ` [PATCH RFC 7/7] mm: better document PG_reserved David Hildenbrand
  2018-12-05 12:56 ` [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation Michal Hocko
  7 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Catalin Marinas, Will Deacon, Bhupesh Sharma, James Morse,
	Marc Zyngier, Dave Kleikamp, Mark Rutland, Andrew Morton,
	Michal Hocko, Matthew Wilcox

This will already be done by free_reserved_page().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/arm64/kernel/machine_kexec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 922add8adb74..0ef4ea73aa54 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -353,7 +353,6 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
 
 	for (addr = begin; addr < end; addr += PAGE_SIZE) {
 		page = phys_to_page(addr);
-		ClearPageReserved(page);
 		free_reserved_page(page);
 	}
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
                   ` (5 preceding siblings ...)
  2018-12-05 12:28 ` [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved() David Hildenbrand
@ 2018-12-05 12:28 ` David Hildenbrand
  2018-12-05 12:59   ` Michal Hocko
  2018-12-05 14:35   ` Matthew Wilcox
  2018-12-05 12:56 ` [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation Michal Hocko
  7 siblings, 2 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 12:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, linux-m68k, linuxppc-dev,
	linux-riscv, linux-s390, linux-mediatek, David Hildenbrand,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Matthew Wilcox, Anthony Yznaga, Miles Chen,
	yi.z.zhang, Dan Williams

The usage of PG_reserved and how PG_reserved pages are to be treated is
burried deep down in different parts of the kernel. Let's shine some light
onto these details by documenting (most?) current users and expected
behavior.

I don't see a reason why we have to document "Some of them might not even
exist". If there is a user, we should document it. E.g. for balloon
drivers we now use PG_offline to indicate that a page might currently
not be backed by memory in the hypervisor. And that is independent from
PG_reserved.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: yi.z.zhang@linux.intel.com
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-flags.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 68b8495e2fbc..112526f5ba61 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -17,8 +17,22 @@
 /*
  * Various page->flags bits:
  *
- * PG_reserved is set for special pages, which can never be swapped out. Some
- * of them might not even exist...
+ * PG_reserved is set for special pages. The "struct page" of such a page
+ * should in general not be touched (e.g. set dirty) except by their owner.
+ * Pages marked as PG_reserved include:
+ * - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
+ * - Pages allocated early during boot (bootmem, memblock)
+ * - Zero pages
+ * - Pages that have been associated with a zone but are not available for
+ *   the page allocator (e.g. excluded via online_page_callback())
+ * - Pages to exclude from the hibernation image (e.g. loaded kexec images)
+ * - MMIO pages (communicate with a device, special caching strategy needed)
+ * - MCA pages on ia64 (pages with memory errors)
+ * - Device memory (e.g. PMEM, DAX, HMM)
+ * Some architectures don't allow to ioremap pages that are not marked
+ * PG_reserved (as they might be in use by somebody else who does not respect
+ * the caching strategy). Consequently, PG_reserved for a page mapped into
+ * user space can indicate the zero page, the vDSO, MMIO pages or device memory.
  *
  * The PG_private bitflag is set on pagecache pages if they contain filesystem
  * specific data (which is normally at page->private). It can be used by
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation
  2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
                   ` (6 preceding siblings ...)
  2018-12-05 12:28 ` [PATCH RFC 7/7] mm: better document PG_reserved David Hildenbrand
@ 2018-12-05 12:56 ` Michal Hocko
  2018-12-05 13:04   ` David Hildenbrand
  7 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-12-05 12:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek, Albert Ou,
	Alexander Duyck, Andrew Morton, Anthony Yznaga, Arnd Bergmann,
	Benjamin Herrenschmidt, Bhupesh Sharma, Catalin Marinas,
	Christophe Leroy, Dan Williams, Dave Kleikamp, David Airlie,
	Geert Uytterhoeven, Greg Kroah-Hartman, Heiko Carstens,
	James Morse, Kees Cook, Marc Zyngier, Mark Rutland,
	Martin Schwidefsky, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport, Miles Chen, Palmer Dabbelt, Paul Mackerras,
	Pavel Tatashin, Souptick Joarder, Stephen Rothwell,
	Tobias Klauser, Vasily Gorbik, Will Deacon

On Wed 05-12-18 13:28:44, David Hildenbrand wrote:
[...]
> Most notably, for device memory we can hopefully soon stop setting
> it PG_reserved

I am busy as hell so I am not likely to look at specific patche this
week. But could you be more specific why we need to get rid of other
PG_reserved users before we can do so for device memory?

I am all for removing relicts because they just confuse people but I
fail to see any relation here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 12:28 ` [PATCH RFC 7/7] mm: better document PG_reserved David Hildenbrand
@ 2018-12-05 12:59   ` Michal Hocko
  2018-12-05 14:35   ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2018-12-05 12:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Alexander Duyck,
	Matthew Wilcox, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On Wed 05-12-18 13:28:51, David Hildenbrand wrote:
> The usage of PG_reserved and how PG_reserved pages are to be treated is
> burried deep down in different parts of the kernel. Let's shine some light
> onto these details by documenting (most?) current users and expected
> behavior.
> 
> I don't see a reason why we have to document "Some of them might not even
> exist". If there is a user, we should document it. E.g. for balloon
> drivers we now use PG_offline to indicate that a page might currently
> not be backed by memory in the hypervisor. And that is independent from
> PG_reserved.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
> Cc: Miles Chen <miles.chen@mediatek.com>
> Cc: yi.z.zhang@linux.intel.com
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

This looks like an improvement. The essential part is that PG_reserved
page belongs to its user and no generic code should touch it. The rest
is a description of current users which I haven't checked due to to lack
of time but yeah, I like the updated wording because I have seen
multiple people confused from the swapped out part which is not true for
many many years. I have tried to dig out when it was actually the case
but failed.

So I cannot give my Ack because I didn't really do a real review but I
like this FWIW.

> ---
>  include/linux/page-flags.h | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 68b8495e2fbc..112526f5ba61 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -17,8 +17,22 @@
>  /*
>   * Various page->flags bits:
>   *
> - * PG_reserved is set for special pages, which can never be swapped out. Some
> - * of them might not even exist...
> + * PG_reserved is set for special pages. The "struct page" of such a page
> + * should in general not be touched (e.g. set dirty) except by their owner.
> + * Pages marked as PG_reserved include:
> + * - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
> + * - Pages allocated early during boot (bootmem, memblock)
> + * - Zero pages
> + * - Pages that have been associated with a zone but are not available for
> + *   the page allocator (e.g. excluded via online_page_callback())
> + * - Pages to exclude from the hibernation image (e.g. loaded kexec images)
> + * - MMIO pages (communicate with a device, special caching strategy needed)
> + * - MCA pages on ia64 (pages with memory errors)
> + * - Device memory (e.g. PMEM, DAX, HMM)
> + * Some architectures don't allow to ioremap pages that are not marked
> + * PG_reserved (as they might be in use by somebody else who does not respect
> + * the caching strategy). Consequently, PG_reserved for a page mapped into
> + * user space can indicate the zero page, the vDSO, MMIO pages or device memory.
>   *
>   * The PG_private bitflag is set on pagecache pages if they contain filesystem
>   * specific data (which is normally at page->private). It can be used by
> -- 
> 2.17.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation
  2018-12-05 12:56 ` [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation Michal Hocko
@ 2018-12-05 13:04   ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 13:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek, Albert Ou,
	Alexander Duyck, Andrew Morton, Anthony Yznaga, Arnd Bergmann,
	Benjamin Herrenschmidt, Bhupesh Sharma, Catalin Marinas,
	Christophe Leroy, Dan Williams, Dave Kleikamp, David Airlie,
	Geert Uytterhoeven, Greg Kroah-Hartman, Heiko Carstens,
	James Morse, Kees Cook, Marc Zyngier, Mark Rutland,
	Martin Schwidefsky, Matthew Wilcox, Michael Ellerman,
	Mike Rapoport, Miles Chen, Palmer Dabbelt, Paul Mackerras,
	Pavel Tatashin, Souptick Joarder, Stephen Rothwell,
	Tobias Klauser, Vasily Gorbik, Will Deacon

On 05.12.18 13:56, Michal Hocko wrote:
> On Wed 05-12-18 13:28:44, David Hildenbrand wrote:
> [...]
>> Most notably, for device memory we can hopefully soon stop setting
>> it PG_reserved
> 
> I am busy as hell so I am not likely to look at specific patche this
> week. But could you be more specific why we need to get rid of other
> PG_reserved users before we can do so for device memory?
> 

No worries, this has time.

For device memory, nothing should really be needed. I am only collecting
and docuumenting users and this is one user soon to go (eventually) :)

> I am all for removing relicts because they just confuse people but I
> fail to see any relation here.
> 

It's really only "why is this patch set not bigger", nothing related to
device memory actually :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved()
  2018-12-05 12:28 ` [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved() David Hildenbrand
@ 2018-12-05 14:00   ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2018-12-05 14:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Catalin Marinas, Will Deacon, Bhupesh Sharma, Marc Zyngier,
	Dave Kleikamp, Mark Rutland, Andrew Morton, Michal Hocko,
	Matthew Wilcox, AKASHI Takahiro

Hi David,

(CC: +Akashi)

On 05/12/2018 12:28, David Hildenbrand wrote:
> This will already be done by free_reserved_page().

(will already be -> will be ?)

So it does!


> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index 922add8adb74..0ef4ea73aa54 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -353,7 +353,6 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
>  
>  	for (addr = begin; addr < end; addr += PAGE_SIZE) {
>  		page = phys_to_page(addr);
> -		ClearPageReserved(page);
>  		free_reserved_page(page);
>  	}
>  }
> 

Acked-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 12:28 ` [PATCH RFC 7/7] mm: better document PG_reserved David Hildenbrand
  2018-12-05 12:59   ` Michal Hocko
@ 2018-12-05 14:35   ` Matthew Wilcox
  2018-12-05 15:05     ` David Hildenbrand
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2018-12-05 14:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
> I don't see a reason why we have to document "Some of them might not even
> exist". If there is a user, we should document it. E.g. for balloon
> drivers we now use PG_offline to indicate that a page might currently
> not be backed by memory in the hypervisor. And that is independent from
> PG_reserved.

I think you're confused by the meaning of "some of them might not even
exist".  What this means is that there might not be memory there; maybe
writes to that memory will be discarded, or maybe they'll cause a machine
check.  Maybe reads will return ~0, or 0, or cause a machine check.
We just don't know what's there, and we shouldn't try touching the memory.

> +++ b/include/linux/page-flags.h
> @@ -17,8 +17,22 @@
>  /*
>   * Various page->flags bits:
>   *
> - * PG_reserved is set for special pages, which can never be swapped out. Some
> - * of them might not even exist...
> + * PG_reserved is set for special pages. The "struct page" of such a page
> + * should in general not be touched (e.g. set dirty) except by their owner.
> + * Pages marked as PG_reserved include:
> + * - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
> + * - Pages allocated early during boot (bootmem, memblock)
> + * - Zero pages
> + * - Pages that have been associated with a zone but are not available for
> + *   the page allocator (e.g. excluded via online_page_callback())
> + * - Pages to exclude from the hibernation image (e.g. loaded kexec images)
> + * - MMIO pages (communicate with a device, special caching strategy needed)
> + * - MCA pages on ia64 (pages with memory errors)
> + * - Device memory (e.g. PMEM, DAX, HMM)
> + * Some architectures don't allow to ioremap pages that are not marked
> + * PG_reserved (as they might be in use by somebody else who does not respect
> + * the caching strategy). Consequently, PG_reserved for a page mapped into
> + * user space can indicate the zero page, the vDSO, MMIO pages or device memory.

So maybe just add one more option to the list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 14:35   ` Matthew Wilcox
@ 2018-12-05 15:05     ` David Hildenbrand
  2018-12-05 17:32       ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 15:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On 05.12.18 15:35, Matthew Wilcox wrote:
> On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
>> I don't see a reason why we have to document "Some of them might not even
>> exist". If there is a user, we should document it. E.g. for balloon
>> drivers we now use PG_offline to indicate that a page might currently
>> not be backed by memory in the hypervisor. And that is independent from
>> PG_reserved.
> 
> I think you're confused by the meaning of "some of them might not even
> exist".  What this means is that there might not be memory there; maybe
> writes to that memory will be discarded, or maybe they'll cause a machine
> check.  Maybe reads will return ~0, or 0, or cause a machine check.
> We just don't know what's there, and we shouldn't try touching the memory.

If there are users, let's document it. And I need more details for that :)

1. machine check: if there is a HW error, we set PG_hwpoison (except
ia64 MCA, see the list)

2. Writes to that memory will be discarded

Who is the user of that? When will we have such pages right now?

3. Reads will return ~0, / 0?

I think this is a special case of e.g. x86? But where do we have that,
are there any user?


In summary: When can we have memory sections that are online but pages
reserved and not accessible? (one example is ballooning I mention here)

(I classify this as dangerous as dump tools will happily dump
PG_reserved pages (unless PG_hwpoison/PG_offline) and that's the right
thing to do).

I want to avoid documenting things that are not actually getting used.

> 
>> +++ b/include/linux/page-flags.h
>> @@ -17,8 +17,22 @@
>>  /*
>>   * Various page->flags bits:
>>   *
>> - * PG_reserved is set for special pages, which can never be swapped out. Some
>> - * of them might not even exist...
>> + * PG_reserved is set for special pages. The "struct page" of such a page
>> + * should in general not be touched (e.g. set dirty) except by their owner.
>> + * Pages marked as PG_reserved include:
>> + * - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
>> + * - Pages allocated early during boot (bootmem, memblock)
>> + * - Zero pages
>> + * - Pages that have been associated with a zone but are not available for
>> + *   the page allocator (e.g. excluded via online_page_callback())
>> + * - Pages to exclude from the hibernation image (e.g. loaded kexec images)
>> + * - MMIO pages (communicate with a device, special caching strategy needed)
>> + * - MCA pages on ia64 (pages with memory errors)
>> + * - Device memory (e.g. PMEM, DAX, HMM)
>> + * Some architectures don't allow to ioremap pages that are not marked
>> + * PG_reserved (as they might be in use by somebody else who does not respect
>> + * the caching strategy). Consequently, PG_reserved for a page mapped into
>> + * user space can indicate the zero page, the vDSO, MMIO pages or device memory.
> 
> So maybe just add one more option to the list.
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 15:05     ` David Hildenbrand
@ 2018-12-05 17:32       ` Matthew Wilcox
  2018-12-05 18:13         ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2018-12-05 17:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On Wed, Dec 05, 2018 at 04:05:12PM +0100, David Hildenbrand wrote:
> On 05.12.18 15:35, Matthew Wilcox wrote:
> > On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
> >> I don't see a reason why we have to document "Some of them might not even
> >> exist". If there is a user, we should document it. E.g. for balloon
> >> drivers we now use PG_offline to indicate that a page might currently
> >> not be backed by memory in the hypervisor. And that is independent from
> >> PG_reserved.
> > 
> > I think you're confused by the meaning of "some of them might not even
> > exist".  What this means is that there might not be memory there; maybe
> > writes to that memory will be discarded, or maybe they'll cause a machine
> > check.  Maybe reads will return ~0, or 0, or cause a machine check.
> > We just don't know what's there, and we shouldn't try touching the memory.
> 
> If there are users, let's document it. And I need more details for that :)
> 
> 1. machine check: if there is a HW error, we set PG_hwpoison (except
> ia64 MCA, see the list)
> 
> 2. Writes to that memory will be discarded
> 
> Who is the user of that? When will we have such pages right now?
> 
> 3. Reads will return ~0, / 0?
> 
> I think this is a special case of e.g. x86? But where do we have that,
> are there any user?

When there are gaps in the physical memory.  As in, if you put that
physical address on the bus (or in a packet), no device will respond
to it.  Look:

00000000-00000fff : Reserved
00001000-00057fff : System RAM
00058000-00058fff : Reserved
00059000-0009dfff : System RAM
0009e000-000fffff : Reserved

Those examples I gave are examples of how various different architectures
respond to "no device responded to this memory access".


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 17:32       ` Matthew Wilcox
@ 2018-12-05 18:13         ` David Hildenbrand
  2018-12-06 10:46           ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2018-12-05 18:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On 05.12.18 18:32, Matthew Wilcox wrote:
> On Wed, Dec 05, 2018 at 04:05:12PM +0100, David Hildenbrand wrote:
>> On 05.12.18 15:35, Matthew Wilcox wrote:
>>> On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
>>>> I don't see a reason why we have to document "Some of them might not even
>>>> exist". If there is a user, we should document it. E.g. for balloon
>>>> drivers we now use PG_offline to indicate that a page might currently
>>>> not be backed by memory in the hypervisor. And that is independent from
>>>> PG_reserved.
>>>
>>> I think you're confused by the meaning of "some of them might not even
>>> exist".  What this means is that there might not be memory there; maybe
>>> writes to that memory will be discarded, or maybe they'll cause a machine
>>> check.  Maybe reads will return ~0, or 0, or cause a machine check.
>>> We just don't know what's there, and we shouldn't try touching the memory.
>>
>> If there are users, let's document it. And I need more details for that :)
>>
>> 1. machine check: if there is a HW error, we set PG_hwpoison (except
>> ia64 MCA, see the list)
>>
>> 2. Writes to that memory will be discarded
>>
>> Who is the user of that? When will we have such pages right now?
>>
>> 3. Reads will return ~0, / 0?
>>
>> I think this is a special case of e.g. x86? But where do we have that,
>> are there any user?
> 
> When there are gaps in the physical memory.  As in, if you put that
> physical address on the bus (or in a packet), no device will respond
> to it.  Look:
> 
> 00000000-00000fff : Reserved
> 00001000-00057fff : System RAM
> 00058000-00058fff : Reserved
> 00059000-0009dfff : System RAM
> 0009e000-000fffff : Reserved
> 
> Those examples I gave are examples of how various different architectures
> respond to "no device responded to this memory access".
> 

Okay, so for this memory we will have
a) vmmaps
b) Memory block devices
c) A sections that is online

So essentially "Gaps in physical memory" which is part of a online section.

This might be a candidate for PG_offline as well.

Thanks for the info, I'll try to find out how such things are handled.
In general I assume this memory has to be readable, because otherwise
kdump and friends would crash already when trying to dump?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 7/7] mm: better document PG_reserved
  2018-12-05 18:13         ` David Hildenbrand
@ 2018-12-06 10:46           ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2018-12-06 10:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek,
	Andrew Morton, Stephen Rothwell, Pavel Tatashin, Michal Hocko,
	Alexander Duyck, Anthony Yznaga, Miles Chen, yi.z.zhang,
	Dan Williams

On 05.12.18 19:13, David Hildenbrand wrote:
> On 05.12.18 18:32, Matthew Wilcox wrote:
>> On Wed, Dec 05, 2018 at 04:05:12PM +0100, David Hildenbrand wrote:
>>> On 05.12.18 15:35, Matthew Wilcox wrote:
>>>> On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
>>>>> I don't see a reason why we have to document "Some of them might not even
>>>>> exist". If there is a user, we should document it. E.g. for balloon
>>>>> drivers we now use PG_offline to indicate that a page might currently
>>>>> not be backed by memory in the hypervisor. And that is independent from
>>>>> PG_reserved.
>>>>
>>>> I think you're confused by the meaning of "some of them might not even
>>>> exist".  What this means is that there might not be memory there; maybe
>>>> writes to that memory will be discarded, or maybe they'll cause a machine
>>>> check.  Maybe reads will return ~0, or 0, or cause a machine check.
>>>> We just don't know what's there, and we shouldn't try touching the memory.
>>>
>>> If there are users, let's document it. And I need more details for that :)
>>>
>>> 1. machine check: if there is a HW error, we set PG_hwpoison (except
>>> ia64 MCA, see the list)
>>>
>>> 2. Writes to that memory will be discarded
>>>
>>> Who is the user of that? When will we have such pages right now?
>>>
>>> 3. Reads will return ~0, / 0?
>>>
>>> I think this is a special case of e.g. x86? But where do we have that,
>>> are there any user?
>>
>> When there are gaps in the physical memory.  As in, if you put that
>> physical address on the bus (or in a packet), no device will respond
>> to it.  Look:
>>
>> 00000000-00000fff : Reserved
>> 00001000-00057fff : System RAM
>> 00058000-00058fff : Reserved
>> 00059000-0009dfff : System RAM
>> 0009e000-000fffff : Reserved
>>
>> Those examples I gave are examples of how various different architectures
>> respond to "no device responded to this memory access".
>>
> 
> Okay, so for this memory we will have
> a) vmmaps
> b) Memory block devices
> c) A sections that is online
> 
> So essentially "Gaps in physical memory" which is part of a online section.
> 
> This might be a candidate for PG_offline as well.
> 
> Thanks for the info, I'll try to find out how such things are handled.
> In general I assume this memory has to be readable, because otherwise
> kdump and friends would crash already when trying to dump?
> 

So I finally understood how physical memory holes in online sections are
handled when dumping. They won't be dumped because the list of dumpable
chunks (contained in /proc/kcore and after a crash /proc/vmcore) is
built using walk_system_ram_range(). So anything not listed as RAM will
be ignored.

I will update the documentation, describing that if we have an online
section that is not completely IORESOURCE_SYSTEM_RAM, that the physical
memory gaps will also be set to PG_reserved.

Trying to touch this memory is indeed dangerous, luckily dumping is
properly handled.

I'll think about if marking these ranges as PG_offline might make sense
(and if it can be easily added). Then we directly know when seeing that
page type that we should not touch it. Ever.

That hint was really helpful :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 4/7] riscv/vdso: don't clear PG_reserved
  2018-12-05 12:28 ` [PATCH RFC 4/7] riscv/vdso: " David Hildenbrand
@ 2018-12-07 18:45   ` Palmer Dabbelt
  0 siblings, 0 replies; 18+ messages in thread
From: Palmer Dabbelt @ 2018-12-07 18:45 UTC (permalink / raw)
  To: david
  Cc: linux-mm, linux-kernel, linux-arm-kernel, linux-m68k,
	linuxppc-dev, linux-riscv, linux-s390, linux-mediatek, david,
	aou, tklauser, akpm, mhocko, willy

On Wed, 05 Dec 2018 04:28:48 PST (-0800), david@redhat.com wrote:
> The VDSO is part of the kernel image and therefore the struct pages are
> marked as reserved during boot.
>
> As we install a special mapping, the actual struct pages will never be
> exposed to MM via the page tables. We can therefore leave the pages
> marked as reserved.
>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: Tobias Klauser <tklauser@distanz.ch>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/riscv/kernel/vdso.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index 582cb153eb24..0cd044122234 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -54,7 +54,6 @@ static int __init vdso_init(void)
>  		struct page *pg;
>
>  		pg = virt_to_page(vdso_start + (i << PAGE_SHIFT));
> -		ClearPageReserved(pg);
>  		vdso_pagelist[i] = pg;
>  	}
>  	vdso_pagelist[i] = virt_to_page(vdso_data);

I'm going to assume this will go in through another tree along with the rest of 
the set assuming everyone else is happy with it.

Acked-by: Palmer Dabbelt <palmer@sifive.com>

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-12-07 18:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 12:28 [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation David Hildenbrand
2018-12-05 12:28 ` [PATCH RFC 1/7] agp: efficeon: no need to set PG_reserved on GATT tables David Hildenbrand
2018-12-05 12:28 ` [PATCH RFC 2/7] s390/vdso: don't clear PG_reserved David Hildenbrand
2018-12-05 12:28 ` [PATCH RFC 3/7] powerpc/vdso: " David Hildenbrand
2018-12-05 12:28 ` [PATCH RFC 4/7] riscv/vdso: " David Hildenbrand
2018-12-07 18:45   ` Palmer Dabbelt
2018-12-05 12:28 ` [PATCH RFC 5/7] m68k/mm: use __ClearPageReserved() David Hildenbrand
2018-12-05 12:28 ` [PATCH RFC 6/7] arm64: kexec: no need to ClearPageReserved() David Hildenbrand
2018-12-05 14:00   ` James Morse
2018-12-05 12:28 ` [PATCH RFC 7/7] mm: better document PG_reserved David Hildenbrand
2018-12-05 12:59   ` Michal Hocko
2018-12-05 14:35   ` Matthew Wilcox
2018-12-05 15:05     ` David Hildenbrand
2018-12-05 17:32       ` Matthew Wilcox
2018-12-05 18:13         ` David Hildenbrand
2018-12-06 10:46           ` David Hildenbrand
2018-12-05 12:56 ` [PATCH RFC 0/7] mm: PG_reserved cleanups and documentation Michal Hocko
2018-12-05 13:04   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).