linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
@ 2018-11-14 21:16 David Hildenbrand
  2018-11-14 21:16 ` [PATCH RFC 1/6] mm: balloon: update comment about isolation/migration/compaction David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Alexander Duyck, Alexey Dobriyan,
	Andrew Morton, Arnd Bergmann, Baoquan He, Boris Ostrovsky,
	Christian Hansen, Dave Young, David Rientjes, Haiyang Zhang,
	Jonathan Corbet, Juergen Gross, Kairui Song, Kirill A. Shutemov,
	K. Y. Srinivasan, Len Brown, Matthew Wilcox, Michael S. Tsirkin,
	Michal Hocko, Mike Rapoport, Miles Chen, Naoya Horiguchi,
	Omar Sandoval, Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka

Right now, pages inflated as part of a balloon driver will be dumped
by dump tools like makedumpfile. While XEN is able to check in the
crash kernel whether a certain pfn is actuall backed by memory in the
hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
virtio-balloon and hv-balloon inflated memory will essentially result in
zero pages getting allocated by the hypervisor and the dump getting
filled with this data.

The allocation and reading of zero pages can directly be avoided if a
dumping tool could know which pages only contain stale information not to
be dumped.

Also for XEN, calling into the kernel and asking the hypervisor if a
pfn is backed can be avoided if the duming tool would skip such pages
right from the beginning.

Dumping tools have no idea whether a given page is part of a balloon driver
and shall not be dumped. Esp. PG_reserved cannot be used for that purpose
as all memory allocated during early boot is also PG_reserved, see
discussion at [1]. So some other way of indication is required and a new
page flag is frowned upon.

We have PG_balloon (MAPCOUNT value), which is essentially unused now. I
suggest renaming it to something more generic (PG_offline) to mark pages as
logically offline. This flag can than e.g. also be used by virtio-mem in
the future to mark subsections as offline. Or by other code that wants to
put pages logically offline (e.g. later maybe poisoned pages that shall
no longer be used).

This series converts PG_balloon to PG_offline, allows dumping tools to
query the value to detect such pages and marks pages in the hv-balloon
and XEN balloon properly as PG_offline. Note that virtio-balloon already
set pages to PG_balloon (and now PG_offline).

Please note that this is also helpful for a problem we were seeing under
Hyper-V: Dumping logically offline memory (pages kept fake offline while
onlining a section via online_page_callback) would under some condicions
result in a kernel panic when dumping them.

As I don't have access to neither XEN nor Hyper-V installation, this was
not tested yet (and a makedumpfile change will be required to skip
dumping these pages).

[1] https://lkml.org/lkml/2018/7/20/566

David Hildenbrand (6):
  mm: balloon: update comment about isolation/migration/compaction
  mm: convert PG_balloon to PG_offline
  kexec: export PG_offline to VMCOREINFO
  xen/balloon: mark inflated pages PG_offline
  hv_balloon: mark inflated pages PG_offline
  PM / Hibernate: exclude all PageOffline() pages

 Documentation/admin-guide/mm/pagemap.rst |  6 +++++
 drivers/hv/hv_balloon.c                  | 14 ++++++++--
 drivers/xen/balloon.c                    |  3 +++
 fs/proc/page.c                           |  4 +--
 include/linux/balloon_compaction.h       | 34 +++++++++---------------
 include/linux/page-flags.h               | 11 +++++---
 include/uapi/linux/kernel-page-flags.h   |  1 +
 kernel/crash_core.c                      |  2 ++
 kernel/power/snapshot.c                  |  5 +++-
 tools/vm/page-types.c                    |  1 +
 10 files changed, 51 insertions(+), 30 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC 1/6] mm: balloon: update comment about isolation/migration/compaction
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
@ 2018-11-14 21:16 ` David Hildenbrand
  2018-11-14 21:17 ` [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline David Hildenbrand
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Andrew Morton, Matthew Wilcox,
	Michal Hocko, Michael S. Tsirkin

Commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page
feature") reworked balloon handling to make use of the general
non-lru movable page feature. The big comment block in
balloon_compaction.h contains quite some outdated information. Let's fix
this.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/balloon_compaction.h | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 53051f3d8f25..cbe50da5a59d 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -4,15 +4,18 @@
  *
  * Common interface definitions for making balloon pages movable by compaction.
  *
- * Despite being perfectly possible to perform ballooned pages migration, they
- * make a special corner case to compaction scans because balloon pages are not
- * enlisted at any LRU list like the other pages we do compact / migrate.
+ * Balloon page migration makes use of the general non-lru movable page
+ * feature.
+ *
+ * page->private is used to reference the responsible balloon device.
+ * page->mapping is used in context of non-lru page migration to reference
+ * the address space operations for page isolation/migration/compaction.
  *
  * As the page isolation scanning step a compaction thread does is a lockless
  * procedure (from a page standpoint), it might bring some racy situations while
  * performing balloon page compaction. In order to sort out these racy scenarios
  * and safely perform balloon's page compaction and migration we must, always,
- * ensure following these three simple rules:
+ * ensure following these simple rules:
  *
  *   i. when updating a balloon's page ->mapping element, strictly do it under
  *      the following lock order, independently of the far superior
@@ -21,19 +24,8 @@
  *	      +--spin_lock_irq(&b_dev_info->pages_lock);
  *	            ... page->mapping updates here ...
  *
- *  ii. before isolating or dequeueing a balloon page from the balloon device
- *      pages list, the page reference counter must be raised by one and the
- *      extra refcount must be dropped when the page is enqueued back into
- *      the balloon device page list, thus a balloon page keeps its reference
- *      counter raised only while it is under our special handling;
- *
- * iii. after the lockless scan step have selected a potential balloon page for
- *      isolation, re-test the PageBalloon mark and the PagePrivate flag
- *      under the proper page lock, to ensure isolating a valid balloon page
- *      (not yet isolated, nor under release procedure)
- *
- *  iv. isolation or dequeueing procedure must clear PagePrivate flag under
- *      page lock together with removing page from balloon device page list.
+ *  ii. isolation or dequeueing procedure must remove the page from balloon
+ *      device page list under b_dev_info->pages_lock.
  *
  * The functions provided by this interface are placed to help on coping with
  * the aforementioned balloon page corner case, as well as to ensure the simple
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
  2018-11-14 21:16 ` [PATCH RFC 1/6] mm: balloon: update comment about isolation/migration/compaction David Hildenbrand
@ 2018-11-14 21:17 ` David Hildenbrand
  2018-11-14 22:23   ` Matthew Wilcox
  2018-11-14 21:17 ` [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO David Hildenbrand
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:17 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Jonathan Corbet, Alexey Dobriyan,
	Mike Rapoport, Andrew Morton, Christian Hansen, Vlastimil Babka,
	Kirill A. Shutemov, Stephen Rothwell, Matthew Wilcox,
	Michael S. Tsirkin, Michal Hocko, Pavel Tatashin,
	Alexander Duyck, Naoya Horiguchi, Miles Chen, David Rientjes

PG_balloon was introduced to implement page migration/compaction for pages
inflated in virtio-balloon. Nowadays, it is only a marker that a page is
part of virtio-balloon and therefore logically offline.

We also want to make use of this flag in other balloon drivers - for
inflated pages or when onlining a section but keeping some pages offline
(e.g. used right now by XEN and Hyper-V via set_online_page_callback()).

We are going to expose this flag to dump tools like makedumpfile. But
instead of exposing PG_balloon, let's generalize the concept of marking
pages as logically offline, so it can be reused for other purposes
later on.

Rename PG_balloon to PG_offline. This is an indicator that the page is
logically offline, the content stale and that it should not be touched
(e.g. a hypervisor would have to allocate backing storage in order for the
guest to dump an unused page).  We can then e.g. exclude such pages from
dumps.

In following patches, we will make use of this bit also in other balloon
drivers.  While at it, document PGTABLE.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/admin-guide/mm/pagemap.rst |  6 ++++++
 fs/proc/page.c                           |  4 ++--
 include/linux/balloon_compaction.h       |  8 ++++----
 include/linux/page-flags.h               | 11 +++++++----
 include/uapi/linux/kernel-page-flags.h   |  1 +
 tools/vm/page-types.c                    |  1 +
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index 3f7bade2c231..9afd6bdc424b 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -78,6 +78,8 @@ number of times a page is mapped.
     23. BALLOON
     24. ZERO_PAGE
     25. IDLE
+    26. PGTABLE
+    27. OFFLINE
 
  * ``/proc/kpagecgroup``.  This file contains a 64-bit inode number of the
    memory cgroup each page is charged to, indexed by PFN. Only available when
@@ -128,6 +130,10 @@ Short descriptions to the page flags
     Note that this flag may be stale in case the page was accessed via
     a PTE. To make sure the flag is up-to-date one has to read
     ``/sys/kernel/mm/page_idle/bitmap`` first.
+26 - PGTABLE
+    page is in use as a page table
+27 - OFFLINE
+    page is logically offline
 
 IO related page flags
 ---------------------
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 6c517b11acf8..378401af4d9d 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -152,8 +152,8 @@ u64 stable_page_flags(struct page *page)
 	else if (page_count(page) == 0 && is_free_buddy_page(page))
 		u |= 1 << KPF_BUDDY;
 
-	if (PageBalloon(page))
-		u |= 1 << KPF_BALLOON;
+	if (PageOffline(page))
+		u |= 1 << KPF_OFFLINE;
 	if (PageTable(page))
 		u |= 1 << KPF_PGTABLE;
 
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index cbe50da5a59d..f111c780ef1d 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -95,7 +95,7 @@ extern int balloon_page_migrate(struct address_space *mapping,
 static inline void balloon_page_insert(struct balloon_dev_info *balloon,
 				       struct page *page)
 {
-	__SetPageBalloon(page);
+	__SetPageOffline(page);
 	__SetPageMovable(page, balloon->inode->i_mapping);
 	set_page_private(page, (unsigned long)balloon);
 	list_add(&page->lru, &balloon->pages);
@@ -111,7 +111,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
  */
 static inline void balloon_page_delete(struct page *page)
 {
-	__ClearPageBalloon(page);
+	__ClearPageOffline(page);
 	__ClearPageMovable(page);
 	set_page_private(page, 0);
 	/*
@@ -141,13 +141,13 @@ static inline gfp_t balloon_mapping_gfp_mask(void)
 static inline void balloon_page_insert(struct balloon_dev_info *balloon,
 				       struct page *page)
 {
-	__SetPageBalloon(page);
+	__SetPageOffline(page);
 	list_add(&page->lru, &balloon->pages);
 }
 
 static inline void balloon_page_delete(struct page *page)
 {
-	__ClearPageBalloon(page);
+	__ClearPageOffline(page);
 	list_del(&page->lru);
 }
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 50ce1bddaf56..f91da3d0a67e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -670,7 +670,7 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PAGE_TYPE_BASE	0xf0000000
 /* Reserve		0x0000007f to catch underflows of page_mapcount */
 #define PG_buddy	0x00000080
-#define PG_balloon	0x00000100
+#define PG_offline	0x00000100
 #define PG_kmemcg	0x00000200
 #define PG_table	0x00000400
 
@@ -700,10 +700,13 @@ static __always_inline void __ClearPage##uname(struct page *page)	\
 PAGE_TYPE_OPS(Buddy, buddy)
 
 /*
- * PageBalloon() is true for pages that are on the balloon page list
- * (see mm/balloon_compaction.c).
+ * PageOffline() indicates that the pages is logically offline although the
+ * containing section is online. (e.g. inflated in a balloon driver or
+ * not onlined when onlining the section).
+ * The content of these pages is effectively stale. Such pages should not
+ * be touched (read/write/dump/save) except by their owner.
  */
-PAGE_TYPE_OPS(Balloon, balloon)
+PAGE_TYPE_OPS(Offline, offline)
 
 /*
  * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index 21b9113c69da..6c662eb0dab8 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -36,5 +36,6 @@
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
 #define KPF_PGTABLE		26
+#define KPF_OFFLINE		27
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 37908a83ddc2..b219c2eafd6a 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -137,6 +137,7 @@ static const char * const page_flag_names[] = {
 	[KPF_PGTABLE]		= "g:pgtable",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
+	[KPF_OFFLINE]		= "o:offline",
 
 	[KPF_RESERVED]		= "r:reserved",
 	[KPF_MLOCKED]		= "m:mlocked",
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
  2018-11-14 21:16 ` [PATCH RFC 1/6] mm: balloon: update comment about isolation/migration/compaction David Hildenbrand
  2018-11-14 21:17 ` [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline David Hildenbrand
@ 2018-11-14 21:17 ` David Hildenbrand
  2018-11-15  6:19   ` Dave Young
  2018-11-14 21:17 ` [PATCH RFC 4/6] xen/balloon: mark inflated pages PG_offline David Hildenbrand
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:17 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Andrew Morton, Dave Young,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Michal Hocko, Michael S. Tsirkin

Let's export PG_offline via PAGE_OFFLINE_MAPCOUNT_VALUE, so
makedumpfile can directly skip pages that are logically offline and the
content therefore stale.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 kernel/crash_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 933cb3e45b98..093c9f917ed0 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -464,6 +464,8 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
 	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
+#define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
+	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
 #endif
 
 	arch_crash_save_vmcoreinfo();
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 4/6] xen/balloon: mark inflated pages PG_offline
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
                   ` (2 preceding siblings ...)
  2018-11-14 21:17 ` [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO David Hildenbrand
@ 2018-11-14 21:17 ` David Hildenbrand
  2018-11-14 21:17 ` [PATCH RFC 5/6] hv_balloon: " David Hildenbrand
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:17 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Andrew Morton, Matthew Wilcox, Michal Hocko,
	Michael S. Tsirkin

Mark inflated and never onlined pages PG_offline, to tell the world that
the content is stale and should not be dumped.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/xen/balloon.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 12148289debd..14dd6b814db3 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -425,6 +425,7 @@ static int xen_bring_pgs_online(struct page *pg, unsigned int order)
 	for (i = 0; i < size; i++) {
 		p = pfn_to_page(start_pfn + i);
 		__online_page_set_limits(p);
+		__SetPageOffline(p);
 		__balloon_append(p);
 	}
 	mutex_unlock(&balloon_mutex);
@@ -493,6 +494,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 		xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
 
 		/* Relinquish the page back to the allocator. */
+		__ClearPageOffline(page);
 		free_reserved_page(page);
 	}
 
@@ -519,6 +521,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 			state = BP_EAGAIN;
 			break;
 		}
+		__SetPageOffline(page);
 		adjust_managed_page_count(page, -1);
 		xenmem_reservation_scrub_page(page);
 		list_add(&page->lru, &pages);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 5/6] hv_balloon: mark inflated pages PG_offline
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
                   ` (3 preceding siblings ...)
  2018-11-14 21:17 ` [PATCH RFC 4/6] xen/balloon: mark inflated pages PG_offline David Hildenbrand
@ 2018-11-14 21:17 ` David Hildenbrand
  2018-11-14 21:17 ` [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:17 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Kairui Song, Vitaly Kuznetsov, Andrew Morton,
	Matthew Wilcox, Michal Hocko, Michael S. Tsirkin

Mark inflated and never onlined pages PG_offline, to tell the world that
the content is stale and should not be dumped.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/hv/hv_balloon.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5728dc470eeb..778b6f879d1c 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -681,8 +681,13 @@ static struct notifier_block hv_memory_nb = {
 /* Check if the particular page is backed and can be onlined and online it. */
 static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg)
 {
-	if (!has_pfn_is_backed(has, page_to_pfn(pg)))
+	if (!has_pfn_is_backed(has, page_to_pfn(pg))) {
+		if (!PageOffline(pg))
+			__SetPageOffline(pg);
 		return;
+	}
+	if (PageOffline(pg))
+		__ClearPageOffline(pg);
 
 	/* This frame is currently backed; online the page. */
 	__online_page_set_limits(pg);
@@ -1200,6 +1205,7 @@ static void free_balloon_pages(struct hv_dynmem_device *dm,
 
 	for (i = 0; i < num_pages; i++) {
 		pg = pfn_to_page(i + start_frame);
+		__ClearPageOffline(pg);
 		__free_page(pg);
 		dm->num_pages_ballooned--;
 	}
@@ -1212,7 +1218,7 @@ static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 					struct dm_balloon_response *bl_resp,
 					int alloc_unit)
 {
-	unsigned int i = 0;
+	unsigned int i, j;
 	struct page *pg;
 
 	if (num_pages < alloc_unit)
@@ -1244,6 +1250,10 @@ static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 		if (alloc_unit != 1)
 			split_page(pg, get_order(alloc_unit << PAGE_SHIFT));
 
+		/* mark all pages offline */
+		for (j = 0; j < (1 << get_order(alloc_unit << PAGE_SHIFT)); j++)
+			__SetPageOffline(pg + j);
+
 		bl_resp->range_count++;
 		bl_resp->range_array[i].finfo.start_page =
 			page_to_pfn(pg);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
                   ` (4 preceding siblings ...)
  2018-11-14 21:17 ` [PATCH RFC 5/6] hv_balloon: " David Hildenbrand
@ 2018-11-14 21:17 ` David Hildenbrand
  2018-11-15  7:48   ` Pavel Machek
  2018-11-15 12:23   ` Michal Hocko
  2018-11-14 22:57 ` [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline Nadav Amit
  2018-11-16 18:23 ` David Hildenbrand
  7 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 21:17 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, David Hildenbrand, Rafael J. Wysocki, Pavel Machek,
	Len Brown, Andrew Morton, Matthew Wilcox, Michal Hocko,
	Michael S. Tsirkin

The content of pages that are marked PG_offline is not of interest
(e.g. inflated by a balloon driver), let's skip these pages.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 kernel/power/snapshot.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index b0308a2c6000..01db1d13481a 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1222,7 +1222,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
 	BUG_ON(!PageHighMem(page));
 
 	if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
-	    PageReserved(page))
+	    PageReserved(page) || PageOffline(page))
 		return NULL;
 
 	if (page_is_guard(page))
@@ -1286,6 +1286,9 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
 	if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
 		return NULL;
 
+	if (PageOffline(page))
+		return NULL;
+
 	if (PageReserved(page)
 	    && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
 		return NULL;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-14 21:17 ` [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline David Hildenbrand
@ 2018-11-14 22:23   ` Matthew Wilcox
  2018-11-14 22:49     ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2018-11-14 22:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Jonathan Corbet, Alexey Dobriyan,
	Mike Rapoport, Andrew Morton, Christian Hansen, Vlastimil Babka,
	Kirill A. Shutemov, Stephen Rothwell, Michael S. Tsirkin,
	Michal Hocko, Pavel Tatashin, Alexander Duyck, Naoya Horiguchi,
	Miles Chen, David Rientjes

On Wed, Nov 14, 2018 at 10:17:00PM +0100, David Hildenbrand wrote:
> Rename PG_balloon to PG_offline. This is an indicator that the page is
> logically offline, the content stale and that it should not be touched
> (e.g. a hypervisor would have to allocate backing storage in order for the
> guest to dump an unused page).  We can then e.g. exclude such pages from
> dumps.
> 
> In following patches, we will make use of this bit also in other balloon
> drivers.  While at it, document PGTABLE.

Thank you for documenting PGTABLE.  I didn't realise I also had this
document to update when I added PGTABLE.

> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -78,6 +78,8 @@ number of times a page is mapped.
>      23. BALLOON
>      24. ZERO_PAGE
>      25. IDLE
> +    26. PGTABLE
> +    27. OFFLINE

So the offline *user* bit is new ... even though the *kernel* bit
just renames the balloon bit.  I'm not sure how I feel about this.
I'm going to think about it some more.  Could you share your decision
process with us?

I have no objection to renaming the balloon bit inside the kernel; I
think that's a wise idea.  I'm just not sure whether we should rename
the user balloon bit rather than adding a new bit.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-14 22:23   ` Matthew Wilcox
@ 2018-11-14 22:49     ` David Hildenbrand
  2018-11-15  2:07       ` Mike Rapoport
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 22:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Jonathan Corbet, Alexey Dobriyan,
	Mike Rapoport, Andrew Morton, Christian Hansen, Vlastimil Babka,
	Kirill A. Shutemov, Stephen Rothwell, Michael S. Tsirkin,
	Michal Hocko, Pavel Tatashin, Alexander Duyck, Naoya Horiguchi,
	Miles Chen, David Rientjes

On 14.11.18 23:23, Matthew Wilcox wrote:
> On Wed, Nov 14, 2018 at 10:17:00PM +0100, David Hildenbrand wrote:
>> Rename PG_balloon to PG_offline. This is an indicator that the page is
>> logically offline, the content stale and that it should not be touched
>> (e.g. a hypervisor would have to allocate backing storage in order for the
>> guest to dump an unused page).  We can then e.g. exclude such pages from
>> dumps.
>>
>> In following patches, we will make use of this bit also in other balloon
>> drivers.  While at it, document PGTABLE.
> 
> Thank you for documenting PGTABLE.  I didn't realise I also had this
> document to update when I added PGTABLE.

Thank you for looking into this :)

> 
>> +++ b/Documentation/admin-guide/mm/pagemap.rst
>> @@ -78,6 +78,8 @@ number of times a page is mapped.
>>      23. BALLOON
>>      24. ZERO_PAGE
>>      25. IDLE
>> +    26. PGTABLE
>> +    27. OFFLINE
> 
> So the offline *user* bit is new ... even though the *kernel* bit
> just renames the balloon bit.  I'm not sure how I feel about this.
> I'm going to think about it some more.  Could you share your decision
> process with us?

BALLOON was/is documented as

"23 - BALLOON
    balloon compaction page
"

and only includes all virtio-ballon pages after the non-lru migration
feature has been implemented for ballooned pages. Since then, this flag
does basically no longer stands for what it actually was supposed to do.

To not break uapi I decided to not rename it but instead to add a new flag.

> 
> I have no objection to renaming the balloon bit inside the kernel; I
> think that's a wise idea.  I'm just not sure whether we should rename
> the user balloon bit rather than adding a new bit.
> 

Can we rename without breaking uapi?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
                   ` (5 preceding siblings ...)
  2018-11-14 21:17 ` [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages David Hildenbrand
@ 2018-11-14 22:57 ` Nadav Amit
  2018-11-14 23:05   ` David Hildenbrand
  2018-11-16 18:23 ` David Hildenbrand
  7 siblings, 1 reply; 29+ messages in thread
From: Nadav Amit @ 2018-11-14 22:57 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: LKML, open list:DOCUMENTATION, devel, linux-fsdevel, linux-pm,
	xen-devel, Alexander Duyck, Alexey Dobriyan, Andrew Morton,
	Arnd Bergmann, Baoquan He, Boris Ostrovsky, Christian Hansen,
	Dave Young, David Rientjes, Haiyang Zhang, Jonathan Corbet,
	Juergen Gross, Kairui Song, Kirill A. Shutemov, K. Y. Srinivasan,
	Len Brown, Matthew Wilcox, Michael S. Tsirkin, Michal Hocko,
	Mike Rapoport, Miles Chen, Naoya Horiguchi, Omar Sandoval,
	Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka, Julien Freche

From: David Hildenbrand
Sent: November 14, 2018 at 9:16:58 PM GMT
> Subject: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
> 
> 
> Right now, pages inflated as part of a balloon driver will be dumped
> by dump tools like makedumpfile. While XEN is able to check in the
> crash kernel whether a certain pfn is actuall backed by memory in the
> hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
> virtio-balloon and hv-balloon inflated memory will essentially result in
> zero pages getting allocated by the hypervisor and the dump getting
> filled with this data.

Is there any reason that VMware balloon driver is not mentioned?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
  2018-11-14 22:57 ` [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline Nadav Amit
@ 2018-11-14 23:05   ` David Hildenbrand
  2018-11-14 23:41     ` Nadav Amit
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-14 23:05 UTC (permalink / raw)
  To: Nadav Amit, linux-mm
  Cc: LKML, open list:DOCUMENTATION, devel, linux-fsdevel, linux-pm,
	xen-devel, Alexander Duyck, Alexey Dobriyan, Andrew Morton,
	Arnd Bergmann, Baoquan He, Boris Ostrovsky, Christian Hansen,
	Dave Young, David Rientjes, Haiyang Zhang, Jonathan Corbet,
	Juergen Gross, Kairui Song, Kirill A. Shutemov, K. Y. Srinivasan,
	Len Brown, Matthew Wilcox, Michael S. Tsirkin, Michal Hocko,
	Mike Rapoport, Miles Chen, Naoya Horiguchi, Omar Sandoval,
	Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka, Julien Freche

On 14.11.18 23:57, Nadav Amit wrote:
> From: David Hildenbrand
> Sent: November 14, 2018 at 9:16:58 PM GMT
>> Subject: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
>>
>>
>> Right now, pages inflated as part of a balloon driver will be dumped
>> by dump tools like makedumpfile. While XEN is able to check in the
>> crash kernel whether a certain pfn is actuall backed by memory in the
>> hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
>> virtio-balloon and hv-balloon inflated memory will essentially result in
>> zero pages getting allocated by the hypervisor and the dump getting
>> filled with this data.
> 
> Is there any reason that VMware balloon driver is not mentioned?

Definitely ...

... not ;) . I haven't looked at vmware's balloon driver yet (I only saw
that there was quite some activity recently). I guess it should have
similar problems. (I mean reading and dumping data nobody cares about is
certainly not desired)

Can you share if something like this is also desired for vmware's
implementation? (I tagged this as RFC to get some more feedback)

It should in theory be as simple as adding a handful of
_SetPageOffline()/_ClearPageOffline() at the right spots.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
  2018-11-14 23:05   ` David Hildenbrand
@ 2018-11-14 23:41     ` Nadav Amit
  2018-11-15  1:42       ` Julien Freche
  0 siblings, 1 reply; 29+ messages in thread
From: Nadav Amit @ 2018-11-14 23:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm, Julien Freche
  Cc: LKML, open list:DOCUMENTATION, devel, linux-fsdevel, linux-pm,
	xen-devel, Alexander Duyck, Alexey Dobriyan, Andrew Morton,
	Arnd Bergmann, Baoquan He, Boris Ostrovsky, Christian Hansen,
	Dave Young, David Rientjes, Haiyang Zhang, Jonathan Corbet,
	Juergen Gross, Kairui Song, Kirill A. Shutemov, K. Y. Srinivasan,
	Len Brown, Matthew Wilcox, Michael S. Tsirkin, Michal Hocko,
	Mike Rapoport, Miles Chen, Naoya Horiguchi, Omar Sandoval,
	Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka

From: David Hildenbrand
Sent: November 14, 2018 at 11:05:38 PM GMT
> Subject: Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
> 
> 
> On 14.11.18 23:57, Nadav Amit wrote:
>> From: David Hildenbrand
>> Sent: November 14, 2018 at 9:16:58 PM GMT
>>> Subject: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
>>> 
>>> 
>>> Right now, pages inflated as part of a balloon driver will be dumped
>>> by dump tools like makedumpfile. While XEN is able to check in the
>>> crash kernel whether a certain pfn is actuall backed by memory in the
>>> hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
>>> virtio-balloon and hv-balloon inflated memory will essentially result in
>>> zero pages getting allocated by the hypervisor and the dump getting
>>> filled with this data.
>> 
>> Is there any reason that VMware balloon driver is not mentioned?
> 
> Definitely ...
> 
> ... not ;) . I haven't looked at vmware's balloon driver yet (I only saw
> that there was quite some activity recently). I guess it should have
> similar problems. (I mean reading and dumping data nobody cares about is
> certainly not desired)
> 
> Can you share if something like this is also desired for vmware's
> implementation? (I tagged this as RFC to get some more feedback)
> 
> It should in theory be as simple as adding a handful of
> _SetPageOffline()/_ClearPageOffline() at the right spots.

Thanks, I was just suspecting it is personal ;-)

Actually, some patches that I sent for 4.20 to use the balloon-compaction
infrastructure by the VMware balloon fell between the cracks, and I need
to resend them.

I would obviously prefer that your changes would be done on top of those
that were skipped. This patch-set sounds very reasonable to me, but I prefer
that Julien (cc’d) would also give his opinion.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
  2018-11-14 23:41     ` Nadav Amit
@ 2018-11-15  1:42       ` Julien Freche
  0 siblings, 0 replies; 29+ messages in thread
From: Julien Freche @ 2018-11-15  1:42 UTC (permalink / raw)
  To: Nadav Amit, David Hildenbrand, linux-mm
  Cc: LKML, open list:DOCUMENTATION, devel, linux-fsdevel, linux-pm,
	xen-devel, Alexander Duyck, Alexey Dobriyan, Andrew Morton,
	Arnd Bergmann, Baoquan He, Boris Ostrovsky, Christian Hansen,
	Dave Young, David Rientjes, Haiyang Zhang, Jonathan Corbet,
	Juergen Gross, Kairui Song, Kirill A. Shutemov, K. Y. Srinivasan,
	Len Brown, Matthew Wilcox, Michael S. Tsirkin, Michal Hocko,
	Mike Rapoport, Miles Chen, Naoya Horiguchi, Omar Sandoval,
	Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka

>On 11/14/18, 3:41 PM, "Nadav Amit" <namit@vmware.com> wrote:
>>From: David Hildenbrand
>>Sent: November 14, 2018 at 11:05:38 PM GMT
>> Subject: Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
>> 
>> 
>> Can you share if something like this is also desired for vmware's
>> implementation? (I tagged this as RFC to get some more feedback)
>> 
>> It should in theory be as simple as adding a handful of
>> _SetPageOffline()/_ClearPageOffline() at the right spots.
>    
> Thanks, I was just suspecting it is personal ;-)
>
> I would obviously prefer that your changes would be done on top of those
> that were skipped. This patch-set sounds very reasonable to me, but I prefer
> that Julien (cc’d) would also give his opinion.
    
I think this is desirable for VMware's implementation also. You are right,
dumping data that is not relevant is a waste :-) 
I haven't heard of any panic/issue due to this but that's still a good optimization.

Nadav or I could help to test that on ESX if required.

Regards,

-- 
Julien Freche


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-14 22:49     ` David Hildenbrand
@ 2018-11-15  2:07       ` Mike Rapoport
  2018-11-15  9:21         ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Rapoport @ 2018-11-15  2:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Jonathan Corbet,
	Alexey Dobriyan, Mike Rapoport, Andrew Morton, Christian Hansen,
	Vlastimil Babka, Kirill A. Shutemov, Stephen Rothwell,
	Michael S. Tsirkin, Michal Hocko, Pavel Tatashin,
	Alexander Duyck, Naoya Horiguchi, Miles Chen, David Rientjes

On Wed, Nov 14, 2018 at 11:49:15PM +0100, David Hildenbrand wrote:
> On 14.11.18 23:23, Matthew Wilcox wrote:
> > On Wed, Nov 14, 2018 at 10:17:00PM +0100, David Hildenbrand wrote:
> >> Rename PG_balloon to PG_offline. This is an indicator that the page is
> >> logically offline, the content stale and that it should not be touched
> >> (e.g. a hypervisor would have to allocate backing storage in order for the
> >> guest to dump an unused page).  We can then e.g. exclude such pages from
> >> dumps.
> >>
> >> In following patches, we will make use of this bit also in other balloon
> >> drivers.  While at it, document PGTABLE.
> > 
> > Thank you for documenting PGTABLE.  I didn't realise I also had this
> > document to update when I added PGTABLE.
> 
> Thank you for looking into this :)
> 
> > 
> >> +++ b/Documentation/admin-guide/mm/pagemap.rst
> >> @@ -78,6 +78,8 @@ number of times a page is mapped.
> >>      23. BALLOON
> >>      24. ZERO_PAGE
> >>      25. IDLE
> >> +    26. PGTABLE
> >> +    27. OFFLINE
> > 
> > So the offline *user* bit is new ... even though the *kernel* bit
> > just renames the balloon bit.  I'm not sure how I feel about this.
> > I'm going to think about it some more.  Could you share your decision
> > process with us?
> 
> BALLOON was/is documented as
> 
> "23 - BALLOON
>     balloon compaction page
> "
> 
> and only includes all virtio-ballon pages after the non-lru migration
> feature has been implemented for ballooned pages. Since then, this flag
> does basically no longer stands for what it actually was supposed to do.

Perhaps I missing something, but how the user should interpret "23" when he
reads /proc/kpageflags?

> To not break uapi I decided to not rename it but instead to add a new flag.

I've got a feeling that uapi was anyway changed for the BALLON flag
meaning.
 
> > 
> > I have no objection to renaming the balloon bit inside the kernel; I
> > think that's a wise idea.  I'm just not sure whether we should rename
> > the user balloon bit rather than adding a new bit.
> > 
> 
> Can we rename without breaking uapi?
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-14 21:17 ` [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO David Hildenbrand
@ 2018-11-15  6:19   ` Dave Young
  2018-11-15  9:23     ` David Hildenbrand
  2018-11-15 11:10     ` Borislav Petkov
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Young @ 2018-11-15  6:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Andrew Morton, Kirill A. Shutemov,
	Baoquan He, Omar Sandoval, Arnd Bergmann, Matthew Wilcox,
	Michal Hocko, Lianbo Jiang, Borislav Petkov, Michael S. Tsirkin

Hi David,

On 11/14/18 at 10:17pm, David Hildenbrand wrote:
> Let's export PG_offline via PAGE_OFFLINE_MAPCOUNT_VALUE, so
> makedumpfile can directly skip pages that are logically offline and the
> content therefore stale.

It would be good to copy some background info from cover letter to the
patch description so that we can get better understanding why this is
needed now.

BTW, Lianbo is working on a documentation of the vmcoreinfo exported
fields. Ccing him so that he is aware of this.

Also cc Boris,  although I do not want the doc changes blocks this
he might have different opinion :)

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  kernel/crash_core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 933cb3e45b98..093c9f917ed0 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -464,6 +464,8 @@ static int __init crash_save_vmcoreinfo_init(void)
>  	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
>  #ifdef CONFIG_HUGETLB_PAGE
>  	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
> +#define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
> +	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
>  #endif
>  
>  	arch_crash_save_vmcoreinfo();
> -- 
> 2.17.2
> 

Thanks
Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages
  2018-11-14 21:17 ` [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages David Hildenbrand
@ 2018-11-15  7:48   ` Pavel Machek
  2018-11-15 12:23   ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Pavel Machek @ 2018-11-15  7:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Rafael J. Wysocki, Len Brown, Andrew Morton,
	Matthew Wilcox, Michal Hocko, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On Wed 2018-11-14 22:17:04, David Hildenbrand wrote:
> The content of pages that are marked PG_offline is not of interest
> (e.g. inflated by a balloon driver), let's skip these pages.
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>

Acked-by: Pavel Machek <pavel@ucw.cz>

> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index b0308a2c6000..01db1d13481a 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1222,7 +1222,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
>  	BUG_ON(!PageHighMem(page));
>  
>  	if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
> -	    PageReserved(page))
> +	    PageReserved(page) || PageOffline(page))
>  		return NULL;
>  
>  	if (page_is_guard(page))
> @@ -1286,6 +1286,9 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
>  	if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
>  		return NULL;
>  
> +	if (PageOffline(page))
> +		return NULL;
> +
>  	if (PageReserved(page)
>  	    && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
>  		return NULL;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-15  2:07       ` Mike Rapoport
@ 2018-11-15  9:21         ` David Hildenbrand
  2018-11-15 12:19           ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-15  9:21 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Matthew Wilcox, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Jonathan Corbet,
	Alexey Dobriyan, Mike Rapoport, Andrew Morton, Christian Hansen,
	Vlastimil Babka, Kirill A. Shutemov, Stephen Rothwell,
	Michael S. Tsirkin, Michal Hocko, Pavel Tatashin,
	Alexander Duyck, Naoya Horiguchi, Miles Chen, David Rientjes

On 15.11.18 03:07, Mike Rapoport wrote:
> On Wed, Nov 14, 2018 at 11:49:15PM +0100, David Hildenbrand wrote:
>> On 14.11.18 23:23, Matthew Wilcox wrote:
>>> On Wed, Nov 14, 2018 at 10:17:00PM +0100, David Hildenbrand wrote:
>>>> Rename PG_balloon to PG_offline. This is an indicator that the page is
>>>> logically offline, the content stale and that it should not be touched
>>>> (e.g. a hypervisor would have to allocate backing storage in order for the
>>>> guest to dump an unused page).  We can then e.g. exclude such pages from
>>>> dumps.
>>>>
>>>> In following patches, we will make use of this bit also in other balloon
>>>> drivers.  While at it, document PGTABLE.
>>>
>>> Thank you for documenting PGTABLE.  I didn't realise I also had this
>>> document to update when I added PGTABLE.
>>
>> Thank you for looking into this :)
>>
>>>
>>>> +++ b/Documentation/admin-guide/mm/pagemap.rst
>>>> @@ -78,6 +78,8 @@ number of times a page is mapped.
>>>>      23. BALLOON
>>>>      24. ZERO_PAGE
>>>>      25. IDLE
>>>> +    26. PGTABLE
>>>> +    27. OFFLINE
>>>
>>> So the offline *user* bit is new ... even though the *kernel* bit
>>> just renames the balloon bit.  I'm not sure how I feel about this.
>>> I'm going to think about it some more.  Could you share your decision
>>> process with us?
>>
>> BALLOON was/is documented as
>>
>> "23 - BALLOON
>>     balloon compaction page
>> "
>>
>> and only includes all virtio-ballon pages after the non-lru migration
>> feature has been implemented for ballooned pages. Since then, this flag
>> does basically no longer stands for what it actually was supposed to do.
> 
> Perhaps I missing something, but how the user should interpret "23" when he
> reads /proc/kpageflags?

Looking at the history in more detail:

commit 09316c09dde33aae14f34489d9e3d243ec0d5938
Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Date:   Thu Oct 9 15:29:32 2014 -0700

    mm/balloon_compaction: add vmstat counters and kpageflags bit

    Always mark pages with PageBalloon even if balloon compaction is
disabled
    and expose this mark in /proc/kpageflags as KPF_BALLOON.


So KPF_BALLOON was exposed when virtio-balloon pages were always marked
with PG_balloon. So the documentation is actually wrong ("balloon page"
vs. "balloon compaction page").

I have no idea who actually used that information. I suspect this was
just some debugging aid.

> 
>> To not break uapi I decided to not rename it but instead to add a new flag.
> 
> I've got a feeling that uapi was anyway changed for the BALLON flag
> meaning.

Yes. If we *replace* KPF_BALLOON by KPF_OFFLINE

a) Some applications might no longer compile (I guess that's ok)
b) Some old applications will treat KPF_OFFLINE like KPF_BALLOON (which
should at least for virtio-balloon usage until now be fine - it is just
more generic)

So I guess it's up to Maintainers/Matthew to decide :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15  6:19   ` Dave Young
@ 2018-11-15  9:23     ` David Hildenbrand
  2018-11-15 11:10     ` Borislav Petkov
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-15  9:23 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Andrew Morton, Kirill A. Shutemov,
	Baoquan He, Omar Sandoval, Arnd Bergmann, Matthew Wilcox,
	Michal Hocko, Lianbo Jiang, Borislav Petkov, Michael S. Tsirkin

On 15.11.18 07:19, Dave Young wrote:
> Hi David,
> 
> On 11/14/18 at 10:17pm, David Hildenbrand wrote:
>> Let's export PG_offline via PAGE_OFFLINE_MAPCOUNT_VALUE, so
>> makedumpfile can directly skip pages that are logically offline and the
>> content therefore stale.
> 
> It would be good to copy some background info from cover letter to the
> patch description so that we can get better understanding why this is
> needed now.

Yes, will add more detail!

> 
> BTW, Lianbo is working on a documentation of the vmcoreinfo exported
> fields. Ccing him so that he is aware of this.
> 
> Also cc Boris,  although I do not want the doc changes blocks this
> he might have different opinion :)

I'll be happy to help updating documentation (or updating it myself if
the doc updates go in first).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15  6:19   ` Dave Young
  2018-11-15  9:23     ` David Hildenbrand
@ 2018-11-15 11:10     ` Borislav Petkov
  2018-11-15 11:20       ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2018-11-15 11:10 UTC (permalink / raw)
  To: Dave Young
  Cc: David Hildenbrand, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Michal Hocko, Lianbo Jiang, Michael S. Tsirkin

On Thu, Nov 15, 2018 at 02:19:23PM +0800, Dave Young wrote:
> It would be good to copy some background info from cover letter to the
> patch description so that we can get better understanding why this is
> needed now.
> 
> BTW, Lianbo is working on a documentation of the vmcoreinfo exported
> fields. Ccing him so that he is aware of this.
> 
> Also cc Boris,  although I do not want the doc changes blocks this
> he might have different opinion :)

Yeah, my initial reaction is that exporting an mm-internal flag to
userspace is a no-no.

What would be better, IMHO, is having a general method of telling the
kdump kernel - maybe ranges of physical addresses - which to skip.

Because the moment there's a set of pages which do not have PG_offline
set but kdump would still like to skip, this breaks.

But that's mm guys' call.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 11:10     ` Borislav Petkov
@ 2018-11-15 11:20       ` David Hildenbrand
  2018-11-15 11:52         ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-15 11:20 UTC (permalink / raw)
  To: Borislav Petkov, Dave Young
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Andrew Morton, Kirill A. Shutemov,
	Baoquan He, Omar Sandoval, Arnd Bergmann, Matthew Wilcox,
	Michal Hocko, Lianbo Jiang, Michael S. Tsirkin

On 15.11.18 12:10, Borislav Petkov wrote:
> On Thu, Nov 15, 2018 at 02:19:23PM +0800, Dave Young wrote:
>> It would be good to copy some background info from cover letter to the
>> patch description so that we can get better understanding why this is
>> needed now.
>>
>> BTW, Lianbo is working on a documentation of the vmcoreinfo exported
>> fields. Ccing him so that he is aware of this.
>>
>> Also cc Boris,  although I do not want the doc changes blocks this
>> he might have different opinion :)
> 
> Yeah, my initial reaction is that exporting an mm-internal flag to
> userspace is a no-no.

Sorry to say, but that is the current practice without which
makedumpfile would not be able to work at all. (exclude user pages,
exclude page cache, exclude buddy pages). Let's not reinvent the wheel
here. This is how dumping works forever.

Also see how hwpoisoned pages are handled.

(please note that most distributions only support dumping via
makedumpfile, for said reasons)

> 
> What would be better, IMHO, is having a general method of telling the
> kdump kernel - maybe ranges of physical addresses - which to skip.

And that has to be updated whenever we change a page flag. But then the
dump kernel would have to be aware about "struct page" location and
format of some old kernel to be dump. Let's just not even discuss going
down that path.

> 
> Because the moment there's a set of pages which do not have PG_offline
> set but kdump would still like to skip, this breaks.

I don't understand your concern. PG_offline is only an optimization for
sections that are online. Offline sections can already be completely
ignored when dumping.

I don't see how there should be "set of pages which do not have
PG_offline". I mean if they don't have PG_offline they will simply be
dumped like before. Which worked forever. Sorry, I don't get your point.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 11:20       ` David Hildenbrand
@ 2018-11-15 11:52         ` Borislav Petkov
  2018-11-15 12:01           ` David Hildenbrand
  2018-11-15 12:11           ` Michal Hocko
  0 siblings, 2 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-11-15 11:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dave Young, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Michal Hocko, Lianbo Jiang, Michael S. Tsirkin

On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
> Sorry to say, but that is the current practice without which
> makedumpfile would not be able to work at all. (exclude user pages,
> exclude page cache, exclude buddy pages). Let's not reinvent the wheel
> here. This is how dumping works forever.

Sorry, but "we've always done this in the past" doesn't make it better.

> I don't see how there should be "set of pages which do not have
> PG_offline".

It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
which the kdump kernel should completely skip because poking in it in
the kdump kernel, causes all kinds of havoc like machine checks. etc.
We've had and still have one issue like that.

But let me clarify my note: I don't want to be discussing with you the
design of makedumpfile and how it should or should not work - that ship
has already sailed. Apparently there are valid reasons to do it this
way.

I was *simply* stating that it feels wrong to export mm flags like that.

But as I said already, that is mm guys' call and looking at how we're
already exporting a bunch of stuff in the vmcoreinfo - including other
mm flags - I guess one more flag doesn't matter anymore.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 11:52         ` Borislav Petkov
@ 2018-11-15 12:01           ` David Hildenbrand
  2018-11-15 17:58             ` Borislav Petkov
  2018-11-15 12:11           ` Michal Hocko
  1 sibling, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2018-11-15 12:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Michal Hocko, Lianbo Jiang, Michael S. Tsirkin

On 15.11.18 12:52, Borislav Petkov wrote:
> On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
>> Sorry to say, but that is the current practice without which
>> makedumpfile would not be able to work at all. (exclude user pages,
>> exclude page cache, exclude buddy pages). Let's not reinvent the wheel
>> here. This is how dumping works forever.
> 
> Sorry, but "we've always done this in the past" doesn't make it better.

Just saying that "I'm not the first to do it, don't hit me with a stick" :)

> 
>> I don't see how there should be "set of pages which do not have
>> PG_offline".
> 
> It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
> which the kdump kernel should completely skip because poking in it in
> the kdump kernel, causes all kinds of havoc like machine checks. etc.
> We've had and still have one issue like that.

Indeed. And we still have without makedumpfile. I think you are aware of
this, but I'll explain it just for consistency: PG_hwpoison

At some point we detect a HW error and mask a page as PG_hwpoison.

makedumpfile knows how to treat that flag and can exclude it from the
dump (== not access it). No crash.

kdump itself has no clue about old "struct pages". Especially:
a) Where they are located in memory (e.g. SPARSE)
b) What their format is ("where are the flags")
c) What the meaning of flags is ("what does bit X mean")

In order to know such information, we would have to do parsing of quite
some information inside the kernel in kdump. Basically what makedumpfile
does just now. Is this feasible? I don't think so.

So we would need another approach to communicate such information as you
said. I can't think of any, but if anybody reading this has an idea,
please speak up. I am interested.


The *only* way right now we would have to handle such scenarios:

1. While dumping memory and we get a machine check, fake reading a zero
page instead of crashing.
2. While dumping memory and we get a fault, fake reading a zero page
instead of crashing.

> 
> But let me clarify my note: I don't want to be discussing with you the
> design of makedumpfile and how it should or should not work - that ship
> has already sailed. Apparently there are valid reasons to do it this
> way.

Indeed, and the basic design is to export these flags. (let's say
"unfortunately", being able to handle such stuff in kdump directly would
be the dream).

> I was *simply* stating that it feels wrong to export mm flags like that.
> 
> But as I said already, that is mm guys' call and looking at how we're
> already exporting a bunch of stuff in the vmcoreinfo - including other
> mm flags - I guess one more flag doesn't matter anymore.

Fair enough, noted. If you have an idea how to handle this in kdump,
please let me know.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 11:52         ` Borislav Petkov
  2018-11-15 12:01           ` David Hildenbrand
@ 2018-11-15 12:11           ` Michal Hocko
  2018-11-15 14:13             ` Borislav Petkov
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2018-11-15 12:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: David Hildenbrand, Dave Young, linux-mm, linux-kernel, linux-doc,
	devel, linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Lianbo Jiang, Michael S. Tsirkin

On Thu 15-11-18 12:52:13, Borislav Petkov wrote:
> On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
> > Sorry to say, but that is the current practice without which
> > makedumpfile would not be able to work at all. (exclude user pages,
> > exclude page cache, exclude buddy pages). Let's not reinvent the wheel
> > here. This is how dumping works forever.
> 
> Sorry, but "we've always done this in the past" doesn't make it better.
> 
> > I don't see how there should be "set of pages which do not have
> > PG_offline".
> 
> It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
> which the kdump kernel should completely skip because poking in it in
> the kdump kernel, causes all kinds of havoc like machine checks. etc.
> We've had and still have one issue like that.
> 
> But let me clarify my note: I don't want to be discussing with you the
> design of makedumpfile and how it should or should not work - that ship
> has already sailed. Apparently there are valid reasons to do it this
> way.
> 
> I was *simply* stating that it feels wrong to export mm flags like that.
> 
> But as I said already, that is mm guys' call and looking at how we're
> already exporting a bunch of stuff in the vmcoreinfo - including other
> mm flags - I guess one more flag doesn't matter anymore.

I am not familiar with kexec to judge this particular patch but we
cannot simply define any range for these pages (same as for hwpoison
ones) because they can be almost anywhere in the available memory range.
Then there can be countless of them. There is no other way to rule them
out but to check the page state.

I am not really sure what is the concern here exactly. Kdump is so
closly tight to the specific kernel version that the api exported
specifically for its purpose cannot be seriously considered an ABI.
Kdump has to adopt all the time.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline
  2018-11-15  9:21         ` David Hildenbrand
@ 2018-11-15 12:19           ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-11-15 12:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Matthew Wilcox, linux-mm, linux-kernel, linux-doc,
	devel, linux-fsdevel, linux-pm, xen-devel, Jonathan Corbet,
	Alexey Dobriyan, Mike Rapoport, Andrew Morton, Christian Hansen,
	Vlastimil Babka, Kirill A. Shutemov, Stephen Rothwell,
	Michael S. Tsirkin, Pavel Tatashin, Alexander Duyck,
	Naoya Horiguchi, Miles Chen, David Rientjes,
	Konstantin Khlebnikov

[Cc Konstantin - the patch is http://lkml.kernel.org/r/20181114211704.6381-3-david@redhat.com]

On Thu 15-11-18 10:21:13, David Hildenbrand wrote:
> On 15.11.18 03:07, Mike Rapoport wrote:
> > On Wed, Nov 14, 2018 at 11:49:15PM +0100, David Hildenbrand wrote:
> >> On 14.11.18 23:23, Matthew Wilcox wrote:
> >>> On Wed, Nov 14, 2018 at 10:17:00PM +0100, David Hildenbrand wrote:
> >>>> Rename PG_balloon to PG_offline. This is an indicator that the page is
> >>>> logically offline, the content stale and that it should not be touched
> >>>> (e.g. a hypervisor would have to allocate backing storage in order for the
> >>>> guest to dump an unused page).  We can then e.g. exclude such pages from
> >>>> dumps.
> >>>>
> >>>> In following patches, we will make use of this bit also in other balloon
> >>>> drivers.  While at it, document PGTABLE.
> >>>
> >>> Thank you for documenting PGTABLE.  I didn't realise I also had this
> >>> document to update when I added PGTABLE.
> >>
> >> Thank you for looking into this :)
> >>
> >>>
> >>>> +++ b/Documentation/admin-guide/mm/pagemap.rst
> >>>> @@ -78,6 +78,8 @@ number of times a page is mapped.
> >>>>      23. BALLOON
> >>>>      24. ZERO_PAGE
> >>>>      25. IDLE
> >>>> +    26. PGTABLE
> >>>> +    27. OFFLINE
> >>>
> >>> So the offline *user* bit is new ... even though the *kernel* bit
> >>> just renames the balloon bit.  I'm not sure how I feel about this.
> >>> I'm going to think about it some more.  Could you share your decision
> >>> process with us?
> >>
> >> BALLOON was/is documented as
> >>
> >> "23 - BALLOON
> >>     balloon compaction page
> >> "
> >>
> >> and only includes all virtio-ballon pages after the non-lru migration
> >> feature has been implemented for ballooned pages. Since then, this flag
> >> does basically no longer stands for what it actually was supposed to do.
> > 
> > Perhaps I missing something, but how the user should interpret "23" when he
> > reads /proc/kpageflags?
> 
> Looking at the history in more detail:
> 
> commit 09316c09dde33aae14f34489d9e3d243ec0d5938
> Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> Date:   Thu Oct 9 15:29:32 2014 -0700
> 
>     mm/balloon_compaction: add vmstat counters and kpageflags bit
> 
>     Always mark pages with PageBalloon even if balloon compaction is
> disabled
>     and expose this mark in /proc/kpageflags as KPF_BALLOON.
> 
> 
> So KPF_BALLOON was exposed when virtio-balloon pages were always marked
> with PG_balloon. So the documentation is actually wrong ("balloon page"
> vs. "balloon compaction page").
> 
> I have no idea who actually used that information. I suspect this was
> just some debugging aid.
> 
> > 
> >> To not break uapi I decided to not rename it but instead to add a new flag.
> > 
> > I've got a feeling that uapi was anyway changed for the BALLON flag
> > meaning.
> 
> Yes. If we *replace* KPF_BALLOON by KPF_OFFLINE
> 
> a) Some applications might no longer compile (I guess that's ok)
> b) Some old applications will treat KPF_OFFLINE like KPF_BALLOON (which
> should at least for virtio-balloon usage until now be fine - it is just
> more generic)

I do not think any compilation could break. If the semantic of the flag
is preserved then everything should work as expected.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages
  2018-11-14 21:17 ` [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages David Hildenbrand
  2018-11-15  7:48   ` Pavel Machek
@ 2018-11-15 12:23   ` Michal Hocko
  2018-11-15 12:29     ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2018-11-15 12:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Andrew Morton, Matthew Wilcox, Michael S. Tsirkin

On Wed 14-11-18 22:17:04, David Hildenbrand wrote:
[...]
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index b0308a2c6000..01db1d13481a 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1222,7 +1222,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
>  	BUG_ON(!PageHighMem(page));
>  
>  	if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
> -	    PageReserved(page))
> +	    PageReserved(page) || PageOffline(page))
>  		return NULL;
>  
>  	if (page_is_guard(page))
> @@ -1286,6 +1286,9 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
>  	if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
>  		return NULL;
>  
> +	if (PageOffline(page))
> +		return NULL;
> +
>  	if (PageReserved(page)
>  	    && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
>  		return NULL;

Btw. now that you are touching this file could you also make
s@pfn_to_page@pfn_to_online_page@ please? We really do not want to touch
offline pfn ranges in general. A separate patch for that of course.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages
  2018-11-15 12:23   ` Michal Hocko
@ 2018-11-15 12:29     ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-15 12:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, devel, linux-fsdevel,
	linux-pm, xen-devel, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Andrew Morton, Matthew Wilcox, Michael S. Tsirkin

On 15.11.18 13:23, Michal Hocko wrote:
> On Wed 14-11-18 22:17:04, David Hildenbrand wrote:
> [...]
>> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
>> index b0308a2c6000..01db1d13481a 100644
>> --- a/kernel/power/snapshot.c
>> +++ b/kernel/power/snapshot.c
>> @@ -1222,7 +1222,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
>>  	BUG_ON(!PageHighMem(page));
>>  
>>  	if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
>> -	    PageReserved(page))
>> +	    PageReserved(page) || PageOffline(page))
>>  		return NULL;
>>  
>>  	if (page_is_guard(page))
>> @@ -1286,6 +1286,9 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
>>  	if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
>>  		return NULL;
>>  
>> +	if (PageOffline(page))
>> +		return NULL;
>> +
>>  	if (PageReserved(page)
>>  	    && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
>>  		return NULL;
> 
> Btw. now that you are touching this file could you also make
> s@pfn_to_page@pfn_to_online_page@ please? We really do not want to touch
> offline pfn ranges in general. A separate patch for that of course.
> 
> Thanks!
> 

Sure thing, will look into that!

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 12:11           ` Michal Hocko
@ 2018-11-15 14:13             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-11-15 14:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, Dave Young, linux-mm, linux-kernel, linux-doc,
	devel, linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Lianbo Jiang, Michael S. Tsirkin

On Thu, Nov 15, 2018 at 01:11:02PM +0100, Michal Hocko wrote:
> I am not familiar with kexec to judge this particular patch but we
> cannot simply define any range for these pages (same as for hwpoison
> ones) because they can be almost anywhere in the available memory range.
> Then there can be countless of them. There is no other way to rule them
> out but to check the page state.

I guess, especially if it is a monster box with a lot of memory in it.

> I am not really sure what is the concern here exactly. Kdump is so
> closly tight to the specific kernel version that the api exported
> specifically for its purpose cannot be seriously considered an ABI.
> Kdump has to adopt all the time.

Right...

Except, when people start ogling vmcoreinfo for other things and start
exporting all kinds of kernel internals in there, my alarm bells start
ringing.

But ok, kdump *is* special and I guess that's fine.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO
  2018-11-15 12:01           ` David Hildenbrand
@ 2018-11-15 17:58             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-11-15 17:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dave Young, linux-mm, linux-kernel, linux-doc, devel,
	linux-fsdevel, linux-pm, xen-devel, Andrew Morton,
	Kirill A. Shutemov, Baoquan He, Omar Sandoval, Arnd Bergmann,
	Matthew Wilcox, Michal Hocko, Lianbo Jiang, Michael S. Tsirkin

On Thu, Nov 15, 2018 at 01:01:17PM +0100, David Hildenbrand wrote:
> Just saying that "I'm not the first to do it, don't hit me with a stick" :)

:-)

> Indeed. And we still have without makedumpfile. I think you are aware of
> this, but I'll explain it just for consistency: PG_hwpoison

No, I appreciate an explanation very much! So thanks for that. :)

> At some point we detect a HW error and mask a page as PG_hwpoison.
> 
> makedumpfile knows how to treat that flag and can exclude it from the
> dump (== not access it). No crash.
> 
> kdump itself has no clue about old "struct pages". Especially:
> a) Where they are located in memory (e.g. SPARSE)
> b) What their format is ("where are the flags")
> c) What the meaning of flags is ("what does bit X mean")
> 
> In order to know such information, we would have to do parsing of quite
> some information inside the kernel in kdump. Basically what makedumpfile
> does just now. Is this feasible? I don't think so.
> 
> So we would need another approach to communicate such information as you
> said. I can't think of any, but if anybody reading this has an idea,
> please speak up. I am interested.

Yeah but that ship has sailed. And even if we had a great idea, we'd
have to support kdump before and after the idea. And that would be a
serious mess.

And if you have a huge box with gazillion piles of memory and an alpha
particle passes through a bunch of them on its way down to the earth's
core, and while doing so, flips a bunch of bits, you need to go and
collect all those regions and update some list which you then need to
shove into the second kernel.

And you probably need to do all that through perhaps a piece of memory
which is used for communication between first and second kernel and that
list better fit in there, or you need to realloc. And that piece of
memory's layout needs to be properly defined so that the second kernel
can parse it correctly.

And so on...

> The *only* way right now we would have to handle such scenarios:
> 
> 1. While dumping memory and we get a machine check, fake reading a zero
> page instead of crashing.
> 2. While dumping memory and we get a fault, fake reading a zero page
> instead of crashing.

Yap.

> Indeed, and the basic design is to export these flags. (let's say
> "unfortunately", being able to handle such stuff in kdump directly would
> be the dream).

Well, AFAICT, the minimum work you need to always do before starting the
dumping is somehow generate that list of pages or ranges to not dump.
And that work needs to be done by the first or the second kernel, I'd
say.

If the first kernel would do it, then you'd have to probably have
callbacks to certain operations which go and add ranges or pages to
exclude, to a list which is then readily accessible to the second
kernel. Which means, when you reserve memory for the second kernel,
you'd have to reserve memory also for such a list.

But then what do you do when that memory gets filled up...?

So I guess exporting those things in vmcoreinfo is probably the only
thing we *can* do in the end.

Oh well, enough rambling... :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline
  2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
                   ` (6 preceding siblings ...)
  2018-11-14 22:57 ` [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline Nadav Amit
@ 2018-11-16 18:23 ` David Hildenbrand
  7 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2018-11-16 18:23 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, devel, linux-fsdevel, linux-pm,
	xen-devel, Alexander Duyck, Alexey Dobriyan, Andrew Morton,
	Arnd Bergmann, Baoquan He, Boris Ostrovsky, Christian Hansen,
	Dave Young, David Rientjes, Haiyang Zhang, Jonathan Corbet,
	Juergen Gross, Kairui Song, Kirill A. Shutemov, K. Y. Srinivasan,
	Len Brown, Matthew Wilcox, Michael S. Tsirkin, Michal Hocko,
	Mike Rapoport, Miles Chen, Naoya Horiguchi, Omar Sandoval,
	Pavel Machek, Pavel Tatashin, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Vitaly Kuznetsov, Vlastimil Babka

On 14.11.18 22:16, David Hildenbrand wrote:
> Right now, pages inflated as part of a balloon driver will be dumped
> by dump tools like makedumpfile. While XEN is able to check in the
> crash kernel whether a certain pfn is actuall backed by memory in the
> hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of
> virtio-balloon and hv-balloon inflated memory will essentially result in
> zero pages getting allocated by the hypervisor and the dump getting
> filled with this data.
> 
> The allocation and reading of zero pages can directly be avoided if a
> dumping tool could know which pages only contain stale information not to
> be dumped.
> 
> Also for XEN, calling into the kernel and asking the hypervisor if a
> pfn is backed can be avoided if the duming tool would skip such pages
> right from the beginning.
> 
> Dumping tools have no idea whether a given page is part of a balloon driver
> and shall not be dumped. Esp. PG_reserved cannot be used for that purpose
> as all memory allocated during early boot is also PG_reserved, see
> discussion at [1]. So some other way of indication is required and a new
> page flag is frowned upon.
> 
> We have PG_balloon (MAPCOUNT value), which is essentially unused now. I
> suggest renaming it to something more generic (PG_offline) to mark pages as
> logically offline. This flag can than e.g. also be used by virtio-mem in
> the future to mark subsections as offline. Or by other code that wants to
> put pages logically offline (e.g. later maybe poisoned pages that shall
> no longer be used).
> 
> This series converts PG_balloon to PG_offline, allows dumping tools to
> query the value to detect such pages and marks pages in the hv-balloon
> and XEN balloon properly as PG_offline. Note that virtio-balloon already
> set pages to PG_balloon (and now PG_offline).
> 
> Please note that this is also helpful for a problem we were seeing under
> Hyper-V: Dumping logically offline memory (pages kept fake offline while
> onlining a section via online_page_callback) would under some condicions
> result in a kernel panic when dumping them.
> 
> As I don't have access to neither XEN nor Hyper-V installation, this was
> not tested yet (and a makedumpfile change will be required to skip
> dumping these pages).
> 
> [1] https://lkml.org/lkml/2018/7/20/566
> 
> David Hildenbrand (6):
>   mm: balloon: update comment about isolation/migration/compaction
>   mm: convert PG_balloon to PG_offline
>   kexec: export PG_offline to VMCOREINFO
>   xen/balloon: mark inflated pages PG_offline
>   hv_balloon: mark inflated pages PG_offline
>   PM / Hibernate: exclude all PageOffline() pages
> 
>  Documentation/admin-guide/mm/pagemap.rst |  6 +++++
>  drivers/hv/hv_balloon.c                  | 14 ++++++++--
>  drivers/xen/balloon.c                    |  3 +++
>  fs/proc/page.c                           |  4 +--
>  include/linux/balloon_compaction.h       | 34 +++++++++---------------
>  include/linux/page-flags.h               | 11 +++++---
>  include/uapi/linux/kernel-page-flags.h   |  1 +
>  kernel/crash_core.c                      |  2 ++
>  kernel/power/snapshot.c                  |  5 +++-
>  tools/vm/page-types.c                    |  1 +
>  10 files changed, 51 insertions(+), 30 deletions(-)
> 

I just did a test with virtio-balloon (and a very simple makedumpfile
patch which I can supply on demand).

1. Guest with 8GB. Inflate balloon to 4GB via
 sudo virsh setmem f29 --size 4096M --live

2. Trigger a kernel panic in the guest
    echo 1 > /proc/sys/kernel/sysrq
    echo c > /proc/sysrq-trigger

Original pages  : 0x00000000001e1da8
  Excluded pages   : 0x00000000001c9221
    Pages filled with zero  : 0x00000000000050b0
    Non-private cache pages : 0x0000000000046547
    Private cache pages     : 0x0000000000002165
    User process data pages : 0x00000000000048cf
    Free pages              : 0x00000000000771f6
    Hwpoison pages          : 0x0000000000000000
    Offline pages           : 0x0000000000100000
  Remaining pages  : 0x0000000000018b87
  (The number of pages is reduced to 5%.)
Memory Hole     : 0x000000000009e258
--------------------------------------------------
Total pages     : 0x0000000000280000

(Offline patches matches the 4GB)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-11-16 18:23 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-14 21:16 [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline David Hildenbrand
2018-11-14 21:16 ` [PATCH RFC 1/6] mm: balloon: update comment about isolation/migration/compaction David Hildenbrand
2018-11-14 21:17 ` [PATCH RFC 2/6] mm: convert PG_balloon to PG_offline David Hildenbrand
2018-11-14 22:23   ` Matthew Wilcox
2018-11-14 22:49     ` David Hildenbrand
2018-11-15  2:07       ` Mike Rapoport
2018-11-15  9:21         ` David Hildenbrand
2018-11-15 12:19           ` Michal Hocko
2018-11-14 21:17 ` [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO David Hildenbrand
2018-11-15  6:19   ` Dave Young
2018-11-15  9:23     ` David Hildenbrand
2018-11-15 11:10     ` Borislav Petkov
2018-11-15 11:20       ` David Hildenbrand
2018-11-15 11:52         ` Borislav Petkov
2018-11-15 12:01           ` David Hildenbrand
2018-11-15 17:58             ` Borislav Petkov
2018-11-15 12:11           ` Michal Hocko
2018-11-15 14:13             ` Borislav Petkov
2018-11-14 21:17 ` [PATCH RFC 4/6] xen/balloon: mark inflated pages PG_offline David Hildenbrand
2018-11-14 21:17 ` [PATCH RFC 5/6] hv_balloon: " David Hildenbrand
2018-11-14 21:17 ` [PATCH RFC 6/6] PM / Hibernate: exclude all PageOffline() pages David Hildenbrand
2018-11-15  7:48   ` Pavel Machek
2018-11-15 12:23   ` Michal Hocko
2018-11-15 12:29     ` David Hildenbrand
2018-11-14 22:57 ` [PATCH RFC 0/6] mm/kdump: allow to exclude pages that are logically offline Nadav Amit
2018-11-14 23:05   ` David Hildenbrand
2018-11-14 23:41     ` Nadav Amit
2018-11-15  1:42       ` Julien Freche
2018-11-16 18:23 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).