All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
@ 2015-06-09 20:00 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

This patchset makes pagemap useable again in the safe way. It adds bit
'map-exlusive' which is set if page is mapped only here and restores
access for non-privileged users but hides pfn from them.

Last patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exlusive are available only
in the new format.

v3: check permissions in ->open

---

Konstantin Khlebnikov (4):
      pagemap: check permissions and capabilities at open time
      pagemap: add mmap-exclusive bit for marking pages mapped only here
      pagemap: hide physical addresses from non-privileged users
      pagemap: switch to the new format and do some cleanup


 Documentation/vm/pagemap.txt |    3 -
 fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
 tools/vm/page-types.c        |   35 +++----
 3 files changed, 118 insertions(+), 139 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
@ 2015-06-09 20:00 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Mark Williamson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

This patchset makes pagemap useable again in the safe way. It adds bit
'map-exlusive' which is set if page is mapped only here and restores
access for non-privileged users but hides pfn from them.

Last patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exlusive are available only
in the new format.

v3: check permissions in ->open

---

Konstantin Khlebnikov (4):
      pagemap: check permissions and capabilities at open time
      pagemap: add mmap-exclusive bit for marking pages mapped only here
      pagemap: hide physical addresses from non-privileged users
      pagemap: switch to the new format and do some cleanup


 Documentation/vm/pagemap.txt |    3 -
 fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
 tools/vm/page-types.c        |   35 +++----
 3 files changed, 118 insertions(+), 139 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
@ 2015-06-09 20:00 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

This patchset makes pagemap useable again in the safe way. It adds bit
'map-exlusive' which is set if page is mapped only here and restores
access for non-privileged users but hides pfn from them.

Last patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exlusive are available only
in the new format.

v3: check permissions in ->open

---

Konstantin Khlebnikov (4):
      pagemap: check permissions and capabilities at open time
      pagemap: add mmap-exclusive bit for marking pages mapped only here
      pagemap: hide physical addresses from non-privileged users
      pagemap: switch to the new format and do some cleanup


 Documentation/vm/pagemap.txt |    3 -
 fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
 tools/vm/page-types.c        |   35 +++----
 3 files changed, 118 insertions(+), 139 deletions(-)

--
Signature

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
  2015-06-09 20:00 ` Konstantin Khlebnikov
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch moves permission checks from pagemap_read() into pagemap_open().

Pointer to mm is saved in file->private_data. This reference pins only
mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
---
 fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6dee68d..21bc251 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct mm_struct *mm = file->private_data;
 	struct pagemapread pm;
-	int ret = -ESRCH;
 	struct mm_walk pagemap_walk = {};
 	unsigned long src;
 	unsigned long svpfn;
 	unsigned long start_vaddr;
 	unsigned long end_vaddr;
-	int copied = 0;
+	int ret = 0, copied = 0;
 
-	if (!task)
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		goto out;
 
 	ret = -EINVAL;
 	/* file position must be aligned */
 	if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
-		goto out_task;
+		goto out_mm;
 
 	ret = 0;
 	if (!count)
-		goto out_task;
+		goto out_mm;
 
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
 	if (!pm.buffer)
-		goto out_task;
-
-	mm = mm_access(task, PTRACE_MODE_READ);
-	ret = PTR_ERR(mm);
-	if (!mm || IS_ERR(mm))
-		goto out_free;
+		goto out_mm;
 
 	pagemap_walk.pmd_entry = pagemap_pte_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
@@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	src = *ppos;
 	svpfn = src / PM_ENTRY_BYTES;
 	start_vaddr = svpfn << PAGE_SHIFT;
-	end_vaddr = TASK_SIZE_OF(task);
+	end_vaddr = mm->task_size;
 
 	/* watch out for wraparound */
-	if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+	if (svpfn > mm->task_size >> PAGE_SHIFT)
 		start_vaddr = end_vaddr;
 
 	/*
@@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 		len = min(count, PM_ENTRY_BYTES * pm.pos);
 		if (copy_to_user(buf, pm.buffer, len)) {
 			ret = -EFAULT;
-			goto out_mm;
+			goto out_free;
 		}
 		copied += len;
 		buf += len;
@@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!ret || ret == PM_END_OF_BUFFER)
 		ret = copied;
 
-out_mm:
-	mmput(mm);
 out_free:
 	kfree(pm.buffer);
-out_task:
-	put_task_struct(task);
+out_mm:
+	mmput(mm);
 out:
 	return ret;
 }
 
 static int pagemap_open(struct inode *inode, struct file *file)
 {
+	struct mm_struct *mm;
+
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");
+
+	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+	file->private_data = mm;
+	return 0;
+}
+
+static int pagemap_release(struct inode *inode, struct file *file)
+{
+	struct mm_struct *mm = file->private_data;
+
+	if (mm)
+		mmdrop(mm);
 	return 0;
 }
 
@@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
 	.open		= pagemap_open,
+	.release	= pagemap_release,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch moves permission checks from pagemap_read() into pagemap_open().

Pointer to mm is saved in file->private_data. This reference pins only
mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
---
 fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6dee68d..21bc251 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct mm_struct *mm = file->private_data;
 	struct pagemapread pm;
-	int ret = -ESRCH;
 	struct mm_walk pagemap_walk = {};
 	unsigned long src;
 	unsigned long svpfn;
 	unsigned long start_vaddr;
 	unsigned long end_vaddr;
-	int copied = 0;
+	int ret = 0, copied = 0;
 
-	if (!task)
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		goto out;
 
 	ret = -EINVAL;
 	/* file position must be aligned */
 	if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
-		goto out_task;
+		goto out_mm;
 
 	ret = 0;
 	if (!count)
-		goto out_task;
+		goto out_mm;
 
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
 	if (!pm.buffer)
-		goto out_task;
-
-	mm = mm_access(task, PTRACE_MODE_READ);
-	ret = PTR_ERR(mm);
-	if (!mm || IS_ERR(mm))
-		goto out_free;
+		goto out_mm;
 
 	pagemap_walk.pmd_entry = pagemap_pte_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
@@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	src = *ppos;
 	svpfn = src / PM_ENTRY_BYTES;
 	start_vaddr = svpfn << PAGE_SHIFT;
-	end_vaddr = TASK_SIZE_OF(task);
+	end_vaddr = mm->task_size;
 
 	/* watch out for wraparound */
-	if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+	if (svpfn > mm->task_size >> PAGE_SHIFT)
 		start_vaddr = end_vaddr;
 
 	/*
@@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 		len = min(count, PM_ENTRY_BYTES * pm.pos);
 		if (copy_to_user(buf, pm.buffer, len)) {
 			ret = -EFAULT;
-			goto out_mm;
+			goto out_free;
 		}
 		copied += len;
 		buf += len;
@@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!ret || ret == PM_END_OF_BUFFER)
 		ret = copied;
 
-out_mm:
-	mmput(mm);
 out_free:
 	kfree(pm.buffer);
-out_task:
-	put_task_struct(task);
+out_mm:
+	mmput(mm);
 out:
 	return ret;
 }
 
 static int pagemap_open(struct inode *inode, struct file *file)
 {
+	struct mm_struct *mm;
+
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");
+
+	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+	file->private_data = mm;
+	return 0;
+}
+
+static int pagemap_release(struct inode *inode, struct file *file)
+{
+	struct mm_struct *mm = file->private_data;
+
+	if (mm)
+		mmdrop(mm);
 	return 0;
 }
 
@@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
 	.open		= pagemap_open,
+	.release	= pagemap_release,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-06-09 20:00 ` Konstantin Khlebnikov
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch sets bit 56 in pagemap if this page is mapped only once.
It allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0       0    0         non-present
1       1    0         file page mapped somewhere else
1       1    1         file page mapped only here
1       0    0         anon non-CoWed page (shared with parent/child)
1       0    1         anon CoWed page (or never forked)

CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.

Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch
that pte via /proc/pid/mem to get real information.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

---

v2:
* handle transparent huge pages
* invert bit and rename shared -> exclusive (less confusing name)
---
 Documentation/vm/pagemap.txt |    3 ++-
 fs/proc/task_mmu.c           |   10 ++++++++++
 tools/vm/page-types.c        |   12 ++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 6bfbc17..3cfbbb3 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,8 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bits 56-60 zero
+    * Bit  56    page exlusively mapped
+    * Bits 57-60 zero
     * Bit  61    page is file-page or shared-anon
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 21bc251..b02e38f 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -982,6 +982,7 @@ struct pagemapread {
 #define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
 
 #define __PM_SOFT_DIRTY      (1LL)
+#define __PM_MMAP_EXCLUSIVE  (2LL)
 #define PM_PRESENT          PM_STATUS(4LL)
 #define PM_SWAP             PM_STATUS(2LL)
 #define PM_FILE             PM_STATUS(1LL)
@@ -1074,6 +1075,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
+	if (page && page_mapcount(page) == 1)
+		flags2 |= __PM_MMAP_EXCLUSIVE;
 	if ((vma->vm_flags & VM_SOFTDIRTY))
 		flags2 |= __PM_SOFT_DIRTY;
 
@@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		else
 			pmd_flags2 = 0;
 
+		if (pmd_present(*pmd)) {
+			struct page *page = pmd_page(*pmd);
+
+			if (page_mapcount(page) == 1)
+				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+		}
+
 		for (; addr != end; addr += PAGE_SIZE) {
 			unsigned long offset;
 			pagemap_entry_t pme;
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 8bdf16b..3a9f193 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -70,9 +70,12 @@
 #define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
 
 #define __PM_SOFT_DIRTY      (1LL)
+#define __PM_MMAP_EXCLUSIVE  (2LL)
 #define PM_PRESENT          PM_STATUS(4LL)
 #define PM_SWAP             PM_STATUS(2LL)
+#define PM_FILE             PM_STATUS(1LL)
 #define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
+#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
 
 
 /*
@@ -100,6 +103,8 @@
 #define KPF_SLOB_FREE		49
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
+#define KPF_FILE		62
+#define KPF_MMAP_EXCLUSIVE	63
 
 #define KPF_ALL_BITS		((uint64_t)~0ULL)
 #define KPF_HACKERS_BITS	(0xffffULL << 32)
@@ -149,6 +154,9 @@ static const char * const page_flag_names[] = {
 	[KPF_SLOB_FREE]		= "P:slob_free",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
+
+	[KPF_FILE]		= "F:file",
+	[KPF_MMAP_EXCLUSIVE]	= "1:mmap_exclusive",
 };
 
 
@@ -452,6 +460,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
+	if (pme & PM_FILE)
+		flags |= BIT(FILE);
+	if (pme & PM_MMAP_EXCLUSIVE)
+		flags |= BIT(MMAP_EXCLUSIVE);
 
 	return flags;
 }


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch sets bit 56 in pagemap if this page is mapped only once.
It allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0       0    0         non-present
1       1    0         file page mapped somewhere else
1       1    1         file page mapped only here
1       0    0         anon non-CoWed page (shared with parent/child)
1       0    1         anon CoWed page (or never forked)

CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.

Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch
that pte via /proc/pid/mem to get real information.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

---

v2:
* handle transparent huge pages
* invert bit and rename shared -> exclusive (less confusing name)
---
 Documentation/vm/pagemap.txt |    3 ++-
 fs/proc/task_mmu.c           |   10 ++++++++++
 tools/vm/page-types.c        |   12 ++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 6bfbc17..3cfbbb3 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,8 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bits 56-60 zero
+    * Bit  56    page exlusively mapped
+    * Bits 57-60 zero
     * Bit  61    page is file-page or shared-anon
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 21bc251..b02e38f 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -982,6 +982,7 @@ struct pagemapread {
 #define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
 
 #define __PM_SOFT_DIRTY      (1LL)
+#define __PM_MMAP_EXCLUSIVE  (2LL)
 #define PM_PRESENT          PM_STATUS(4LL)
 #define PM_SWAP             PM_STATUS(2LL)
 #define PM_FILE             PM_STATUS(1LL)
@@ -1074,6 +1075,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
+	if (page && page_mapcount(page) == 1)
+		flags2 |= __PM_MMAP_EXCLUSIVE;
 	if ((vma->vm_flags & VM_SOFTDIRTY))
 		flags2 |= __PM_SOFT_DIRTY;
 
@@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		else
 			pmd_flags2 = 0;
 
+		if (pmd_present(*pmd)) {
+			struct page *page = pmd_page(*pmd);
+
+			if (page_mapcount(page) == 1)
+				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+		}
+
 		for (; addr != end; addr += PAGE_SIZE) {
 			unsigned long offset;
 			pagemap_entry_t pme;
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 8bdf16b..3a9f193 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -70,9 +70,12 @@
 #define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
 
 #define __PM_SOFT_DIRTY      (1LL)
+#define __PM_MMAP_EXCLUSIVE  (2LL)
 #define PM_PRESENT          PM_STATUS(4LL)
 #define PM_SWAP             PM_STATUS(2LL)
+#define PM_FILE             PM_STATUS(1LL)
 #define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
+#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
 
 
 /*
@@ -100,6 +103,8 @@
 #define KPF_SLOB_FREE		49
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
+#define KPF_FILE		62
+#define KPF_MMAP_EXCLUSIVE	63
 
 #define KPF_ALL_BITS		((uint64_t)~0ULL)
 #define KPF_HACKERS_BITS	(0xffffULL << 32)
@@ -149,6 +154,9 @@ static const char * const page_flag_names[] = {
 	[KPF_SLOB_FREE]		= "P:slob_free",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
+
+	[KPF_FILE]		= "F:file",
+	[KPF_MMAP_EXCLUSIVE]	= "1:mmap_exclusive",
 };
 
 
@@ -452,6 +460,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
+	if (pme & PM_FILE)
+		flags |= BIT(FILE);
+	if (pme & PM_MMAP_EXCLUSIVE)
+		flags |= BIT(MMAP_EXCLUSIVE);
 
 	return flags;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
  2015-06-09 20:00 ` Konstantin Khlebnikov
  (?)
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch makes pagemap readable for normal users back but hides physical
addresses from them. For some use cases PFN isn't required at all: flags
give information about presence, page type (anon/file/swap), soft-dirty mark,
and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

---

v3: get capabilities from file
---
 fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b02e38f..f1b9ae8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -962,6 +962,7 @@ struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
 	bool v2;
+	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
@@ -1046,12 +1047,13 @@ out:
 static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame, flags;
+	u64 frame = 0, flags;
 	struct page *page = NULL;
 	int flags2 = 0;
 
 	if (pte_present(pte)) {
-		frame = pte_pfn(pte);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte);
 		flags = PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
@@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		pmd_t pmd, int offset, int pmd_flags2)
 {
+	u64 frame = 0;
+
 	/*
 	 * Currently pmd for thp is always present because thp can not be
 	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
 	 * This if-check is just to prepare for future implementation.
 	 */
-	if (pmd_present(pmd))
-		*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
-				| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
-	else
+	if (pmd_present(pmd)) {
+		if (pm->show_pfn)
+			frame = pmd_pfn(pmd) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, pmd_flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
 }
 #else
@@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 					pte_t pte, int offset, int flags2)
 {
-	if (pte_present(pte))
-		*pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)	|
-				PM_STATUS2(pm->v2, flags2)		|
-				PM_PRESENT);
-	else
+	u64 frame = 0;
+
+	if (pte_present(pte)) {
+		if (pm->show_pfn)
+			frame = pte_pfn(pte) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
 				PM_STATUS2(pm->v2, flags2));
 }
@@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
@@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	/* do not disclose physical addresses: attack vector */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Mark Williamson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>

This patch makes pagemap readable for normal users back but hides physical
addresses from them. For some use cases PFN isn't required at all: flags
give information about presence, page type (anon/file/swap), soft-dirty mark,
and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).

Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org

---

v3: get capabilities from file
---
 fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b02e38f..f1b9ae8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -962,6 +962,7 @@ struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
 	bool v2;
+	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
@@ -1046,12 +1047,13 @@ out:
 static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame, flags;
+	u64 frame = 0, flags;
 	struct page *page = NULL;
 	int flags2 = 0;
 
 	if (pte_present(pte)) {
-		frame = pte_pfn(pte);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte);
 		flags = PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
@@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		pmd_t pmd, int offset, int pmd_flags2)
 {
+	u64 frame = 0;
+
 	/*
 	 * Currently pmd for thp is always present because thp can not be
 	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
 	 * This if-check is just to prepare for future implementation.
 	 */
-	if (pmd_present(pmd))
-		*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
-				| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
-	else
+	if (pmd_present(pmd)) {
+		if (pm->show_pfn)
+			frame = pmd_pfn(pmd) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, pmd_flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
 }
 #else
@@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 					pte_t pte, int offset, int flags2)
 {
-	if (pte_present(pte))
-		*pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)	|
-				PM_STATUS2(pm->v2, flags2)		|
-				PM_PRESENT);
-	else
+	u64 frame = 0;
+
+	if (pte_present(pte)) {
+		if (pm->show_pfn)
+			frame = pte_pfn(pte) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
 				PM_STATUS2(pm->v2, flags2));
 }
@@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
@@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	/* do not disclose physical addresses: attack vector */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch makes pagemap readable for normal users back but hides physical
addresses from them. For some use cases PFN isn't required at all: flags
give information about presence, page type (anon/file/swap), soft-dirty mark,
and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

---

v3: get capabilities from file
---
 fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b02e38f..f1b9ae8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -962,6 +962,7 @@ struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
 	bool v2;
+	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
@@ -1046,12 +1047,13 @@ out:
 static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame, flags;
+	u64 frame = 0, flags;
 	struct page *page = NULL;
 	int flags2 = 0;
 
 	if (pte_present(pte)) {
-		frame = pte_pfn(pte);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte);
 		flags = PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
@@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 		pmd_t pmd, int offset, int pmd_flags2)
 {
+	u64 frame = 0;
+
 	/*
 	 * Currently pmd for thp is always present because thp can not be
 	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
 	 * This if-check is just to prepare for future implementation.
 	 */
-	if (pmd_present(pmd))
-		*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
-				| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
-	else
+	if (pmd_present(pmd)) {
+		if (pm->show_pfn)
+			frame = pmd_pfn(pmd) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, pmd_flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
 }
 #else
@@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
 					pte_t pte, int offset, int flags2)
 {
-	if (pte_present(pte))
-		*pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)	|
-				PM_STATUS2(pm->v2, flags2)		|
-				PM_PRESENT);
-	else
+	u64 frame = 0;
+
+	if (pte_present(pte)) {
+		if (pm->show_pfn)
+			frame = pte_pfn(pte) + offset;
+		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
+				PM_STATUS2(pm->v2, flags2));
+	} else
 		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
 				PM_STATUS2(pm->v2, flags2));
 }
@@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
@@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	/* do not disclose physical addresses: attack vector */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup
  2015-06-09 20:00 ` Konstantin Khlebnikov
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
 tools/vm/page-types.c |   29 +++-------
 2 files changed, 58 insertions(+), 118 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1b9ae8..0e134bf 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,38 +937,26 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFEAME_BITS		54
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFEAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1044,50 +1008,44 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame = 0, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
-		flags2 |= __PM_MMAP_EXCLUSIVE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+		flags |= PM_MMAP_EXCLUSIVE;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
 	u64 frame = 0;
 
@@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
 	if (pmd_present(pmd)) {
 		if (pm->show_pfn)
 			frame = pmd_pfn(pmd) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, pmd_flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		if (pmd_present(*pmd)) {
 			struct page *page = pmd_page(*pmd);
 
 			if (page_mapcount(page) == 1)
-				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+				flags |= PM_MMAP_EXCLUSIVE;
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
 	u64 frame = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 
 	/* do not disclose physical addresses: attack vector */
 	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 3a9f193..1fa872e 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,26 +57,15 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFEAME_BITS		54
+#define PM_PFRAME_MASK		((1LL << PM_PFEAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup
@ 2015-06-09 20:00   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-09 20:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: linux-api, Mark Williamson, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
 tools/vm/page-types.c |   29 +++-------
 2 files changed, 58 insertions(+), 118 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1b9ae8..0e134bf 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,38 +937,26 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFEAME_BITS		54
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFEAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1044,50 +1008,44 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame = 0, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
-		flags2 |= __PM_MMAP_EXCLUSIVE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+		flags |= PM_MMAP_EXCLUSIVE;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
 	u64 frame = 0;
 
@@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
 	if (pmd_present(pmd)) {
 		if (pm->show_pfn)
 			frame = pmd_pfn(pmd) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, pmd_flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		if (pmd_present(*pmd)) {
 			struct page *page = pmd_page(*pmd);
 
 			if (page_mapcount(page) == 1)
-				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+				flags |= PM_MMAP_EXCLUSIVE;
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
 	u64 frame = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 
 	/* do not disclose physical addresses: attack vector */
 	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 3a9f193..1fa872e 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,26 +57,15 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFEAME_BITS		54
+#define PM_PFRAME_MASK		((1LL << PM_PFEAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
  2015-06-09 20:00   ` Konstantin Khlebnikov
@ 2015-06-12 18:44     ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch moves permission checks from pagemap_read() into pagemap_open().
>
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
> ---
>  fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 6dee68d..21bc251 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>  static ssize_t pagemap_read(struct file *file, char __user *buf,
>                             size_t count, loff_t *ppos)
>  {
> -       struct task_struct *task = get_proc_task(file_inode(file));
> -       struct mm_struct *mm;
> +       struct mm_struct *mm = file->private_data;
>         struct pagemapread pm;
> -       int ret = -ESRCH;
>         struct mm_walk pagemap_walk = {};
>         unsigned long src;
>         unsigned long svpfn;
>         unsigned long start_vaddr;
>         unsigned long end_vaddr;
> -       int copied = 0;
> +       int ret = 0, copied = 0;
>
> -       if (!task)
> +       if (!mm || !atomic_inc_not_zero(&mm->mm_users))
>                 goto out;
>
>         ret = -EINVAL;
>         /* file position must be aligned */
>         if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
> -               goto out_task;
> +               goto out_mm;
>
>         ret = 0;
>         if (!count)
> -               goto out_task;
> +               goto out_mm;
>
>         pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
>         if (!pm.buffer)
> -               goto out_task;
> -
> -       mm = mm_access(task, PTRACE_MODE_READ);
> -       ret = PTR_ERR(mm);
> -       if (!mm || IS_ERR(mm))
> -               goto out_free;
> +               goto out_mm;
>
>         pagemap_walk.pmd_entry = pagemap_pte_range;
>         pagemap_walk.pte_hole = pagemap_pte_hole;
> @@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         src = *ppos;
>         svpfn = src / PM_ENTRY_BYTES;
>         start_vaddr = svpfn << PAGE_SHIFT;
> -       end_vaddr = TASK_SIZE_OF(task);
> +       end_vaddr = mm->task_size;
>
>         /* watch out for wraparound */
> -       if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
> +       if (svpfn > mm->task_size >> PAGE_SHIFT)
>                 start_vaddr = end_vaddr;
>
>         /*
> @@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>                 len = min(count, PM_ENTRY_BYTES * pm.pos);
>                 if (copy_to_user(buf, pm.buffer, len)) {
>                         ret = -EFAULT;
> -                       goto out_mm;
> +                       goto out_free;
>                 }
>                 copied += len;
>                 buf += len;
> @@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         if (!ret || ret == PM_END_OF_BUFFER)
>                 ret = copied;
>
> -out_mm:
> -       mmput(mm);
>  out_free:
>         kfree(pm.buffer);
> -out_task:
> -       put_task_struct(task);
> +out_mm:
> +       mmput(mm);
>  out:
>         return ret;
>  }
>
>  static int pagemap_open(struct inode *inode, struct file *file)
>  {
> +       struct mm_struct *mm;
> +
>         /* do not disclose physical addresses: attack vector */
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>         pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
>                         "to stop being page-shift some time soon. See the "
>                         "linux/Documentation/vm/pagemap.txt for details.\n");
> +
> +       mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +       if (IS_ERR(mm))
> +               return PTR_ERR(mm);
> +       file->private_data = mm;
> +       return 0;
> +}
> +
> +static int pagemap_release(struct inode *inode, struct file *file)
> +{
> +       struct mm_struct *mm = file->private_data;
> +
> +       if (mm)
> +               mmdrop(mm);
>         return 0;
>  }
>
> @@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
>         .llseek         = mem_lseek, /* borrow this */
>         .read           = pagemap_read,
>         .open           = pagemap_open,
> +       .release        = pagemap_release,
>  };
>  #endif /* CONFIG_PROC_PAGE_MONITOR */
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
@ 2015-06-12 18:44     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch moves permission checks from pagemap_read() into pagemap_open().
>
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
> ---
>  fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 6dee68d..21bc251 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>  static ssize_t pagemap_read(struct file *file, char __user *buf,
>                             size_t count, loff_t *ppos)
>  {
> -       struct task_struct *task = get_proc_task(file_inode(file));
> -       struct mm_struct *mm;
> +       struct mm_struct *mm = file->private_data;
>         struct pagemapread pm;
> -       int ret = -ESRCH;
>         struct mm_walk pagemap_walk = {};
>         unsigned long src;
>         unsigned long svpfn;
>         unsigned long start_vaddr;
>         unsigned long end_vaddr;
> -       int copied = 0;
> +       int ret = 0, copied = 0;
>
> -       if (!task)
> +       if (!mm || !atomic_inc_not_zero(&mm->mm_users))
>                 goto out;
>
>         ret = -EINVAL;
>         /* file position must be aligned */
>         if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
> -               goto out_task;
> +               goto out_mm;
>
>         ret = 0;
>         if (!count)
> -               goto out_task;
> +               goto out_mm;
>
>         pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
>         if (!pm.buffer)
> -               goto out_task;
> -
> -       mm = mm_access(task, PTRACE_MODE_READ);
> -       ret = PTR_ERR(mm);
> -       if (!mm || IS_ERR(mm))
> -               goto out_free;
> +               goto out_mm;
>
>         pagemap_walk.pmd_entry = pagemap_pte_range;
>         pagemap_walk.pte_hole = pagemap_pte_hole;
> @@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         src = *ppos;
>         svpfn = src / PM_ENTRY_BYTES;
>         start_vaddr = svpfn << PAGE_SHIFT;
> -       end_vaddr = TASK_SIZE_OF(task);
> +       end_vaddr = mm->task_size;
>
>         /* watch out for wraparound */
> -       if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
> +       if (svpfn > mm->task_size >> PAGE_SHIFT)
>                 start_vaddr = end_vaddr;
>
>         /*
> @@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>                 len = min(count, PM_ENTRY_BYTES * pm.pos);
>                 if (copy_to_user(buf, pm.buffer, len)) {
>                         ret = -EFAULT;
> -                       goto out_mm;
> +                       goto out_free;
>                 }
>                 copied += len;
>                 buf += len;
> @@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         if (!ret || ret == PM_END_OF_BUFFER)
>                 ret = copied;
>
> -out_mm:
> -       mmput(mm);
>  out_free:
>         kfree(pm.buffer);
> -out_task:
> -       put_task_struct(task);
> +out_mm:
> +       mmput(mm);
>  out:
>         return ret;
>  }
>
>  static int pagemap_open(struct inode *inode, struct file *file)
>  {
> +       struct mm_struct *mm;
> +
>         /* do not disclose physical addresses: attack vector */
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>         pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
>                         "to stop being page-shift some time soon. See the "
>                         "linux/Documentation/vm/pagemap.txt for details.\n");
> +
> +       mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +       if (IS_ERR(mm))
> +               return PTR_ERR(mm);
> +       file->private_data = mm;
> +       return 0;
> +}
> +
> +static int pagemap_release(struct inode *inode, struct file *file)
> +{
> +       struct mm_struct *mm = file->private_data;
> +
> +       if (mm)
> +               mmdrop(mm);
>         return 0;
>  }
>
> @@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
>         .llseek         = mem_lseek, /* borrow this */
>         .read           = pagemap_read,
>         .open           = pagemap_open,
> +       .release        = pagemap_release,
>  };
>  #endif /* CONFIG_PROC_PAGE_MONITOR */
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-06-09 20:00   ` Konstantin Khlebnikov
@ 2015-06-12 18:46     ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:46 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
>
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
>
> CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.
>
> Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
>
> ---
>
> v2:
> * handle transparent huge pages
> * invert bit and rename shared -> exclusive (less confusing name)
> ---
>  Documentation/vm/pagemap.txt |    3 ++-
>  fs/proc/task_mmu.c           |   10 ++++++++++
>  tools/vm/page-types.c        |   12 ++++++++++++
>  3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index 6bfbc17..3cfbbb3 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -16,7 +16,8 @@ There are three components to pagemap:
>      * Bits 0-4   swap type if swapped
>      * Bits 5-54  swap offset if swapped
>      * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> -    * Bits 56-60 zero
> +    * Bit  56    page exlusively mapped
> +    * Bits 57-60 zero
>      * Bit  61    page is file-page or shared-anon
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 21bc251..b02e38f 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -982,6 +982,7 @@ struct pagemapread {
>  #define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
>
>  #define __PM_SOFT_DIRTY      (1LL)
> +#define __PM_MMAP_EXCLUSIVE  (2LL)
>  #define PM_PRESENT          PM_STATUS(4LL)
>  #define PM_SWAP             PM_STATUS(2LL)
>  #define PM_FILE             PM_STATUS(1LL)
> @@ -1074,6 +1075,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
> +       if (page && page_mapcount(page) == 1)
> +               flags2 |= __PM_MMAP_EXCLUSIVE;
>         if ((vma->vm_flags & VM_SOFTDIRTY))
>                 flags2 |= __PM_SOFT_DIRTY;
>
> @@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>                 else
>                         pmd_flags2 = 0;
>
> +               if (pmd_present(*pmd)) {
> +                       struct page *page = pmd_page(*pmd);
> +
> +                       if (page_mapcount(page) == 1)
> +                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +               }
> +
>                 for (; addr != end; addr += PAGE_SIZE) {
>                         unsigned long offset;
>                         pagemap_entry_t pme;
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 8bdf16b..3a9f193 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -70,9 +70,12 @@
>  #define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
>
>  #define __PM_SOFT_DIRTY      (1LL)
> +#define __PM_MMAP_EXCLUSIVE  (2LL)
>  #define PM_PRESENT          PM_STATUS(4LL)
>  #define PM_SWAP             PM_STATUS(2LL)
> +#define PM_FILE             PM_STATUS(1LL)
>  #define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> +#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
>
>
>  /*
> @@ -100,6 +103,8 @@
>  #define KPF_SLOB_FREE          49
>  #define KPF_SLUB_FROZEN                50
>  #define KPF_SLUB_DEBUG         51
> +#define KPF_FILE               62
> +#define KPF_MMAP_EXCLUSIVE     63
>
>  #define KPF_ALL_BITS           ((uint64_t)~0ULL)
>  #define KPF_HACKERS_BITS       (0xffffULL << 32)
> @@ -149,6 +154,9 @@ static const char * const page_flag_names[] = {
>         [KPF_SLOB_FREE]         = "P:slob_free",
>         [KPF_SLUB_FROZEN]       = "A:slub_frozen",
>         [KPF_SLUB_DEBUG]        = "E:slub_debug",
> +
> +       [KPF_FILE]              = "F:file",
> +       [KPF_MMAP_EXCLUSIVE]    = "1:mmap_exclusive",
>  };
>
>
> @@ -452,6 +460,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
>
>         if (pme & PM_SOFT_DIRTY)
>                 flags |= BIT(SOFTDIRTY);
> +       if (pme & PM_FILE)
> +               flags |= BIT(FILE);
> +       if (pme & PM_MMAP_EXCLUSIVE)
> +               flags |= BIT(MMAP_EXCLUSIVE);
>
>         return flags;
>  }
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-06-12 18:46     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:46 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
>
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
>
> CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.
>
> Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
>
> ---
>
> v2:
> * handle transparent huge pages
> * invert bit and rename shared -> exclusive (less confusing name)
> ---
>  Documentation/vm/pagemap.txt |    3 ++-
>  fs/proc/task_mmu.c           |   10 ++++++++++
>  tools/vm/page-types.c        |   12 ++++++++++++
>  3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index 6bfbc17..3cfbbb3 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -16,7 +16,8 @@ There are three components to pagemap:
>      * Bits 0-4   swap type if swapped
>      * Bits 5-54  swap offset if swapped
>      * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> -    * Bits 56-60 zero
> +    * Bit  56    page exlusively mapped
> +    * Bits 57-60 zero
>      * Bit  61    page is file-page or shared-anon
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 21bc251..b02e38f 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -982,6 +982,7 @@ struct pagemapread {
>  #define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
>
>  #define __PM_SOFT_DIRTY      (1LL)
> +#define __PM_MMAP_EXCLUSIVE  (2LL)
>  #define PM_PRESENT          PM_STATUS(4LL)
>  #define PM_SWAP             PM_STATUS(2LL)
>  #define PM_FILE             PM_STATUS(1LL)
> @@ -1074,6 +1075,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
> +       if (page && page_mapcount(page) == 1)
> +               flags2 |= __PM_MMAP_EXCLUSIVE;
>         if ((vma->vm_flags & VM_SOFTDIRTY))
>                 flags2 |= __PM_SOFT_DIRTY;
>
> @@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>                 else
>                         pmd_flags2 = 0;
>
> +               if (pmd_present(*pmd)) {
> +                       struct page *page = pmd_page(*pmd);
> +
> +                       if (page_mapcount(page) == 1)
> +                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +               }
> +
>                 for (; addr != end; addr += PAGE_SIZE) {
>                         unsigned long offset;
>                         pagemap_entry_t pme;
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 8bdf16b..3a9f193 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -70,9 +70,12 @@
>  #define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
>
>  #define __PM_SOFT_DIRTY      (1LL)
> +#define __PM_MMAP_EXCLUSIVE  (2LL)
>  #define PM_PRESENT          PM_STATUS(4LL)
>  #define PM_SWAP             PM_STATUS(2LL)
> +#define PM_FILE             PM_STATUS(1LL)
>  #define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> +#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
>
>
>  /*
> @@ -100,6 +103,8 @@
>  #define KPF_SLOB_FREE          49
>  #define KPF_SLUB_FROZEN                50
>  #define KPF_SLUB_DEBUG         51
> +#define KPF_FILE               62
> +#define KPF_MMAP_EXCLUSIVE     63
>
>  #define KPF_ALL_BITS           ((uint64_t)~0ULL)
>  #define KPF_HACKERS_BITS       (0xffffULL << 32)
> @@ -149,6 +154,9 @@ static const char * const page_flag_names[] = {
>         [KPF_SLOB_FREE]         = "P:slob_free",
>         [KPF_SLUB_FROZEN]       = "A:slub_frozen",
>         [KPF_SLUB_DEBUG]        = "E:slub_debug",
> +
> +       [KPF_FILE]              = "F:file",
> +       [KPF_MMAP_EXCLUSIVE]    = "1:mmap_exclusive",
>  };
>
>
> @@ -452,6 +460,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
>
>         if (pme & PM_SOFT_DIRTY)
>                 flags |= BIT(SOFTDIRTY);
> +       if (pme & PM_FILE)
> +               flags |= BIT(FILE);
> +       if (pme & PM_MMAP_EXCLUSIVE)
> +               flags |= BIT(MMAP_EXCLUSIVE);
>
>         return flags;
>  }
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
  2015-06-09 20:00   ` Konstantin Khlebnikov
  (?)
@ 2015-06-12 18:47     ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:47 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch makes pagemap readable for normal users back but hides physical
> addresses from them. For some use cases PFN isn't required at all: flags
> give information about presence, page type (anon/file/swap), soft-dirty mark,
> and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>
> ---
>
> v3: get capabilities from file
> ---
>  fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b02e38f..f1b9ae8 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -962,6 +962,7 @@ struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
>         bool v2;
> +       bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
> @@ -1046,12 +1047,13 @@ out:
>  static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame, flags;
> +       u64 frame = 0, flags;
>         struct page *page = NULL;
>         int flags2 = 0;
>
>         if (pte_present(pte)) {
> -               frame = pte_pfn(pte);
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte);
>                 flags = PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> @@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>  static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 pmd_t pmd, int offset, int pmd_flags2)
>  {
> +       u64 frame = 0;
> +
>         /*
>          * Currently pmd for thp is always present because thp can not be
>          * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
>          * This if-check is just to prepare for future implementation.
>          */
> -       if (pmd_present(pmd))
> -               *pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
> -                               | PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
> -       else
> +       if (pmd_present(pmd)) {
> +               if (pm->show_pfn)
> +                       frame = pmd_pfn(pmd) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, pmd_flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
>  }
>  #else
> @@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                                         pte_t pte, int offset, int flags2)
>  {
> -       if (pte_present(pte))
> -               *pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)        |
> -                               PM_STATUS2(pm->v2, flags2)              |
> -                               PM_PRESENT);
> -       else
> +       u64 frame = 0;
> +
> +       if (pte_present(pte)) {
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
>                                 PM_STATUS2(pm->v2, flags2));
>  }
> @@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         if (!count)
>                 goto out_mm;
>
> +       /* do not disclose physical addresses: attack vector */
> +       pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
>         pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
> @@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       /* do not disclose physical addresses: attack vector */
> -       if (!capable(CAP_SYS_ADMIN))
> -               return -EPERM;
>         pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
>                         "to stop being page-shift some time soon. See the "
>                         "linux/Documentation/vm/pagemap.txt for details.\n");
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
@ 2015-06-12 18:47     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:47 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi,
	Linux API, kernel list, Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org
Tested-by: mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> From: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>
> This patch makes pagemap readable for normal users back but hides physical
> addresses from them. For some use cases PFN isn't required at all: flags
> give information about presence, page type (anon/file/swap), soft-dirty mark,
> and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org
>
> ---
>
> v3: get capabilities from file
> ---
>  fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b02e38f..f1b9ae8 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -962,6 +962,7 @@ struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
>         bool v2;
> +       bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
> @@ -1046,12 +1047,13 @@ out:
>  static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame, flags;
> +       u64 frame = 0, flags;
>         struct page *page = NULL;
>         int flags2 = 0;
>
>         if (pte_present(pte)) {
> -               frame = pte_pfn(pte);
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte);
>                 flags = PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> @@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>  static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 pmd_t pmd, int offset, int pmd_flags2)
>  {
> +       u64 frame = 0;
> +
>         /*
>          * Currently pmd for thp is always present because thp can not be
>          * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
>          * This if-check is just to prepare for future implementation.
>          */
> -       if (pmd_present(pmd))
> -               *pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
> -                               | PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
> -       else
> +       if (pmd_present(pmd)) {
> +               if (pm->show_pfn)
> +                       frame = pmd_pfn(pmd) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, pmd_flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
>  }
>  #else
> @@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                                         pte_t pte, int offset, int flags2)
>  {
> -       if (pte_present(pte))
> -               *pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)        |
> -                               PM_STATUS2(pm->v2, flags2)              |
> -                               PM_PRESENT);
> -       else
> +       u64 frame = 0;
> +
> +       if (pte_present(pte)) {
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
>                                 PM_STATUS2(pm->v2, flags2));
>  }
> @@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         if (!count)
>                 goto out_mm;
>
> +       /* do not disclose physical addresses: attack vector */
> +       pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
>         pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
> @@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       /* do not disclose physical addresses: attack vector */
> -       if (!capable(CAP_SYS_ADMIN))
> -               return -EPERM;
>         pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
>                         "to stop being page-shift some time soon. See the "
>                         "linux/Documentation/vm/pagemap.txt for details.\n");
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users
@ 2015-06-12 18:47     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:47 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

This looks good from our side - thanks!

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch makes pagemap readable for normal users back but hides physical
> addresses from them. For some use cases PFN isn't required at all: flags
> give information about presence, page type (anon/file/swap), soft-dirty mark,
> and hint about page mapcount state: exclusive(mapcount = 1) or (mapcount > 1).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>
> ---
>
> v3: get capabilities from file
> ---
>  fs/proc/task_mmu.c |   36 ++++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b02e38f..f1b9ae8 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -962,6 +962,7 @@ struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
>         bool v2;
> +       bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
> @@ -1046,12 +1047,13 @@ out:
>  static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame, flags;
> +       u64 frame = 0, flags;
>         struct page *page = NULL;
>         int flags2 = 0;
>
>         if (pte_present(pte)) {
> -               frame = pte_pfn(pte);
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte);
>                 flags = PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> @@ -1087,15 +1089,19 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>  static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                 pmd_t pmd, int offset, int pmd_flags2)
>  {
> +       u64 frame = 0;
> +
>         /*
>          * Currently pmd for thp is always present because thp can not be
>          * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
>          * This if-check is just to prepare for future implementation.
>          */
> -       if (pmd_present(pmd))
> -               *pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
> -                               | PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
> -       else
> +       if (pmd_present(pmd)) {
> +               if (pm->show_pfn)
> +                       frame = pmd_pfn(pmd) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, pmd_flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
>  }
>  #else
> @@ -1171,11 +1177,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
>                                         pte_t pte, int offset, int flags2)
>  {
> -       if (pte_present(pte))
> -               *pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)        |
> -                               PM_STATUS2(pm->v2, flags2)              |
> -                               PM_PRESENT);
> -       else
> +       u64 frame = 0;
> +
> +       if (pte_present(pte)) {
> +               if (pm->show_pfn)
> +                       frame = pte_pfn(pte) + offset;
> +               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> +                               PM_STATUS2(pm->v2, flags2));
> +       } else
>                 *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
>                                 PM_STATUS2(pm->v2, flags2));
>  }
> @@ -1258,6 +1267,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         if (!count)
>                 goto out_mm;
>
> +       /* do not disclose physical addresses: attack vector */
> +       pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
>         pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
> @@ -1328,9 +1339,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       /* do not disclose physical addresses: attack vector */
> -       if (!capable(CAP_SYS_ADMIN))
> -               return -EPERM;
>         pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
>                         "to stop being page-shift some time soon. See the "
>                         "linux/Documentation/vm/pagemap.txt for details.\n");
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup
  2015-06-09 20:00   ` Konstantin Khlebnikov
  (?)
@ 2015-06-12 18:49     ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:49 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

One tiny nitpick / typo, inline below - functionally, this looks good
from our side...

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

<...snip...>

> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFEAME_BITS - 1, 0)

s/PM_FEAME_BITS/PM_FRAME_BITS/ I presume?

> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..1fa872e 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFEAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup
@ 2015-06-12 18:49     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:49 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi,
	Linux API, kernel list, Kirill A. Shutemov

One tiny nitpick / typo, inline below - functionally, this looks good
from our side...

Reviewed-by: mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org
Tested-by: mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> From: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>

<...snip...>

> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFEAME_BITS - 1, 0)

s/PM_FEAME_BITS/PM_FRAME_BITS/ I presume?

> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..1fa872e 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFEAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup
@ 2015-06-12 18:49     ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:49 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

One tiny nitpick / typo, inline below - functionally, this looks good
from our side...

Reviewed-by: mwilliamson@undo-software.com
Tested-by: mwilliamson@undo-software.com

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

<...snip...>

> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFEAME_BITS - 1, 0)

s/PM_FEAME_BITS/PM_FRAME_BITS/ I presume?

> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..1fa872e 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFEAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFEAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
  2015-06-09 20:00 ` Konstantin Khlebnikov
  (?)
@ 2015-06-12 18:59   ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

Hi Konstantin,

Thanks very much for your help on this.

>From our side, I've tested our application against a patched kernel
and I confirm that the functionality can replace what we lost when
PFNs were removed from /proc/PID/pagemap.  This addresses the
functionality regression from our PoV (just requires minor userspace
changes on our part, which is fine).

I also reviewed the patch content and everything seemed good to me.

We're keen to see these get into mainline, so let us know if there's
anything we can do to help.

Cheers,
Mark

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> This patchset makes pagemap useable again in the safe way. It adds bit
> 'map-exlusive' which is set if page is mapped only here and restores
> access for non-privileged users but hides pfn from them.
>
> Last patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> v3: check permissions in ->open
>
> ---
>
> Konstantin Khlebnikov (4):
>       pagemap: check permissions and capabilities at open time
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: switch to the new format and do some cleanup
>
>
>  Documentation/vm/pagemap.txt |    3 -
>  fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
>  tools/vm/page-types.c        |   35 +++----
>  3 files changed, 118 insertions(+), 139 deletions(-)
>
> --
> Signature

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
@ 2015-06-12 18:59   ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi,
	Linux API, kernel list, Kirill A. Shutemov

Hi Konstantin,

Thanks very much for your help on this.

>From our side, I've tested our application against a patched kernel
and I confirm that the functionality can replace what we lost when
PFNs were removed from /proc/PID/pagemap.  This addresses the
functionality regression from our PoV (just requires minor userspace
changes on our part, which is fine).

I also reviewed the patch content and everything seemed good to me.

We're keen to see these get into mainline, so let us know if there's
anything we can do to help.

Cheers,
Mark

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> This patchset makes pagemap useable again in the safe way. It adds bit
> 'map-exlusive' which is set if page is mapped only here and restores
> access for non-privileged users but hides pfn from them.
>
> Last patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> v3: check permissions in ->open
>
> ---
>
> Konstantin Khlebnikov (4):
>       pagemap: check permissions and capabilities at open time
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: switch to the new format and do some cleanup
>
>
>  Documentation/vm/pagemap.txt |    3 -
>  fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
>  tools/vm/page-types.c        |   35 +++----
>  3 files changed, 118 insertions(+), 139 deletions(-)
>
> --
> Signature

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHSET v3 0/4] pagemap: make useable for non-privilege users
@ 2015-06-12 18:59   ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-12 18:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

Hi Konstantin,

Thanks very much for your help on this.

>From our side, I've tested our application against a patched kernel
and I confirm that the functionality can replace what we lost when
PFNs were removed from /proc/PID/pagemap.  This addresses the
functionality regression from our PoV (just requires minor userspace
changes on our part, which is fine).

I also reviewed the patch content and everything seemed good to me.

We're keen to see these get into mainline, so let us know if there's
anything we can do to help.

Cheers,
Mark

On Tue, Jun 9, 2015 at 9:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> This patchset makes pagemap useable again in the safe way. It adds bit
> 'map-exlusive' which is set if page is mapped only here and restores
> access for non-privileged users but hides pfn from them.
>
> Last patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> v3: check permissions in ->open
>
> ---
>
> Konstantin Khlebnikov (4):
>       pagemap: check permissions and capabilities at open time
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: switch to the new format and do some cleanup
>
>
>  Documentation/vm/pagemap.txt |    3 -
>  fs/proc/task_mmu.c           |  219 +++++++++++++++++++-----------------------
>  tools/vm/page-types.c        |   35 +++----
>  3 files changed, 118 insertions(+), 139 deletions(-)
>
> --
> Signature

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4] pagemap: switch to the new format and do some cleanup
  2015-06-09 20:00   ` Konstantin Khlebnikov
@ 2015-06-15  5:56     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-15  5:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Mark Williamson, Naoya Horiguchi
  Cc: linux-api, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

---

v4: fix misprint PM_PFEAME_BITS -> PM_PFRAME_BITS
---
 fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
 tools/vm/page-types.c |   29 +++-------
 2 files changed, 58 insertions(+), 118 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1b9ae8..99fa2ae 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,38 +937,26 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFRAME_BITS		54
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1044,50 +1008,44 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame = 0, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
-		flags2 |= __PM_MMAP_EXCLUSIVE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+		flags |= PM_MMAP_EXCLUSIVE;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
 	u64 frame = 0;
 
@@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
 	if (pmd_present(pmd)) {
 		if (pm->show_pfn)
 			frame = pmd_pfn(pmd) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, pmd_flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		if (pmd_present(*pmd)) {
 			struct page *page = pmd_page(*pmd);
 
 			if (page_mapcount(page) == 1)
-				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+				flags |= PM_MMAP_EXCLUSIVE;
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
 	u64 frame = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 
 	/* do not disclose physical addresses: attack vector */
 	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 3a9f193..e1d5ff8 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,26 +57,15 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFRAME_BITS		54
+#define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-15  5:56     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-15  5:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Mark Williamson, Naoya Horiguchi
  Cc: linux-api, linux-kernel, Kirill A. Shutemov

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

---

v4: fix misprint PM_PFEAME_BITS -> PM_PFRAME_BITS
---
 fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
 tools/vm/page-types.c |   29 +++-------
 2 files changed, 58 insertions(+), 118 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1b9ae8..99fa2ae 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,38 +937,26 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFRAME_BITS		54
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1044,50 +1008,44 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame = 0, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
-		flags2 |= __PM_MMAP_EXCLUSIVE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+		flags |= PM_MMAP_EXCLUSIVE;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
 	u64 frame = 0;
 
@@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
 	if (pmd_present(pmd)) {
 		if (pm->show_pfn)
 			frame = pmd_pfn(pmd) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, pmd_flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		if (pmd_present(*pmd)) {
 			struct page *page = pmd_page(*pmd);
 
 			if (page_mapcount(page) == 1)
-				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
+				flags |= PM_MMAP_EXCLUSIVE;
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
 	u64 frame = 0;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) + offset;
-		*pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
-				PM_STATUS2(pm->v2, flags2));
-	} else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 
 	/* do not disclose physical addresses: attack vector */
 	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 3a9f193..e1d5ff8 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,26 +57,15 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define __PM_MMAP_EXCLUSIVE  (2LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFRAME_BITS		54
+#define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
  2015-06-15  5:56     ` Konstantin Khlebnikov
  (?)
@ 2015-06-15 14:57       ` Mark Williamson
  -1 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-15 14:57 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

Thanks!  No outstanding issues with the patchset, from our side.

Reviewed-by: mwilliamson@undo-software.com

On Mon, Jun 15, 2015 at 6:56 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> ---
>
> v4: fix misprint PM_PFEAME_BITS -> PM_PFRAME_BITS
> ---
>  fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
>  tools/vm/page-types.c |   29 +++-------
>  2 files changed, 58 insertions(+), 118 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f1b9ae8..99fa2ae 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
>         .release        = proc_map_release,
>  };
>
> -/*
> - * We do not want to have constant page-shift bits sitting in
> - * pagemap entries and are about to reuse them some time soon.
> - *
> - * Here's the "migration strategy":
> - * 1. when the system boots these bits remain what they are,
> - *    but a warning about future change is printed in log;
> - * 2. once anyone clears soft-dirty bits via clear_refs file,
> - *    these flag is set to denote, that user is aware of the
> - *    new API and those page-shift bits change their meaning.
> - *    The respective warning is printed in dmesg;
> - * 3. In a couple of releases we will remove all the mentions
> - *    of page-shift in pagemap entries.
> - */
> -
> -static bool soft_dirty_cleared __read_mostly;
> -
>  enum clear_refs_types {
>         CLEAR_REFS_ALL = 1,
>         CLEAR_REFS_ANON,
> @@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
>         if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
>                 return -EINVAL;
>
> -       if (type == CLEAR_REFS_SOFT_DIRTY) {
> -               soft_dirty_cleared = true;
> -               pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
> -                            " See the linux/Documentation/vm/pagemap.txt for "
> -                            "details.\n");
> -       }
> -
>         task = get_proc_task(file_inode(file));
>         if (!task)
>                 return -ESRCH;
> @@ -961,38 +937,26 @@ typedef struct {
>  struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
> -       bool v2;
>         bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
>  #define PAGEMAP_WALK_MASK      (PMD_MASK)
>
> -#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -/* in "new" pagemap pshift bits are occupied with more status bits */
> -#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..e1d5ff8 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFRAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-15 14:57       ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-15 14:57 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi,
	Linux API, kernel list, Kirill A. Shutemov

Thanks!  No outstanding issues with the patchset, from our side.

Reviewed-by: mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org

On Mon, Jun 15, 2015 at 6:56 AM, Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> From: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>
> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>
> ---
>
> v4: fix misprint PM_PFEAME_BITS -> PM_PFRAME_BITS
> ---
>  fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
>  tools/vm/page-types.c |   29 +++-------
>  2 files changed, 58 insertions(+), 118 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f1b9ae8..99fa2ae 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
>         .release        = proc_map_release,
>  };
>
> -/*
> - * We do not want to have constant page-shift bits sitting in
> - * pagemap entries and are about to reuse them some time soon.
> - *
> - * Here's the "migration strategy":
> - * 1. when the system boots these bits remain what they are,
> - *    but a warning about future change is printed in log;
> - * 2. once anyone clears soft-dirty bits via clear_refs file,
> - *    these flag is set to denote, that user is aware of the
> - *    new API and those page-shift bits change their meaning.
> - *    The respective warning is printed in dmesg;
> - * 3. In a couple of releases we will remove all the mentions
> - *    of page-shift in pagemap entries.
> - */
> -
> -static bool soft_dirty_cleared __read_mostly;
> -
>  enum clear_refs_types {
>         CLEAR_REFS_ALL = 1,
>         CLEAR_REFS_ANON,
> @@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
>         if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
>                 return -EINVAL;
>
> -       if (type == CLEAR_REFS_SOFT_DIRTY) {
> -               soft_dirty_cleared = true;
> -               pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
> -                            " See the linux/Documentation/vm/pagemap.txt for "
> -                            "details.\n");
> -       }
> -
>         task = get_proc_task(file_inode(file));
>         if (!task)
>                 return -ESRCH;
> @@ -961,38 +937,26 @@ typedef struct {
>  struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
> -       bool v2;
>         bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
>  #define PAGEMAP_WALK_MASK      (PMD_MASK)
>
> -#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -/* in "new" pagemap pshift bits are occupied with more status bits */
> -#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..e1d5ff8 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFRAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-15 14:57       ` Mark Williamson
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Williamson @ 2015-06-15 14:57 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Linux API, kernel list,
	Kirill A. Shutemov

Thanks!  No outstanding issues with the patchset, from our side.

Reviewed-by: mwilliamson@undo-software.com

On Mon, Jun 15, 2015 at 6:56 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> ---
>
> v4: fix misprint PM_PFEAME_BITS -> PM_PFRAME_BITS
> ---
>  fs/proc/task_mmu.c    |  147 ++++++++++++++++---------------------------------
>  tools/vm/page-types.c |   29 +++-------
>  2 files changed, 58 insertions(+), 118 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f1b9ae8..99fa2ae 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
>         .release        = proc_map_release,
>  };
>
> -/*
> - * We do not want to have constant page-shift bits sitting in
> - * pagemap entries and are about to reuse them some time soon.
> - *
> - * Here's the "migration strategy":
> - * 1. when the system boots these bits remain what they are,
> - *    but a warning about future change is printed in log;
> - * 2. once anyone clears soft-dirty bits via clear_refs file,
> - *    these flag is set to denote, that user is aware of the
> - *    new API and those page-shift bits change their meaning.
> - *    The respective warning is printed in dmesg;
> - * 3. In a couple of releases we will remove all the mentions
> - *    of page-shift in pagemap entries.
> - */
> -
> -static bool soft_dirty_cleared __read_mostly;
> -
>  enum clear_refs_types {
>         CLEAR_REFS_ALL = 1,
>         CLEAR_REFS_ANON,
> @@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
>         if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
>                 return -EINVAL;
>
> -       if (type == CLEAR_REFS_SOFT_DIRTY) {
> -               soft_dirty_cleared = true;
> -               pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
> -                            " See the linux/Documentation/vm/pagemap.txt for "
> -                            "details.\n");
> -       }
> -
>         task = get_proc_task(file_inode(file));
>         if (!task)
>                 return -ESRCH;
> @@ -961,38 +937,26 @@ typedef struct {
>  struct pagemapread {
>         int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>         pagemap_entry_t *buffer;
> -       bool v2;
>         bool show_pfn;
>  };
>
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
>  #define PAGEMAP_WALK_MASK      (PMD_MASK)
>
> -#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -/* in "new" pagemap pshift bits are occupied with more status bits */
> -#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
> +#define PM_ENTRY_BYTES         sizeof(pagemap_entry_t)
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
> +#define PM_SOFT_DIRTY          BIT_ULL(55)
> +#define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> +#define PM_FILE                        BIT_ULL(61)
> +#define PM_SWAP                        BIT_ULL(62)
> +#define PM_PRESENT             BIT_ULL(63)
> +
>  #define PM_END_OF_BUFFER    1
>
> -static inline pagemap_entry_t make_pme(u64 val)
> +static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
>  {
> -       return (pagemap_entry_t) { .pme = val };
> +       return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
>  }
>
>  static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> @@ -1013,7 +977,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>         while (addr < end) {
>                 struct vm_area_struct *vma = find_vma(walk->mm, addr);
> -               pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
> +               pagemap_entry_t pme = make_pme(0, 0);
>                 /* End of address space hole, which we mark as non-present. */
>                 unsigned long hole_end;
>
> @@ -1033,7 +997,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
>
>                 /* Addresses in the VMA. */
>                 if (vma->vm_flags & VM_SOFTDIRTY)
> -                       pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
> +                       pme = make_pme(0, PM_SOFT_DIRTY);
>                 for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
> @@ -1044,50 +1008,44 @@ out:
>         return err;
>  }
>
> -static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> +static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                 struct vm_area_struct *vma, unsigned long addr, pte_t pte)
>  {
> -       u64 frame = 0, flags;
> +       u64 frame = 0, flags = 0;
>         struct page *page = NULL;
> -       int flags2 = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte);
> -               flags = PM_PRESENT;
> +               flags |= PM_PRESENT;
>                 page = vm_normal_page(vma, addr, pte);
>                 if (pte_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>         } else if (is_swap_pte(pte)) {
>                 swp_entry_t entry;
>                 if (pte_swp_soft_dirty(pte))
> -                       flags2 |= __PM_SOFT_DIRTY;
> +                       flags |= PM_SOFT_DIRTY;
>                 entry = pte_to_swp_entry(pte);
>                 frame = swp_type(entry) |
>                         (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
> -               flags = PM_SWAP;
> +               flags |= PM_SWAP;
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
> -       } else {
> -               if (vma->vm_flags & VM_SOFTDIRTY)
> -                       flags2 |= __PM_SOFT_DIRTY;
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
> -               return;
>         }
>
>         if (page && !PageAnon(page))
>                 flags |= PM_FILE;
>         if (page && page_mapcount(page) == 1)
> -               flags2 |= __PM_MMAP_EXCLUSIVE;
> -       if ((vma->vm_flags & VM_SOFTDIRTY))
> -               flags2 |= __PM_SOFT_DIRTY;
> +               flags |= PM_MMAP_EXCLUSIVE;
> +       if (vma->vm_flags & VM_SOFTDIRTY)
> +               flags |= PM_SOFT_DIRTY;
>
> -       *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
> +       return make_pme(frame, flags);
>  }
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
> @@ -1099,15 +1057,16 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
>         if (pmd_present(pmd)) {
>                 if (pm->show_pfn)
>                         frame = pmd_pfn(pmd) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, pmd_flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>  #else
> -static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -               pmd_t pmd, int offset, int pmd_flags2)
> +static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
> +               pmd_t pmd, int offset, u64 flags)
>  {
> +       return make_pme(0, 0);
>  }
>  #endif
>
> @@ -1121,18 +1080,16 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         int err = 0;
>
>         if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -               int pmd_flags2;
> +               u64 flags = 0;
>
>                 if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
> -                       pmd_flags2 = __PM_SOFT_DIRTY;
> -               else
> -                       pmd_flags2 = 0;
> +                       flags |= PM_SOFT_DIRTY;
>
>                 if (pmd_present(*pmd)) {
>                         struct page *page = pmd_page(*pmd);
>
>                         if (page_mapcount(page) == 1)
> -                               pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +                               flags |= PM_MMAP_EXCLUSIVE;
>                 }
>
>                 for (; addr != end; addr += PAGE_SIZE) {
> @@ -1141,7 +1098,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>                         offset = (addr & ~PAGEMAP_WALK_MASK) >>
>                                         PAGE_SHIFT;
> -                       thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
> +                       pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
>                         err = add_to_pagemap(addr, &pme, pm);
>                         if (err)
>                                 break;
> @@ -1161,7 +1118,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>         for (; addr < end; pte++, addr += PAGE_SIZE) {
>                 pagemap_entry_t pme;
>
> -               pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> +               pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         break;
> @@ -1174,19 +1131,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  }
>
>  #ifdef CONFIG_HUGETLB_PAGE
> -static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
> -                                       pte_t pte, int offset, int flags2)
> +static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
> +                                       pte_t pte, int offset, u64 flags)
>  {
>         u64 frame = 0;
>
>         if (pte_present(pte)) {
>                 if (pm->show_pfn)
>                         frame = pte_pfn(pte) + offset;
> -               *pme = make_pme(PM_PFRAME(frame) | PM_PRESENT |
> -                               PM_STATUS2(pm->v2, flags2));
> -       } else
> -               *pme = make_pme(PM_NOT_PRESENT(pm->v2)                  |
> -                               PM_STATUS2(pm->v2, flags2));
> +               flags |= PM_PRESENT;
> +       }
> +
> +       return make_pme(frame, flags);
>  }
>
>  /* This function walks within one hugetlb entry in the single call */
> @@ -1197,17 +1153,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>         struct pagemapread *pm = walk->private;
>         struct vm_area_struct *vma = walk->vma;
>         int err = 0;
> -       int flags2;
> +       u64 flags = 0;
>         pagemap_entry_t pme;
>
>         if (vma->vm_flags & VM_SOFTDIRTY)
> -               flags2 = __PM_SOFT_DIRTY;
> -       else
> -               flags2 = 0;
> +               flags |= PM_SOFT_DIRTY;
>
>         for (; addr != end; addr += PAGE_SIZE) {
>                 int offset = (addr & ~hmask) >> PAGE_SHIFT;
> -               huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
> +               pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
>                 err = add_to_pagemap(addr, &pme, pm);
>                 if (err)
>                         return err;
> @@ -1228,7 +1182,9 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
>   * Bits 0-54  page frame number (PFN) if present
>   * Bits 0-4   swap type if swapped
>   * Bits 5-54  swap offset if swapped
> - * Bits 55-60 page shift (page size = 1<<page shift)
> + * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> + * Bit  56    page exclusively mapped
> + * Bits 57-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> @@ -1269,7 +1225,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>
>         /* do not disclose physical addresses: attack vector */
>         pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
> -       pm.v2 = soft_dirty_cleared;
>         pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
>         pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
>         ret = -ENOMEM;
> @@ -1339,10 +1294,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
>  {
>         struct mm_struct *mm;
>
> -       pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> -                       "to stop being page-shift some time soon. See the "
> -                       "linux/Documentation/vm/pagemap.txt for details.\n");
> -
>         mm = proc_mem_open(inode, PTRACE_MODE_READ);
>         if (IS_ERR(mm))
>                 return PTR_ERR(mm);
> diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
> index 3a9f193..e1d5ff8 100644
> --- a/tools/vm/page-types.c
> +++ b/tools/vm/page-types.c
> @@ -57,26 +57,15 @@
>   * pagemap kernel ABI bits
>   */
>
> -#define PM_ENTRY_BYTES      sizeof(uint64_t)
> -#define PM_STATUS_BITS      3
> -#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
> -#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
> -#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
> -#define PM_PSHIFT_BITS      6
> -#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
> -#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
> -#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
> -#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
> -#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
> -
> -#define __PM_SOFT_DIRTY      (1LL)
> -#define __PM_MMAP_EXCLUSIVE  (2LL)
> -#define PM_PRESENT          PM_STATUS(4LL)
> -#define PM_SWAP             PM_STATUS(2LL)
> -#define PM_FILE             PM_STATUS(1LL)
> -#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
> -#define PM_MMAP_EXCLUSIVE   __PM_PSHIFT(__PM_MMAP_EXCLUSIVE)
> -
> +#define PM_ENTRY_BYTES         8
> +#define PM_PFRAME_BITS         54
> +#define PM_PFRAME_MASK         ((1LL << PM_PFRAME_BITS) - 1)
> +#define PM_PFRAME(x)           ((x) & PM_PFRAME_MASK)
> +#define PM_SOFT_DIRTY          (1ULL << 55)
> +#define PM_MMAP_EXCLUSIVE      (1ULL << 56)
> +#define PM_FILE                        (1ULL << 61)
> +#define PM_SWAP                        (1ULL << 62)
> +#define PM_PRESENT             (1ULL << 63)
>
>  /*
>   * kernel page flags
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
  2015-06-15  5:56     ` Konstantin Khlebnikov
  (?)
@ 2015-06-16 21:29       ` Andrew Morton
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2015-06-16 21:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, linux-api,
	linux-kernel, Kirill A. Shutemov

On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:

> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.

hm, I can't find any kernel version to which this patch comes close to
applying.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-16 21:29       ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2015-06-16 21:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mark Williamson,
	Naoya Horiguchi, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.

hm, I can't find any kernel version to which this patch comes close to
applying.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-16 21:29       ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2015-06-16 21:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, linux-api,
	linux-kernel, Kirill A. Shutemov

On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:

> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.

hm, I can't find any kernel version to which this patch comes close to
applying.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-17  4:59         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-17  4:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, Linux API,
	Linux Kernel Mailing List, Kirill A. Shutemov

On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>
>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>> completes migration to the new bit layout. Also it cleans messy macro.
>
> hm, I can't find any kernel version to which this patch comes close to
> applying.

This patchset applies to  4.1-rc8 and current mmotm without problems.
I guess you've tried pick this patch alone without previous changes.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-17  4:59         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-17  4:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mark Williamson,
	Naoya Horiguchi, Linux API, Linux Kernel Mailing List,
	Kirill A. Shutemov

On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>> completes migration to the new bit layout. Also it cleans messy macro.
>
> hm, I can't find any kernel version to which this patch comes close to
> applying.

This patchset applies to  4.1-rc8 and current mmotm without problems.
I guess you've tried pick this patch alone without previous changes.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-17  4:59         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-17  4:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, Linux API,
	Linux Kernel Mailing List, Kirill A. Shutemov

On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>
>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>> completes migration to the new bit layout. Also it cleans messy macro.
>
> hm, I can't find any kernel version to which this patch comes close to
> applying.

This patchset applies to  4.1-rc8 and current mmotm without problems.
I guess you've tried pick this patch alone without previous changes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
  2015-06-17  4:59         ` Konstantin Khlebnikov
@ 2015-06-17  6:40           ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-17  6:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, Linux API,
	Linux Kernel Mailing List, Kirill A. Shutemov

On Wed, Jun 17, 2015 at 7:59 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>
>>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>>> completes migration to the new bit layout. Also it cleans messy macro.
>>
>> hm, I can't find any kernel version to which this patch comes close to
>> applying.
>
> This patchset applies to  4.1-rc8 and current mmotm without problems.
> I guess you've tried pick this patch alone without previous changes.

My bad. I've sent single v4 patch as a reply to v3 patch and forget
'4/4' in subject.
That's fourth patch in patchset.

Here is v3 patchset cover letter: https://lkml.org/lkml/2015/6/9/804
"[PATCHSET v3 0/4] pagemap: make useable for non-privilege users"

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] pagemap: switch to the new format and do some cleanup
@ 2015-06-17  6:40           ` Konstantin Khlebnikov
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Khlebnikov @ 2015-06-17  6:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mark Williamson, Naoya Horiguchi, Linux API,
	Linux Kernel Mailing List, Kirill A. Shutemov

On Wed, Jun 17, 2015 at 7:59 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>
>>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>>> completes migration to the new bit layout. Also it cleans messy macro.
>>
>> hm, I can't find any kernel version to which this patch comes close to
>> applying.
>
> This patchset applies to  4.1-rc8 and current mmotm without problems.
> I guess you've tried pick this patch alone without previous changes.

My bad. I've sent single v4 patch as a reply to v3 patch and forget
'4/4' in subject.
That's fourth patch in patchset.

Here is v3 patchset cover letter: https://lkml.org/lkml/2015/6/9/804
"[PATCHSET v3 0/4] pagemap: make useable for non-privilege users"

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
  2015-06-09 20:00   ` Konstantin Khlebnikov
@ 2015-06-17  7:58     ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2015-06-17  7:58 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-api, Mark Williamson,
	linux-kernel, Kirill A. Shutemov

On Tue, Jun 09, 2015 at 11:00:15PM +0300, Konstantin Khlebnikov wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> This patch moves permission checks from pagemap_read() into pagemap_open().
> 
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
                                                           
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>   

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] pagemap: check permissions and capabilities at open time
@ 2015-06-17  7:58     ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2015-06-17  7:58 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-api, Mark Williamson,
	linux-kernel, Kirill A. Shutemov

On Tue, Jun 09, 2015 at 11:00:15PM +0300, Konstantin Khlebnikov wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> This patch moves permission checks from pagemap_read() into pagemap_open().
> 
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
                                                           
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>   
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-06-09 20:00   ` Konstantin Khlebnikov
@ 2015-06-17  8:11     ` Naoya Horiguchi
  -1 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2015-06-17  8:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-api, Mark Williamson,
	linux-kernel, Kirill A. Shutemov

On Tue, Jun 09, 2015 at 11:00:17PM +0300, Konstantin Khlebnikov wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
> 
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
> 
> CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.
> 
> Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
> 
> ---
> 
> v2:
> * handle transparent huge pages
> * invert bit and rename shared -> exclusive (less confusing name)
> ---
...

> @@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  		else
>  			pmd_flags2 = 0;
>  
> +		if (pmd_present(*pmd)) {
> +			struct page *page = pmd_page(*pmd);
> +
> +			if (page_mapcount(page) == 1)
> +				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +		}
> +

Could you do the same thing for huge_pte_to_pagemap_entry(), too? 
                                                                  
Thanks,                                                           
Naoya Horiguchi                                                   

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-06-17  8:11     ` Naoya Horiguchi
  0 siblings, 0 replies; 42+ messages in thread
From: Naoya Horiguchi @ 2015-06-17  8:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-api, Mark Williamson,
	linux-kernel, Kirill A. Shutemov

On Tue, Jun 09, 2015 at 11:00:17PM +0300, Konstantin Khlebnikov wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
> 
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
> 
> CoWed pages in MAP_FILE|MAP_PRIVATE areas are anon in this context.
> 
> Mmap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
> 
> ---
> 
> v2:
> * handle transparent huge pages
> * invert bit and rename shared -> exclusive (less confusing name)
> ---
...

> @@ -1119,6 +1122,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  		else
>  			pmd_flags2 = 0;
>  
> +		if (pmd_present(*pmd)) {
> +			struct page *page = pmd_page(*pmd);
> +
> +			if (page_mapcount(page) == 1)
> +				pmd_flags2 |= __PM_MMAP_EXCLUSIVE;
> +		}
> +

Could you do the same thing for huge_pte_to_pagemap_entry(), too? 
                                                                  
Thanks,                                                           
Naoya Horiguchi                                                   
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-06-17  8:13 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 20:00 [PATCHSET v3 0/4] pagemap: make useable for non-privilege users Konstantin Khlebnikov
2015-06-09 20:00 ` Konstantin Khlebnikov
2015-06-09 20:00 ` Konstantin Khlebnikov
2015-06-09 20:00 ` [PATCH v3 1/4] pagemap: check permissions and capabilities at open time Konstantin Khlebnikov
2015-06-09 20:00   ` Konstantin Khlebnikov
2015-06-12 18:44   ` Mark Williamson
2015-06-12 18:44     ` Mark Williamson
2015-06-17  7:58   ` Naoya Horiguchi
2015-06-17  7:58     ` Naoya Horiguchi
2015-06-09 20:00 ` [PATCH v3 2/4] pagemap: add mmap-exclusive bit for marking pages mapped only here Konstantin Khlebnikov
2015-06-09 20:00   ` Konstantin Khlebnikov
2015-06-12 18:46   ` Mark Williamson
2015-06-12 18:46     ` Mark Williamson
2015-06-17  8:11   ` Naoya Horiguchi
2015-06-17  8:11     ` Naoya Horiguchi
2015-06-09 20:00 ` [PATCH v3 3/4] pagemap: hide physical addresses from non-privileged users Konstantin Khlebnikov
2015-06-09 20:00   ` Konstantin Khlebnikov
2015-06-09 20:00   ` Konstantin Khlebnikov
2015-06-12 18:47   ` Mark Williamson
2015-06-12 18:47     ` Mark Williamson
2015-06-12 18:47     ` Mark Williamson
2015-06-09 20:00 ` [PATCH v3 4/4] pagemap: switch to the new format and do some cleanup Konstantin Khlebnikov
2015-06-09 20:00   ` Konstantin Khlebnikov
2015-06-12 18:49   ` Mark Williamson
2015-06-12 18:49     ` Mark Williamson
2015-06-12 18:49     ` Mark Williamson
2015-06-15  5:56   ` [PATCH v4] " Konstantin Khlebnikov
2015-06-15  5:56     ` Konstantin Khlebnikov
2015-06-15 14:57     ` Mark Williamson
2015-06-15 14:57       ` Mark Williamson
2015-06-15 14:57       ` Mark Williamson
2015-06-16 21:29     ` Andrew Morton
2015-06-16 21:29       ` Andrew Morton
2015-06-16 21:29       ` Andrew Morton
2015-06-17  4:59       ` Konstantin Khlebnikov
2015-06-17  4:59         ` Konstantin Khlebnikov
2015-06-17  4:59         ` Konstantin Khlebnikov
2015-06-17  6:40         ` Konstantin Khlebnikov
2015-06-17  6:40           ` Konstantin Khlebnikov
2015-06-12 18:59 ` [PATCHSET v3 0/4] pagemap: make useable for non-privilege users Mark Williamson
2015-06-12 18:59   ` Mark Williamson
2015-06-12 18:59   ` Mark Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.