All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 15:37 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patchset makes pagemap useable again in the safe way (after row hammer
bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
non-privileged users but hides PFNs from them.

Also it adds bit 'map-exlusive' which is set if page is mapped only here:
it helps in estimation of working set without exposing pfns and allows to
distinguish CoWed and non-CoWed private anonymous pages.

Second patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exlusive are available only
in the new format.

Changes since v3:
* patches reordered: cleanup now in second patch
* update pagemap for hugetlb, add missing 'FILE' bit
* fix PM_PFRAME_BITS: its 55 not 54 as was in previous versions

---

Konstantin Khlebnikov (5):
      pagemap: check permissions and capabilities at open time
      pagemap: switch to the new format and do some cleanup
      pagemap: rework hugetlb and thp report
      pagemap: hide physical addresses from non-privileged users
      pagemap: add mmap-exclusive bit for marking pages mapped only here


 Documentation/vm/pagemap.txt |    3 
 fs/proc/task_mmu.c           |  267 ++++++++++++++++++------------------------
 tools/vm/page-types.c        |   35 +++---
 3 files changed, 137 insertions(+), 168 deletions(-)

--
Konstantin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 15:37 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patchset makes pagemap useable again in the safe way (after row hammer
bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
non-privileged users but hides PFNs from them.

Also it adds bit 'map-exlusive' which is set if page is mapped only here:
it helps in estimation of working set without exposing pfns and allows to
distinguish CoWed and non-CoWed private anonymous pages.

Second patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exlusive are available only
in the new format.

Changes since v3:
* patches reordered: cleanup now in second patch
* update pagemap for hugetlb, add missing 'FILE' bit
* fix PM_PFRAME_BITS: its 55 not 54 as was in previous versions

---

Konstantin Khlebnikov (5):
      pagemap: check permissions and capabilities at open time
      pagemap: switch to the new format and do some cleanup
      pagemap: rework hugetlb and thp report
      pagemap: hide physical addresses from non-privileged users
      pagemap: add mmap-exclusive bit for marking pages mapped only here


 Documentation/vm/pagemap.txt |    3 
 fs/proc/task_mmu.c           |  267 ++++++++++++++++++------------------------
 tools/vm/page-types.c        |   35 +++---
 3 files changed, 137 insertions(+), 168 deletions(-)

--
Konstantin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
  2015-07-14 15:37 ` Konstantin Khlebnikov
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch moves permission checks from pagemap_read() into pagemap_open().

Pointer to mm is saved in file->private_data. This reference pins only
mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
---
 fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091881d4..270bf7cbc8a5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct mm_struct *mm = file->private_data;
 	struct pagemapread pm;
-	int ret = -ESRCH;
 	struct mm_walk pagemap_walk = {};
 	unsigned long src;
 	unsigned long svpfn;
 	unsigned long start_vaddr;
 	unsigned long end_vaddr;
-	int copied = 0;
+	int ret = 0, copied = 0;
 
-	if (!task)
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		goto out;
 
 	ret = -EINVAL;
 	/* file position must be aligned */
 	if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
-		goto out_task;
+		goto out_mm;
 
 	ret = 0;
 	if (!count)
-		goto out_task;
+		goto out_mm;
 
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
 	if (!pm.buffer)
-		goto out_task;
-
-	mm = mm_access(task, PTRACE_MODE_READ);
-	ret = PTR_ERR(mm);
-	if (!mm || IS_ERR(mm))
-		goto out_free;
+		goto out_mm;
 
 	pagemap_walk.pmd_entry = pagemap_pte_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
@@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	src = *ppos;
 	svpfn = src / PM_ENTRY_BYTES;
 	start_vaddr = svpfn << PAGE_SHIFT;
-	end_vaddr = TASK_SIZE_OF(task);
+	end_vaddr = mm->task_size;
 
 	/* watch out for wraparound */
-	if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+	if (svpfn > mm->task_size >> PAGE_SHIFT)
 		start_vaddr = end_vaddr;
 
 	/*
@@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 		len = min(count, PM_ENTRY_BYTES * pm.pos);
 		if (copy_to_user(buf, pm.buffer, len)) {
 			ret = -EFAULT;
-			goto out_mm;
+			goto out_free;
 		}
 		copied += len;
 		buf += len;
@@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!ret || ret == PM_END_OF_BUFFER)
 		ret = copied;
 
-out_mm:
-	mmput(mm);
 out_free:
 	kfree(pm.buffer);
-out_task:
-	put_task_struct(task);
+out_mm:
+	mmput(mm);
 out:
 	return ret;
 }
 
 static int pagemap_open(struct inode *inode, struct file *file)
 {
+	struct mm_struct *mm;
+
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");
+
+	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+	file->private_data = mm;
+	return 0;
+}
+
+static int pagemap_release(struct inode *inode, struct file *file)
+{
+	struct mm_struct *mm = file->private_data;
+
+	if (mm)
+		mmdrop(mm);
 	return 0;
 }
 
@@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
 	.open		= pagemap_open,
+	.release	= pagemap_release,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch moves permission checks from pagemap_read() into pagemap_open().

Pointer to mm is saved in file->private_data. This reference pins only
mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
---
 fs/proc/task_mmu.c |   48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091881d4..270bf7cbc8a5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1227,40 +1227,33 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct mm_struct *mm = file->private_data;
 	struct pagemapread pm;
-	int ret = -ESRCH;
 	struct mm_walk pagemap_walk = {};
 	unsigned long src;
 	unsigned long svpfn;
 	unsigned long start_vaddr;
 	unsigned long end_vaddr;
-	int copied = 0;
+	int ret = 0, copied = 0;
 
-	if (!task)
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		goto out;
 
 	ret = -EINVAL;
 	/* file position must be aligned */
 	if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
-		goto out_task;
+		goto out_mm;
 
 	ret = 0;
 	if (!count)
-		goto out_task;
+		goto out_mm;
 
 	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
 	if (!pm.buffer)
-		goto out_task;
-
-	mm = mm_access(task, PTRACE_MODE_READ);
-	ret = PTR_ERR(mm);
-	if (!mm || IS_ERR(mm))
-		goto out_free;
+		goto out_mm;
 
 	pagemap_walk.pmd_entry = pagemap_pte_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
@@ -1273,10 +1266,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	src = *ppos;
 	svpfn = src / PM_ENTRY_BYTES;
 	start_vaddr = svpfn << PAGE_SHIFT;
-	end_vaddr = TASK_SIZE_OF(task);
+	end_vaddr = mm->task_size;
 
 	/* watch out for wraparound */
-	if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+	if (svpfn > mm->task_size >> PAGE_SHIFT)
 		start_vaddr = end_vaddr;
 
 	/*
@@ -1303,7 +1296,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 		len = min(count, PM_ENTRY_BYTES * pm.pos);
 		if (copy_to_user(buf, pm.buffer, len)) {
 			ret = -EFAULT;
-			goto out_mm;
+			goto out_free;
 		}
 		copied += len;
 		buf += len;
@@ -1313,24 +1306,38 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!ret || ret == PM_END_OF_BUFFER)
 		ret = copied;
 
-out_mm:
-	mmput(mm);
 out_free:
 	kfree(pm.buffer);
-out_task:
-	put_task_struct(task);
+out_mm:
+	mmput(mm);
 out:
 	return ret;
 }
 
 static int pagemap_open(struct inode *inode, struct file *file)
 {
+	struct mm_struct *mm;
+
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
 			"to stop being page-shift some time soon. See the "
 			"linux/Documentation/vm/pagemap.txt for details.\n");
+
+	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+	file->private_data = mm;
+	return 0;
+}
+
+static int pagemap_release(struct inode *inode, struct file *file)
+{
+	struct mm_struct *mm = file->private_data;
+
+	if (mm)
+		mmdrop(mm);
 	return 0;
 }
 
@@ -1338,6 +1345,7 @@ const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
 	.open		= pagemap_open,
+	.release	= pagemap_release,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 2/5] pagemap: switch to the new format and do some cleanup
  2015-07-14 15:37 ` Konstantin Khlebnikov
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c    |  150 +++++++++++++++++--------------------------------
 tools/vm/page-types.c |   25 +++-----
 2 files changed, 61 insertions(+), 114 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 270bf7cbc8a5..c05db6acdc35 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,36 +937,24 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFRAME_BITS		55
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1011,7 +975,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1031,7 +995,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1042,63 +1006,61 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	u64 frame = 0;
+
 	/*
 	 * Currently pmd for thp is always present because thp can not be
 	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
 	 * This if-check is just to prepare for future implementation.
 	 */
-	if (pmd_present(pmd))
-		*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
-				| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
-	else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+	if (pmd_present(pmd)) {
+		frame = pmd_pfn(pmd) + offset;
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1112,12 +1074,10 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		for (; addr != end; addr += PAGE_SIZE) {
 			unsigned long offset;
@@ -1125,7 +1085,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1145,7 +1105,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1158,16 +1118,17 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
-	if (pte_present(pte))
-		*pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)	|
-				PM_STATUS2(pm->v2, flags2)		|
-				PM_PRESENT);
-	else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+	u64 frame = 0;
+
+	if (pte_present(pte)) {
+		frame = pte_pfn(pte) + offset;
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1178,17 +1139,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1209,7 +1168,8 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bits 56-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1248,7 +1208,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1321,9 +1280,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
 
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 8bdf16b8ba60..603ec916716b 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,23 +57,14 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFRAME_BITS		55
+#define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 2/5] pagemap: switch to the new format and do some cleanup
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c    |  150 +++++++++++++++++--------------------------------
 tools/vm/page-types.c |   25 +++-----
 2 files changed, 61 insertions(+), 114 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 270bf7cbc8a5..c05db6acdc35 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -710,23 +710,6 @@ const struct file_operations proc_tid_smaps_operations = {
 	.release	= proc_map_release,
 };
 
-/*
- * We do not want to have constant page-shift bits sitting in
- * pagemap entries and are about to reuse them some time soon.
- *
- * Here's the "migration strategy":
- * 1. when the system boots these bits remain what they are,
- *    but a warning about future change is printed in log;
- * 2. once anyone clears soft-dirty bits via clear_refs file,
- *    these flag is set to denote, that user is aware of the
- *    new API and those page-shift bits change their meaning.
- *    The respective warning is printed in dmesg;
- * 3. In a couple of releases we will remove all the mentions
- *    of page-shift in pagemap entries.
- */
-
-static bool soft_dirty_cleared __read_mostly;
-
 enum clear_refs_types {
 	CLEAR_REFS_ALL = 1,
 	CLEAR_REFS_ANON,
@@ -887,13 +870,6 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 	if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
 		return -EINVAL;
 
-	if (type == CLEAR_REFS_SOFT_DIRTY) {
-		soft_dirty_cleared = true;
-		pr_warn_once("The pagemap bits 55-60 has changed their meaning!"
-			     " See the linux/Documentation/vm/pagemap.txt for "
-			     "details.\n");
-	}
-
 	task = get_proc_task(file_inode(file));
 	if (!task)
 		return -ESRCH;
@@ -961,36 +937,24 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
-	bool v2;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
 
-#define PM_ENTRY_BYTES      sizeof(pagemap_entry_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-/* in "new" pagemap pshift bits are occupied with more status bits */
-#define PM_STATUS2(v2, x)   (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_FILE             PM_STATUS(1LL)
-#define PM_NOT_PRESENT(v2)  PM_STATUS2(v2, 0)
+#define PM_ENTRY_BYTES		sizeof(pagemap_entry_t)
+#define PM_PFRAME_BITS		55
+#define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
+#define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_FILE			BIT_ULL(61)
+#define PM_SWAP			BIT_ULL(62)
+#define PM_PRESENT		BIT_ULL(63)
+
 #define PM_END_OF_BUFFER    1
 
-static inline pagemap_entry_t make_pme(u64 val)
+static inline pagemap_entry_t make_pme(u64 frame, u64 flags)
 {
-	return (pagemap_entry_t) { .pme = val };
+	return (pagemap_entry_t) { .pme = (frame & PM_PFRAME_MASK) | flags };
 }
 
 static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
@@ -1011,7 +975,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 	while (addr < end) {
 		struct vm_area_struct *vma = find_vma(walk->mm, addr);
-		pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+		pagemap_entry_t pme = make_pme(0, 0);
 		/* End of address space hole, which we mark as non-present. */
 		unsigned long hole_end;
 
@@ -1031,7 +995,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end,
 
 		/* Addresses in the VMA. */
 		if (vma->vm_flags & VM_SOFTDIRTY)
-			pme.pme |= PM_STATUS2(pm->v2, __PM_SOFT_DIRTY);
+			pme = make_pme(0, PM_SOFT_DIRTY);
 		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
@@ -1042,63 +1006,61 @@ out:
 	return err;
 }
 
-static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
+static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
-	u64 frame, flags;
+	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	int flags2 = 0;
 
 	if (pte_present(pte)) {
 		frame = pte_pfn(pte);
-		flags = PM_PRESENT;
+		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
-			flags2 |= __PM_SOFT_DIRTY;
+			flags |= PM_SOFT_DIRTY;
 		entry = pte_to_swp_entry(pte);
 		frame = swp_type(entry) |
 			(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
-		flags = PM_SWAP;
+		flags |= PM_SWAP;
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
-	} else {
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags2 |= __PM_SOFT_DIRTY;
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
-		return;
 	}
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
-	if ((vma->vm_flags & VM_SOFTDIRTY))
-		flags2 |= __PM_SOFT_DIRTY;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-	*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
+	return make_pme(frame, flags);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	u64 frame = 0;
+
 	/*
 	 * Currently pmd for thp is always present because thp can not be
 	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
 	 * This if-check is just to prepare for future implementation.
 	 */
-	if (pmd_present(pmd))
-		*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
-				| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
-	else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, pmd_flags2));
+	if (pmd_present(pmd)) {
+		frame = pmd_pfn(pmd) + offset;
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 #else
-static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-		pmd_t pmd, int offset, int pmd_flags2)
+static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
+		pmd_t pmd, int offset, u64 flags)
 {
+	return make_pme(0, 0);
 }
 #endif
 
@@ -1112,12 +1074,10 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	int err = 0;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		int pmd_flags2;
+		u64 flags = 0;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
-			pmd_flags2 = __PM_SOFT_DIRTY;
-		else
-			pmd_flags2 = 0;
+			flags |= PM_SOFT_DIRTY;
 
 		for (; addr != end; addr += PAGE_SIZE) {
 			unsigned long offset;
@@ -1125,7 +1085,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
-			thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
+			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
@@ -1145,7 +1105,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
-		pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
+		pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			break;
@@ -1158,16 +1118,17 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
-					pte_t pte, int offset, int flags2)
+static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
+					pte_t pte, int offset, u64 flags)
 {
-	if (pte_present(pte))
-		*pme = make_pme(PM_PFRAME(pte_pfn(pte) + offset)	|
-				PM_STATUS2(pm->v2, flags2)		|
-				PM_PRESENT);
-	else
-		*pme = make_pme(PM_NOT_PRESENT(pm->v2)			|
-				PM_STATUS2(pm->v2, flags2));
+	u64 frame = 0;
+
+	if (pte_present(pte)) {
+		frame = pte_pfn(pte) + offset;
+		flags |= PM_PRESENT;
+	}
+
+	return make_pme(frame, flags);
 }
 
 /* This function walks within one hugetlb entry in the single call */
@@ -1178,17 +1139,15 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	int err = 0;
-	int flags2;
+	u64 flags = 0;
 	pagemap_entry_t pme;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
-		flags2 = __PM_SOFT_DIRTY;
-	else
-		flags2 = 0;
+		flags |= PM_SOFT_DIRTY;
 
 	for (; addr != end; addr += PAGE_SIZE) {
 		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2);
+		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
@@ -1209,7 +1168,8 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * Bits 0-54  page frame number (PFN) if present
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
- * Bits 55-60 page shift (page size = 1<<page shift)
+ * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
+ * Bits 56-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
@@ -1248,7 +1208,6 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
-	pm.v2 = soft_dirty_cleared;
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1321,9 +1280,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 	/* do not disclose physical addresses: attack vector */
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
-	pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
-			"to stop being page-shift some time soon. See the "
-			"linux/Documentation/vm/pagemap.txt for details.\n");
 
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 8bdf16b8ba60..603ec916716b 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -57,23 +57,14 @@
  * pagemap kernel ABI bits
  */
 
-#define PM_ENTRY_BYTES      sizeof(uint64_t)
-#define PM_STATUS_BITS      3
-#define PM_STATUS_OFFSET    (64 - PM_STATUS_BITS)
-#define PM_STATUS_MASK      (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
-#define PM_STATUS(nr)       (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
-#define PM_PSHIFT_BITS      6
-#define PM_PSHIFT_OFFSET    (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
-#define PM_PSHIFT_MASK      (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
-#define __PM_PSHIFT(x)      (((uint64_t) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
-#define PM_PFRAME_MASK      ((1LL << PM_PSHIFT_OFFSET) - 1)
-#define PM_PFRAME(x)        ((x) & PM_PFRAME_MASK)
-
-#define __PM_SOFT_DIRTY      (1LL)
-#define PM_PRESENT          PM_STATUS(4LL)
-#define PM_SWAP             PM_STATUS(2LL)
-#define PM_SOFT_DIRTY       __PM_PSHIFT(__PM_SOFT_DIRTY)
-
+#define PM_ENTRY_BYTES		8
+#define PM_PFRAME_BITS		55
+#define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
+#define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
+#define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_FILE			(1ULL << 61)
+#define PM_SWAP			(1ULL << 62)
+#define PM_PRESENT		(1ULL << 63)
 
 /*
  * kernel page flags

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 3/5] pagemap: rework hugetlb and thp report
  2015-07-14 15:37 ` Konstantin Khlebnikov
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch moves pmd dissection out of reporting loop: huge pages
are reported as bunch of normal pages with contiguous PFNs.

Add missing "FILE" bit in hugetlb vmas.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c |  100 +++++++++++++++++++++++-----------------------------
 1 file changed, 44 insertions(+), 56 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c05db6acdc35..040721fa405a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1038,33 +1038,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	return make_pme(frame, flags);
 }
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
-		pmd_t pmd, int offset, u64 flags)
-{
-	u64 frame = 0;
-
-	/*
-	 * Currently pmd for thp is always present because thp can not be
-	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
-	 * This if-check is just to prepare for future implementation.
-	 */
-	if (pmd_present(pmd)) {
-		frame = pmd_pfn(pmd) + offset;
-		flags |= PM_PRESENT;
-	}
-
-	return make_pme(frame, flags);
-}
-#else
-static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
-		pmd_t pmd, int offset, u64 flags)
-{
-	return make_pme(0, 0);
-}
-#endif
-
-static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
@@ -1073,35 +1047,48 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte, *orig_pte;
 	int err = 0;
 
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		u64 flags = 0;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	if (pmd_trans_huge_lock(pmdp, vma, &ptl) == 1) {
+		u64 flags = 0, frame = 0;
+		pmd_t pmd = *pmdp;
 
-		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
+		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
 			flags |= PM_SOFT_DIRTY;
 
+		/*
+		 * Currently pmd for thp is always present because thp
+		 * can not be swapped-out, migrated, or HWPOISONed
+		 * (split in such cases instead.)
+		 * This if-check is just to prepare for future implementation.
+		 */
+		if (pmd_present(pmd)) {
+			flags |= PM_PRESENT;
+			frame = pmd_pfn(pmd) +
+				((addr & ~PMD_MASK) >> PAGE_SHIFT);
+		}
+
 		for (; addr != end; addr += PAGE_SIZE) {
-			unsigned long offset;
-			pagemap_entry_t pme;
+			pagemap_entry_t pme = make_pme(frame, flags);
 
-			offset = (addr & ~PAGEMAP_WALK_MASK) >>
-					PAGE_SHIFT;
-			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
+			if (flags & PM_PRESENT)
+				frame++;
 		}
 		spin_unlock(ptl);
 		return err;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmdp))
 		return 0;
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 	/*
 	 * We can assume that @vma always points to a valid one and @end never
 	 * goes beyond vma->vm_end.
 	 */
-	orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	orig_pte = pte = pte_offset_map_lock(walk->mm, pmdp, addr, &ptl);
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
@@ -1118,39 +1105,40 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
-					pte_t pte, int offset, u64 flags)
-{
-	u64 frame = 0;
-
-	if (pte_present(pte)) {
-		frame = pte_pfn(pte) + offset;
-		flags |= PM_PRESENT;
-	}
-
-	return make_pme(frame, flags);
-}
-
 /* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
+static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 				 unsigned long addr, unsigned long end,
 				 struct mm_walk *walk)
 {
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
+	u64 flags = 0, frame = 0;
 	int err = 0;
-	u64 flags = 0;
-	pagemap_entry_t pme;
+	pte_t pte;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
+	pte = huge_ptep_get(ptep);
+	if (pte_present(pte)) {
+		struct page *page = pte_page(pte);
+
+		if (!PageAnon(page))
+			flags |= PM_FILE;
+
+		flags |= PM_PRESENT;
+		frame = pte_pfn(pte) +
+			((addr & ~hmask) >> PAGE_SHIFT);
+	}
+
 	for (; addr != end; addr += PAGE_SIZE) {
-		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
+		pagemap_entry_t pme = make_pme(frame, flags);
+
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
+		if (flags & PM_PRESENT)
+			frame++;
 	}
 
 	cond_resched();
@@ -1214,7 +1202,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!pm.buffer)
 		goto out_mm;
 
-	pagemap_walk.pmd_entry = pagemap_pte_range;
+	pagemap_walk.pmd_entry = pagemap_pmd_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
 #ifdef CONFIG_HUGETLB_PAGE
 	pagemap_walk.hugetlb_entry = pagemap_hugetlb_range;


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch moves pmd dissection out of reporting loop: huge pages
are reported as bunch of normal pages with contiguous PFNs.

Add missing "FILE" bit in hugetlb vmas.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/proc/task_mmu.c |  100 +++++++++++++++++++++++-----------------------------
 1 file changed, 44 insertions(+), 56 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c05db6acdc35..040721fa405a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1038,33 +1038,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	return make_pme(frame, flags);
 }
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
-		pmd_t pmd, int offset, u64 flags)
-{
-	u64 frame = 0;
-
-	/*
-	 * Currently pmd for thp is always present because thp can not be
-	 * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
-	 * This if-check is just to prepare for future implementation.
-	 */
-	if (pmd_present(pmd)) {
-		frame = pmd_pfn(pmd) + offset;
-		flags |= PM_PRESENT;
-	}
-
-	return make_pme(frame, flags);
-}
-#else
-static pagemap_entry_t thp_pmd_to_pagemap_entry(struct pagemapread *pm,
-		pmd_t pmd, int offset, u64 flags)
-{
-	return make_pme(0, 0);
-}
-#endif
-
-static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			     struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
@@ -1073,35 +1047,48 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte, *orig_pte;
 	int err = 0;
 
-	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		u64 flags = 0;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	if (pmd_trans_huge_lock(pmdp, vma, &ptl) == 1) {
+		u64 flags = 0, frame = 0;
+		pmd_t pmd = *pmdp;
 
-		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
+		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
 			flags |= PM_SOFT_DIRTY;
 
+		/*
+		 * Currently pmd for thp is always present because thp
+		 * can not be swapped-out, migrated, or HWPOISONed
+		 * (split in such cases instead.)
+		 * This if-check is just to prepare for future implementation.
+		 */
+		if (pmd_present(pmd)) {
+			flags |= PM_PRESENT;
+			frame = pmd_pfn(pmd) +
+				((addr & ~PMD_MASK) >> PAGE_SHIFT);
+		}
+
 		for (; addr != end; addr += PAGE_SIZE) {
-			unsigned long offset;
-			pagemap_entry_t pme;
+			pagemap_entry_t pme = make_pme(frame, flags);
 
-			offset = (addr & ~PAGEMAP_WALK_MASK) >>
-					PAGE_SHIFT;
-			pme = thp_pmd_to_pagemap_entry(pm, *pmd, offset, flags);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
+			if (flags & PM_PRESENT)
+				frame++;
 		}
 		spin_unlock(ptl);
 		return err;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmdp))
 		return 0;
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 	/*
 	 * We can assume that @vma always points to a valid one and @end never
 	 * goes beyond vma->vm_end.
 	 */
-	orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	orig_pte = pte = pte_offset_map_lock(walk->mm, pmdp, addr, &ptl);
 	for (; addr < end; pte++, addr += PAGE_SIZE) {
 		pagemap_entry_t pme;
 
@@ -1118,39 +1105,40 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static pagemap_entry_t huge_pte_to_pagemap_entry(struct pagemapread *pm,
-					pte_t pte, int offset, u64 flags)
-{
-	u64 frame = 0;
-
-	if (pte_present(pte)) {
-		frame = pte_pfn(pte) + offset;
-		flags |= PM_PRESENT;
-	}
-
-	return make_pme(frame, flags);
-}
-
 /* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
+static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 				 unsigned long addr, unsigned long end,
 				 struct mm_walk *walk)
 {
 	struct pagemapread *pm = walk->private;
 	struct vm_area_struct *vma = walk->vma;
+	u64 flags = 0, frame = 0;
 	int err = 0;
-	u64 flags = 0;
-	pagemap_entry_t pme;
+	pte_t pte;
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
+	pte = huge_ptep_get(ptep);
+	if (pte_present(pte)) {
+		struct page *page = pte_page(pte);
+
+		if (!PageAnon(page))
+			flags |= PM_FILE;
+
+		flags |= PM_PRESENT;
+		frame = pte_pfn(pte) +
+			((addr & ~hmask) >> PAGE_SHIFT);
+	}
+
 	for (; addr != end; addr += PAGE_SIZE) {
-		int offset = (addr & ~hmask) >> PAGE_SHIFT;
-		pme = huge_pte_to_pagemap_entry(pm, *pte, offset, flags);
+		pagemap_entry_t pme = make_pme(frame, flags);
+
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
+		if (flags & PM_PRESENT)
+			frame++;
 	}
 
 	cond_resched();
@@ -1214,7 +1202,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!pm.buffer)
 		goto out_mm;
 
-	pagemap_walk.pmd_entry = pagemap_pte_range;
+	pagemap_walk.pmd_entry = pagemap_pmd_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
 #ifdef CONFIG_HUGETLB_PAGE
 	pagemap_walk.hugetlb_entry = pagemap_hugetlb_range;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
  2015-07-14 15:37 ` Konstantin Khlebnikov
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch makes pagemap readable for normal users and hides physical
addresses from them. For some use-cases PFN isn't required at all.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
---
 fs/proc/task_mmu.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 040721fa405a..3a5d338ea219 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -937,6 +937,7 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
+	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
@@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	struct page *page = NULL;
 
 	if (pte_present(pte)) {
-		frame = pte_pfn(pte);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte);
 		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
@@ -1063,8 +1065,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		 */
 		if (pmd_present(pmd)) {
 			flags |= PM_PRESENT;
-			frame = pmd_pfn(pmd) +
-				((addr & ~PMD_MASK) >> PAGE_SHIFT);
+			if (pm->show_pfn)
+				frame = pmd_pfn(pmd) +
+					((addr & ~PMD_MASK) >> PAGE_SHIFT);
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1073,7 +1076,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
-			if (flags & PM_PRESENT)
+			if (pm->show_pfn && (flags & PM_PRESENT))
 				frame++;
 		}
 		spin_unlock(ptl);
@@ -1127,8 +1130,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 			flags |= PM_FILE;
 
 		flags |= PM_PRESENT;
-		frame = pte_pfn(pte) +
-			((addr & ~hmask) >> PAGE_SHIFT);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte) +
+				((addr & ~hmask) >> PAGE_SHIFT);
 	}
 
 	for (; addr != end; addr += PAGE_SIZE) {
@@ -1137,7 +1141,7 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
-		if (flags & PM_PRESENT)
+		if (pm->show_pfn && (flags & PM_PRESENT))
 			frame++;
 	}
 
@@ -1196,6 +1200,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
+
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1265,10 +1272,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	/* do not disclose physical addresses: attack vector */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch makes pagemap readable for normal users and hides physical
addresses from them. For some use-cases PFN isn't required at all.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
---
 fs/proc/task_mmu.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 040721fa405a..3a5d338ea219 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -937,6 +937,7 @@ typedef struct {
 struct pagemapread {
 	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
 	pagemap_entry_t *buffer;
+	bool show_pfn;
 };
 
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
@@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	struct page *page = NULL;
 
 	if (pte_present(pte)) {
-		frame = pte_pfn(pte);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte);
 		flags |= PM_PRESENT;
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
@@ -1063,8 +1065,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		 */
 		if (pmd_present(pmd)) {
 			flags |= PM_PRESENT;
-			frame = pmd_pfn(pmd) +
-				((addr & ~PMD_MASK) >> PAGE_SHIFT);
+			if (pm->show_pfn)
+				frame = pmd_pfn(pmd) +
+					((addr & ~PMD_MASK) >> PAGE_SHIFT);
 		}
 
 		for (; addr != end; addr += PAGE_SIZE) {
@@ -1073,7 +1076,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
 				break;
-			if (flags & PM_PRESENT)
+			if (pm->show_pfn && (flags & PM_PRESENT))
 				frame++;
 		}
 		spin_unlock(ptl);
@@ -1127,8 +1130,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 			flags |= PM_FILE;
 
 		flags |= PM_PRESENT;
-		frame = pte_pfn(pte) +
-			((addr & ~hmask) >> PAGE_SHIFT);
+		if (pm->show_pfn)
+			frame = pte_pfn(pte) +
+				((addr & ~hmask) >> PAGE_SHIFT);
 	}
 
 	for (; addr != end; addr += PAGE_SIZE) {
@@ -1137,7 +1141,7 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		err = add_to_pagemap(addr, &pme, pm);
 		if (err)
 			return err;
-		if (flags & PM_PRESENT)
+		if (pm->show_pfn && (flags & PM_PRESENT))
 			frame++;
 	}
 
@@ -1196,6 +1200,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_mm;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
+
 	pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
 	pm.buffer = kmalloc(pm.len * PM_ENTRY_BYTES, GFP_TEMPORARY);
 	ret = -ENOMEM;
@@ -1265,10 +1272,6 @@ static int pagemap_open(struct inode *inode, struct file *file)
 {
 	struct mm_struct *mm;
 
-	/* do not disclose physical addresses: attack vector */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
 	mm = proc_mem_open(inode, PTRACE_MODE_READ);
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-07-14 15:37 ` Konstantin Khlebnikov
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch sets bit 56 in pagemap if this page is mapped only once.
It allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0       0    0         non-present
1       1    0         file page mapped somewhere else
1       1    1         file page mapped only here
1       0    0         anon non-CoWed page (shared with parent/child)
1       0    1         anon CoWed page (or never forked)

CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.

MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch
that pte via /proc/pid/mem to get real information.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Requested-by: Mark Williamson <mwilliamson@undo-software.com>
Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
---
 Documentation/vm/pagemap.txt |    3 ++-
 fs/proc/task_mmu.c           |   14 +++++++++++++-
 tools/vm/page-types.c        |   10 ++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 6bfbc172cdb9..3cfbbb333ea1 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,8 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bits 56-60 zero
+    * Bit  56    page exlusively mapped
+    * Bits 57-60 zero
     * Bit  61    page is file-page or shared-anon
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3a5d338ea219..bac4c97f8ff8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -947,6 +947,7 @@ struct pagemapread {
 #define PM_PFRAME_BITS		55
 #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
 #define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1034,6 +1035,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
+	if (page && page_mapcount(page) == 1)
+		flags |= PM_MMAP_EXCLUSIVE;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
@@ -1064,6 +1067,11 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		 * This if-check is just to prepare for future implementation.
 		 */
 		if (pmd_present(pmd)) {
+			struct page *page = pmd_page(pmd);
+
+			if (page_mapcount(page) == 1)
+				flags |= PM_MMAP_EXCLUSIVE;
+
 			flags |= PM_PRESENT;
 			if (pm->show_pfn)
 				frame = pmd_pfn(pmd) +
@@ -1129,6 +1137,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (!PageAnon(page))
 			flags |= PM_FILE;
 
+		if (page_mapcount(page) == 1)
+			flags |= PM_MMAP_EXCLUSIVE;
+
 		flags |= PM_PRESENT;
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
@@ -1161,7 +1172,8 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
  * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
- * Bits 56-60 zero
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 603ec916716b..7f73fa32a590 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -62,6 +62,7 @@
 #define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
 #define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
 #define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
 #define PM_FILE			(1ULL << 61)
 #define PM_SWAP			(1ULL << 62)
 #define PM_PRESENT		(1ULL << 63)
@@ -91,6 +92,8 @@
 #define KPF_SLOB_FREE		49
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
+#define KPF_FILE		62
+#define KPF_MMAP_EXCLUSIVE	63
 
 #define KPF_ALL_BITS		((uint64_t)~0ULL)
 #define KPF_HACKERS_BITS	(0xffffULL << 32)
@@ -140,6 +143,9 @@ static const char * const page_flag_names[] = {
 	[KPF_SLOB_FREE]		= "P:slob_free",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
+
+	[KPF_FILE]		= "F:file",
+	[KPF_MMAP_EXCLUSIVE]	= "1:mmap_exclusive",
 };
 
 
@@ -443,6 +449,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
+	if (pme & PM_FILE)
+		flags |= BIT(FILE);
+	if (pme & PM_MMAP_EXCLUSIVE)
+		flags |= BIT(MMAP_EXCLUSIVE);
 
 	return flags;
 }


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-07-14 15:37   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 15:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi
  Cc: Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

This patch sets bit 56 in pagemap if this page is mapped only once.
It allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0       0    0         non-present
1       1    0         file page mapped somewhere else
1       1    1         file page mapped only here
1       0    0         anon non-CoWed page (shared with parent/child)
1       0    1         anon CoWed page (or never forked)

CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.

MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch
that pte via /proc/pid/mem to get real information.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Requested-by: Mark Williamson <mwilliamson@undo-software.com>
Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
---
 Documentation/vm/pagemap.txt |    3 ++-
 fs/proc/task_mmu.c           |   14 +++++++++++++-
 tools/vm/page-types.c        |   10 ++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 6bfbc172cdb9..3cfbbb333ea1 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,8 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bits 56-60 zero
+    * Bit  56    page exlusively mapped
+    * Bits 57-60 zero
     * Bit  61    page is file-page or shared-anon
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3a5d338ea219..bac4c97f8ff8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -947,6 +947,7 @@ struct pagemapread {
 #define PM_PFRAME_BITS		55
 #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
 #define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1034,6 +1035,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
+	if (page && page_mapcount(page) == 1)
+		flags |= PM_MMAP_EXCLUSIVE;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
@@ -1064,6 +1067,11 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		 * This if-check is just to prepare for future implementation.
 		 */
 		if (pmd_present(pmd)) {
+			struct page *page = pmd_page(pmd);
+
+			if (page_mapcount(page) == 1)
+				flags |= PM_MMAP_EXCLUSIVE;
+
 			flags |= PM_PRESENT;
 			if (pm->show_pfn)
 				frame = pmd_pfn(pmd) +
@@ -1129,6 +1137,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (!PageAnon(page))
 			flags |= PM_FILE;
 
+		if (page_mapcount(page) == 1)
+			flags |= PM_MMAP_EXCLUSIVE;
+
 		flags |= PM_PRESENT;
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
@@ -1161,7 +1172,8 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
  * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
- * Bits 56-60 zero
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 603ec916716b..7f73fa32a590 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -62,6 +62,7 @@
 #define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
 #define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
 #define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
 #define PM_FILE			(1ULL << 61)
 #define PM_SWAP			(1ULL << 62)
 #define PM_PRESENT		(1ULL << 63)
@@ -91,6 +92,8 @@
 #define KPF_SLOB_FREE		49
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
+#define KPF_FILE		62
+#define KPF_MMAP_EXCLUSIVE	63
 
 #define KPF_ALL_BITS		((uint64_t)~0ULL)
 #define KPF_HACKERS_BITS	(0xffffULL << 32)
@@ -140,6 +143,9 @@ static const char * const page_flag_names[] = {
 	[KPF_SLOB_FREE]		= "P:slob_free",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
+
+	[KPF_FILE]		= "F:file",
+	[KPF_MMAP_EXCLUSIVE]	= "1:mmap_exclusive",
 };
 
 
@@ -443,6 +449,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
+	if (pme & PM_FILE)
+		flags |= BIT(FILE);
+	if (pme & PM_MMAP_EXCLUSIVE)
+		flags |= BIT(MMAP_EXCLUSIVE);
 
 	return flags;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
  2015-07-14 15:37 ` Konstantin Khlebnikov
  (?)
@ 2015-07-14 18:52   ` Andrew Morton
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2015-07-14 18:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Naoya Horiguchi, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:

> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.

Documentation/vm/pagemap.txt hasn't been updated to describe these
privilege issues?

> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
> 
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.

I'm not really seeing a description of the new format in these
changelogs.  Precisely what got removed, what got added and which
capabilities change the output in what manner?



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 18:52   ` Andrew Morton
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2015-07-14 18:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Naoya Horiguchi,
	Kirill A. Shutemov, Mark Williamson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> wrote:

> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.

Documentation/vm/pagemap.txt hasn't been updated to describe these
privilege issues?

> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
> 
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.

I'm not really seeing a description of the new format in these
changelogs.  Precisely what got removed, what got added and which
capabilities change the output in what manner?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 18:52   ` Andrew Morton
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2015-07-14 18:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Naoya Horiguchi, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:

> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.

Documentation/vm/pagemap.txt hasn't been updated to describe these
privilege issues?

> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
> 
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.

I'm not really seeing a description of the new format in these
changelogs.  Precisely what got removed, what got added and which
capabilities change the output in what manner?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 20:15     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 20:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, linux-mm, Naoya Horiguchi,
	Kirill A. Shutemov, Mark Williamson, Linux Kernel Mailing List,
	Linux API

On Tue, Jul 14, 2015 at 9:52 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
>
>> This patchset makes pagemap useable again in the safe way (after row hammer
>> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
>> non-privileged users but hides PFNs from them.
>
> Documentation/vm/pagemap.txt hasn't been updated to describe these
> privilege issues?

Will do. Too much time passed between versions, I planned but forgot about that.

>
>> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
>> it helps in estimation of working set without exposing pfns and allows to
>> distinguish CoWed and non-CoWed private anonymous pages.
>>
>> Second patch removes page-shift bits and completes migration to the new
>> pagemap format: flags soft-dirty and mmap-exlusive are available only
>> in the new format.
>
> I'm not really seeing a description of the new format in these
> changelogs.  Precisely what got removed, what got added and which
> capabilities change the output in what manner?

Now pfn (bits 0-54) is zero if task who opened pagemap has no
CAP_SYS_ADMIN (system-wide).

in v2 format page-shift (bits 55-60) now used for flags:
55 - soft-dirty (added for checkpoint-restore, I guess)
56 - mmap-exclusive (added in last patch)
57-60 - free for use

I'll document the history of these changes.

>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 20:15     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 20:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Naoya Horiguchi, Kirill A. Shutemov, Mark Williamson,
	Linux Kernel Mailing List, Linux API

On Tue, Jul 14, 2015 at 9:52 PM, Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> wrote:
>
>> This patchset makes pagemap useable again in the safe way (after row hammer
>> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
>> non-privileged users but hides PFNs from them.
>
> Documentation/vm/pagemap.txt hasn't been updated to describe these
> privilege issues?

Will do. Too much time passed between versions, I planned but forgot about that.

>
>> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
>> it helps in estimation of working set without exposing pfns and allows to
>> distinguish CoWed and non-CoWed private anonymous pages.
>>
>> Second patch removes page-shift bits and completes migration to the new
>> pagemap format: flags soft-dirty and mmap-exlusive are available only
>> in the new format.
>
> I'm not really seeing a description of the new format in these
> changelogs.  Precisely what got removed, what got added and which
> capabilities change the output in what manner?

Now pfn (bits 0-54) is zero if task who opened pagemap has no
CAP_SYS_ADMIN (system-wide).

in v2 format page-shift (bits 55-60) now used for flags:
55 - soft-dirty (added for checkpoint-restore, I guess)
56 - mmap-exclusive (added in last patch)
57-60 - free for use

I'll document the history of these changes.

>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-14 20:15     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-14 20:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, linux-mm, Naoya Horiguchi,
	Kirill A. Shutemov, Mark Williamson, Linux Kernel Mailing List,
	Linux API

On Tue, Jul 14, 2015 at 9:52 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 14 Jul 2015 18:37:34 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
>
>> This patchset makes pagemap useable again in the safe way (after row hammer
>> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
>> non-privileged users but hides PFNs from them.
>
> Documentation/vm/pagemap.txt hasn't been updated to describe these
> privilege issues?

Will do. Too much time passed between versions, I planned but forgot about that.

>
>> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
>> it helps in estimation of working set without exposing pfns and allows to
>> distinguish CoWed and non-CoWed private anonymous pages.
>>
>> Second patch removes page-shift bits and completes migration to the new
>> pagemap format: flags soft-dirty and mmap-exlusive are available only
>> in the new format.
>
> I'm not really seeing a description of the new format in these
> changelogs.  Precisely what got removed, what got added and which
> capabilities change the output in what manner?

Now pfn (bits 0-54) is zero if task who opened pagemap has no
CAP_SYS_ADMIN (system-wide).

in v2 format page-shift (bits 55-60) now used for flags:
55 - soft-dirty (added for checkpoint-restore, I guess)
56 - mmap-exclusive (added in last patch)
57-60 - free for use

I'll document the history of these changes.

>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] pagemap: update documentation
  2015-07-14 15:37 ` Konstantin Khlebnikov
  (?)
@ 2015-07-16 18:47   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-16 18:47 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi; +Cc: linux-api, linux-kernel

Notes about recent changes.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/vm/pagemap.txt |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 3cfbbb333ea1..aab39aa7dd8f 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,12 +16,17 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bit  56    page exlusively mapped
+    * Bit  56    page exclusively mapped (since 4.2)
     * Bits 57-60 zero
-    * Bit  61    page is file-page or shared-anon
+    * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
 
+   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs:
+   for unprivileged users from 4.0 till 4.2 open fails with -EPERM, starting
+   from from 4.2 PFN field is zeroed if user has no CAP_SYS_ADMIN capability.
+   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
+
    If the page is not present but in swap, then the PFN contains an
    encoding of the swap file number and the page's offset into the
    swap. Unmapped pages return a null PFN. This allows determining
@@ -160,3 +165,8 @@ Other notes:
 Reading from any of the files will return -EINVAL if you are not starting
 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
 into the file), or if the size of the read is not a multiple of 8 bytes.
+
+Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
+always 12 at most architectures). Since Linux 3.11 their meaning changes
+after first clear of soft-dirty bits. Since Linux 4.2 they are used for
+flags unconditionally.


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH] pagemap: update documentation
@ 2015-07-16 18:47   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-16 18:47 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Notes about recent changes.

Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
---
 Documentation/vm/pagemap.txt |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 3cfbbb333ea1..aab39aa7dd8f 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,12 +16,17 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bit  56    page exlusively mapped
+    * Bit  56    page exclusively mapped (since 4.2)
     * Bits 57-60 zero
-    * Bit  61    page is file-page or shared-anon
+    * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
 
+   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs:
+   for unprivileged users from 4.0 till 4.2 open fails with -EPERM, starting
+   from from 4.2 PFN field is zeroed if user has no CAP_SYS_ADMIN capability.
+   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
+
    If the page is not present but in swap, then the PFN contains an
    encoding of the swap file number and the page's offset into the
    swap. Unmapped pages return a null PFN. This allows determining
@@ -160,3 +165,8 @@ Other notes:
 Reading from any of the files will return -EINVAL if you are not starting
 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
 into the file), or if the size of the read is not a multiple of 8 bytes.
+
+Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
+always 12 at most architectures). Since Linux 3.11 their meaning changes
+after first clear of soft-dirty bits. Since Linux 4.2 they are used for
+flags unconditionally.

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH] pagemap: update documentation
@ 2015-07-16 18:47   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-16 18:47 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Naoya Horiguchi; +Cc: linux-api, linux-kernel

Notes about recent changes.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/vm/pagemap.txt |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 3cfbbb333ea1..aab39aa7dd8f 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,12 +16,17 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bit  56    page exlusively mapped
+    * Bit  56    page exclusively mapped (since 4.2)
     * Bits 57-60 zero
-    * Bit  61    page is file-page or shared-anon
+    * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
 
+   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs:
+   for unprivileged users from 4.0 till 4.2 open fails with -EPERM, starting
+   from from 4.2 PFN field is zeroed if user has no CAP_SYS_ADMIN capability.
+   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
+
    If the page is not present but in swap, then the PFN contains an
    encoding of the swap file number and the page's offset into the
    swap. Unmapped pages return a null PFN. This allows determining
@@ -160,3 +165,8 @@ Other notes:
 Reading from any of the files will return -EINVAL if you are not starting
 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
 into the file), or if the size of the read is not a multiple of 8 bytes.
+
+Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
+always 12 at most architectures). Since Linux 3.11 their meaning changes
+after first clear of soft-dirty bits. Since Linux 4.2 they are used for
+flags unconditionally.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
  2015-07-14 15:37   ` Konstantin Khlebnikov
@ 2015-07-19 11:10     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 55+ messages in thread
From: Kirill A. Shutemov @ 2015-07-19 11:10 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
> @@ -1073,35 +1047,48 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	pte_t *pte, *orig_pte;
>  	int err = 0;
>  
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		u64 flags = 0;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (pmd_trans_huge_lock(pmdp, vma, &ptl) == 1) {

#ifdef is redundant. pmd_trans_huge_lock() always return 0 for !THP.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-19 11:10     ` Kirill A. Shutemov
  0 siblings, 0 replies; 55+ messages in thread
From: Kirill A. Shutemov @ 2015-07-19 11:10 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
> @@ -1073,35 +1047,48 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	pte_t *pte, *orig_pte;
>  	int err = 0;
>  
> -	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
> -		u64 flags = 0;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (pmd_trans_huge_lock(pmdp, vma, &ptl) == 1) {

#ifdef is redundant. pmd_trans_huge_lock() always return 0 for !THP.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 2/5] pagemap: switch to the new format and do some cleanup
  2015-07-14 15:37   ` Konstantin Khlebnikov
@ 2015-07-21  7:44     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  7:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:37PM +0300, Konstantin Khlebnikov wrote:
> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 2/5] pagemap: switch to the new format and do some cleanup
@ 2015-07-21  7:44     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  7:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:37PM +0300, Konstantin Khlebnikov wrote:
> This patch removes page-shift bits (scheduled to remove since 3.11) and
> completes migration to the new bit layout. Also it cleans messy macro.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
  2015-07-14 15:37   ` Konstantin Khlebnikov
@ 2015-07-21  8:00     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
> This patch moves pmd dissection out of reporting loop: huge pages
> are reported as bunch of normal pages with contiguous PFNs.
> 
> Add missing "FILE" bit in hugetlb vmas.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

With reflecting Kirill's comment about #ifdef, I'm OK for this patch.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-21  8:00     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
> This patch moves pmd dissection out of reporting loop: huge pages
> are reported as bunch of normal pages with contiguous PFNs.
> 
> Add missing "FILE" bit in hugetlb vmas.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

With reflecting Kirill's comment about #ifdef, I'm OK for this patch.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
  2015-07-14 15:37   ` Konstantin Khlebnikov
@ 2015-07-21  8:06     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:06 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:35PM +0300, Konstantin Khlebnikov wrote:
> This patch moves permission checks from pagemap_read() into pagemap_open().
> 
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
@ 2015-07-21  8:06     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:06 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:35PM +0300, Konstantin Khlebnikov wrote:
> This patch moves permission checks from pagemap_read() into pagemap_open().
> 
> Pointer to mm is saved in file->private_data. This reference pins only
> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
  2015-07-14 15:37   ` Konstantin Khlebnikov
@ 2015-07-21  8:11     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
> This patch makes pagemap readable for normal users and hides physical
> addresses from them. For some use-cases PFN isn't required at all.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
> ---
>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 040721fa405a..3a5d338ea219 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -937,6 +937,7 @@ typedef struct {
>  struct pagemapread {
>  	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
>  	pagemap_entry_t *buffer;
> +	bool show_pfn;
>  };
>  
>  #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  	struct page *page = NULL;
>  
>  	if (pte_present(pte)) {
> -		frame = pte_pfn(pte);
> +		if (pm->show_pfn)
> +			frame = pte_pfn(pte);
>  		flags |= PM_PRESENT;
>  		page = vm_normal_page(vma, addr, pte);
>  		if (pte_soft_dirty(pte))

Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
(although I don't think that it can be exploited by row hammer attack ...)

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
@ 2015-07-21  8:11     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
> This patch makes pagemap readable for normal users and hides physical
> addresses from them. For some use-cases PFN isn't required at all.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
> ---
>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 040721fa405a..3a5d338ea219 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -937,6 +937,7 @@ typedef struct {
>  struct pagemapread {
>  	int pos, len;		/* units: PM_ENTRY_BYTES, not bytes */
>  	pagemap_entry_t *buffer;
> +	bool show_pfn;
>  };
>  
>  #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  	struct page *page = NULL;
>  
>  	if (pte_present(pte)) {
> -		frame = pte_pfn(pte);
> +		if (pm->show_pfn)
> +			frame = pte_pfn(pte);
>  		flags |= PM_PRESENT;
>  		page = vm_normal_page(vma, addr, pte);
>  		if (pte_soft_dirty(pte))

Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
(although I don't think that it can be exploited by row hammer attack ...)

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-07-14 15:37   ` Konstantin Khlebnikov
  (?)
@ 2015-07-21  8:17     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:49PM +0300, Konstantin Khlebnikov wrote:
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
> 
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
> 
> CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.
> 
> MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Requested-by: Mark Williamson <mwilliamson@undo-software.com>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-07-21  8:17     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton,
	Kirill A. Shutemov, Mark Williamson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Tue, Jul 14, 2015 at 06:37:49PM +0300, Konstantin Khlebnikov wrote:
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
> 
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
> 
> CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.
> 
> MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
> Requested-by: Mark Williamson <mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org

Reviewed-by: Naoya Horiguchi <n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org>--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-07-21  8:17     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mark Williamson,
	linux-kernel, linux-api

On Tue, Jul 14, 2015 at 06:37:49PM +0300, Konstantin Khlebnikov wrote:
> This patch sets bit 56 in pagemap if this page is mapped only once.
> It allows to detect exclusively used pages without exposing PFN:
> 
> present file exclusive state
> 0       0    0         non-present
> 1       1    0         file page mapped somewhere else
> 1       1    1         file page mapped only here
> 1       0    0         anon non-CoWed page (shared with parent/child)
> 1       0    1         anon CoWed page (or never forked)
> 
> CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.
> 
> MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
> page could be mapped once but has several swap-ptes which point to it.
> Application could detect that by swap bit in pagemap entry and touch
> that pte via /proc/pid/mem to get real information.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Requested-by: Mark Williamson <mwilliamson@undo-software.com>
> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] pagemap: update documentation
  2015-07-16 18:47   ` Konstantin Khlebnikov
@ 2015-07-21  8:35     ` Naoya Horiguchi
  -1 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-api, linux-kernel

On Thu, Jul 16, 2015 at 09:47:42PM +0300, Konstantin Khlebnikov wrote:
> Notes about recent changes.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  Documentation/vm/pagemap.txt |   14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index 3cfbbb333ea1..aab39aa7dd8f 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -16,12 +16,17 @@ There are three components to pagemap:
>      * Bits 0-4   swap type if swapped
>      * Bits 5-54  swap offset if swapped
>      * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> -    * Bit  56    page exlusively mapped
> +    * Bit  56    page exclusively mapped (since 4.2)
>      * Bits 57-60 zero
> -    * Bit  61    page is file-page or shared-anon
> +    * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
>  
> +   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs:
> +   for unprivileged users from 4.0 till 4.2 open fails with -EPERM, starting

I'm expecting that this patch will be merged before 4.2 is released, so if that's
right, stating "till 4.2" might be incorrect.

> +   from from 4.2 PFN field is zeroed if user has no CAP_SYS_ADMIN capability.

"from" duplicates ...

Thanks,
Naoya Horiguchi

> +   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
> +
>     If the page is not present but in swap, then the PFN contains an
>     encoding of the swap file number and the page's offset into the
>     swap. Unmapped pages return a null PFN. This allows determining
> @@ -160,3 +165,8 @@ Other notes:
>  Reading from any of the files will return -EINVAL if you are not starting
>  the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
>  into the file), or if the size of the read is not a multiple of 8 bytes.
> +
> +Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
> +always 12 at most architectures). Since Linux 3.11 their meaning changes
> +after first clear of soft-dirty bits. Since Linux 4.2 they are used for
> +flags unconditionally.
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] pagemap: update documentation
@ 2015-07-21  8:35     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2015-07-21  8:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-api, linux-kernel

On Thu, Jul 16, 2015 at 09:47:42PM +0300, Konstantin Khlebnikov wrote:
> Notes about recent changes.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  Documentation/vm/pagemap.txt |   14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index 3cfbbb333ea1..aab39aa7dd8f 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -16,12 +16,17 @@ There are three components to pagemap:
>      * Bits 0-4   swap type if swapped
>      * Bits 5-54  swap offset if swapped
>      * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
> -    * Bit  56    page exlusively mapped
> +    * Bit  56    page exclusively mapped (since 4.2)
>      * Bits 57-60 zero
> -    * Bit  61    page is file-page or shared-anon
> +    * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
>  
> +   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs:
> +   for unprivileged users from 4.0 till 4.2 open fails with -EPERM, starting

I'm expecting that this patch will be merged before 4.2 is released, so if that's
right, stating "till 4.2" might be incorrect.

> +   from from 4.2 PFN field is zeroed if user has no CAP_SYS_ADMIN capability.

"from" duplicates ...

Thanks,
Naoya Horiguchi

> +   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
> +
>     If the page is not present but in swap, then the PFN contains an
>     encoding of the swap file number and the page's offset into the
>     swap. Unmapped pages return a null PFN. This allows determining
> @@ -160,3 +165,8 @@ Other notes:
>  Reading from any of the files will return -EINVAL if you are not starting
>  the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
>  into the file), or if the size of the read is not a multiple of 8 bytes.
> +
> +Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
> +always 12 at most architectures). Since Linux 3.11 their meaning changes
> +after first clear of soft-dirty bits. Since Linux 4.2 they are used for
> +flags unconditionally.
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
  2015-07-21  8:11     ` Naoya Horiguchi
@ 2015-07-21  8:39       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-21  8:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

On Tue, Jul 21, 2015 at 11:11 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
>> This patch makes pagemap readable for normal users and hides physical
>> addresses from them. For some use-cases PFN isn't required at all.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
>> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>> ---
>>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 040721fa405a..3a5d338ea219 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -937,6 +937,7 @@ typedef struct {
>>  struct pagemapread {
>>       int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>>       pagemap_entry_t *buffer;
>> +     bool show_pfn;
>>  };
>>
>>  #define PAGEMAP_WALK_SIZE    (PMD_SIZE)
>> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>>       struct page *page = NULL;
>>
>>       if (pte_present(pte)) {
>> -             frame = pte_pfn(pte);
>> +             if (pm->show_pfn)
>> +                     frame = pte_pfn(pte);
>>               flags |= PM_PRESENT;
>>               page = vm_normal_page(vma, addr, pte);
>>               if (pte_soft_dirty(pte))
>
> Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
> (although I don't think that it can be exploited by row hammer attack ...)

Yeah, but I see no reason for that.
Probably except swap on ramdrive, but this too weird =)

>
> Thanks,
> Naoya Horiguchi
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
@ 2015-07-21  8:39       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-21  8:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

On Tue, Jul 21, 2015 at 11:11 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
>> This patch makes pagemap readable for normal users and hides physical
>> addresses from them. For some use-cases PFN isn't required at all.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
>> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>> ---
>>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 040721fa405a..3a5d338ea219 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -937,6 +937,7 @@ typedef struct {
>>  struct pagemapread {
>>       int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>>       pagemap_entry_t *buffer;
>> +     bool show_pfn;
>>  };
>>
>>  #define PAGEMAP_WALK_SIZE    (PMD_SIZE)
>> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>>       struct page *page = NULL;
>>
>>       if (pte_present(pte)) {
>> -             frame = pte_pfn(pte);
>> +             if (pm->show_pfn)
>> +                     frame = pte_pfn(pte);
>>               flags |= PM_PRESENT;
>>               page = vm_normal_page(vma, addr, pte);
>>               if (pte_soft_dirty(pte))
>
> Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
> (although I don't think that it can be exploited by row hammer attack ...)

Yeah, but I see no reason for that.
Probably except swap on ramdrive, but this too weird =)

>
> Thanks,
> Naoya Horiguchi
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
  2015-07-21  8:00     ` Naoya Horiguchi
@ 2015-07-21  8:43       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-21  8:43 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>> This patch moves pmd dissection out of reporting loop: huge pages
>> are reported as bunch of normal pages with contiguous PFNs.
>>
>> Add missing "FILE" bit in hugetlb vmas.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.

That ifdef works most like documentation: "all thp magic happens here".
I'd like to keep it, if two redundant lines isn't a big deal.

>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-21  8:43       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 55+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-21  8:43 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, Mark Williamson, linux-kernel, linux-api

On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>> This patch moves pmd dissection out of reporting loop: huge pages
>> are reported as bunch of normal pages with contiguous PFNs.
>>
>> Add missing "FILE" bit in hugetlb vmas.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.

That ifdef works most like documentation: "all thp magic happens here".
I'd like to keep it, if two redundant lines isn't a big deal.

>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
  2015-07-14 15:37 ` Konstantin Khlebnikov
  (?)
@ 2015-07-24 17:34   ` Mark Williamson
  -1 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 17:34 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Kirill A. Shutemov,
	kernel list, Linux API

Hi Konstantin,

Thank you for the further update - I tested this patchset against our
code and it allows our software to work correctly (with minor userland
changes, as before).

I'll follow up with review messages but there aren't really any
concerns that I can see.

Cheers,
Mark

On Tue, Jul 14, 2015 at 4:37 PM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.
>
> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
>
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> Changes since v3:
> * patches reordered: cleanup now in second patch
> * update pagemap for hugetlb, add missing 'FILE' bit
> * fix PM_PFRAME_BITS: its 55 not 54 as was in previous versions
>
> ---
>
> Konstantin Khlebnikov (5):
>       pagemap: check permissions and capabilities at open time
>       pagemap: switch to the new format and do some cleanup
>       pagemap: rework hugetlb and thp report
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>
>
>  Documentation/vm/pagemap.txt |    3
>  fs/proc/task_mmu.c           |  267 ++++++++++++++++++------------------------
>  tools/vm/page-types.c        |   35 +++---
>  3 files changed, 137 insertions(+), 168 deletions(-)
>
> --
> Konstantin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-24 17:34   ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 17:34 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, Naoya Horiguchi,
	Kirill A. Shutemov, kernel list, Linux API

Hi Konstantin,

Thank you for the further update - I tested this patchset against our
code and it allows our software to work correctly (with minor userland
changes, as before).

I'll follow up with review messages but there aren't really any
concerns that I can see.

Cheers,
Mark

On Tue, Jul 14, 2015 at 4:37 PM, Konstantin Khlebnikov
<khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> wrote:
> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.
>
> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
>
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> Changes since v3:
> * patches reordered: cleanup now in second patch
> * update pagemap for hugetlb, add missing 'FILE' bit
> * fix PM_PFRAME_BITS: its 55 not 54 as was in previous versions
>
> ---
>
> Konstantin Khlebnikov (5):
>       pagemap: check permissions and capabilities at open time
>       pagemap: switch to the new format and do some cleanup
>       pagemap: rework hugetlb and thp report
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>
>
>  Documentation/vm/pagemap.txt |    3
>  fs/proc/task_mmu.c           |  267 ++++++++++++++++++------------------------
>  tools/vm/page-types.c        |   35 +++---
>  3 files changed, 137 insertions(+), 168 deletions(-)
>
> --
> Konstantin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCHSET v4 0/5] pagemap: make useable for non-privilege users
@ 2015-07-24 17:34   ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 17:34 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, Kirill A. Shutemov,
	kernel list, Linux API

Hi Konstantin,

Thank you for the further update - I tested this patchset against our
code and it allows our software to work correctly (with minor userland
changes, as before).

I'll follow up with review messages but there aren't really any
concerns that I can see.

Cheers,
Mark

On Tue, Jul 14, 2015 at 4:37 PM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> This patchset makes pagemap useable again in the safe way (after row hammer
> bug it was made CAP_SYS_ADMIN-only). This patchset restores access for
> non-privileged users but hides PFNs from them.
>
> Also it adds bit 'map-exlusive' which is set if page is mapped only here:
> it helps in estimation of working set without exposing pfns and allows to
> distinguish CoWed and non-CoWed private anonymous pages.
>
> Second patch removes page-shift bits and completes migration to the new
> pagemap format: flags soft-dirty and mmap-exlusive are available only
> in the new format.
>
> Changes since v3:
> * patches reordered: cleanup now in second patch
> * update pagemap for hugetlb, add missing 'FILE' bit
> * fix PM_PFRAME_BITS: its 55 not 54 as was in previous versions
>
> ---
>
> Konstantin Khlebnikov (5):
>       pagemap: check permissions and capabilities at open time
>       pagemap: switch to the new format and do some cleanup
>       pagemap: rework hugetlb and thp report
>       pagemap: hide physical addresses from non-privileged users
>       pagemap: add mmap-exclusive bit for marking pages mapped only here
>
>
>  Documentation/vm/pagemap.txt |    3
>  fs/proc/task_mmu.c           |  267 ++++++++++++++++++------------------------
>  tools/vm/page-types.c        |   35 +++---
>  3 files changed, 137 insertions(+), 168 deletions(-)
>
> --
> Konstantin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
@ 2015-07-24 18:16       ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:16 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

(within the limits of my understanding of the mm code)
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:06 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:35PM +0300, Konstantin Khlebnikov wrote:
>> This patch moves permission checks from pagemap_read() into pagemap_open().
>>
>> Pointer to mm is saved in file->private_data. This reference pins only
>> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
@ 2015-07-24 18:16       ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:16 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Andrew Morton, Kirill A. Shutemov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

(within the limits of my understanding of the mm code)
Reviewed-by: Mark Williamson <mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org>

On Tue, Jul 21, 2015 at 9:06 AM, Naoya Horiguchi
<n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org> wrote:
> On Tue, Jul 14, 2015 at 06:37:35PM +0300, Konstantin Khlebnikov wrote:
>> This patch moves permission checks from pagemap_read() into pagemap_open().
>>
>> Pointer to mm is saved in file->private_data. This reference pins only
>> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 1/5] pagemap: check permissions and capabilities at open time
@ 2015-07-24 18:16       ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:16 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

(within the limits of my understanding of the mm code)
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:06 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:35PM +0300, Konstantin Khlebnikov wrote:
>> This patch moves permission checks from pagemap_read() into pagemap_open().
>>
>> Pointer to mm is saved in file->private_data. This reference pins only
>> mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Link: http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-24 18:17         ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:43 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
>> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>>> This patch moves pmd dissection out of reporting loop: huge pages
>>> are reported as bunch of normal pages with contiguous PFNs.
>>>
>>> Add missing "FILE" bit in hugetlb vmas.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>
>> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.
>
> That ifdef works most like documentation: "all thp magic happens here".
> I'd like to keep it, if two redundant lines isn't a big deal.
>
>>
>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-24 18:17         ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton,
	Kirill A. Shutemov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Reviewed-by: Mark Williamson <mwilliamson-/4lU09Eg6ahx67MzidHQgQC/G2K4zDHf@public.gmane.org>

On Tue, Jul 21, 2015 at 9:43 AM, Konstantin Khlebnikov <koct9i-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
> <n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org> wrote:
>> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>>> This patch moves pmd dissection out of reporting loop: huge pages
>>> are reported as bunch of normal pages with contiguous PFNs.
>>>
>>> Add missing "FILE" bit in hugetlb vmas.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>>
>> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.
>
> That ifdef works most like documentation: "all thp magic happens here".
> I'd like to keep it, if two redundant lines isn't a big deal.
>
>>
>> Reviewed-by: Naoya Horiguchi <n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a hrefmailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-24 18:17         ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:43 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
>> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>>> This patch moves pmd dissection out of reporting loop: huge pages
>>> are reported as bunch of normal pages with contiguous PFNs.
>>>
>>> Add missing "FILE" bit in hugetlb vmas.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>
>> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.
>
> That ifdef works most like documentation: "all thp magic happens here".
> I'd like to keep it, if two redundant lines isn't a big deal.
>
>>
>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
  2015-07-21  8:39       ` Konstantin Khlebnikov
@ 2015-07-24 18:18         ` Mark Williamson
  -1 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:18 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:39 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Jul 21, 2015 at 11:11 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
>> On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
>>> This patch makes pagemap readable for normal users and hides physical
>>> addresses from them. For some use-cases PFN isn't required at all.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
>>> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>>> ---
>>>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index 040721fa405a..3a5d338ea219 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -937,6 +937,7 @@ typedef struct {
>>>  struct pagemapread {
>>>       int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>>>       pagemap_entry_t *buffer;
>>> +     bool show_pfn;
>>>  };
>>>
>>>  #define PAGEMAP_WALK_SIZE    (PMD_SIZE)
>>> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>>>       struct page *page = NULL;
>>>
>>>       if (pte_present(pte)) {
>>> -             frame = pte_pfn(pte);
>>> +             if (pm->show_pfn)
>>> +                     frame = pte_pfn(pte);
>>>               flags |= PM_PRESENT;
>>>               page = vm_normal_page(vma, addr, pte);
>>>               if (pte_soft_dirty(pte))
>>
>> Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
>> (although I don't think that it can be exploited by row hammer attack ...)
>
> Yeah, but I see no reason for that.
> Probably except swap on ramdrive, but this too weird =)
>
>>
>> Thanks,
>> Naoya Horiguchi
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users
@ 2015-07-24 18:18         ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:18 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:39 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Jul 21, 2015 at 11:11 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
>> On Tue, Jul 14, 2015 at 06:37:47PM +0300, Konstantin Khlebnikov wrote:
>>> This patch makes pagemap readable for normal users and hides physical
>>> addresses from them. For some use-cases PFN isn't required at all.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
>>> Link: http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>>> ---
>>>  fs/proc/task_mmu.c |   25 ++++++++++++++-----------
>>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index 040721fa405a..3a5d338ea219 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -937,6 +937,7 @@ typedef struct {
>>>  struct pagemapread {
>>>       int pos, len;           /* units: PM_ENTRY_BYTES, not bytes */
>>>       pagemap_entry_t *buffer;
>>> +     bool show_pfn;
>>>  };
>>>
>>>  #define PAGEMAP_WALK_SIZE    (PMD_SIZE)
>>> @@ -1013,7 +1014,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>>>       struct page *page = NULL;
>>>
>>>       if (pte_present(pte)) {
>>> -             frame = pte_pfn(pte);
>>> +             if (pm->show_pfn)
>>> +                     frame = pte_pfn(pte);
>>>               flags |= PM_PRESENT;
>>>               page = vm_normal_page(vma, addr, pte);
>>>               if (pte_soft_dirty(pte))
>>
>> Don't you need the same if (pm->show_pfn) check in is_swap_pte path, too?
>> (although I don't think that it can be exploited by row hammer attack ...)
>
> Yeah, but I see no reason for that.
> Probably except swap on ramdrive, but this too weird =)
>
>>
>> Thanks,
>> Naoya Horiguchi
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
  2015-07-21  8:17     ` Naoya Horiguchi
@ 2015-07-24 18:18       ` Mark Williamson
  -1 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:18 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:17 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:49PM +0300, Konstantin Khlebnikov wrote:
>> This patch sets bit 56 in pagemap if this page is mapped only once.
>> It allows to detect exclusively used pages without exposing PFN:
>>
>> present file exclusive state
>> 0       0    0         non-present
>> 1       1    0         file page mapped somewhere else
>> 1       1    1         file page mapped only here
>> 1       0    0         anon non-CoWed page (shared with parent/child)
>> 1       0    1         anon CoWed page (or never forked)
>>
>> CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.
>>
>> MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
>> page could be mapped once but has several swap-ptes which point to it.
>> Application could detect that by swap bit in pagemap entry and touch
>> that pte via /proc/pid/mem to get real information.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Requested-by: Mark Williamson <mwilliamson@undo-software.com>
>> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here
@ 2015-07-24 18:18       ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:18 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>

On Tue, Jul 21, 2015 at 9:17 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, Jul 14, 2015 at 06:37:49PM +0300, Konstantin Khlebnikov wrote:
>> This patch sets bit 56 in pagemap if this page is mapped only once.
>> It allows to detect exclusively used pages without exposing PFN:
>>
>> present file exclusive state
>> 0       0    0         non-present
>> 1       1    0         file page mapped somewhere else
>> 1       1    1         file page mapped only here
>> 1       0    0         anon non-CoWed page (shared with parent/child)
>> 1       0    1         anon CoWed page (or never forked)
>>
>> CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.
>>
>> MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
>> page could be mapped once but has several swap-ptes which point to it.
>> Application could detect that by swap bit in pagemap entry and touch
>> that pte via /proc/pid/mem to get real information.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Requested-by: Mark Williamson <mwilliamson@undo-software.com>
>> Link: http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com
>
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
  2015-07-24 18:17         ` Mark Williamson
@ 2015-07-24 18:19           ` Mark Williamson
  -1 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:19 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

(my review on this patch comes with the caveat that the specifics of
hugetlb / thp are a bit outside my experience)

On Fri, Jul 24, 2015 at 7:17 PM, Mark Williamson
<mwilliamson@undo-software.com> wrote:
> Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
>
> On Tue, Jul 21, 2015 at 9:43 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
>> <n-horiguchi@ah.jp.nec.com> wrote:
>>> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>>>> This patch moves pmd dissection out of reporting loop: huge pages
>>>> are reported as bunch of normal pages with contiguous PFNs.
>>>>
>>>> Add missing "FILE" bit in hugetlb vmas.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>
>>> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.
>>
>> That ifdef works most like documentation: "all thp magic happens here".
>> I'd like to keep it, if two redundant lines isn't a big deal.
>>
>>>
>>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 3/5] pagemap: rework hugetlb and thp report
@ 2015-07-24 18:19           ` Mark Williamson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Williamson @ 2015-07-24 18:19 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Naoya Horiguchi, Konstantin Khlebnikov, linux-mm, Andrew Morton,
	Kirill A. Shutemov, linux-kernel, linux-api

(my review on this patch comes with the caveat that the specifics of
hugetlb / thp are a bit outside my experience)

On Fri, Jul 24, 2015 at 7:17 PM, Mark Williamson
<mwilliamson@undo-software.com> wrote:
> Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
>
> On Tue, Jul 21, 2015 at 9:43 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Tue, Jul 21, 2015 at 11:00 AM, Naoya Horiguchi
>> <n-horiguchi@ah.jp.nec.com> wrote:
>>> On Tue, Jul 14, 2015 at 06:37:39PM +0300, Konstantin Khlebnikov wrote:
>>>> This patch moves pmd dissection out of reporting loop: huge pages
>>>> are reported as bunch of normal pages with contiguous PFNs.
>>>>
>>>> Add missing "FILE" bit in hugetlb vmas.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>
>>> With reflecting Kirill's comment about #ifdef, I'm OK for this patch.
>>
>> That ifdef works most like documentation: "all thp magic happens here".
>> I'd like to keep it, if two redundant lines isn't a big deal.
>>
>>>
>>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2015-07-24 18:19 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 15:37 [PATCHSET v4 0/5] pagemap: make useable for non-privilege users Konstantin Khlebnikov
2015-07-14 15:37 ` Konstantin Khlebnikov
2015-07-14 15:37 ` [PATCH v4 1/5] pagemap: check permissions and capabilities at open time Konstantin Khlebnikov
2015-07-14 15:37   ` Konstantin Khlebnikov
2015-07-21  8:06   ` Naoya Horiguchi
2015-07-21  8:06     ` Naoya Horiguchi
2015-07-24 18:16     ` Mark Williamson
2015-07-24 18:16       ` Mark Williamson
2015-07-24 18:16       ` Mark Williamson
2015-07-14 15:37 ` [PATCH v4 2/5] pagemap: switch to the new format and do some cleanup Konstantin Khlebnikov
2015-07-14 15:37   ` Konstantin Khlebnikov
2015-07-21  7:44   ` Naoya Horiguchi
2015-07-21  7:44     ` Naoya Horiguchi
2015-07-14 15:37 ` [PATCH v4 3/5] pagemap: rework hugetlb and thp report Konstantin Khlebnikov
2015-07-14 15:37   ` Konstantin Khlebnikov
2015-07-19 11:10   ` Kirill A. Shutemov
2015-07-19 11:10     ` Kirill A. Shutemov
2015-07-21  8:00   ` Naoya Horiguchi
2015-07-21  8:00     ` Naoya Horiguchi
2015-07-21  8:43     ` Konstantin Khlebnikov
2015-07-21  8:43       ` Konstantin Khlebnikov
2015-07-24 18:17       ` Mark Williamson
2015-07-24 18:17         ` Mark Williamson
2015-07-24 18:17         ` Mark Williamson
2015-07-24 18:19         ` Mark Williamson
2015-07-24 18:19           ` Mark Williamson
2015-07-14 15:37 ` [PATCH v4 4/5] pagemap: hide physical addresses from non-privileged users Konstantin Khlebnikov
2015-07-14 15:37   ` Konstantin Khlebnikov
2015-07-21  8:11   ` Naoya Horiguchi
2015-07-21  8:11     ` Naoya Horiguchi
2015-07-21  8:39     ` Konstantin Khlebnikov
2015-07-21  8:39       ` Konstantin Khlebnikov
2015-07-24 18:18       ` Mark Williamson
2015-07-24 18:18         ` Mark Williamson
2015-07-14 15:37 ` [PATCH v4 5/5] pagemap: add mmap-exclusive bit for marking pages mapped only here Konstantin Khlebnikov
2015-07-14 15:37   ` Konstantin Khlebnikov
2015-07-21  8:17   ` Naoya Horiguchi
2015-07-21  8:17     ` Naoya Horiguchi
2015-07-21  8:17     ` Naoya Horiguchi
2015-07-24 18:18     ` Mark Williamson
2015-07-24 18:18       ` Mark Williamson
2015-07-14 18:52 ` [PATCHSET v4 0/5] pagemap: make useable for non-privilege users Andrew Morton
2015-07-14 18:52   ` Andrew Morton
2015-07-14 18:52   ` Andrew Morton
2015-07-14 20:15   ` Konstantin Khlebnikov
2015-07-14 20:15     ` Konstantin Khlebnikov
2015-07-14 20:15     ` Konstantin Khlebnikov
2015-07-16 18:47 ` [PATCH] pagemap: update documentation Konstantin Khlebnikov
2015-07-16 18:47   ` Konstantin Khlebnikov
2015-07-16 18:47   ` Konstantin Khlebnikov
2015-07-21  8:35   ` Naoya Horiguchi
2015-07-21  8:35     ` Naoya Horiguchi
2015-07-24 17:34 ` [PATCHSET v4 0/5] pagemap: make useable for non-privilege users Mark Williamson
2015-07-24 17:34   ` Mark Williamson
2015-07-24 17:34   ` Mark Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.