linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jann Horn <jannh@google.com>, Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Jiri Kosina <jikos@kernel.org>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Dave Chinner <david@fromorbit.com>,
	Kevin Easton <kevin@guarana.org>,
	Matthew Wilcox <willy@infradead.org>,
	Cyril Hrubis <chrubis@suse.cz>, Tejun Heo <tj@kernel.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Daniel Gruss <daniel@gruss.cc>, Jiri Kosina <jkosina@suse.cz>,
	Josh Snyder <joshs@netflix.com>, Michal Hocko <mhocko@suse.com>
Subject: [PATCH v2 2/2] mm/mincore: provide mapped status when cached status is not allowed
Date: Tue, 12 Mar 2019 15:17:08 +0100	[thread overview]
Message-ID: <20190312141708.6652-3-vbabka@suse.cz> (raw)
In-Reply-To: <20190312141708.6652-1-vbabka@suse.cz>

After "mm/mincore: make mincore() more conservative" we sometimes restrict the
information about page cache residency, which needs to be done without breaking
existing userspace, as much as possible. Instead of returning with error, we
thus fake the results. For that we return residency values as 1, which should
be safer than faking them as 0, as there might theoretically exist code that
would try to fault in the page(s) in a loop until mincore() returns 1 for them.

Faking 1 however means that such code would not fault in a page even if it was
not truly in page cache, with possibly unwanted performance implications. We
can improve the situation by revisting the approach of 574823bfab82 ("Change
mincore() to count "mapped" pages rather than "cached" pages"), later reverted
by 30bac164aca7 and replaced by restricting/faking the results. In this patch
we apply the approach only to cases where page cache residency check is
restricted. Thus mincore() will return 0 for an unmapped page (which may or may
not be resident in a pagecache), and 1 after the process faults it in.

One potential downside is that mincore() users will be now able to recognize
when a previously mapped page was reclaimed. While that might be useful for
some attack scenarios, it is not as crucial as recognizing that somebody else
faulted the page in, which is the main reason we are making mincore() more
conservative. For detecting that pages being reclaimed, there are also other
existing ways anyway.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mincore.c | 67 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 16 deletions(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index c3f058bd0faf..c9a265abc631 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -21,12 +21,23 @@
 #include <linux/uaccess.h>
 #include <asm/pgtable.h>
 
+/*
+ * mincore() page walk's private structure. Contains pointer to the array
+ * of return values to be set, and whether the current vma passed the
+ * can_do_mincore() check.
+ */
+struct mincore_walk_private {
+	unsigned char *vec;
+	bool can_check_pagecache;
+};
+
 static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
 			unsigned long end, struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
 	unsigned char present;
-	unsigned char *vec = walk->private;
+	struct mincore_walk_private *walk_private = walk->private;
+	unsigned char *vec = walk_private->vec;
 
 	/*
 	 * Hugepages under user process are always in RAM and never
@@ -35,7 +46,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
 	present = pte && !huge_pte_none(huge_ptep_get(pte));
 	for (; addr != end; vec++, addr += PAGE_SIZE)
 		*vec = present;
-	walk->private = vec;
+	walk_private->vec = vec;
 #else
 	BUG();
 #endif
@@ -85,7 +96,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 }
 
 static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
-				struct vm_area_struct *vma, unsigned char *vec)
+				struct vm_area_struct *vma, unsigned char *vec,
+				bool can_check_pagecache)
 {
 	unsigned long nr = (end - addr) >> PAGE_SHIFT;
 	int i;
@@ -95,7 +107,14 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
 
 		pgoff = linear_page_index(vma, addr);
 		for (i = 0; i < nr; i++, pgoff++)
-			vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff);
+			/*
+			 * Return page cache residency state if we are allowed
+			 * to, otherwise return mapping state, which is 0 for
+			 * an unmapped range.
+			 */
+			vec[i] = can_check_pagecache ?
+				 mincore_page(vma->vm_file->f_mapping, pgoff)
+				 : 0;
 	} else {
 		for (i = 0; i < nr; i++)
 			vec[i] = 0;
@@ -106,8 +125,11 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
 static int mincore_unmapped_range(unsigned long addr, unsigned long end,
 				   struct mm_walk *walk)
 {
-	walk->private += __mincore_unmapped_range(addr, end,
-						  walk->vma, walk->private);
+	struct mincore_walk_private *walk_private = walk->private;
+	unsigned char *vec = walk_private->vec;
+
+	walk_private->vec += __mincore_unmapped_range(addr, end, walk->vma,
+				vec, walk_private->can_check_pagecache);
 	return 0;
 }
 
@@ -117,7 +139,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	spinlock_t *ptl;
 	struct vm_area_struct *vma = walk->vma;
 	pte_t *ptep;
-	unsigned char *vec = walk->private;
+	struct mincore_walk_private *walk_private = walk->private;
+	unsigned char *vec = walk_private->vec;
 	int nr = (end - addr) >> PAGE_SHIFT;
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
@@ -128,7 +151,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	}
 
 	if (pmd_trans_unstable(pmd)) {
-		__mincore_unmapped_range(addr, end, vma, vec);
+		__mincore_unmapped_range(addr, end, vma, vec,
+					walk_private->can_check_pagecache);
 		goto out;
 	}
 
@@ -138,7 +162,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 		if (pte_none(pte))
 			__mincore_unmapped_range(addr, addr + PAGE_SIZE,
-						 vma, vec);
+				 vma, vec, walk_private->can_check_pagecache);
 		else if (pte_present(pte))
 			*vec = 1;
 		else { /* pte is a swap entry */
@@ -152,8 +176,20 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 				*vec = 1;
 			} else {
 #ifdef CONFIG_SWAP
-				*vec = mincore_page(swap_address_space(entry),
+				/*
+				 * If tmpfs pages are being swapped out, treat
+				 * it with same restrictions on mincore() as
+				 * the page cache so we don't expose that
+				 * somebody else brought them back from swap.
+				 * In the restricted case return 0 as swap
+				 * entry means the page is not mapped.
+				 */
+				if (walk_private->can_check_pagecache)
+					*vec = mincore_page(
+						    swap_address_space(entry),
 						    swp_offset(entry));
+				else
+					*vec = 0;
 #else
 				WARN_ON(1);
 				*vec = 1;
@@ -195,22 +231,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
 	struct vm_area_struct *vma;
 	unsigned long end;
 	int err;
+	struct mincore_walk_private walk_private = {
+		.vec = vec
+	};
 	struct mm_walk mincore_walk = {
 		.pmd_entry = mincore_pte_range,
 		.pte_hole = mincore_unmapped_range,
 		.hugetlb_entry = mincore_hugetlb,
-		.private = vec,
+		.private = &walk_private
 	};
 
 	vma = find_vma(current->mm, addr);
 	if (!vma || addr < vma->vm_start)
 		return -ENOMEM;
 	end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
-	if (!can_do_mincore(vma)) {
-		unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE);
-		memset(vec, 1, pages);
-		return pages;
-	}
+	walk_private.can_check_pagecache = can_do_mincore(vma);
 	mincore_walk.mm = vma->vm_mm;
 	err = walk_page_range(addr, end, &mincore_walk);
 	if (err < 0)
-- 
2.20.1


      parent reply	other threads:[~2019-03-12 14:17 UTC|newest]

Thread overview: 215+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05 17:27 [PATCH] mm/mincore: allow for making sys_mincore() privileged Jiri Kosina
2019-01-05 17:27 ` Jiri Kosina
2019-01-05 19:14 ` Vlastimil Babka
2019-01-05 19:24   ` Jiri Kosina
2019-01-05 19:24     ` Jiri Kosina
2019-01-05 19:38     ` Vlastimil Babka
2019-01-08  9:14       ` Bernd Petrovitsch
2019-01-08 11:37         ` Jiri Kosina
2019-01-08 11:37           ` Jiri Kosina
2019-01-08 13:53           ` Bernd Petrovitsch
2019-01-08 14:08             ` Kirill A. Shutemov
2019-01-05 19:44 ` kbuild test robot
2019-01-05 19:46 ` Linus Torvalds
2019-01-05 19:46   ` Linus Torvalds
2019-01-05 20:12   ` Jiri Kosina
2019-01-05 20:12     ` Jiri Kosina
2019-01-05 20:17     ` Linus Torvalds
2019-01-05 20:17       ` Linus Torvalds
2019-01-05 20:43       ` Jiri Kosina
2019-01-05 20:43         ` Jiri Kosina
2019-01-05 21:54         ` Linus Torvalds
2019-01-05 21:54           ` Linus Torvalds
2019-01-06 11:33           ` Kevin Easton
2019-01-08  8:50           ` Kevin Easton
2019-01-18 14:23           ` Tejun Heo
2019-01-05 20:13   ` Linus Torvalds
2019-01-05 20:13     ` Linus Torvalds
2019-01-05 19:56 ` kbuild test robot
2019-01-05 22:54 ` Jann Horn
2019-01-05 22:54   ` Jann Horn
2019-01-05 23:05   ` Linus Torvalds
2019-01-05 23:05     ` Linus Torvalds
2019-01-05 23:16     ` Linus Torvalds
2019-01-05 23:16       ` Linus Torvalds
2019-01-05 23:28       ` Linus Torvalds
2019-01-05 23:28         ` Linus Torvalds
2019-01-05 23:39       ` Linus Torvalds
2019-01-05 23:39         ` Linus Torvalds
2019-01-06  0:11         ` Matthew Wilcox
2019-01-06  0:22           ` Linus Torvalds
2019-01-06  0:22             ` Linus Torvalds
2019-01-06  1:50             ` Linus Torvalds
2019-01-06  1:50               ` Linus Torvalds
2019-01-06 21:46               ` Linus Torvalds
2019-01-06 21:46                 ` Linus Torvalds
2019-01-08  4:43                 ` Dave Chinner
2019-01-08 17:57                   ` Linus Torvalds
2019-01-08 17:57                     ` Linus Torvalds
2019-01-09  2:24                     ` Dave Chinner
2019-01-09  2:31                       ` Jiri Kosina
2019-01-09  2:31                         ` Jiri Kosina
2019-01-09  4:39                         ` Dave Chinner
2019-01-09 10:08                           ` Jiri Kosina
2019-01-09 10:08                             ` Jiri Kosina
2019-01-10  1:15                             ` Dave Chinner
2019-01-10  7:54                               ` Jiri Kosina
2019-01-10  7:54                                 ` Jiri Kosina
2019-01-09 18:25                           ` Linus Torvalds
2019-01-09 18:25                             ` Linus Torvalds
2019-01-10  0:44                             ` Dave Chinner
2019-01-10  1:18                               ` Linus Torvalds
2019-01-10  1:18                                 ` Linus Torvalds
2019-01-10  5:26                                 ` Andy Lutomirski
2019-01-10  5:26                                   ` Andy Lutomirski
2019-01-10 14:47                                   ` Matthew Wilcox
2019-01-10 21:44                                     ` Dave Chinner
2019-01-10 21:59                                       ` Linus Torvalds
2019-01-10 21:59                                         ` Linus Torvalds
2019-01-11  1:47                                   ` Dave Chinner
2019-01-10  7:03                                 ` Dave Chinner
2019-01-10 11:47                                   ` Linus Torvalds
2019-01-10 11:47                                     ` Linus Torvalds
2019-01-10 12:24                                     ` Dominique Martinet
2019-01-10 12:24                                       ` Dominique Martinet
2019-01-10 22:11                                       ` Linus Torvalds
2019-01-10 22:11                                         ` Linus Torvalds
2019-01-11  2:03                                         ` Dave Chinner
2019-01-11  2:18                                           ` Linus Torvalds
2019-01-11  2:18                                             ` Linus Torvalds
2019-01-11  4:04                                             ` Dave Chinner
2019-01-11  4:08                                               ` Andy Lutomirski
2019-01-11  7:20                                                 ` Dave Chinner
2019-01-11  7:08                                               ` Linus Torvalds
2019-01-11  7:08                                                 ` Linus Torvalds
2019-01-11  7:36                                                 ` Dave Chinner
2019-01-11 16:26                                                   ` Linus Torvalds
2019-01-11 16:26                                                     ` Linus Torvalds
2019-01-15 23:45                                                     ` Dave Chinner
2019-01-16  4:54                                                       ` Linus Torvalds
2019-01-16  4:54                                                         ` Linus Torvalds
2019-01-16  5:49                                                         ` Linus Torvalds
2019-01-16  5:49                                                           ` Linus Torvalds
2019-01-17  1:26                                                         ` Dave Chinner
2019-02-20 15:49                                                     ` Nicolai Stange
2019-01-11  4:57                                         ` Dominique Martinet
2019-01-11  7:11                                           ` Linus Torvalds
2019-01-11  7:11                                             ` Linus Torvalds
2019-01-11  7:32                                             ` Dominique Martinet
2019-01-16  0:42                                         ` Josh Snyder
2019-01-16  5:00                                           ` Linus Torvalds
2019-01-16  5:00                                             ` Linus Torvalds
2019-01-16  5:25                                             ` Andy Lutomirski
2019-01-16  5:34                                               ` Linus Torvalds
2019-01-16  5:34                                                 ` Linus Torvalds
2019-01-16  5:46                                                 ` Dominique Martinet
2019-01-16  5:58                                                   ` Linus Torvalds
2019-01-16  5:58                                                     ` Linus Torvalds
2019-01-16  6:34                                                     ` Dominique Martinet
2019-01-16  7:52                                                       ` Josh Snyder
2019-01-16  7:52                                                         ` Josh Snyder
2019-01-16 12:18                                                         ` Kevin Easton
2019-01-17 21:45                                                         ` Vlastimil Babka
2019-01-18  4:49                                                           ` Linus Torvalds
2019-01-18  4:49                                                             ` Linus Torvalds
2019-01-18 18:58                                                             ` Vlastimil Babka
2019-01-16 16:12                                                     ` Jiri Kosina
2019-01-16 16:12                                                       ` Jiri Kosina
2019-01-16 17:48                                                       ` Linus Torvalds
2019-01-16 17:48                                                         ` Linus Torvalds
2019-01-16 20:23                                                         ` Jiri Kosina
2019-01-16 20:23                                                           ` Jiri Kosina
2019-01-16 21:37                                                           ` Matthew Wilcox
2019-01-16 21:41                                                             ` Jiri Kosina
2019-01-16 21:41                                                               ` Jiri Kosina
2019-01-17  9:52                                                               ` Cyril Hrubis
2019-01-28 13:49                                                               ` Cyril Hrubis
2019-01-17  4:51                                                             ` Linus Torvalds
2019-01-17  4:51                                                               ` Linus Torvalds
2019-01-18  4:54                                                               ` Linus Torvalds
2019-01-18  4:54                                                                 ` Linus Torvalds
2019-01-17  1:49                                                           ` Dominique Martinet
2019-01-23 20:27                                                           ` Linus Torvalds
2019-01-23 20:27                                                             ` Linus Torvalds
2019-01-23 20:35                                                             ` Linus Torvalds
2019-01-23 20:35                                                               ` Linus Torvalds
2019-01-23 23:12                                                               ` Jiri Kosina
2019-01-23 23:12                                                                 ` Jiri Kosina
2019-01-24  0:20                                                                 ` Linus Torvalds
2019-01-24  0:20                                                                   ` Linus Torvalds
2019-01-24  0:24                                                             ` Dominique Martinet
2019-01-24 12:45                                                               ` Dominique Martinet
2019-01-24 14:25                                                                 ` Jiri Kosina
2019-01-24 14:25                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                     ` Jiri Kosina
2019-01-28  0:05                                                                     ` Dominique Martinet
2019-01-29 23:52                                                                       ` Jiri Kosina
2019-01-30  9:09                                                                         ` Michal Hocko
2019-01-30 12:29                                                                           ` Jiri Kosina
2019-01-16 12:36                                             ` Matthew Wilcox
2019-01-10 14:50                               ` Matthew Wilcox
2019-01-11  7:36                               ` Jiri Kosina
2019-01-11  7:36                                 ` Jiri Kosina
2019-01-17  2:22                                 ` Dave Chinner
2019-01-17  8:18                                   ` Jiri Kosina
2019-01-17  8:18                                     ` Jiri Kosina
2019-01-17 21:06                                     ` Dave Chinner
2019-01-07  4:32             ` Dominique Martinet
2019-01-07 10:33               ` Vlastimil Babka
2019-01-07 11:08                 ` Dominique Martinet
2019-01-07 11:59                   ` Vlastimil Babka
2019-01-07 13:29                   ` Daniel Gruss
2019-01-07 10:10         ` Michael Ellerman
2019-01-05 23:09   ` Jiri Kosina
2019-01-05 23:09     ` Jiri Kosina
2019-01-30 12:44 ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Vlastimil Babka
2019-01-30 12:44   ` [PATCH 1/3] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-01-31  9:43     ` Michal Hocko
2019-01-31  9:51       ` Dominique Martinet
2019-01-31 17:46       ` Josh Snyder
2019-02-01  8:56     ` Vlastimil Babka
2019-03-06 23:13     ` Andrew Morton
2019-03-07  0:01       ` Jiri Kosina
2019-03-07  0:40         ` Dominique Martinet
2019-03-07  5:46           ` Jiri Kosina
2019-01-30 12:44   ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Vlastimil Babka
2019-01-30 15:04     ` Florian Weimer
2019-01-30 15:15       ` Jiri Kosina
2019-01-31 10:47         ` Florian Weimer
2019-01-31 11:34           ` Jiri Kosina
2019-01-31  9:56     ` Michal Hocko
2019-01-31 10:15       ` Jiri Kosina
2019-01-31 10:23         ` Michal Hocko
2019-01-31 10:30           ` Jiri Kosina
2019-01-31 11:32             ` Michal Hocko
2019-01-31 17:54           ` Linus Torvalds
2019-02-01  5:13             ` Dave Chinner
2019-02-01  7:05               ` Linus Torvalds
2019-02-01  7:21                 ` Linus Torvalds
2019-02-01  1:44       ` Dave Chinner
2019-02-12 15:48         ` Jiri Kosina
2019-01-31 12:04     ` Daniel Gruss
2019-01-31 12:06       ` Vlastimil Babka
2019-01-31 12:08       ` Jiri Kosina
2019-01-31 12:57         ` Daniel Gruss
2019-01-30 12:44   ` [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka
2019-01-31 10:09     ` Michal Hocko
2019-02-01  9:04       ` Vlastimil Babka
2019-02-01  9:11         ` Michal Hocko
2019-02-01  9:27           ` Vlastimil Babka
2019-02-06 20:14             ` Jiri Kosina
2019-02-12  3:44         ` Jiri Kosina
2019-02-12  6:36           ` Michal Hocko
2019-02-12 13:09             ` Jiri Kosina
2019-02-12 14:01               ` Michal Hocko
2019-03-06 12:11   ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Jiri Kosina
2019-03-06 22:35     ` Andrew Morton
2019-03-06 22:48       ` Jiri Kosina
2019-03-06 23:23         ` Andrew Morton
2019-03-06 23:32           ` Dominique Martinet
2019-03-06 23:38             ` Andrew Morton
2019-03-09 16:53               ` Linus Torvalds
2019-03-12 14:17   ` [PATCH v2 0/2] prevent mincore() page cache leaks Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 1/2] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-03-12 14:17     ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190312141708.6652-3-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=asmadeus@codewreck.org \
    --cc=chrubis@suse.cz \
    --cc=daniel@gruss.cc \
    --cc=david@fromorbit.com \
    --cc=jannh@google.com \
    --cc=jikos@kernel.org \
    --cc=jkosina@suse.cz \
    --cc=joshs@netflix.com \
    --cc=kevin@guarana.org \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).