linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] special_mapping_fault() is broken
@ 2015-07-10 16:51 Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 1/3] mm: introduce vma_is_anonymous(vma) helper Oleg Nesterov
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-07-10 16:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

Hello,

special_mapping_fault() is absolutely broken. It seems it was always
wrong, but this didn't matter until vdso/vvar started to use more than
one page.

The patches are the same, just 1/3 was re-diffed on top of the recent
6b7339f4c31ad "mm: avoid setting up anonymous pages into file mapping"
from Kirill.

And after this change vma_is_anonymous() becomes really trivial, it
simply checks vm_ops == NULL. However, I do think the helper makes
sense. There are a lot of ->vm_ops != NULL checks, the helper makes
the caller's code more understandable (self-documented) and this is
more grep-friendly.

Oleg.

 include/linux/mm.h |    5 +++++
 mm/memory.c        |    8 ++++----
 mm/mmap.c          |   14 +++-----------
 3 files changed, 12 insertions(+), 15 deletions(-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] mm: introduce vma_is_anonymous(vma) helper
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
@ 2015-07-10 16:51 ` Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 2/3] mmap: fix the usage of ->vm_pgoff in special_mapping paths Oleg Nesterov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-07-10 16:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

Preparation. Add the new simple helper, vma_is_anonymous(vma), and
change handle_pte_fault() to use it. It will have more users.

The name is not accurate, say a hpet_mmap()'ed vma is not anonymous.
Perhaps it should be named vma_has_fault() instead. But it matches
the logic in mmap.c/memory.c (see next changes). "True" just means
that a page fault will use do_anonymous_page().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/mm.h |    5 +++++
 mm/memory.c        |    8 ++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0755b9f..0207ffa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1225,6 +1225,11 @@ static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr)
 	return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
 }
 
+static inline bool vma_is_anonymous(struct vm_area_struct *vma)
+{
+	return !vma->vm_ops;
+}
+
 static inline int stack_guard_page_start(struct vm_area_struct *vma,
 					     unsigned long addr)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 2a9e098..87c9285 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3249,12 +3249,12 @@ static int handle_pte_fault(struct mm_struct *mm,
 	barrier();
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
-			if (vma->vm_ops)
+			if (vma_is_anonymous(vma))
+				return do_anonymous_page(mm, vma, address,
+							 pte, pmd, flags);
+			else
 				return do_fault(mm, vma, address, pte, pmd,
 						flags, entry);
-
-			return do_anonymous_page(mm, vma, address, pte, pmd,
-					flags);
 		}
 		return do_swap_page(mm, vma, address,
 					pte, pmd, flags, entry);
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] mmap: fix the usage of ->vm_pgoff in special_mapping paths
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 1/3] mm: introduce vma_is_anonymous(vma) helper Oleg Nesterov
@ 2015-07-10 16:51 ` Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 3/3] mremap: fix the wrong !vma->vm_file check in copy_vma() Oleg Nesterov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-07-10 16:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

Test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <stdlib.h>
	#include <string.h>
	#include <sys/mman.h>
	#include <assert.h>

	void *find_vdso_vaddr(void)
	{
		FILE *perl;
		char buf[32] = {};

		perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
				"/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
		fread(buf, sizeof(buf), 1, perl);
		fclose(perl);

		return (void *)atol(buf);
	}

	#define PAGE_SIZE	4096

	int main(void)
	{
		void *vdso = find_vdso_vaddr();
		assert(vdso);

		// of course they should differ, and they do so far
		printf("vdso pages differ: %d\n",
			!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));

		// split into 2 vma's
		assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0);

		// force another fault on the next check
		assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);

		// now they no longer differ, the 2nd vm_pgoff is wrong
		printf("vdso pages differ: %d\n",
			!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));

		return 0;
	}

Output:

	vdso pages differ: 1
	vdso pages differ: 0

This is because split_vma() correctly updates ->vm_pgoff, but the logic
in insert_vm_struct() and special_mapping_fault() is absolutely broken,
so the fault at vdso + PAGE_SIZE return the 1st page. The same happens
if you simply unmap the 1st page.

special_mapping_fault() does:

	pgoff = vmf->pgoff - vma->vm_pgoff;

and this is _only_ correct if vma->vm_start mmaps the first page from
->vm_private_data array.

vdso or any other user of install_special_mapping() is not anonymous,
it has the "backing storage" even if it is just the array of pages.
So we actually need to make vm_pgoff work as an offset in this array.

Note: this also allows to fix another problem: currently gdb can't access
"[vvar]" memory because in this case special_mapping_fault() doesn't work.
Now that we can use ->vm_pgoff we can implement ->access() and fix this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/mmap.c |   12 ++----------
 1 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index bb50cac..992417f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2871,7 +2871,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	 * using the existing file pgoff checks and manipulations.
 	 * Similarly in do_mmap_pgoff and in do_brk.
 	 */
-	if (!vma->vm_file) {
+	if (vma_is_anonymous(vma)) {
 		BUG_ON(vma->anon_vma);
 		vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
 	}
@@ -3013,21 +3013,13 @@ static int special_mapping_fault(struct vm_area_struct *vma,
 	pgoff_t pgoff;
 	struct page **pages;
 
-	/*
-	 * special mappings have no vm_file, and in that case, the mm
-	 * uses vm_pgoff internally. So we have to subtract it from here.
-	 * We are allowed to do this because we are the mm; do not copy
-	 * this code into drivers!
-	 */
-	pgoff = vmf->pgoff - vma->vm_pgoff;
-
 	if (vma->vm_ops == &legacy_special_mapping_vmops)
 		pages = vma->vm_private_data;
 	else
 		pages = ((struct vm_special_mapping *)vma->vm_private_data)->
 			pages;
 
-	for (; pgoff && *pages; ++pages)
+	for (pgoff = vmf->pgoff; pgoff && *pages; ++pages)
 		pgoff--;
 
 	if (*pages) {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] mremap: fix the wrong !vma->vm_file check in copy_vma()
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 1/3] mm: introduce vma_is_anonymous(vma) helper Oleg Nesterov
  2015-07-10 16:51 ` [PATCH v2 2/3] mmap: fix the usage of ->vm_pgoff in special_mapping paths Oleg Nesterov
@ 2015-07-10 16:51 ` Oleg Nesterov
  2015-07-10 17:08 ` [PATCH v2 0/3] special_mapping_fault() is broken Kirill A. Shutemov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-07-10 16:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

Test-case:

	#define _GNU_SOURCE
	#include <stdio.h>
	#include <unistd.h>
	#include <stdlib.h>
	#include <string.h>
	#include <sys/mman.h>
	#include <assert.h>

	void *find_vdso_vaddr(void)
	{
		FILE *perl;
		char buf[32] = {};

		perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
				"/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
		fread(buf, sizeof(buf), 1, perl);
		fclose(perl);

		return (void *)atol(buf);
	}

	#define PAGE_SIZE	4096

	void *get_unmapped_area(void)
	{
		void *p = mmap(0, PAGE_SIZE, PROT_NONE,
				MAP_PRIVATE|MAP_ANONYMOUS, -1,0);
		assert(p != MAP_FAILED);
		munmap(p, PAGE_SIZE);
		return p;
	}

	char save[2][PAGE_SIZE];

	int main(void)
	{
		void *vdso = find_vdso_vaddr();
		void *page[2];

		assert(vdso);
		memcpy(save, vdso, sizeof (save));
		// force another fault on the next check
		assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);

		page[0] = mremap(vdso,
				PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
				get_unmapped_area());
		page[1] = mremap(vdso + PAGE_SIZE,
				PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
				get_unmapped_area());

		assert(page[0] != MAP_FAILED && page[1] != MAP_FAILED);
		printf("match: %d %d\n",
			!memcmp(save[0], page[0], PAGE_SIZE),
			!memcmp(save[1], page[1], PAGE_SIZE));

		return 0;
	}

fails without this patch. Before the previous commit it gets the wrong
page, now it segfaults (which is imho better).

This is because copy_vma() wrongly assumes that if vma->vm_file == NULL
is irrelevant until the first fault which will use do_anonymous_page().
This is obviously wrong for the special mapping.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/mmap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 992417f..2185cd9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2905,7 +2905,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	 * If anonymous vma has not yet been faulted, update new pgoff
 	 * to match new location, to increase its chance of merging.
 	 */
-	if (unlikely(!vma->vm_file && !vma->anon_vma)) {
+	if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) {
 		pgoff = addr >> PAGE_SHIFT;
 		faulted_in_anon_vma = false;
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] special_mapping_fault() is broken
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
                   ` (2 preceding siblings ...)
  2015-07-10 16:51 ` [PATCH v2 3/3] mremap: fix the wrong !vma->vm_file check in copy_vma() Oleg Nesterov
@ 2015-07-10 17:08 ` Kirill A. Shutemov
  2015-07-10 18:20 ` Davidlohr Bueso
  2015-07-10 21:52 ` Andrew Morton
  5 siblings, 0 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2015-07-10 17:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andy Lutomirski, Hugh Dickins, Kirill Shutemov,
	Linus Torvalds, Pavel Emelyanov, linux-kernel

On Fri, Jul 10, 2015 at 06:51:21PM +0200, Oleg Nesterov wrote:
> Hello,
> 
> special_mapping_fault() is absolutely broken. It seems it was always
> wrong, but this didn't matter until vdso/vvar started to use more than
> one page.
> 
> The patches are the same, just 1/3 was re-diffed on top of the recent
> 6b7339f4c31ad "mm: avoid setting up anonymous pages into file mapping"
> from Kirill.
> 
> And after this change vma_is_anonymous() becomes really trivial, it
> simply checks vm_ops == NULL. However, I do think the helper makes
> sense. There are a lot of ->vm_ops != NULL checks, the helper makes
> the caller's code more understandable (self-documented) and this is
> more grep-friendly.

For all three:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] special_mapping_fault() is broken
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
                   ` (3 preceding siblings ...)
  2015-07-10 17:08 ` [PATCH v2 0/3] special_mapping_fault() is broken Kirill A. Shutemov
@ 2015-07-10 18:20 ` Davidlohr Bueso
  2015-07-10 21:52 ` Andrew Morton
  5 siblings, 0 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2015-07-10 18:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andy Lutomirski, Hugh Dickins, Kirill Shutemov,
	Linus Torvalds, Pavel Emelyanov, linux-kernel

On Fri, 2015-07-10 at 18:51 +0200, Oleg Nesterov wrote:
> And after this change vma_is_anonymous() becomes really trivial, it
> simply checks vm_ops == NULL. However, I do think the helper makes
> sense. There are a lot of ->vm_ops != NULL checks, the helper makes
> the caller's code more understandable (self-documented) and this is
> more grep-friendly.

Yes, I for one, very much welcome this helper.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] special_mapping_fault() is broken
  2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
                   ` (4 preceding siblings ...)
  2015-07-10 18:20 ` Davidlohr Bueso
@ 2015-07-10 21:52 ` Andrew Morton
  2015-07-11 23:43   ` Oleg Nesterov
  5 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2015-07-10 21:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

On Fri, 10 Jul 2015 18:51:21 +0200 Oleg Nesterov <oleg@redhat.com> wrote:

> special_mapping_fault() is absolutely broken. It seems it was always
> wrong, but this didn't matter until vdso/vvar started to use more than
> one page.
> 
> The patches are the same, just 1/3 was re-diffed on top of the recent
> 6b7339f4c31ad "mm: avoid setting up anonymous pages into file mapping"
> from Kirill.
> 
> And after this change vma_is_anonymous() becomes really trivial, it
> simply checks vm_ops == NULL. However, I do think the helper makes
> sense. There are a lot of ->vm_ops != NULL checks, the helper makes
> the caller's code more understandable (self-documented) and this is
> more grep-friendly.

I'm trying to work out which kernel version(s) this should go into,
without a lot of success.

What do we think the worst-case effects of the bug?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] special_mapping_fault() is broken
  2015-07-10 21:52 ` Andrew Morton
@ 2015-07-11 23:43   ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-07-11 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Hugh Dickins, Kirill Shutemov, Linus Torvalds,
	Pavel Emelyanov, linux-kernel

On 07/10, Andrew Morton wrote:
>
> On Fri, 10 Jul 2015 18:51:21 +0200 Oleg Nesterov <oleg@redhat.com> wrote:
>
> > special_mapping_fault() is absolutely broken. It seems it was always
> > wrong, but this didn't matter until vdso/vvar started to use more than
> > one page.
> >
> > The patches are the same, just 1/3 was re-diffed on top of the recent
> > 6b7339f4c31ad "mm: avoid setting up anonymous pages into file mapping"
> > from Kirill.
> >
> > And after this change vma_is_anonymous() becomes really trivial, it
> > simply checks vm_ops == NULL. However, I do think the helper makes
> > sense. There are a lot of ->vm_ops != NULL checks, the helper makes
> > the caller's code more understandable (self-documented) and this is
> > more grep-friendly.
>
> I'm trying to work out which kernel version(s) this should go into,
> without a lot of success.
>
> What do we think the worst-case effects of the bug?

Ah, I should have mentioned this. And when I re-read my messages I see
that "absolutely broken" looks like "should be fixed asap". Sorry for
confusion.

No, this bug is not serious. Nothing bad can happen from the kernel
perspective. And I doubt that some application will ever unmap/remap
the part of vdso or any other install_special_mapping() user. So this
is just correctness fix. In fact, to me the main problem is that I
was totally confused when I tried to read/understand this code ;)

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-11 23:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 16:51 [PATCH v2 0/3] special_mapping_fault() is broken Oleg Nesterov
2015-07-10 16:51 ` [PATCH v2 1/3] mm: introduce vma_is_anonymous(vma) helper Oleg Nesterov
2015-07-10 16:51 ` [PATCH v2 2/3] mmap: fix the usage of ->vm_pgoff in special_mapping paths Oleg Nesterov
2015-07-10 16:51 ` [PATCH v2 3/3] mremap: fix the wrong !vma->vm_file check in copy_vma() Oleg Nesterov
2015-07-10 17:08 ` [PATCH v2 0/3] special_mapping_fault() is broken Kirill A. Shutemov
2015-07-10 18:20 ` Davidlohr Bueso
2015-07-10 21:52 ` Andrew Morton
2015-07-11 23:43   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).