linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: blaisorblade@yahoo.it
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: [patch 10/14] remap_file_pages protection support: fix get_user_pages() on VM_MANYPROTS vmas
Date: Sun, 30 Apr 2006 19:30:03 +0200	[thread overview]
Message-ID: <20060430173025.365179000@zion.home.lan> (raw)
In-Reply-To: 20060430172953.409399000@zion.home.lan

[-- Attachment #1: rfp/10_2-rfp-fix-get_user_pages-revamp.diff --]
[-- Type: text/plain, Size: 7606 bytes --]

From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>

*Untested patch* - I've not written a test case to verify functionality of
ptrace on VM_MANYPROTS area.

get_user_pages may well call __handle_mm_fault() wanting to override protections...
so in this case __handle_mm_fault() should still avoid checking VM access rights.

Also, get_user_pages() may give write faults on present readonly PTEs in
VM_MANYPROTS areas (think of PTRACE_POKETEXT), so we must still do do_wp_page
even on VM_MANYPROTS areas.

So, possibly use VM_MAYWRITE and/or VM_MAYREAD in the access_mask and check
VM_MANYPROTS in maybe_mkwrite_file (new variant of maybe_mkwrite).

API Note: there are many flags parameter which can be constructed but which
don't make any sense, but the code very freely interprets them too.
For instance VM_MAYREAD|VM_WRITE is interpreted as VM_MAYWRITE|VM_WRITE.

XXX: Todo: add checking to reject all meaningless flag combination.

Index: linux-2.6.git/mm/memory.c
===================================================================
--- linux-2.6.git.orig/mm/memory.c
+++ linux-2.6.git/mm/memory.c
@@ -1045,16 +1045,17 @@ int get_user_pages(struct task_struct *t
 			cond_resched();
 			while (!(page = follow_page(vma, start, foll_flags))) {
 				int ret;
-				ret = __handle_mm_fault(mm, vma, start,
-						foll_flags & FOLL_WRITE);
+				ret = __handle_mm_fault(mm, vma, start, vm_flags);
 				/*
 				 * The VM_FAULT_WRITE bit tells us that do_wp_page has
 				 * broken COW when necessary, even if maybe_mkwrite
 				 * decided not to set pte_write. We can thus safely do
 				 * subsequent page lookups as if they were reads.
 				 */
-				if (ret & VM_FAULT_WRITE)
+				if (ret & VM_FAULT_WRITE) {
 					foll_flags &= ~FOLL_WRITE;
+					vm_flags &= ~(VM_WRITE|VM_MAYWRITE);
+				}
 				
 				switch (ret & ~VM_FAULT_WRITE) {
 				case VM_FAULT_MINOR:
@@ -1389,7 +1390,20 @@ static inline int pte_unmap_same(struct 
  * servicing faults for write access.  In the normal case, do always want
  * pte_mkwrite.  But get_user_pages can cause write faults for mappings
  * that do not have writing enabled, when used by access_process_vm.
+ *
+ * Also, we must never change protections on VM_MANYPROTS pages; that's only
+ * allowed in do_no_page(), so test only VMA protections there. For other cases
+ * we *know* that VM_MANYPROTS is clear, such as anonymous/swap pages, and in
+ * that case using plain maybe_mkwrite() is an optimization.
+ * Instead, when we may be mapping a file, we must use maybe_mkwrite_file.
  */
+static inline pte_t maybe_mkwrite_file(pte_t pte, struct vm_area_struct *vma)
+{
+	if (likely((vma->vm_flags & (VM_WRITE | VM_MANYPROTS)) == VM_WRITE))
+		pte = pte_mkwrite(pte);
+	return pte;
+}
+
 static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 {
 	if (likely(vma->vm_flags & VM_WRITE))
@@ -1441,6 +1455,9 @@ static inline void cow_user_page(struct 
  * We enter with non-exclusive mmap_sem (to exclude vma changes,
  * but allow concurrent faults), with pte both mapped and locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
+ *
+ * Note that a page here can be a shared readonly page where
+ * get_user_pages() (for instance for ptrace()) wants to write to it!
  */
 static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, pte_t *page_table, pmd_t *pmd,
@@ -1460,7 +1477,8 @@ static int do_wp_page(struct mm_struct *
 		if (reuse) {
 			flush_cache_page(vma, address, pte_pfn(orig_pte));
 			entry = pte_mkyoung(orig_pte);
-			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+			/* Since it can be shared, it can be VM_MANYPROTS! */
+			entry = maybe_mkwrite_file(pte_mkdirty(entry), vma);
 			ptep_set_access_flags(vma, address, page_table, entry, 1);
 			update_mmu_cache(vma, address, entry);
 			lazy_mmu_prot_update(entry);
@@ -1504,7 +1522,7 @@ gotten:
 			inc_mm_counter(mm, anon_rss);
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
-		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		entry = maybe_mkwrite_file(pte_mkdirty(entry), vma);
 		ptep_establish(vma, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lazy_mmu_prot_update(entry);
@@ -1930,7 +1948,7 @@ again:
 	inc_mm_counter(mm, anon_rss);
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && can_share_swap_page(page)) {
-		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = maybe_mkwrite_file(pte_mkdirty(pte), vma);
 		write_access = 0;
 	}
 
@@ -2231,15 +2249,15 @@ static inline int handle_pte_fault(struc
 	pte_t entry;
 	pte_t old_entry;
 	spinlock_t *ptl;
-	int write_access = access_mask & VM_WRITE;
+	int write_access = access_mask & (VM_WRITE|VM_MAYWRITE);
 
 	old_entry = entry = *pte;
 	if (!pte_present(entry)) {
 		/* when pte_file(), the VMA protections are useless.  Otherwise,
 		 * we need to check VM_MANYPROTS, because in that case the arch
 		 * fault handler skips the VMA protection check. */
-		if (!pte_file(entry) && check_perms(vma, access_mask))
-			goto out_segv;
+		if (!pte_file(entry) && unlikely(check_perms(vma, access_mask)))
+			goto segv;
 
 		if (pte_none(entry)) {
 			if (!vma->vm_ops || !vma->vm_ops->nopage)
@@ -2269,12 +2287,14 @@ static inline int handle_pte_fault(struc
 		pgprot_t pgprot = pte_to_pgprot(*pte);
 		pte_t test_entry = pfn_pte(0, pgprot);
 
-		if (unlikely((access_mask & VM_WRITE) && !pte_write(test_entry)))
-			goto out_segv;
-		if (unlikely((access_mask & VM_READ) && !pte_read(test_entry)))
-			goto out_segv;
-		if (unlikely((access_mask & VM_EXEC) && !pte_exec(test_entry)))
-			goto out_segv;
+		if (likely(!(access_mask & (VM_MAYWRITE|VM_MAYREAD)))) {
+			if (unlikely((access_mask & VM_WRITE) && !pte_write(test_entry)))
+				goto segv_unlock;
+			if (unlikely((access_mask & VM_READ) && !pte_read(test_entry)))
+				goto segv_unlock;
+			if (unlikely((access_mask & VM_EXEC) && !pte_exec(test_entry)))
+				goto segv_unlock;
+		}
 	}
 
 	if (write_access) {
@@ -2301,8 +2321,10 @@ static inline int handle_pte_fault(struc
 unlock:
 	pte_unmap_unlock(pte, ptl);
 	return VM_FAULT_MINOR;
-out_segv:
+
+segv_unlock:
 	pte_unmap_unlock(pte, ptl);
+segv:
 	return VM_FAULT_SIGSEGV;
 }
 
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -734,7 +734,22 @@ extern int install_file_pte(struct mm_st
 
 #ifdef CONFIG_MMU
 
-/* We reuse VM_READ, VM_WRITE and VM_EXEC for the @access_mask. */
+/* We reuse VM_READ, VM_WRITE and (optionally) VM_EXEC for the @access_mask, to
+ * report the kind of access we request for permission checking, in case the VMA
+ * is VM_MANYPROTS.
+ *
+ * get_user_pages( force == 1 ) is a special case. It's allowed to override
+ * protection checks, even on VM_MANYPROTS vma.
+ *
+ * To express that, it must replace VM_READ / VM_WRITE with the corresponding
+ * MAY flags.
+ * This allows to force copying COW pages to break sharing even on read-only
+ * page table entries.
+ * PITFALL: you're not allowed to override only part of the checks, and in
+ * general specifying strange combinations of flags may lead to unspecified
+ * results.
+ */
+
 extern int __handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma,
 			unsigned long address, int access_mask);
 

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

  parent reply	other threads:[~2006-04-30 17:35 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-30 17:29 [patch 00/14] remap_file_pages protection support blaisorblade
2006-04-30 17:29 ` [patch 01/14] Fix comment about remap_file_pages blaisorblade
2006-04-30 17:29 ` [patch 02/14] remap_file_pages protection support: add needed macros blaisorblade
2006-04-30 17:29 ` [patch 03/14] remap_file_pages protection support: handle MANYPROTS VMAs blaisorblade
2006-04-30 17:29 ` [patch 04/14] remap_file_pages protection support: disallow mprotect() on manyprots mappings blaisorblade
2006-04-30 17:29 ` [patch 05/14] remap_file_pages protection support: cleanup syscall checks blaisorblade
2006-04-30 17:29 ` [patch 06/14] remap_file_pages protection support: enhance syscall interface blaisorblade
2006-04-30 17:30 ` [patch 07/14] remap_file_pages protection support: support private vma for MAP_POPULATE blaisorblade
2006-04-30 17:30 ` [patch 08/14] remap_file_pages protection support: use FAULT_SIGSEGV for protection checking blaisorblade
2006-04-30 17:30 ` [patch 09/14] remap_file_pages protection support: fix race condition with concurrent faults on same address space blaisorblade
2006-04-30 17:30 ` blaisorblade [this message]
2006-04-30 17:30 ` [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes blaisorblade
2006-05-02  3:53   ` Nick Piggin
2006-05-03  1:29     ` Blaisorblade
2006-05-06 10:03       ` Nick Piggin
2006-05-07 17:50         ` Blaisorblade
2006-04-30 17:30 ` [patch 12/14] remap_file_pages protection support: also set VM_NONLINEAR on nonuniform VMAs blaisorblade
2006-04-30 17:30 ` [patch 13/14] remap_file_pages protection support: uml, i386, x64 bits blaisorblade
2006-04-30 17:30 ` [patch 14/14] remap_file_pages protection support: adapt to uml peculiarities blaisorblade
2006-05-02  3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin
2006-05-02  3:56   ` Nick Piggin
2006-05-02 11:24     ` Ingo Molnar
2006-05-02 12:19       ` Nick Piggin
2006-05-02 17:16   ` Lee Schermerhorn
2006-05-03  1:20     ` Blaisorblade
2006-05-03 14:35       ` Lee Schermerhorn
2006-05-03  0:25   ` Blaisorblade
2006-05-06 16:05     ` Ulrich Drepper
2006-05-07  4:22       ` Nick Piggin
2006-05-13 14:13         ` Nick Piggin
2006-05-13 18:19           ` Valerie Henson
2006-05-13 22:54             ` Valerie Henson
2006-05-16 13:30             ` Nick Piggin
2006-05-16 13:51               ` Andreas Mohr
2006-05-16 16:31                 ` Valerie Henson
2006-05-16 16:47                   ` Andreas Mohr
2006-05-17  3:25                     ` Nick Piggin
2006-05-17  6:10                     ` Blaisorblade
2006-05-16 16:33               ` Valerie Henson
2006-05-03  0:44   ` Blaisorblade
2006-05-06  9:06     ` Nick Piggin
2006-05-06 15:26       ` Ulrich Drepper
2006-05-02 10:21 ` Arjan van de Ven
2006-05-02 23:46   ` Valerie Henson
2006-05-03  0:26   ` Blaisorblade
2006-05-03  1:44     ` Ulrich Drepper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060430173025.365179000@zion.home.lan \
    --to=blaisorblade@yahoo.it \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).