From: Martin Schwidefsky <schwidefsky@de.ibm.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matt Mackall <mpm@selenic.com>, Gerald Schaefer <gerald.schaefer@de.ibm.com>, akpm@linux-foundation.org Subject: [PATCH] fix/improve generic page table walker Date: Wed, 11 Mar 2009 14:49:51 +0100 [thread overview] Message-ID: <20090311144951.58c6ab60@skybase> (raw) From: Martin Schwidefsky <schwidefsky@de.ibm.com> On s390 the /proc/pid/pagemap interface is currently broken. This is caused by the unconditional loop over all pgd/pud entries as specified by the address range passed to walk_page_range. The tricky bit here is that the pgd++ in the outer loop may only be done if the page table really has 4 levels. For the pud++ in the second loop the page table needs to have at least 3 levels. With the dynamic page tables on s390 we can have page tables with 2, 3 or 4 levels. Which means that the pgd and/or the pud pointer can get out-of-bounds causing all kinds of mayhem. The proposed solution is to fast-forward over the hole between the start address and the first vma and the hole between the last vma and the end address. The pgd/pud/pmd/pte loops are used only for the address range between the first and last vma. This guarantees that the page table pointers stay in range for s390. For the other architectures this is a small optimization. As the page walker now accesses the vma list the mmap_sem is required. All callers of the walk_page_range function needs to acquire the semaphore. Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> --- fs/proc/task_mmu.c | 2 ++ mm/pagewalk.c | 28 ++++++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 2 deletions(-) diff -urpN linux-2.6/fs/proc/task_mmu.c linux-2.6-patched/fs/proc/task_mmu.c --- linux-2.6/fs/proc/task_mmu.c 2009-03-11 13:38:53.000000000 +0100 +++ linux-2.6-patched/fs/proc/task_mmu.c 2009-03-11 13:39:45.000000000 +0100 @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file * user buffer is tracked in "pm", and the walk * will stop when we hit the end of the buffer. */ + down_read(&mm->mmap_sem); ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk); + up_read(&mm->mmap_sem); if (ret == PM_END_OF_BUFFER) ret = 0; /* don't need mmap_sem for these, but this looks cleaner */ diff -urpN linux-2.6/mm/pagewalk.c linux-2.6-patched/mm/pagewalk.c --- linux-2.6/mm/pagewalk.c 2008-12-25 00:26:37.000000000 +0100 +++ linux-2.6-patched/mm/pagewalk.c 2009-03-11 13:39:45.000000000 +0100 @@ -104,6 +104,8 @@ static int walk_pud_range(pgd_t *pgd, un int walk_page_range(unsigned long addr, unsigned long end, struct mm_walk *walk) { + struct vm_area_struct *vma, *prev; + unsigned long stop; pgd_t *pgd; unsigned long next; int err = 0; @@ -114,9 +116,28 @@ int walk_page_range(unsigned long addr, if (!walk->mm) return -EINVAL; + /* Find first valid address contained in a vma. */ + vma = find_vma(walk->mm, addr); + if (!vma) + /* One big hole. */ + return walk->pte_hole(addr, end, walk); + if (addr < vma->vm_start) { + /* Skip over all ptes in the area before the first vma. */ + err = walk->pte_hole(addr, vma->vm_start, walk); + if (err) + return err; + addr = vma->vm_start; + } + + /* Find last valid address contained in a vma. */ + stop = end; + vma = find_vma_prev(walk->mm, end, &prev); + if (!vma) + stop = prev->vm_end; + pgd = pgd_offset(walk->mm, addr); do { - next = pgd_addr_end(addr, end); + next = pgd_addr_end(addr, stop); if (pgd_none_or_clear_bad(pgd)) { if (walk->pte_hole) err = walk->pte_hole(addr, next, walk); @@ -131,7 +152,10 @@ int walk_page_range(unsigned long addr, err = walk_pud_range(pgd, addr, next, walk); if (err) break; - } while (pgd++, addr = next, addr != end); + } while (pgd++, addr = next, addr != stop); + if (stop < end) + /* Skip over all ptes in the area after the last vma. */ + err = walk->pte_hole(stop, end, walk); return err; }
WARNING: multiple messages have this Message-ID (diff)
From: Martin Schwidefsky <schwidefsky@de.ibm.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matt Mackall <mpm@selenic.com>, Gerald Schaefer <gerald.schaefer@de.ibm.com>, akpm@linux-foundation.org Subject: [PATCH] fix/improve generic page table walker Date: Wed, 11 Mar 2009 14:49:51 +0100 [thread overview] Message-ID: <20090311144951.58c6ab60@skybase> (raw) From: Martin Schwidefsky <schwidefsky@de.ibm.com> On s390 the /proc/pid/pagemap interface is currently broken. This is caused by the unconditional loop over all pgd/pud entries as specified by the address range passed to walk_page_range. The tricky bit here is that the pgd++ in the outer loop may only be done if the page table really has 4 levels. For the pud++ in the second loop the page table needs to have at least 3 levels. With the dynamic page tables on s390 we can have page tables with 2, 3 or 4 levels. Which means that the pgd and/or the pud pointer can get out-of-bounds causing all kinds of mayhem. The proposed solution is to fast-forward over the hole between the start address and the first vma and the hole between the last vma and the end address. The pgd/pud/pmd/pte loops are used only for the address range between the first and last vma. This guarantees that the page table pointers stay in range for s390. For the other architectures this is a small optimization. As the page walker now accesses the vma list the mmap_sem is required. All callers of the walk_page_range function needs to acquire the semaphore. Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> --- fs/proc/task_mmu.c | 2 ++ mm/pagewalk.c | 28 ++++++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 2 deletions(-) diff -urpN linux-2.6/fs/proc/task_mmu.c linux-2.6-patched/fs/proc/task_mmu.c --- linux-2.6/fs/proc/task_mmu.c 2009-03-11 13:38:53.000000000 +0100 +++ linux-2.6-patched/fs/proc/task_mmu.c 2009-03-11 13:39:45.000000000 +0100 @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file * user buffer is tracked in "pm", and the walk * will stop when we hit the end of the buffer. */ + down_read(&mm->mmap_sem); ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk); + up_read(&mm->mmap_sem); if (ret == PM_END_OF_BUFFER) ret = 0; /* don't need mmap_sem for these, but this looks cleaner */ diff -urpN linux-2.6/mm/pagewalk.c linux-2.6-patched/mm/pagewalk.c --- linux-2.6/mm/pagewalk.c 2008-12-25 00:26:37.000000000 +0100 +++ linux-2.6-patched/mm/pagewalk.c 2009-03-11 13:39:45.000000000 +0100 @@ -104,6 +104,8 @@ static int walk_pud_range(pgd_t *pgd, un int walk_page_range(unsigned long addr, unsigned long end, struct mm_walk *walk) { + struct vm_area_struct *vma, *prev; + unsigned long stop; pgd_t *pgd; unsigned long next; int err = 0; @@ -114,9 +116,28 @@ int walk_page_range(unsigned long addr, if (!walk->mm) return -EINVAL; + /* Find first valid address contained in a vma. */ + vma = find_vma(walk->mm, addr); + if (!vma) + /* One big hole. */ + return walk->pte_hole(addr, end, walk); + if (addr < vma->vm_start) { + /* Skip over all ptes in the area before the first vma. */ + err = walk->pte_hole(addr, vma->vm_start, walk); + if (err) + return err; + addr = vma->vm_start; + } + + /* Find last valid address contained in a vma. */ + stop = end; + vma = find_vma_prev(walk->mm, end, &prev); + if (!vma) + stop = prev->vm_end; + pgd = pgd_offset(walk->mm, addr); do { - next = pgd_addr_end(addr, end); + next = pgd_addr_end(addr, stop); if (pgd_none_or_clear_bad(pgd)) { if (walk->pte_hole) err = walk->pte_hole(addr, next, walk); @@ -131,7 +152,10 @@ int walk_page_range(unsigned long addr, err = walk_pud_range(pgd, addr, next, walk); if (err) break; - } while (pgd++, addr = next, addr != end); + } while (pgd++, addr = next, addr != stop); + if (stop < end) + /* Skip over all ptes in the area after the last vma. */ + err = walk->pte_hole(stop, end, walk); return err; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2009-03-11 13:53 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-03-11 13:49 Martin Schwidefsky [this message] 2009-03-11 13:49 ` [PATCH] fix/improve generic page table walker Martin Schwidefsky 2009-03-11 17:24 ` Matt Mackall 2009-03-11 17:24 ` Matt Mackall 2009-03-12 8:33 ` Martin Schwidefsky 2009-03-12 8:33 ` Martin Schwidefsky 2009-03-12 10:19 ` Martin Schwidefsky 2009-03-12 10:19 ` Martin Schwidefsky 2009-03-12 11:24 ` Martin Schwidefsky 2009-03-12 11:24 ` Martin Schwidefsky 2009-03-12 14:10 ` Matt Mackall 2009-03-12 14:10 ` Matt Mackall 2009-03-12 14:42 ` Martin Schwidefsky 2009-03-12 14:42 ` Martin Schwidefsky 2009-03-12 15:58 ` Matt Mackall 2009-03-12 15:58 ` Matt Mackall 2009-03-16 12:27 ` Martin Schwidefsky 2009-03-16 12:27 ` Martin Schwidefsky 2009-03-16 12:36 ` Nick Piggin 2009-03-16 12:36 ` Nick Piggin 2009-03-16 12:55 ` Martin Schwidefsky 2009-03-16 12:55 ` Martin Schwidefsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20090311144951.58c6ab60@skybase \ --to=schwidefsky@de.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=gerald.schaefer@de.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mpm@selenic.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.