* [PATCH] Mark thread stack correctly in proc/<pid>/maps @ 2012-01-14 12:35 Siddhesh Poyarekar 2012-01-16 11:28 ` Jamie Lokier 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-01-14 12:35 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Siddhesh Poyarekar Memory mmaped by glibc for a thread stack currently shows up as a simple anonymous map, which makes it difficult to differentiate between memory usage of the thread on stack and other dynamic allocation. Since glibc already uses MAP_STACK to request this mapping, the attached patch uses this flag to add additional VM_STACK_FLAGS to the resulting vma so that the mapping is treated as a stack and not any regular anonymous mapping. Also, one may use vm_flags to decide if a vma is a stack. There is an additional complication with posix threads where the stack guard for a thread stack may be larger than a page, unlike the case for process stack where the stack guard is a page long. glibc implements these guards by calling mprotect on the beginning page(s) to remove all permissions. I have used this to remove vmas that have the thread stack guard, from the /proc/maps output. If accepted, this should also reflect in the man page for mmap since MAP_STACK will no longer be a noop. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- fs/proc/task_mmu.c | 8 +++++--- include/linux/mm.h | 17 +++++++++++++++++ mm/mmap.c | 3 +++ 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e418c5a..98b5275 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -227,7 +227,10 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } - /* We don't show the stack guard page in /proc/maps */ + /* We don't show the stack guard pages in /proc/maps */ + if (thread_stack_guard(vma)) + return; + start = vma->vm_start; if (stack_guard_page_start(vma, start)) start += PAGE_SIZE; @@ -259,8 +262,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vma_is_stack(vma)) { name = "[stack]"; } } else { diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..9871e10 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1018,6 +1018,23 @@ static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr) return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN); } +static inline int vma_is_stack(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN)); +} + +/* + * POSIX thread stack guards may be more than a page long and access to it + * should return an error (possibly a SIGSEGV). The glibc implementation does + * an mprotect(..., ..., PROT_NONE), so our guard vma has no permissions. + */ +static inline int thread_stack_guard(struct vm_area_struct *vma) +{ + return vma_is_stack(vma) && + ((vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_MAYSHARE)) == 0) && + vma_is_stack((vma->vm_flags & VM_GROWSDOWN)?vma->vm_next:vma->vm_prev); +} + static inline int stack_guard_page_start(struct vm_area_struct *vma, unsigned long addr) { diff --git a/mm/mmap.c b/mm/mmap.c index 3f758c7..2f9f540 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + if (flags & MAP_STACK) + vm_flags |= VM_STACK_FLAGS; + if (flags & MAP_LOCKED) if (!can_do_mlock()) return -EPERM; -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-14 12:35 [PATCH] Mark thread stack correctly in proc/<pid>/maps Siddhesh Poyarekar @ 2012-01-16 11:28 ` Jamie Lokier 2012-01-16 13:08 ` Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: Jamie Lokier @ 2012-01-16 11:28 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man Siddhesh Poyarekar wrote: > Memory mmaped by glibc for a thread stack currently shows up as a simple > anonymous map, which makes it difficult to differentiate between memory > usage of the thread on stack and other dynamic allocation. Since glibc > already uses MAP_STACK to request this mapping, the attached patch > uses this flag to add additional VM_STACK_FLAGS to the resulting vma > so that the mapping is treated as a stack and not any regular > anonymous mapping. Also, one may use vm_flags to decide if a vma is a > stack. I think this is fine. > There is an additional complication with posix threads where the stack > guard for a thread stack may be larger than a page, unlike the case > for process stack where the stack guard is a page long. glibc > implements these guards by calling mprotect on the beginning page(s) > to remove all permissions. I have used this to remove vmas that have > the thread stack guard, from the /proc/maps output. > - /* We don't show the stack guard page in /proc/maps */ > + /* We don't show the stack guard pages in /proc/maps */ > + if (thread_stack_guard(vma)) > + return; > + > start = vma->vm_start; > if (stack_guard_page_start(vma, start)) > start += PAGE_SIZE; Hmm, I see why you did this. The current code already hides one guard page, which is already dubious for programs that do things like read /proc/pid/maps to decide if MAP_FIXED would be not clobber an existing mapping. At least those programs _could_ know about the stack guard page address I wonder if it's a potential security hole: You've just allowed programs to use two MAP_GROWSUP/DOWN|PROT_NONE to hide vmas from the user. Sure, the memory isn't accessible, but it can still store data and be ephemerally made visible using mprotect() then hidden again. I would prefer a label like "[stack guard]" or just "[guard]", both for the thread stacks and the process stack. With a label like "[guard]" it needn't be limited to stacks; heap guard pages used by some programs would also be labelled. > +static inline int vma_is_stack(struct vm_area_struct *vma) > +{ > + return vma && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN)); > +} > + > +/* > + * POSIX thread stack guards may be more than a page long and access to it > + * should return an error (possibly a SIGSEGV). The glibc implementation does > + * an mprotect(..., ..., PROT_NONE), so our guard vma has no permissions. > + */ > +static inline int thread_stack_guard(struct vm_area_struct *vma) Is there a reason the names aren't consistent - i.e. not vma_is_stack_guard()? > +{ > + return vma_is_stack(vma) && > + ((vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_MAYSHARE)) == 0) && > + vma_is_stack((vma->vm_flags & VM_GROWSDOWN)?vma->vm_next:vma->vm_prev); > +} > + That doesn't check if ->vm_next/prev is adjacent in address space. You can't assume the program is using Glibc, or that MAP_STACK mappings are all from Glibc, or that they are in the pattern you expect. How about simply calling it vma_is_guard(), return 1 if it's PROT_NONE without checking vma_is_stack() or ->vm_next/prev, and annotate the maps output like this: is_stack => "[stack]" is_guard & is_stack => "[stack guard]" is_guard & !is_stack => "[guard]" What do you think? -- Jamie ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-16 11:28 ` Jamie Lokier @ 2012-01-16 13:08 ` Siddhesh Poyarekar 2012-01-16 16:31 ` Jamie Lokier 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-01-16 13:08 UTC (permalink / raw) To: Jamie Lokier Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man On Mon, Jan 16, 2012 at 4:58 PM, Jamie Lokier <jamie@shareable.org> wrote: > Is there a reason the names aren't consistent - i.e. not vma_is_stack_guard()? Ah, that was an error on my part; I did not notice the naming convention. > How about simply calling it vma_is_guard(), return 1 if it's PROT_NONE > without checking vma_is_stack() or ->vm_next/prev, and annotate the > maps output like this: > > is_stack => "[stack]" > is_guard & is_stack => "[stack guard]" > is_guard & !is_stack => "[guard]" > > What do you think? Thanks for the review. We're already marking permissions in the maps output to convey protection, so isn't marking those vmas as [guard] redundant? Following that, we could just mark the thread stack guard as [stack] without any permissions. The process stack guard page probably deserves the [stack guard] label since it is marked differently from the thread stack guard and will otherwise have the permissions that the process stack has. Will that be good? -- Siddhesh Poyarekar ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-16 13:08 ` Siddhesh Poyarekar @ 2012-01-16 16:31 ` Jamie Lokier 2012-01-16 17:01 ` Siddhesh Poyarekar 2012-01-17 4:54 ` Siddhesh Poyarekar 0 siblings, 2 replies; 42+ messages in thread From: Jamie Lokier @ 2012-01-16 16:31 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man Siddhesh Poyarekar wrote: > On Mon, Jan 16, 2012 at 4:58 PM, Jamie Lokier <jamie@shareable.org> wrote: > > Is there a reason the names aren't consistent - i.e. not vma_is_stack_guard()? > > Ah, that was an error on my part; I did not notice the naming convention. > > > How about simply calling it vma_is_guard(), return 1 if it's PROT_NONE > > without checking vma_is_stack() or ->vm_next/prev, and annotate the > > maps output like this: > > > > is_stack => "[stack]" > > is_guard & is_stack => "[stack guard]" > > is_guard & !is_stack => "[guard]" > > > > What do you think? > > Thanks for the review. We're already marking permissions in the maps > output to convey protection, so isn't marking those vmas as [guard] > redundant? Yes it's redundant, I just think it's a bit clearer at showing the intent. After all that's also the reason for "[stack]", "[heap]" etc. It's not important though. > Following that, we could just mark the thread stack guard as [stack] > without any permissions. The process stack guard page probably > deserves the [stack guard] label since it is marked differently from > the thread stack guard and will otherwise have the permissions that > the process stack has. Will that be good? I don't have any strong opinions about what the test looks like; mainly I was pointing out the ->vm_next/prev check seemed dubious, that Glibc's layout shouldn't be assumed, and that hiding ranges from /maps may mislead some programs. Aesthetically I think if the main process stack has "[stack guard]", it makes sense for the thread stack guards to be labelled the same. One more technical thing: Now that you're using VM_STACK to change the text, why not set that flag for the process stack vma as well, when the stack is set up by exec, and get rid of the special case for process stack in printing? All the best, -- Jamie ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-16 16:31 ` Jamie Lokier @ 2012-01-16 17:01 ` Siddhesh Poyarekar 2012-01-17 4:54 ` Siddhesh Poyarekar 1 sibling, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-01-16 17:01 UTC (permalink / raw) To: Jamie Lokier Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man On Mon, Jan 16, 2012 at 10:01 PM, Jamie Lokier <jamie@shareable.org> wrote: > Aesthetically I think if the main process stack has "[stack guard]", > it makes sense for the thread stack guards to be labelled the same. Right, I'll mark both stack guards alike. > One more technical thing: Now that you're using VM_STACK to change the > text, why not set that flag for the process stack vma as well, when > the stack is set up by exec, and get rid of the special case for > process stack in printing? I think the flag is already set: static int __bprm_mm_init(struct linux_binprm *bprm) { ... /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't * use STACK_TOP because that can depend on attributes which aren't * configured yet. */ BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP); vma->vm_end = STACK_TOP_MAX; vma->vm_start = vma->vm_end - PAGE_SIZE; vma->vm_flags = VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP; vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); INIT_LIST_HEAD(&vma->anon_vma_chain); ... } The only special case in the printing code for the process stack is the skipping of the guard page. I'll modify that to mark and display the stack guard instead. I'll post an updated patch with these changes. Thanks! -- Siddhesh Poyarekar ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-16 16:31 ` Jamie Lokier 2012-01-16 17:01 ` Siddhesh Poyarekar @ 2012-01-17 4:54 ` Siddhesh Poyarekar 2012-02-02 6:24 ` [RESEND][PATCH] " Siddhesh Poyarekar 1 sibling, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-01-17 4:54 UTC (permalink / raw) To: Jamie Lokier Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Siddhesh Poyarekar [Take 2] Memory mmaped by glibc for a thread stack currently shows up as a simple anonymous map, which makes it difficult to differentiate between memory usage of the thread on stack and other dynamic allocation. Since glibc already uses MAP_STACK to request this mapping, the attached patch uses this flag to add additional VM_STACK_FLAGS to the resulting vma so that the mapping is treated as a stack and not any regular anonymous mapping. Also, one may use vm_flags to decide if a vma is a stack. This patch also changes the maps output to annotate stack guards for both the process stack as well as the thread stacks. Thus is born the [stack guard] annotation, which should be exactly a page long for the process stack and can be longer than a page (configurable in userspace) for POSIX compliant thread stacks. A thread stack guard is simply page(s) with PROT_NONE. If accepted, this should also reflect in the man page for mmap since MAP_STACK will no longer be a noop. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- fs/proc/task_mmu.c | 41 ++++++++++++++++++++++++++++++++++++----- include/linux/mm.h | 19 +++++++++++++++++-- mm/mmap.c | 3 +++ 3 files changed, 56 insertions(+), 7 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e418c5a..650330c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -227,13 +227,42 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } - /* We don't show the stack guard page in /proc/maps */ + /* + * Mark the process stack guard, which is just one page at the + * beginning of the stack within the stack vma. + */ start = vma->vm_start; - if (stack_guard_page_start(vma, start)) + if (stack_guard_page_start(vma, start)) { + seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", + start, + start + PAGE_SIZE, + flags & VM_READ ? 'r' : '-', + flags & VM_WRITE ? 'w' : '-', + flags & VM_EXEC ? 'x' : '-', + flags & VM_MAYSHARE ? 's' : 'p', + pgoff, + MAJOR(dev), MINOR(dev), ino, &len); + + pad_len_spaces(m, len); + seq_puts(m, "[stack guard]\n"); start += PAGE_SIZE; + } end = vma->vm_end; - if (stack_guard_page_end(vma, end)) + if (stack_guard_page_end(vma, end)) { + seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", + end - PAGE_SIZE, + end, + flags & VM_READ ? 'r' : '-', + flags & VM_WRITE ? 'w' : '-', + flags & VM_EXEC ? 'x' : '-', + flags & VM_MAYSHARE ? 's' : 'p', + pgoff, + MAJOR(dev), MINOR(dev), ino, &len); + + pad_len_spaces(m, len); + seq_puts(m, "[stack guard]\n"); end -= PAGE_SIZE; + } seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", start, @@ -259,8 +288,10 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vma_is_stack(vma) && + vma_is_guard(vma)) { + name = "[stack guard]"; + } else if (vma_is_stack(vma)) { name = "[stack]"; } } else { diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..4e57753 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1018,12 +1018,26 @@ static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr) return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN); } +static inline int vma_is_stack(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN)); +} + +/* + * Check guard set by userspace (PROT_NONE) + */ +static inline int vma_is_guard(struct vm_area_struct *vma) +{ + return (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) == 0; +} + static inline int stack_guard_page_start(struct vm_area_struct *vma, unsigned long addr) { return (vma->vm_flags & VM_GROWSDOWN) && (vma->vm_start == addr) && - !vma_growsdown(vma->vm_prev, addr); + !vma_growsdown(vma->vm_prev, addr) && + !vma_is_guard(vma); } /* Is the vma a continuation of the stack vma below it? */ @@ -1037,7 +1051,8 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, { return (vma->vm_flags & VM_GROWSUP) && (vma->vm_end == addr) && - !vma_growsup(vma->vm_next, addr); + !vma_growsup(vma->vm_next, addr) && + !vma_is_guard(vma); } extern unsigned long move_page_tables(struct vm_area_struct *vma, diff --git a/mm/mmap.c b/mm/mmap.c index 3f758c7..2f9f540 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + if (flags & MAP_STACK) + vm_flags |= VM_STACK_FLAGS; + if (flags & MAP_LOCKED) if (!can_do_mlock()) return -EPERM; -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-01-17 4:54 ` Siddhesh Poyarekar @ 2012-02-02 6:24 ` Siddhesh Poyarekar 2012-02-02 21:40 ` KOSAKI Motohiro 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-02 6:24 UTC (permalink / raw) To: Jamie Lokier Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Siddhesh Poyarekar Hi, Resending since I did not get any feedback on the second take of the patch. Thanks, Siddhesh ---------- Forwarded message ---------- From: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Date: Tue, Jan 17, 2012 at 10:24 AM Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps To: Jamie Lokier <jamie@shareable.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org, Michael Kerrisk <mtk.manpages@gmail.com>, linux-man@vger.kernel.org, Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> [Take 2] Memory mmaped by glibc for a thread stack currently shows up as a simple anonymous map, which makes it difficult to differentiate between memory usage of the thread on stack and other dynamic allocation. Since glibc already uses MAP_STACK to request this mapping, the attached patch uses this flag to add additional VM_STACK_FLAGS to the resulting vma so that the mapping is treated as a stack and not any regular anonymous mapping. Also, one may use vm_flags to decide if a vma is a stack. This patch also changes the maps output to annotate stack guards for both the process stack as well as the thread stacks. Thus is born the [stack guard] annotation, which should be exactly a page long for the process stack and can be longer than a page (configurable in userspace) for POSIX compliant thread stacks. A thread stack guard is simply page(s) with PROT_NONE. If accepted, this should also reflect in the man page for mmap since MAP_STACK will no longer be a noop. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- fs/proc/task_mmu.c | 41 ++++++++++++++++++++++++++++++++++++----- include/linux/mm.h | 19 +++++++++++++++++-- mm/mmap.c | 3 +++ 3 files changed, 56 insertions(+), 7 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e418c5a..650330c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -227,13 +227,42 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } - /* We don't show the stack guard page in /proc/maps */ + /* + * Mark the process stack guard, which is just one page at the + * beginning of the stack within the stack vma. + */ start = vma->vm_start; - if (stack_guard_page_start(vma, start)) + if (stack_guard_page_start(vma, start)) { + seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", + start, + start + PAGE_SIZE, + flags & VM_READ ? 'r' : '-', + flags & VM_WRITE ? 'w' : '-', + flags & VM_EXEC ? 'x' : '-', + flags & VM_MAYSHARE ? 's' : 'p', + pgoff, + MAJOR(dev), MINOR(dev), ino, &len); + + pad_len_spaces(m, len); + seq_puts(m, "[stack guard]\n"); start += PAGE_SIZE; + } end = vma->vm_end; - if (stack_guard_page_end(vma, end)) + if (stack_guard_page_end(vma, end)) { + seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", + end - PAGE_SIZE, + end, + flags & VM_READ ? 'r' : '-', + flags & VM_WRITE ? 'w' : '-', + flags & VM_EXEC ? 'x' : '-', + flags & VM_MAYSHARE ? 's' : 'p', + pgoff, + MAJOR(dev), MINOR(dev), ino, &len); + + pad_len_spaces(m, len); + seq_puts(m, "[stack guard]\n"); end -= PAGE_SIZE; + } seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", start, @@ -259,8 +288,10 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vma_is_stack(vma) && + vma_is_guard(vma)) { + name = "[stack guard]"; + } else if (vma_is_stack(vma)) { name = "[stack]"; } } else { diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..4e57753 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1018,12 +1018,26 @@ static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr) return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN); } +static inline int vma_is_stack(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN)); +} + +/* + * Check guard set by userspace (PROT_NONE) + */ +static inline int vma_is_guard(struct vm_area_struct *vma) +{ + return (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) == 0; +} + static inline int stack_guard_page_start(struct vm_area_struct *vma, unsigned long addr) { return (vma->vm_flags & VM_GROWSDOWN) && (vma->vm_start == addr) && - !vma_growsdown(vma->vm_prev, addr); + !vma_growsdown(vma->vm_prev, addr) && + !vma_is_guard(vma); } /* Is the vma a continuation of the stack vma below it? */ @@ -1037,7 +1051,8 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, { return (vma->vm_flags & VM_GROWSUP) && (vma->vm_end == addr) && - !vma_growsup(vma->vm_next, addr); + !vma_growsup(vma->vm_next, addr) && + !vma_is_guard(vma); } extern unsigned long move_page_tables(struct vm_area_struct *vma, diff --git a/mm/mmap.c b/mm/mmap.c index 3f758c7..2f9f540 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + if (flags & MAP_STACK) + vm_flags |= VM_STACK_FLAGS; + if (flags & MAP_LOCKED) if (!can_do_mlock()) return -EPERM; -- 1.7.7.4 -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-02 6:24 ` [RESEND][PATCH] " Siddhesh Poyarekar @ 2012-02-02 21:40 ` KOSAKI Motohiro 2012-02-03 7:09 ` Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-02 21:40 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man > extern unsigned long move_page_tables(struct vm_area_struct *vma, > diff --git a/mm/mmap.c b/mm/mmap.c > index 3f758c7..2f9f540 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file, > unsigned long addr, > vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | > mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; > > + if (flags& MAP_STACK) > + vm_flags |= VM_STACK_FLAGS; ?? MAP_STACK doesn't mean auto stack expansion. Why do you turn on VM_GROWSDOWN? Seems incorrect. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-02 21:40 ` KOSAKI Motohiro @ 2012-02-03 7:09 ` Siddhesh Poyarekar 2012-02-03 8:01 ` KOSAKI Motohiro 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-03 7:09 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man On Fri, Feb 3, 2012 at 3:10 AM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: >> extern unsigned long move_page_tables(struct vm_area_struct *vma, >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 3f758c7..2f9f540 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file, >> unsigned long addr, >> vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | >> mm->def_flags | VM_MAYREAD | VM_MAYWRITE | >> VM_MAYEXEC; >> >> + if (flags& MAP_STACK) >> + vm_flags |= VM_STACK_FLAGS; > > > ?? > MAP_STACK doesn't mean auto stack expansion. Why do you turn on > VM_GROWSDOWN? > Seems incorrect. > Right now MAP_STACK does not mean anything since it is ignored. The intention of this behaviour change is to make MAP_STACK mean that the map is going to be used as a stack and hence, set it up like a stack ought to be. I could not really think of a valid case for fixed size stacks; it looks like a limitation in the pthread implementation in glibc rather than a feature. So this patch will actually result in uniform behaviour across threads when it comes to stacks. This does change vm accounting since thread stacks were earlier accounted as anon memory. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-03 7:09 ` Siddhesh Poyarekar @ 2012-02-03 8:01 ` KOSAKI Motohiro 2012-02-03 9:49 ` Siddhesh Poyarekar ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-03 8:01 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man > Right now MAP_STACK does not mean anything since it is ignored. The > intention of this behaviour change is to make MAP_STACK mean that the > map is going to be used as a stack and hence, set it up like a stack > ought to be. I could not really think of a valid case for fixed size > stacks; it looks like a limitation in the pthread implementation in > glibc rather than a feature. So this patch will actually result in > uniform behaviour across threads when it comes to stacks. > > This does change vm accounting since thread stacks were earlier > accounted as anon memory. The fact is, now process stack and pthread stack clearly behave different dance. libc don't expect pthread stack grow automatically. So, your patch will break userland. Just only change display thing. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-03 8:01 ` KOSAKI Motohiro @ 2012-02-03 9:49 ` Siddhesh Poyarekar 2012-02-03 10:29 ` Mike Frysinger 2012-02-03 18:34 ` Siddhesh Poyarekar 2 siblings, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-03 9:49 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > The fact is, now process stack and pthread stack clearly behave > different dance. libc don't expect pthread stack grow automatically. > So, your patch will break userland. Just only change display thing. Thanks for your feedback. This attempt was to unify this behaviours, but I guess you're right; I need to check if glibc really has a problem with this than assuming that it should not. I will check with glibc maintainers on this and update here. Since this flag is specifically for glibc, it should not affect other applications or libraries. The proc changes won't make sense without the change to mark thread stacks unless we create yet another vm flag to reflect MAP_STACK in the vma and then use that for both process and its threads. I'll submit a patch with this (if acceptable of course) if glibc strictly requires fixed sized stacks. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-03 8:01 ` KOSAKI Motohiro 2012-02-03 9:49 ` Siddhesh Poyarekar @ 2012-02-03 10:29 ` Mike Frysinger 2012-02-03 18:34 ` Siddhesh Poyarekar 2 siblings, 0 replies; 42+ messages in thread From: Mike Frysinger @ 2012-02-03 10:29 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Siddhesh Poyarekar, Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man [-- Attachment #1: Type: Text/Plain, Size: 1247 bytes --] On Friday 03 February 2012 03:01:35 KOSAKI Motohiro wrote: > > Right now MAP_STACK does not mean anything since it is ignored. The > > intention of this behaviour change is to make MAP_STACK mean that the > > map is going to be used as a stack and hence, set it up like a stack > > ought to be. I could not really think of a valid case for fixed size > > stacks; it looks like a limitation in the pthread implementation in > > glibc rather than a feature. So this patch will actually result in > > uniform behaviour across threads when it comes to stacks. > > > > This does change vm accounting since thread stacks were earlier > > accounted as anon memory. > > The fact is, now process stack and pthread stack clearly behave > different dance. libc don't expect pthread stack grow automatically. > So, your patch will break userland. Just only change display thing. does it though ? glibc doesn't keep track of the unused address space ... that's what the kernel is for. pthread_attr_setstacksize explicitly operates on the *minimum* stack size, not the *exact* size. where exactly do you think userland would break ? http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_attr_setstacksize.html -mike [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-03 8:01 ` KOSAKI Motohiro 2012-02-03 9:49 ` Siddhesh Poyarekar 2012-02-03 10:29 ` Mike Frysinger @ 2012-02-03 18:34 ` Siddhesh Poyarekar 2012-02-08 4:00 ` Siddhesh Poyarekar 2 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-03 18:34 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > The fact is, now process stack and pthread stack clearly behave > different dance. libc don't expect pthread stack grow automatically. > So, your patch will break userland. Just only change display thing. The change should not make thread stacks (as implemented by glibc) grow automatically in the general case since it has to implement guard page(s) at the beginning of the mapped memory for stack using mprotect(top, size, PROT_NONE). Due to this, the program will crash before it ever has a chance to cause a page fault to make the stack grow. This is of course assuming that the program doesn't purposely jump over the guard page(s) by setting %sp beyond them and then causing a page fault. So the only case in which a thread stack will grow automatically is if the stackguard is set to 0: http://pubs.opengroup.org/onlinepubs/007904975/functions/pthread_attr_setguardsize.html I have also dropped an email on the libc-alpha list here to solicit comments from libc maintainers on this: http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-03 18:34 ` Siddhesh Poyarekar @ 2012-02-08 4:00 ` Siddhesh Poyarekar 2012-02-08 17:57 ` KOSAKI Motohiro 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-08 4:00 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Mike Frysinger On Sat, Feb 4, 2012 at 12:04 AM, Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote: > On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro > <kosaki.motohiro@gmail.com> wrote: >> The fact is, now process stack and pthread stack clearly behave >> different dance. libc don't expect pthread stack grow automatically. >> So, your patch will break userland. Just only change display thing. <snip> > I have also dropped an email on the libc-alpha list here to solicit > comments from libc maintainers on this: > > http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html > Kosaki-san, your suggestion of adding an extra flag seems like the right way to go about this based on the discussion on libc-alpha, specifically, your point about pthread_getattr_np() -- it may not be a standard, but it's a breakage anyway. However, looking at the vm_flags options in mm.h, it looks like the entire 32-bit space has been exhausted for the flag value. The vm_flags is an unsigned long, so it ought to take 8 bytes on a 64-bit system, but 32-bit systems will be left behind. So there are two options for this: 1) make vm_flags 64-bit for all arches. This will cause ABI breakage on 32-bit systems, so any external drivers will have to be rebuilt 2) Implement this patch for 64-bit only by defining the new flag only for 64-bit. 32-bit systems behave as is Which of these would be better? I prefer the latter because it looks like the path of least breakage. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-08 4:00 ` Siddhesh Poyarekar @ 2012-02-08 17:57 ` KOSAKI Motohiro 2012-02-11 10:19 ` Siddhesh Poyarekar 2012-02-11 15:03 ` [PATCH] " Siddhesh Poyarekar 0 siblings, 2 replies; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-08 17:57 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Mike Frysinger (2/7/12 11:00 PM), Siddhesh Poyarekar wrote: > On Sat, Feb 4, 2012 at 12:04 AM, Siddhesh Poyarekar > <siddhesh.poyarekar@gmail.com> wrote: >> On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro >> <kosaki.motohiro@gmail.com> wrote: >>> The fact is, now process stack and pthread stack clearly behave >>> different dance. libc don't expect pthread stack grow automatically. >>> So, your patch will break userland. Just only change display thing. > <snip> >> I have also dropped an email on the libc-alpha list here to solicit >> comments from libc maintainers on this: >> >> http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html >> > > Kosaki-san, your suggestion of adding an extra flag seems like the > right way to go about this based on the discussion on libc-alpha, > specifically, your point about pthread_getattr_np() -- it may not be a > standard, but it's a breakage anyway. However, looking at the vm_flags > options in mm.h, it looks like the entire 32-bit space has been > exhausted for the flag value. The vm_flags is an unsigned long, so it > ought to take 8 bytes on a 64-bit system, but 32-bit systems will be > left behind. > > So there are two options for this: > > 1) make vm_flags 64-bit for all arches. This will cause ABI breakage > on 32-bit systems, so any external drivers will have to be rebuilt Several month ago, Linus NAKed this way. > 2) Implement this patch for 64-bit only by defining the new flag only > for 64-bit. 32-bit systems behave as is No. That's bad than status quo. Enduser may get inconsistent and bad user experience. Now, we are using some bit saving hack. example, 1) use ifdef #ifndef CONFIG_TRANSPARENT_HUGEPAGE #define VM_MAPPED_COPY 0x01000000 /* T if mapped copy of data (nommu mmap) */ #else #define VM_HUGEPAGE 0x01000000 /* MADV_HUGEPAGE marked this vma */ #endif 2) use bit combination #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) Maybe you can take a similar way. And of course, you can ban some useless flag bits. thanks. > Which of these would be better? I prefer the latter because it looks > like the path of least breakage. > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-08 17:57 ` KOSAKI Motohiro @ 2012-02-11 10:19 ` Siddhesh Poyarekar 2012-02-11 15:03 ` [PATCH] " Siddhesh Poyarekar 1 sibling, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-11 10:19 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Jamie Lokier, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Michael Kerrisk, linux-man, Mike Frysinger On Wed, Feb 8, 2012 at 11:27 PM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > Now, we are using some bit saving hack. example, > > 1) use ifdef > > #ifndef CONFIG_TRANSPARENT_HUGEPAGE > #define VM_MAPPED_COPY 0x01000000 /* T if mapped copy of data (nommu > mmap) */ > #else > #define VM_HUGEPAGE 0x01000000 /* MADV_HUGEPAGE marked this vma */ > #endif > > 2) use bit combination > > #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) > > > Maybe you can take a similar way. And of course, you can ban some useless > flag > bits. I found the thread in which Linus rejected the idea of expanding vm_flags: https://lkml.org/lkml/2011/11/10/522 and based on that, I don't think I can justify the need for a new flag for this patch since it is purely for display purposes and has nothing to do with the actual treatment of the vma. So I figured out another way to identify a thread stack without changing the way the vma properties (I should have done this in the first place I think) which is by checking if the vma contains the stack pointer of the task. With this change: /proc/pid/task/tid/maps: will only mark the stack that the task uses /proc/pid/maps: will mark all stacks. This path will be slower since it will iterate through the entire thread group for each vma. I'll test this and attach a new version of the patch. Regards, Siddhesh -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-08 17:57 ` KOSAKI Motohiro 2012-02-11 10:19 ` Siddhesh Poyarekar @ 2012-02-11 15:03 ` Siddhesh Poyarekar 2012-02-21 4:24 ` [RESEND][PATCH] " Siddhesh Poyarekar 2012-02-23 23:17 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps KOSAKI Motohiro 1 sibling, 2 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-11 15:03 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier, Siddhesh Poyarekar Stack for a new thread is mapped by userspace code and passed via sys_clone. This memory is currently seen as anonymous in /proc/<pid>/maps, which makes it difficult to ascertain which mappings are being used for thread stacks. This patch uses the individual task stack pointers to determine which vmas are actually thread stacks. The display for maps, smaps and numa_maps is now different at the thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps) levels. The idea is to give the mapping as the individual tasks see it in /proc/PID/task/TID/maps and then give an overview of the entire mm as it were, in /proc/PID/maps. At the thread group level, all vmas that are used as stacks are marked as such. At the thread level however, only the stack that the task in question uses is marked as such and all others (including the main stack) are marked as anonymous memory. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- Documentation/filesystems/proc.txt | 10 ++- fs/proc/base.c | 12 ++-- fs/proc/internal.h | 9 ++- fs/proc/task_mmu.c | 139 ++++++++++++++++++++++++++++++------ fs/proc/task_nommu.c | 57 ++++++++++++--- include/linux/mm.h | 9 +++ mm/memory.c | 22 ++++++ 7 files changed, 214 insertions(+), 44 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a76a26a..e0f9de3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -356,12 +356,18 @@ The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: [heap] = the heap of the program - [stack] = the stack of the main process + [stack] = the mapping is used as a stack by one + of the threads of the process [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack] only if that task sees it as a stack. This is a key difference from +the content of /proc/PID/maps, where you will see all mappings that are being +used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff --git a/fs/proc/base.c b/fs/proc/base.c index d4548dd..558660a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_pid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_tid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_tid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 2925775..c44efe1 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); -extern const struct file_operations proc_maps_operations; -extern const struct file_operations proc_numa_maps_operations; -extern const struct file_operations proc_smaps_operations; +extern const struct file_operations proc_pid_maps_operations; +extern const struct file_operations proc_tid_maps_operations; +extern const struct file_operations proc_pid_numa_maps_operations; +extern const struct file_operations proc_tid_numa_maps_operations; +extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7dcd2a2..3e166f5 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode, struct file *file, return ret; } -static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) +static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; vm_flags_t flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; @@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(task, vma, is_pid)) { name = "[stack]"; } } else { @@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) seq_putc(m, '\n'); } -static int show_map(struct seq_file *m, void *v) +static int show_map(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); if (m->count < m->size) /* vma is copied successfully */ m->version = (vma != get_gate_vma(task->mm)) @@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v) return 0; } +static int show_pid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 1); +} + +static int show_tid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 0); +} + static const struct seq_operations proc_pid_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map }; -static int maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map +}; + +static int pid_maps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_maps_op); } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_maps_op); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } -static int show_smap(struct seq_file *m, void *v) +static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; @@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v) if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); seq_printf(m, "Size: %8lu kB\n" @@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v) return 0; } +static int show_pid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 1); +} + +static int show_tid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 0); +} + static const struct seq_operations proc_pid_smaps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_smap + .show = show_pid_smap +}; + +static const struct seq_operations proc_tid_smaps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_smap }; -static int smaps_open(struct inode *inode, struct file *file) +static int pid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_smaps_op); } -const struct file_operations proc_smaps_operations = { - .open = smaps_open, +static int tid_smaps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_smaps_op); +} + +const struct file_operations proc_pid_smaps_operations = { + .open = pid_smaps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_smaps_operations = { + .open = tid_smaps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, /* * Display pages allocated per node and memory policy via /proc. */ -static int show_numa_map(struct seq_file *m, void *v) +static int show_numa_map(struct seq_file *m, void *v, int is_pid) { struct numa_maps_private *numa_priv = m->private; struct proc_maps_private *proc_priv = &numa_priv->proc_maps; @@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(proc_priv->task, vma, is_pid)) { seq_printf(m, " stack"); } @@ -1084,21 +1142,39 @@ out: return 0; } +static int show_pid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 1); +} + +static int show_tid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 0); +} + static const struct seq_operations proc_pid_numa_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_numa_map, + .show = show_pid_numa_map, }; -static int numa_maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_numa_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_numa_map, +}; + +static int numa_maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct numa_maps_private *priv; int ret = -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->proc_maps.pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_numa_maps_op); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_numa_maps_operations = { - .open = numa_maps_open, +static int pid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_pid_numa_maps_op); +} + +static int tid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_tid_numa_maps_op); +} + +const struct file_operations proc_pid_numa_maps_operations = { + .open = pid_numa_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_numa_maps_operations = { + .open = tid_numa_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..bdfff69 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len) /* * display a single VMA to a sequenced file */ -static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) +static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, + int is_pid) { struct mm_struct *mm = vma->vm_mm; + struct proc_maps_private *priv = m->private; unsigned long ino = 0; struct file *file; dev_t dev = 0; @@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + if (vm_is_stack(priv->task, vma, is_pid)) pad_len_spaces(m, len); seq_puts(m, "[stack]"); } @@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) /* * display mapping lines for a particular process's /proc/pid/maps */ -static int show_map(struct seq_file *m, void *_p) +static int show_map(struct seq_file *m, void *_p, int is_pid) { struct rb_node *p = _p; - return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb), + is_pid); +} + +static int show_pid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 1); +} + +static int show_tid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 0); } static void *m_start(struct seq_file *m, loff_t *pos) @@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map +}; + +static const struct seq_operations proc_tid_maps_ops = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map }; -static int maps_open(struct inode *inode, struct file *file) +static int maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct proc_maps_private *priv; int ret = -ENOMEM; @@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file) priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_maps_ops); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int pid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_pid_maps_ops); +} + +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_tid_maps_ops); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..b0fc583 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +/* Check if the vma is being used as a stack by this task */ +static inline int vm_is_stack_for_task(struct task_struct *t, + struct vm_area_struct *vma) +{ + return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t)); +} + +extern int vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index fa2f04e..601a920 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip) up_read(¤t->mm->mmap_sem); } +/* + * Check if the vma is being used as a stack. + * If is_group is non-zero, check in the entire thread group or else + * just check in the current task. + */ +int vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) +{ + if (vm_is_stack_for_task(task, vma)) + return 1; + + if (in_group) { + struct task_struct *t = task; + while_each_thread(task, t) { + if (vm_is_stack_for_task(t, vma)) + return 1; + } + } + + return 0; +} + #ifdef CONFIG_PROVE_LOCKING void might_fault(void) { -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-11 15:03 ` [PATCH] " Siddhesh Poyarekar @ 2012-02-21 4:24 ` Siddhesh Poyarekar 2012-02-22 23:00 ` Andrew Morton 2012-02-23 23:17 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps KOSAKI Motohiro 1 sibling, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-21 4:24 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, Mike Frysinger Hi, Resending patch. Regards, Siddhesh ---------- Forwarded message ---------- From: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Date: Sat, Feb 11, 2012 at 8:33 PM Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps To: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org, Jamie Lokier <jamie@shareable.org>, vapier@gentoo.org, Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Stack for a new thread is mapped by userspace code and passed via sys_clone. This memory is currently seen as anonymous in /proc/<pid>/maps, which makes it difficult to ascertain which mappings are being used for thread stacks. This patch uses the individual task stack pointers to determine which vmas are actually thread stacks. The display for maps, smaps and numa_maps is now different at the thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps) levels. The idea is to give the mapping as the individual tasks see it in /proc/PID/task/TID/maps and then give an overview of the entire mm as it were, in /proc/PID/maps. At the thread group level, all vmas that are used as stacks are marked as such. At the thread level however, only the stack that the task in question uses is marked as such and all others (including the main stack) are marked as anonymous memory. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- Documentation/filesystems/proc.txt | 10 ++- fs/proc/base.c | 12 ++-- fs/proc/internal.h | 9 ++- fs/proc/task_mmu.c | 139 ++++++++++++++++++++++++++++++------ fs/proc/task_nommu.c | 57 ++++++++++++--- include/linux/mm.h | 9 +++ mm/memory.c | 22 ++++++ 7 files changed, 214 insertions(+), 44 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a76a26a..e0f9de3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -356,12 +356,18 @@ The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: [heap] = the heap of the program - [stack] = the stack of the main process + [stack] = the mapping is used as a stack by one + of the threads of the process [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack] only if that task sees it as a stack. This is a key difference from +the content of /proc/PID/maps, where you will see all mappings that are being +used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff --git a/fs/proc/base.c b/fs/proc/base.c index d4548dd..558660a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_pid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_tid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_tid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 2925775..c44efe1 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); -extern const struct file_operations proc_maps_operations; -extern const struct file_operations proc_numa_maps_operations; -extern const struct file_operations proc_smaps_operations; +extern const struct file_operations proc_pid_maps_operations; +extern const struct file_operations proc_tid_maps_operations; +extern const struct file_operations proc_pid_numa_maps_operations; +extern const struct file_operations proc_tid_numa_maps_operations; +extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7dcd2a2..3e166f5 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode, struct file *file, return ret; } -static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) +static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; vm_flags_t flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; @@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(task, vma, is_pid)) { name = "[stack]"; } } else { @@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) seq_putc(m, '\n'); } -static int show_map(struct seq_file *m, void *v) +static int show_map(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); if (m->count < m->size) /* vma is copied successfully */ m->version = (vma != get_gate_vma(task->mm)) @@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v) return 0; } +static int show_pid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 1); +} + +static int show_tid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 0); +} + static const struct seq_operations proc_pid_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map }; -static int maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map +}; + +static int pid_maps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_maps_op); } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_maps_op); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } -static int show_smap(struct seq_file *m, void *v) +static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; @@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v) if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); seq_printf(m, "Size: %8lu kB\n" @@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v) return 0; } +static int show_pid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 1); +} + +static int show_tid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 0); +} + static const struct seq_operations proc_pid_smaps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_smap + .show = show_pid_smap +}; + +static const struct seq_operations proc_tid_smaps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_smap }; -static int smaps_open(struct inode *inode, struct file *file) +static int pid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_smaps_op); } -const struct file_operations proc_smaps_operations = { - .open = smaps_open, +static int tid_smaps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_smaps_op); +} + +const struct file_operations proc_pid_smaps_operations = { + .open = pid_smaps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_smaps_operations = { + .open = tid_smaps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, /* * Display pages allocated per node and memory policy via /proc. */ -static int show_numa_map(struct seq_file *m, void *v) +static int show_numa_map(struct seq_file *m, void *v, int is_pid) { struct numa_maps_private *numa_priv = m->private; struct proc_maps_private *proc_priv = &numa_priv->proc_maps; @@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(proc_priv->task, vma, is_pid)) { seq_printf(m, " stack"); } @@ -1084,21 +1142,39 @@ out: return 0; } +static int show_pid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 1); +} + +static int show_tid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 0); +} + static const struct seq_operations proc_pid_numa_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_numa_map, + .show = show_pid_numa_map, }; -static int numa_maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_numa_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_numa_map, +}; + +static int numa_maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct numa_maps_private *priv; int ret = -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->proc_maps.pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_numa_maps_op); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_numa_maps_operations = { - .open = numa_maps_open, +static int pid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_pid_numa_maps_op); +} + +static int tid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_tid_numa_maps_op); +} + +const struct file_operations proc_pid_numa_maps_operations = { + .open = pid_numa_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_numa_maps_operations = { + .open = tid_numa_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..bdfff69 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len) /* * display a single VMA to a sequenced file */ -static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) +static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, + int is_pid) { struct mm_struct *mm = vma->vm_mm; + struct proc_maps_private *priv = m->private; unsigned long ino = 0; struct file *file; dev_t dev = 0; @@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + if (vm_is_stack(priv->task, vma, is_pid)) pad_len_spaces(m, len); seq_puts(m, "[stack]"); } @@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) /* * display mapping lines for a particular process's /proc/pid/maps */ -static int show_map(struct seq_file *m, void *_p) +static int show_map(struct seq_file *m, void *_p, int is_pid) { struct rb_node *p = _p; - return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb), + is_pid); +} + +static int show_pid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 1); +} + +static int show_tid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 0); } static void *m_start(struct seq_file *m, loff_t *pos) @@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map +}; + +static const struct seq_operations proc_tid_maps_ops = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map }; -static int maps_open(struct inode *inode, struct file *file) +static int maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct proc_maps_private *priv; int ret = -ENOMEM; @@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file) priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_maps_ops); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int pid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_pid_maps_ops); +} + +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_tid_maps_ops); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..b0fc583 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +/* Check if the vma is being used as a stack by this task */ +static inline int vm_is_stack_for_task(struct task_struct *t, + struct vm_area_struct *vma) +{ + return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t)); +} + +extern int vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index fa2f04e..601a920 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip) up_read(¤t->mm->mmap_sem); } +/* + * Check if the vma is being used as a stack. + * If is_group is non-zero, check in the entire thread group or else + * just check in the current task. + */ +int vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) +{ + if (vm_is_stack_for_task(task, vma)) + return 1; + + if (in_group) { + struct task_struct *t = task; + while_each_thread(task, t) { + if (vm_is_stack_for_task(t, vma)) + return 1; + } + } + + return 0; +} + #ifdef CONFIG_PROVE_LOCKING void might_fault(void) { -- 1.7.7.4 -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-21 4:24 ` [RESEND][PATCH] " Siddhesh Poyarekar @ 2012-02-22 23:00 ` Andrew Morton 2012-02-23 4:03 ` [PATCH] " Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: Andrew Morton @ 2012-02-22 23:00 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, Mike Frysinger On Tue, 21 Feb 2012 09:54:04 +0530 Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote: > Stack for a new thread is mapped by userspace code and passed via > sys_clone. This memory is currently seen as anonymous in > /proc/<pid>/maps, which makes it difficult to ascertain which mappings > are being used for thread stacks. This patch uses the individual task > stack pointers to determine which vmas are actually thread stacks. > > The display for maps, smaps and numa_maps is now different at the > thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps) > levels. The idea is to give the mapping as the individual tasks see it > in /proc/PID/task/TID/maps and then give an overview of the entire mm > as it were, in /proc/PID/maps. > > At the thread group level, all vmas that are used as stacks are marked > as such. At the thread level however, only the stack that the task in > question uses is marked as such and all others (including the main > stack) are marked as anonymous memory. Please flesh this description out with specific examples of the before-and-after contents of all the applicable procfs files. This way we can clearly see the proposed interface changes, which is the thing we care most about with such a patch. The patch itself has been utterly and hopelessly mangled by gmail. Please fix that up when resending (as a last resort: use a text/plain attachment). ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-22 23:00 ` Andrew Morton @ 2012-02-23 4:03 ` Siddhesh Poyarekar 2012-02-23 20:22 ` Andrew Morton 2012-02-23 23:47 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps Mike Frysinger 0 siblings, 2 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-23 4:03 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, Mike Frysinger, Siddhesh Poyarekar Stack for a new thread is mapped by userspace code and passed via sys_clone. This memory is currently seen as anonymous in /proc/<pid>/maps, which makes it difficult to ascertain which mappings are being used for thread stacks. This patch uses the individual task stack pointers to determine which vmas are actually thread stacks. For a multithreaded program like the following: \#include <pthread.h> void *thread_main(void *foo) { while(1); } int main() { pthread_t t; pthread_create(&t, NULL, thread_main, NULL); pthread_join(t, NULL); } proc/PID/maps looks like the following: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but that is not always a reliable way to find out which vma is a thread stack. Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same content. With this patch in place, /proc/PID/task/TID/maps are treated as 'maps as the task would see it' and hence, only the vma that that task uses as stack is marked as [stack]. All other 'stack' vmas are marked as anonymous memory. /proc/PID/maps acts as a thread group level view, where all stack vmas are marked. So /proc/PID/maps will look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Thus marking all vmas that are used as stacks in the thread group. The task level maps will however look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] where only the vma that is being used as a stack by *that* task is marked as [stack]. Analogous changes have been made to /proc/PID/smaps, /proc/PID/numa_maps, /proc/PID/task/TID/smaps and /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and numa_maps: [siddhesh@localhost ~ ]$ pgrep a.out 1441 [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack\]" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack\]" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack\]" 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack" 7f8a44492000 default stack anon=2 dirty=2 N0=2 7fff6273a000 default stack anon=3 dirty=3 N0=3 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack" 7f8a44492000 default stack anon=2 dirty=2 N0=2 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack" 7fff6273a000 default stack anon=3 dirty=3 N0=3 Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- Documentation/filesystems/proc.txt | 10 ++- fs/proc/base.c | 12 ++-- fs/proc/internal.h | 9 ++- fs/proc/task_mmu.c | 139 ++++++++++++++++++++++++++++++------ fs/proc/task_nommu.c | 57 ++++++++++++--- include/linux/mm.h | 9 +++ mm/memory.c | 22 ++++++ 7 files changed, 214 insertions(+), 44 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a76a26a..e0f9de3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -356,12 +356,18 @@ The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: [heap] = the heap of the program - [stack] = the stack of the main process + [stack] = the mapping is used as a stack by one + of the threads of the process [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack] only if that task sees it as a stack. This is a key difference from +the content of /proc/PID/maps, where you will see all mappings that are being +used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff --git a/fs/proc/base.c b/fs/proc/base.c index d4548dd..558660a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_pid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_tid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_tid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 2925775..c44efe1 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); -extern const struct file_operations proc_maps_operations; -extern const struct file_operations proc_numa_maps_operations; -extern const struct file_operations proc_smaps_operations; +extern const struct file_operations proc_pid_maps_operations; +extern const struct file_operations proc_tid_maps_operations; +extern const struct file_operations proc_pid_numa_maps_operations; +extern const struct file_operations proc_tid_numa_maps_operations; +extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7dcd2a2..3e166f5 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode, struct file *file, return ret; } -static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) +static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; vm_flags_t flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; @@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(task, vma, is_pid)) { name = "[stack]"; } } else { @@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) seq_putc(m, '\n'); } -static int show_map(struct seq_file *m, void *v) +static int show_map(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); if (m->count < m->size) /* vma is copied successfully */ m->version = (vma != get_gate_vma(task->mm)) @@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v) return 0; } +static int show_pid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 1); +} + +static int show_tid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 0); +} + static const struct seq_operations proc_pid_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map }; -static int maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map +}; + +static int pid_maps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_maps_op); } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_maps_op); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } -static int show_smap(struct seq_file *m, void *v) +static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; @@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v) if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); seq_printf(m, "Size: %8lu kB\n" @@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v) return 0; } +static int show_pid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 1); +} + +static int show_tid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 0); +} + static const struct seq_operations proc_pid_smaps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_smap + .show = show_pid_smap +}; + +static const struct seq_operations proc_tid_smaps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_smap }; -static int smaps_open(struct inode *inode, struct file *file) +static int pid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_smaps_op); } -const struct file_operations proc_smaps_operations = { - .open = smaps_open, +static int tid_smaps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_smaps_op); +} + +const struct file_operations proc_pid_smaps_operations = { + .open = pid_smaps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_smaps_operations = { + .open = tid_smaps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, /* * Display pages allocated per node and memory policy via /proc. */ -static int show_numa_map(struct seq_file *m, void *v) +static int show_numa_map(struct seq_file *m, void *v, int is_pid) { struct numa_maps_private *numa_priv = m->private; struct proc_maps_private *proc_priv = &numa_priv->proc_maps; @@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + } else if (vm_is_stack(proc_priv->task, vma, is_pid)) { seq_printf(m, " stack"); } @@ -1084,21 +1142,39 @@ out: return 0; } +static int show_pid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 1); +} + +static int show_tid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 0); +} + static const struct seq_operations proc_pid_numa_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_numa_map, + .show = show_pid_numa_map, }; -static int numa_maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_numa_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_numa_map, +}; + +static int numa_maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct numa_maps_private *priv; int ret = -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->proc_maps.pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_numa_maps_op); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_numa_maps_operations = { - .open = numa_maps_open, +static int pid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_pid_numa_maps_op); +} + +static int tid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_tid_numa_maps_op); +} + +const struct file_operations proc_pid_numa_maps_operations = { + .open = pid_numa_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_numa_maps_operations = { + .open = tid_numa_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..bdfff69 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len) /* * display a single VMA to a sequenced file */ -static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) +static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, + int is_pid) { struct mm_struct *mm = vma->vm_mm; + struct proc_maps_private *priv = m->private; unsigned long ino = 0; struct file *file; dev_t dev = 0; @@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + if (vm_is_stack(priv->task, vma, is_pid)) pad_len_spaces(m, len); seq_puts(m, "[stack]"); } @@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) /* * display mapping lines for a particular process's /proc/pid/maps */ -static int show_map(struct seq_file *m, void *_p) +static int show_map(struct seq_file *m, void *_p, int is_pid) { struct rb_node *p = _p; - return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb), + is_pid); +} + +static int show_pid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 1); +} + +static int show_tid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 0); } static void *m_start(struct seq_file *m, loff_t *pos) @@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map +}; + +static const struct seq_operations proc_tid_maps_ops = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map }; -static int maps_open(struct inode *inode, struct file *file) +static int maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct proc_maps_private *priv; int ret = -ENOMEM; @@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file) priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_maps_ops); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct file *file) return ret; } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int pid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_pid_maps_ops); +} + +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_tid_maps_ops); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff --git a/include/linux/mm.h b/include/linux/mm.h index 17b27cd..b0fc583 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } +/* Check if the vma is being used as a stack by this task */ +static inline int vm_is_stack_for_task(struct task_struct *t, + struct vm_area_struct *vma) +{ + return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t)); +} + +extern int vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index fa2f04e..601a920 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip) up_read(¤t->mm->mmap_sem); } +/* + * Check if the vma is being used as a stack. + * If is_group is non-zero, check in the entire thread group or else + * just check in the current task. + */ +int vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) +{ + if (vm_is_stack_for_task(task, vma)) + return 1; + + if (in_group) { + struct task_struct *t = task; + while_each_thread(task, t) { + if (vm_is_stack_for_task(t, vma)) + return 1; + } + } + + return 0; +} + #ifdef CONFIG_PROVE_LOCKING void might_fault(void) { -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 4:03 ` [PATCH] " Siddhesh Poyarekar @ 2012-02-23 20:22 ` Andrew Morton 2012-02-24 13:05 ` Siddhesh Poyarekar 2012-02-23 23:47 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps Mike Frysinger 1 sibling, 1 reply; 42+ messages in thread From: Andrew Morton @ 2012-02-23 20:22 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, Mike Frysinger On Thu, 23 Feb 2012 09:33:31 +0530 Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote: > With this patch in place, /proc/PID/task/TID/maps are treated as 'maps > as the task would see it' and hence, only the vma that that task uses > as stack is marked as [stack]. All other 'stack' vmas are marked as > anonymous memory. /proc/PID/maps acts as a thread group level view, > where all stack vmas are marked. Looks OK to me, thanks. I doubt if those interface changes will cause significant disruption. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 20:22 ` Andrew Morton @ 2012-02-24 13:05 ` Siddhesh Poyarekar 2012-02-26 16:17 ` [PATCH] x86_64: Record stack pointer before task execution begins Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-24 13:05 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, Mike Frysinger On Fri, Feb 24, 2012 at 1:52 AM, Andrew Morton <akpm@linux-foundation.org> wrote: > Looks OK to me, thanks. I doubt if those interface changes will cause > significant disruption. > I just found one breakage due to this patch: `cat /proc/self/maps` does not always get the stack marked right. I think this is because it gets the $esp a little to early, even before the vma is sent to its randomized space. That is why /proc/self/smaps works just ok as it always wins the race due to the sheer volume of data it prints. Similarly numa_maps always fails since its write volume is lower than maps. I'll try to fix this. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH] x86_64: Record stack pointer before task execution begins 2012-02-24 13:05 ` Siddhesh Poyarekar @ 2012-02-26 16:17 ` Siddhesh Poyarekar 2012-02-27 6:17 ` [tip:x86/process] " tip-bot for Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-26 16:17 UTC (permalink / raw) To: x86 Cc: KOSAKI Motohiro, linux-kernel, Jamie Lokier, Mike Frysinger, Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Siddhesh Poyarekar task->thread.usersp is unusable immediately after a binary is exec()'d until it undergoes a context switch cycle. The start_thread() function called during execve() saves the stack pointer into pt_regs and into old_rsp, but fails to record it into task->thread.usersp. Because of this, KSTK_ESP(task) returns an incorrect value for a 64-bit program until the task is switched out and back in since switch_to swaps %rsp values in and out into task->thread.usersp. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- arch/x86/kernel/process_64.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 9b9fe4a..702a3b9 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -341,6 +341,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, loadsegment(es, _ds); loadsegment(ds, _ds); load_gs_index(0); + current->thread.usersp = new_sp; regs->ip = new_ip; regs->sp = new_sp; percpu_write(old_rsp, new_sp); -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [tip:x86/process] x86_64: Record stack pointer before task execution begins 2012-02-26 16:17 ` [PATCH] x86_64: Record stack pointer before task execution begins Siddhesh Poyarekar @ 2012-02-27 6:17 ` tip-bot for Siddhesh Poyarekar 0 siblings, 0 replies; 42+ messages in thread From: tip-bot for Siddhesh Poyarekar @ 2012-02-27 6:17 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, siddhesh.poyarekar, tglx Commit-ID: 42dfc43ee5999ac64284476ea0ac6c937587cf2b Gitweb: http://git.kernel.org/tip/42dfc43ee5999ac64284476ea0ac6c937587cf2b Author: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> AuthorDate: Sun, 26 Feb 2012 21:47:55 +0530 Committer: H. Peter Anvin <hpa@zytor.com> CommitDate: Sun, 26 Feb 2012 12:59:04 -0800 x86_64: Record stack pointer before task execution begins task->thread.usersp is unusable immediately after a binary is exec()'d until it undergoes a context switch cycle. The start_thread() function called during execve() saves the stack pointer into pt_regs and into old_rsp, but fails to record it into task->thread.usersp. Because of this, KSTK_ESP(task) returns an incorrect value for a 64-bit program until the task is switched out and back in since switch_to swaps %rsp values in and out into task->thread.usersp. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Link: http://lkml.kernel.org/r/1330273075-2949-1-git-send-email-siddhesh.poyarekar@gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/kernel/process_64.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 1fd94bc..eb54dd0 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -341,6 +341,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, loadsegment(es, _ds); loadsegment(ds, _ds); load_gs_index(0); + current->thread.usersp = new_sp; regs->ip = new_ip; regs->sp = new_sp; percpu_write(old_rsp, new_sp); ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 4:03 ` [PATCH] " Siddhesh Poyarekar 2012-02-23 20:22 ` Andrew Morton @ 2012-02-23 23:47 ` Mike Frysinger 2012-02-24 5:47 ` Siddhesh Poyarekar 1 sibling, 1 reply; 42+ messages in thread From: Mike Frysinger @ 2012-02-23 23:47 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier [-- Attachment #1: Type: Text/Plain, Size: 545 bytes --] On Wednesday 22 February 2012 23:03:31 Siddhesh Poyarekar wrote: > With this patch in place, /proc/PID/task/TID/maps are treated as 'maps > as the task would see it' and hence, only the vma that that task uses > as stack is marked as [stack]. All other 'stack' vmas are marked as > anonymous memory. /proc/PID/maps acts as a thread group level view, > where all stack vmas are marked. i don't suppose we could have it say "[tid stack]" rather than "[stack]" ? or perhaps even "[stack tid:%u]" with replacing %u with the tid ? -mike [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 23:47 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps Mike Frysinger @ 2012-02-24 5:47 ` Siddhesh Poyarekar 2012-02-24 16:12 ` Mike Frysinger 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-24 5:47 UTC (permalink / raw) To: Mike Frysinger Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier On Fri, Feb 24, 2012 at 5:17 AM, Mike Frysinger <vapier@gentoo.org> wrote: > i don't suppose we could have it say "[tid stack]" rather than "[stack]" ? or > perhaps even "[stack tid:%u]" with replacing %u with the tid ? Why do we need to differentiate a thread stack from a process stack? If someone really wants to know, the main stack is the last one since it doesn't look like mmap allocates anything above the stack right now. I like the idea of marking all stack vmas with their task ids but it will most likely break procps. Besides, I think it could be done within procps with this change rather than having the kernel do it. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-24 5:47 ` Siddhesh Poyarekar @ 2012-02-24 16:12 ` Mike Frysinger 2012-02-24 18:23 ` Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar 0 siblings, 2 replies; 42+ messages in thread From: Mike Frysinger @ 2012-02-24 16:12 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier [-- Attachment #1: Type: Text/Plain, Size: 1055 bytes --] On Friday 24 February 2012 00:47:48 Siddhesh Poyarekar wrote: > On Fri, Feb 24, 2012 at 5:17 AM, Mike Frysinger wrote: > > i don't suppose we could have it say "[tid stack]" rather than "[stack]" > > ? or perhaps even "[stack tid:%u]" with replacing %u with the tid ? > > Why do we need to differentiate a thread stack from a process stack? if it's trivial to display, it'd be nice to coordinate things when investigating issues > If someone really wants to know, the main stack is the last one since > it doesn't look like mmap allocates anything above the stack right > now. you can't rely on that. you're describing arch-specific details that happen to work. > I like the idea of marking all stack vmas with their task ids but it > will most likely break procps. how ? > Besides, I think it could be done within procps with this change rather than > having the kernel do it. how exactly is procps supposed to figure this out ? /proc/<pid>/maps shows the pid's main stack, as does /proc/<pid>/tid/*/maps. -mike [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-24 16:12 ` Mike Frysinger @ 2012-02-24 18:23 ` Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar 1 sibling, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-24 18:23 UTC (permalink / raw) To: Mike Frysinger Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier On Fri, Feb 24, 2012 at 9:42 PM, Mike Frysinger <vapier@gentoo.org> wrote: >> I like the idea of marking all stack vmas with their task ids but it >> will most likely break procps. > > how ? I don't know yet, since I haven't looked at the procps code. I intend to do that once the patch is stable. But I imagine it would look for "[stack]" or something similar in the output. It ought to be easy enough to change I guess. >> Besides, I think it could be done within procps with this change rather than >> having the kernel do it. > > how exactly is procps supposed to figure this out ? /proc/<pid>/maps shows the > pid's main stack, as does /proc/<pid>/tid/*/maps. Since the maps are essentially the same, it would require pmap for example, to read through the PID/maps as well as TID/maps and associate them. I understand now that this may be a little racy. I'll include thread ids and see how procps copes with it. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 1/2] Take rcu read lock when iterating through thread group 2012-02-24 16:12 ` Mike Frysinger 2012-02-24 18:23 ` Siddhesh Poyarekar @ 2012-03-01 5:20 ` Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 2/2] procfs: Mark stack vma with pid of the owning task Siddhesh Poyarekar ` (2 more replies) 1 sibling, 3 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-03-01 5:20 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel, Oleg Nesterov, Siddhesh Poyarekar Protect the iteration through thread group with rcu_read_lock when looking for tasks in the group that use the current vma as stack. Thanks KOSAKI Motohiro (kosaki.motohiro@gmail.com) for pointing it out. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- mm/memory.c | 15 +++++++++++---- 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 601a920..a88b764 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3915,20 +3915,27 @@ void print_vma_addr(char *prefix, unsigned long ip) * just check in the current task. */ int vm_is_stack(struct task_struct *task, - struct vm_area_struct *vma, int in_group) + struct vm_area_struct *vma, int in_group) { + int ret = 0; + if (vm_is_stack_for_task(task, vma)) return 1; if (in_group) { struct task_struct *t = task; + rcu_read_lock(); while_each_thread(task, t) { - if (vm_is_stack_for_task(t, vma)) - return 1; + if (vm_is_stack_for_task(t, vma)) { + ret = 1; + goto done; + } } } - return 0; +done: + rcu_read_unlock(); + return ret; } #ifdef CONFIG_PROVE_LOCKING -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 2/2] procfs: Mark stack vma with pid of the owning task 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar @ 2012-03-01 5:20 ` Siddhesh Poyarekar 2012-03-01 23:17 ` Andrew Morton 2012-03-01 16:51 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Oleg Nesterov 2012-03-01 23:21 ` Andrew Morton 2 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-03-01 5:20 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel, Oleg Nesterov, Siddhesh Poyarekar This is a usability tweak to the earlier patch to mark thread stack vmas based on which vma a task is using as stack. With this patch, each marked stack is also marked with the pid of the task to make things that little bit easier while troubleshooting issues. I have tested to confirm that this does not break current procps behaviour. Thanks Mike Frysinger <vapier@gentoo.org> for the idea. An example output of cat /proc/self/maps with this patch: $ cat /proc/self/maps 00400000-0040b000 r-xp 00000000 fd:00 1048598 /bin/cat 0060a000-0060b000 r--p 0000a000 fd:00 1048598 /bin/cat 0060b000-0060c000 rw-p 0000b000 fd:00 1048598 /bin/cat 02370000-02391000 rw-p 00000000 00:00 0 [heap] 7fc8fbf93000-7fc9023b6000 r--p 00000000 fd:00 1714381 /usr/lib/locale/locale-archive 7fc9023b6000-7fc902561000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7fc902561000-7fc902761000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7fc902761000-7fc902765000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7fc902765000-7fc902767000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7fc902767000-7fc90276c000 rw-p 00000000 00:00 0 7fc90276c000-7fc90278e000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7fc90295f000-7fc902962000 rw-p 00000000 00:00 0 7fc90298c000-7fc90298d000 rw-p 00000000 00:00 0 7fc90298d000-7fc90298e000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7fc90298e000-7fc90298f000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7fc90298f000-7fc902990000 rw-p 00000000 00:00 0 7fff5fb8d000-7fff5fbae000 rw-p 00000000 00:00 0 [stack:2663] 7fff5fbff000-7fff5fc00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- Documentation/filesystems/proc.txt | 12 ++++++------ fs/proc/task_mmu.c | 16 ++++++++++++---- fs/proc/task_nommu.c | 5 +++-- include/linux/mm.h | 2 +- mm/memory.c | 13 +++++++------ 5 files changed, 29 insertions(+), 19 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index e0f9de3..a31434e 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -337,7 +337,7 @@ a8024000-a8027000 rw-p 00000000 00:00 0 a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 -aff35000-aff4a000 rw-p 00000000 00:00 0 [stack] +aff35000-aff4a000 rw-p 00000000 00:00 0 [stack:1001] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] where "address" is the address space in the process that it occupies, "perms" @@ -356,8 +356,8 @@ The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: [heap] = the heap of the program - [stack] = the mapping is used as a stack by one - of the threads of the process + [stack:1001] = the mapping is used as a stack by the thread + with tid 1001 [vdso] = the "virtual dynamic shared object", the kernel system call handler @@ -365,9 +365,9 @@ is not associated with a file: The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint of the individual tasks of a process. In this file you will see a mapping marked -as [stack] only if that task sees it as a stack. This is a key difference from -the content of /proc/PID/maps, where you will see all mappings that are being -used as stack by all of those tasks. +as [stack:TID] only if that task sees it as a stack. This is a key difference +from the content of /proc/PID/maps, where you will see all mappings that are +being used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f94a12a..8e32e28 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -262,8 +262,14 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]"; - } else if (vm_is_stack(task, vma, is_pid)) { - name = "[stack]"; + } else { + pid_t tid = + vm_is_stack(task, vma, is_pid); + if (tid != 0) { + pad_len_spaces(m, len); + seq_printf(m, "[stack:%d]", + tid); + } } } else { name = "[vdso]"; @@ -1099,8 +1105,10 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid) seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vm_is_stack(proc_priv->task, vma, is_pid)) { - seq_printf(m, " stack"); + } else { + pid_t tid = vm_is_stack(proc_priv->task, vma, is_pid); + if (tid != 0) + seq_printf(m, " stack:%d", tid); } if (is_vm_hugetlb_page(vma)) diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index bdfff69..8aaba8c 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -170,9 +170,10 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vm_is_stack(priv->task, vma, is_pid)) + pid_t tid = vm_is_stack(priv->task, vma, is_pid); + if (tid != 0) { pad_len_spaces(m, len); - seq_puts(m, "[stack]"); + seq_printf(m, "[stack:%d]", tid); } } diff --git a/include/linux/mm.h b/include/linux/mm.h index 58d47ae..53c3e75 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1040,7 +1040,7 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma, !vma_growsup(vma->vm_next, addr); } -extern int +extern pid_t vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); extern unsigned long move_page_tables(struct vm_area_struct *vma, diff --git a/mm/memory.c b/mm/memory.c index d44b180..2533d9f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3919,22 +3919,23 @@ void print_vma_addr(char *prefix, unsigned long ip) /* * Check if the vma is being used as a stack. * If is_group is non-zero, check in the entire thread group or else - * just check in the current task. + * just check in the current task. Returns the pid of the task that + * the vma is stack for. */ -int vm_is_stack(struct task_struct *task, - struct vm_area_struct *vma, int in_group) +pid_t vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) { - int ret = 0; + pid_t ret = 0; if (vm_is_stack_for_task(task, vma)) - return 1; + return task->pid; if (in_group) { struct task_struct *t = task; rcu_read_lock(); while_each_thread(task, t) { if (vm_is_stack_for_task(t, vma)) { - ret = 1; + ret = t->pid; goto done; } } -- 1.7.7.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 2/2] procfs: Mark stack vma with pid of the owning task 2012-03-01 5:20 ` [PATCH 2/2] procfs: Mark stack vma with pid of the owning task Siddhesh Poyarekar @ 2012-03-01 23:17 ` Andrew Morton 0 siblings, 0 replies; 42+ messages in thread From: Andrew Morton @ 2012-03-01 23:17 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel, Oleg Nesterov On Thu, 1 Mar 2012 10:50:59 +0530 Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote: > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -262,8 +262,14 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) > if (vma->vm_start <= mm->brk && > vma->vm_end >= mm->start_brk) { > name = "[heap]"; > - } else if (vm_is_stack(task, vma, is_pid)) { > - name = "[stack]"; > + } else { > + pid_t tid = > + vm_is_stack(task, vma, is_pid); > + if (tid != 0) { > + pad_len_spaces(m, len); > + seq_printf(m, "[stack:%d]", > + tid); > + } Well, the 80-column police police will get you there, and it is pretty silly-looking. We can avoid the first line break by doing pid_t tid; tid = vm_is_stack(task, vma, is_pid); nice and easy! And we can avoid the other by saving a whole tabstop: --- a/fs/proc/task_mmu.c~procfs-mark-stack-vma-with-pid-of-the-owning-task-fix +++ a/fs/proc/task_mmu.c @@ -222,6 +222,7 @@ show_map_vma(struct seq_file *m, struct unsigned long start, end; dev_t dev = 0; int len; + const char *name; if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; @@ -255,31 +256,33 @@ show_map_vma(struct seq_file *m, struct if (file) { pad_len_spaces(m, len); seq_path(m, &file->f_path, "\n"); - } else { - const char *name = arch_vma_name(vma); - if (!name) { - if (mm) { - if (vma->vm_start <= mm->brk && - vma->vm_end >= mm->start_brk) { - name = "[heap]"; - } else { - pid_t tid = - vm_is_stack(task, vma, is_pid); - if (tid != 0) { - pad_len_spaces(m, len); - seq_printf(m, "[stack:%d]", - tid); - } - } + goto out; + } + + name = arch_vma_name(vma); + if (!name) { + if (mm) { + if (vma->vm_start <= mm->brk && + vma->vm_end >= mm->start_brk) { + name = "[heap]"; } else { - name = "[vdso]"; + pid_t tid; + + tid = vm_is_stack(task, vma, is_pid); + if (tid != 0) { + pad_len_spaces(m, len); + seq_printf(m, "[stack:%d]", tid); + } } - } - if (name) { - pad_len_spaces(m, len); - seq_puts(m, name); + } else { + name = "[vdso]"; } } + if (name) { + pad_len_spaces(m, len); + seq_puts(m, name); + } +out: seq_putc(m, '\n'); } Which is not the prettiest thing in the world, but it gets the job done. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] Take rcu read lock when iterating through thread group 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 2/2] procfs: Mark stack vma with pid of the owning task Siddhesh Poyarekar @ 2012-03-01 16:51 ` Oleg Nesterov 2012-03-01 23:21 ` Andrew Morton 2 siblings, 0 replies; 42+ messages in thread From: Oleg Nesterov @ 2012-03-01 16:51 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Andrew Morton, KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel On 03/01, Siddhesh Poyarekar wrote: > > Protect the iteration through thread group with rcu_read_lock when > looking for tasks in the group that use the current vma as > stack. Thanks KOSAKI Motohiro (kosaki.motohiro@gmail.com) for pointing > it out. > > Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> > --- > mm/memory.c | 15 +++++++++++---- > 1 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 601a920..a88b764 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3915,20 +3915,27 @@ void print_vma_addr(char *prefix, unsigned long ip) > * just check in the current task. > */ > int vm_is_stack(struct task_struct *task, > - struct vm_area_struct *vma, int in_group) > + struct vm_area_struct *vma, int in_group) > { > + int ret = 0; > + > if (vm_is_stack_for_task(task, vma)) > return 1; > > if (in_group) { > struct task_struct *t = task; > + rcu_read_lock(); > while_each_thread(task, t) { This is the commont mistake. rcu_read_lock() can not help unless you verify that ->thread_group.next still points to the rcu-protected memory. Just suppose that this task exits, then next_thread() exits too. Now you take rcu_read_lock() but it is too late, ->next points to nowhere. Also. In fact while_each_thread() is not safe under rcu. We are going to fix this, but only for the case when while_each_thread() starts at the thread group leader. Oleg. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] Take rcu read lock when iterating through thread group 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 2/2] procfs: Mark stack vma with pid of the owning task Siddhesh Poyarekar 2012-03-01 16:51 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Oleg Nesterov @ 2012-03-01 23:21 ` Andrew Morton 2012-03-04 20:04 ` Siddhesh Poyarekar 2 siblings, 1 reply; 42+ messages in thread From: Andrew Morton @ 2012-03-01 23:21 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel, Oleg Nesterov On Thu, 1 Mar 2012 10:50:58 +0530 Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote: > Protect the iteration through thread group with rcu_read_lock when > looking for tasks in the group that use the current vma as > stack. Thanks KOSAKI Motohiro (kosaki.motohiro@gmail.com) for pointing > it out. It will help reviwers a lot if you tell them that this is a fix again a patch which is presently in -mm (and linux-next). This patch has got a bit out of control. I've folded all six patches back into a single one, which is below. The changelog which I have is now a little out of date - can you please send an updated one? From: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Subject: procfs: mark thread stack correctly in proc/<pid>/maps Stack for a new thread is mapped by userspace code and passed via sys_clone. This memory is currently seen as anonymous in /proc/<pid>/maps, which makes it difficult to ascertain which mappings are being used for thread stacks. This patch uses the individual task stack pointers to determine which vmas are actually thread stacks. For a multithreaded program like the following: #include <pthread.h> void *thread_main(void *foo) { while(1); } int main() { pthread_t t; pthread_create(&t, NULL, thread_main, NULL); pthread_join(t, NULL); } proc/PID/maps looks like the following: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but that is not always a reliable way to find out which vma is a thread stack. Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same content. With this patch in place, /proc/PID/task/TID/maps are treated as 'maps as the task would see it' and hence, only the vma that that task uses as stack is marked as [stack]. All other 'stack' vmas are marked as anonymous memory. /proc/PID/maps acts as a thread group level view, where all stack vmas are marked. So /proc/PID/maps will look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Thus marking all vmas that are used as stacks in the thread group. The task level maps will however look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] where only the vma that is being used as a stack by *that* task is marked as [stack]. Analogous changes have been made to /proc/PID/smaps, /proc/PID/numa_maps, /proc/PID/task/TID/smaps and /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and numa_maps: [siddhesh@localhost ~ ]$ pgrep a.out 1441 [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack\]" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack\]" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack\]" 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack" 7f8a44492000 default stack anon=2 dirty=2 N0=2 7fff6273a000 default stack anon=3 dirty=3 N0=3 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack" 7f8a44492000 default stack anon=2 dirty=2 N0=2 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack" 7fff6273a000 default stack anon=3 dirty=3 N0=3 [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix build] Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jamie Lokier <jamie@shareable.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/filesystems/proc.txt | 12 + fs/proc/base.c | 12 - fs/proc/internal.h | 9 - fs/proc/task_mmu.c | 187 +++++++++++++++++++++------ fs/proc/task_nommu.c | 60 +++++++- include/linux/mm.h | 3 mm/memory.c | 37 +++++ 7 files changed, 256 insertions(+), 64 deletions(-) diff -puN Documentation/filesystems/proc.txt~procfs-mark-thread-stack-correctly-in-proc-pid-maps Documentation/filesystems/proc.txt --- a/Documentation/filesystems/proc.txt~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (a rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -337,7 +337,7 @@ a8024000-a8027000 rw-p 00000000 00:00 0 a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 -aff35000-aff4a000 rw-p 00000000 00:00 0 [stack] +aff35000-aff4a000 rw-p 00000000 00:00 0 [stack:1001] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] where "address" is the address space in the process that it occupies, "perms" @@ -356,12 +356,18 @@ The "pathname" shows the name associated is not associated with a file: [heap] = the heap of the program - [stack] = the stack of the main process + [stack:1001] = the mapping is used as a stack by the thread + with tid 1001 [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack:TID] only if that task sees it as a stack. This is a key difference +from the content of /proc/PID/maps, where you will see all mappings that are +being used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff -puN fs/proc/base.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/base.c --- a/fs/proc/base.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/base.c @@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_ INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_pid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_ REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_s INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_tid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_s REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_tid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff -puN fs/proc/internal.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/internal.h --- a/fs/proc/internal.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/internal.h @@ -56,9 +56,12 @@ extern int proc_pid_statm(struct seq_fil struct pid *pid, struct task_struct *task); extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); -extern const struct file_operations proc_maps_operations; -extern const struct file_operations proc_numa_maps_operations; -extern const struct file_operations proc_smaps_operations; +extern const struct file_operations proc_pid_maps_operations; +extern const struct file_operations proc_tid_maps_operations; +extern const struct file_operations proc_pid_numa_maps_operations; +extern const struct file_operations proc_tid_numa_maps_operations; +extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; diff -puN fs/proc/task_mmu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/task_mmu.c --- a/fs/proc/task_mmu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/task_mmu.c @@ -209,16 +209,20 @@ static int do_maps_open(struct inode *in return ret; } -static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) +static void +show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; vm_flags_t flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; unsigned long start, end; dev_t dev = 0; int len; + const char *name; if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; @@ -252,36 +256,43 @@ static void show_map_vma(struct seq_file if (file) { pad_len_spaces(m, len); seq_path(m, &file->f_path, "\n"); - } else { - const char *name = arch_vma_name(vma); - if (!name) { - if (mm) { - if (vma->vm_start <= mm->brk && - vma->vm_end >= mm->start_brk) { - name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { - name = "[stack]"; - } + goto out; + } + + name = arch_vma_name(vma); + if (!name) { + if (mm) { + if (vma->vm_start <= mm->brk && + vma->vm_end >= mm->start_brk) { + name = "[heap]"; } else { - name = "[vdso]"; + pid_t tid; + + tid = vm_is_stack(task, vma, is_pid); + if (tid != 0) { + pad_len_spaces(m, len); + seq_printf(m, "[stack:%d]", tid); + } } - } - if (name) { - pad_len_spaces(m, len); - seq_puts(m, name); + } else { + name = "[vdso]"; } } + if (name) { + pad_len_spaces(m, len); + seq_puts(m, name); + } +out: seq_putc(m, '\n'); } -static int show_map(struct seq_file *m, void *v) +static int show_map(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); if (m->count < m->size) /* vma is copied successfully */ m->version = (vma != get_gate_vma(task->mm)) @@ -289,20 +300,49 @@ static int show_map(struct seq_file *m, return 0; } +static int show_pid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 1); +} + +static int show_tid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 0); +} + static const struct seq_operations proc_pid_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map }; -static int maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map +}; + +static int pid_maps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_maps_op); } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_maps_op); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -413,7 +453,7 @@ static int smaps_pte_range(pmd_t *pmd, u return 0; } -static int show_smap(struct seq_file *m, void *v) +static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; @@ -431,7 +471,7 @@ static int show_smap(struct seq_file *m, if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); seq_printf(m, "Size: %8lu kB\n" @@ -470,20 +510,49 @@ static int show_smap(struct seq_file *m, return 0; } +static int show_pid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 1); +} + +static int show_tid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 0); +} + static const struct seq_operations proc_pid_smaps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_smap + .show = show_pid_smap }; -static int smaps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_smaps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_smap +}; + +static int pid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_smaps_op); } -const struct file_operations proc_smaps_operations = { - .open = smaps_open, +static int tid_smaps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_smaps_op); +} + +const struct file_operations proc_pid_smaps_operations = { + .open = pid_smaps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_smaps_operations = { + .open = tid_smaps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -1026,7 +1095,7 @@ static int gather_hugetbl_stats(pte_t *p /* * Display pages allocated per node and memory policy via /proc. */ -static int show_numa_map(struct seq_file *m, void *v) +static int show_numa_map(struct seq_file *m, void *v, int is_pid) { struct numa_maps_private *numa_priv = m->private; struct proc_maps_private *proc_priv = &numa_priv->proc_maps; @@ -1063,9 +1132,10 @@ static int show_numa_map(struct seq_file seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { - seq_printf(m, " stack"); + } else { + pid_t tid = vm_is_stack(proc_priv->task, vma, is_pid); + if (tid != 0) + seq_printf(m, " stack:%d", tid); } if (is_vm_hugetlb_page(vma)) @@ -1108,21 +1178,39 @@ out: return 0; } +static int show_pid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 1); +} + +static int show_tid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 0); +} + static const struct seq_operations proc_pid_numa_maps_op = { - .start = m_start, - .next = m_next, - .stop = m_stop, - .show = show_numa_map, + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_pid_numa_map, }; -static int numa_maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_numa_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_numa_map, +}; + +static int numa_maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct numa_maps_private *priv; int ret = -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->proc_maps.pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_numa_maps_op); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -1133,8 +1221,25 @@ static int numa_maps_open(struct inode * return ret; } -const struct file_operations proc_numa_maps_operations = { - .open = numa_maps_open, +static int pid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_pid_numa_maps_op); +} + +static int tid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_tid_numa_maps_op); +} + +const struct file_operations proc_pid_numa_maps_operations = { + .open = pid_numa_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_numa_maps_operations = { + .open = tid_numa_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff -puN fs/proc/task_nommu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/task_nommu.c --- a/fs/proc/task_nommu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/task_nommu.c @@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_fi /* * display a single VMA to a sequenced file */ -static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) +static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, + int is_pid) { struct mm_struct *mm = vma->vm_mm; + struct proc_maps_private *priv = m->private; unsigned long ino = 0; struct file *file; dev_t dev = 0; @@ -168,10 +170,10 @@ static int nommu_vma_show(struct seq_fil pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + pid_t tid = vm_is_stack(priv->task, vma, is_pid); + if (tid != 0) { pad_len_spaces(m, len); - seq_puts(m, "[stack]"); + seq_printf(m, "[stack:%d]", tid); } } @@ -182,11 +184,22 @@ static int nommu_vma_show(struct seq_fil /* * display mapping lines for a particular process's /proc/pid/maps */ -static int show_map(struct seq_file *m, void *_p) +static int show_map(struct seq_file *m, void *_p, int is_pid) { struct rb_node *p = _p; - return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb), + is_pid); +} + +static int show_pid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 1); +} + +static int show_tid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 0); } static void *m_start(struct seq_file *m, loff_t *pos) @@ -240,10 +253,18 @@ static const struct seq_operations proc_ .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map +}; + +static const struct seq_operations proc_tid_maps_ops = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map }; -static int maps_open(struct inode *inode, struct file *file) +static int maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct proc_maps_private *priv; int ret = -ENOMEM; @@ -251,7 +272,7 @@ static int maps_open(struct inode *inode priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_maps_ops); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -262,8 +283,25 @@ static int maps_open(struct inode *inode return ret; } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int pid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_pid_maps_ops); +} + +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_tid_maps_ops); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff -puN include/linux/mm.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps include/linux/mm.h --- a/include/linux/mm.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/include/linux/mm.h @@ -1041,6 +1041,9 @@ static inline int stack_guard_page_end(s !vma_growsup(vma->vm_next, addr); } +extern pid_t +vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff -puN mm/memory.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps mm/memory.c --- a/mm/memory.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/mm/memory.c @@ -112,6 +112,13 @@ __setup("norandmaps", disable_randmaps); unsigned long zero_pfn __read_mostly; unsigned long highest_memmap_pfn __read_mostly; +/* Check if the vma is being used as a stack by this task */ +static int vm_is_stack_for_task(struct task_struct *t, + struct vm_area_struct *vma) +{ + return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t)); +} + /* * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init() */ @@ -3891,6 +3898,36 @@ void print_vma_addr(char *prefix, unsign up_read(¤t->mm->mmap_sem); } +/* + * Check if the vma is being used as a stack. + * If is_group is non-zero, check in the entire thread group or else + * just check in the current task. Returns the pid of the task that + * the vma is stack for. + */ +pid_t vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) +{ + pid_t ret = 0; + + if (vm_is_stack_for_task(task, vma)) + return task->pid; + + if (in_group) { + struct task_struct *t = task; + rcu_read_lock(); + while_each_thread(task, t) { + if (vm_is_stack_for_task(t, vma)) { + ret = t->pid; + goto done; + } + } + } + +done: + rcu_read_unlock(); + return ret; +} + #ifdef CONFIG_PROVE_LOCKING void might_fault(void) { _ ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] Take rcu read lock when iterating through thread group 2012-03-01 23:21 ` Andrew Morton @ 2012-03-04 20:04 ` Siddhesh Poyarekar 0 siblings, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-03-04 20:04 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, linux-kernel, Oleg Nesterov On Fri, Mar 2, 2012 at 4:51 AM, Andrew Morton <akpm@linux-foundation.org> wrote: > It will help reviwers a lot if you tell them that this is a fix again a > patch which is presently in -mm (and linux-next). I'll keep this in mind for the future, thanks and sorry about the trouble. > This patch has got a bit out of control. I've folded all six patches > back into a single one, which is below. > > The changelog which I have is now a little out of date - can you please > send an updated one? Updated ChangeLog below. I think I still need to fix Oleg's observation. I'll send a patch on top of this and mark it correctly as a fix on top of mm and linux-next. Regards, Siddhesh From: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Subject: procfs: mark thread stack correctly in proc/<pid>/maps Stack for a new thread is mapped by userspace code and passed via sys_clone. This memory is currently seen as anonymous in /proc/<pid>/maps, which makes it difficult to ascertain which mappings are being used for thread stacks. This patch uses the individual task stack pointers to determine which vmas are actually thread stacks. For a multithreaded program like the following: #include <pthread.h> void *thread_main(void *foo) { while(1); } int main() { pthread_t t; pthread_create(&t, NULL, thread_main, NULL); pthread_join(t, NULL); } proc/PID/maps looks like the following: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but that is not always a reliable way to find out which vma is a thread stack. Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same content. With this patch in place, /proc/PID/task/TID/maps are treated as 'maps as the task would see it' and hence, only the vma that that task uses as stack is marked as [stack:TID] where TID is the process ID of the task that is currently using this vma as stack. All other 'stack' vmas are marked as anonymous memory. /proc/PID/maps acts as a thread group level view, where all stack vmas are marked. So /proc/PID/maps will look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack:1441] 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Thus marking all vmas that are used as stacks in the thread group. The task level maps will however look like this: 00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out 00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out 019ef000-01a10000 rw-p 00000000 00:00 0 [heap] 7f8a44491000-7f8a44492000 ---p 00000000 00:00 0 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442] 7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so 7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0 7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so 7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0 7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0 7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0 7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so 7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] where only the vma that is being used as a stack by *that* task is marked as [stack:TID]. Analogous changes have been made to /proc/PID/smaps, /proc/PID/numa_maps, /proc/PID/task/TID/smaps and /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and numa_maps: [siddhesh@localhost ~ ]$ pgrep a.out 1441 [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442] 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack:1441] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack" 7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442] [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack" 7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack:1441] [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack" 7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2 7fff6273a000 default stack:1441 anon=3 dirty=3 N0=3 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack" 7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2 [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack" 7fff6273a000 default stack:1441 anon=3 dirty=3 N0=3 [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix build] Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jamie Lokier <jamie@shareable.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/filesystems/proc.txt | 12 + fs/proc/base.c | 12 - fs/proc/internal.h | 9 - fs/proc/task_mmu.c | 187 +++++++++++++++++++++------ fs/proc/task_nommu.c | 60 +++++++- include/linux/mm.h | 3 mm/memory.c | 37 +++++ 7 files changed, 256 insertions(+), 64 deletions(-) diff -puN Documentation/filesystems/proc.txt~procfs-mark-thread-stack-correctly-in-proc-pid-maps Documentation/filesystems/proc.txt --- a/Documentation/filesystems/proc.txt~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (a rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -337,7 +337,7 @@ a8024000-a8027000 rw-p 00000000 00:00 0 a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 -aff35000-aff4a000 rw-p 00000000 00:00 0 [stack] +aff35000-aff4a000 rw-p 00000000 00:00 0 [stack:1001] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] where "address" is the address space in the process that it occupies, "perms" @@ -356,12 +356,18 @@ The "pathname" shows the name associated is not associated with a file: [heap] = the heap of the program - [stack] = the stack of the main process + [stack:1001] = the mapping is used as a stack by the thread + with tid 1001 [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack:TID] only if that task sees it as a stack. This is a key difference +from the content of /proc/PID/maps, where you will see all mappings that are +being used as stack by all of those tasks. The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff -puN fs/proc/base.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/base.c --- a/fs/proc/base.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/base.c @@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_ INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_pid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_ REG("mountstats", S_IRUSR, proc_mountstats_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_pid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY @@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_s INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), - REG("maps", S_IRUGO, proc_maps_operations), + REG("maps", S_IRUGO, proc_tid_maps_operations), #ifdef CONFIG_NUMA - REG("numa_maps", S_IRUGO, proc_numa_maps_operations), + REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), LNK("cwd", proc_cwd_link), @@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_s REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), - REG("smaps", S_IRUGO, proc_smaps_operations), + REG("smaps", S_IRUGO, proc_tid_smaps_operations), REG("pagemap", S_IRUGO, proc_pagemap_operations), #endif #ifdef CONFIG_SECURITY diff -puN fs/proc/internal.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/internal.h --- a/fs/proc/internal.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/internal.h @@ -56,9 +56,12 @@ extern int proc_pid_statm(struct seq_fil struct pid *pid, struct task_struct *task); extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); -extern const struct file_operations proc_maps_operations; -extern const struct file_operations proc_numa_maps_operations; -extern const struct file_operations proc_smaps_operations; +extern const struct file_operations proc_pid_maps_operations; +extern const struct file_operations proc_tid_maps_operations; +extern const struct file_operations proc_pid_numa_maps_operations; +extern const struct file_operations proc_tid_numa_maps_operations; +extern const struct file_operations proc_pid_smaps_operations; +extern const struct file_operations proc_tid_smaps_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern const struct file_operations proc_net_operations; diff -puN fs/proc/task_mmu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/task_mmu.c --- a/fs/proc/task_mmu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/task_mmu.c @@ -209,16 +209,20 @@ static int do_maps_open(struct inode *in return ret; } -static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) +static void +show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; + struct proc_maps_private *priv = m->private; + struct task_struct *task = priv->task; vm_flags_t flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; unsigned long start, end; dev_t dev = 0; int len; + const char *name; if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; @@ -252,36 +256,43 @@ static void show_map_vma(struct seq_file if (file) { pad_len_spaces(m, len); seq_path(m, &file->f_path, "\n"); - } else { - const char *name = arch_vma_name(vma); - if (!name) { - if (mm) { - if (vma->vm_start <= mm->brk && - vma->vm_end >= mm->start_brk) { - name = "[heap]"; - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { - name = "[stack]"; - } + goto out; + } + + name = arch_vma_name(vma); + if (!name) { + if (mm) { + if (vma->vm_start <= mm->brk && + vma->vm_end >= mm->start_brk) { + name = "[heap]"; } else { - name = "[vdso]"; + pid_t tid; + + tid = vm_is_stack(task, vma, is_pid); + if (tid != 0) { + pad_len_spaces(m, len); + seq_printf(m, "[stack:%d]", tid); + } } - } - if (name) { - pad_len_spaces(m, len); - seq_puts(m, name); + } else { + name = "[vdso]"; } } + if (name) { + pad_len_spaces(m, len); + seq_puts(m, name); + } +out: seq_putc(m, '\n'); } -static int show_map(struct seq_file *m, void *v) +static int show_map(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); if (m->count < m->size) /* vma is copied successfully */ m->version = (vma != get_gate_vma(task->mm)) @@ -289,20 +300,49 @@ static int show_map(struct seq_file *m, return 0; } +static int show_pid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 1); +} + +static int show_tid_map(struct seq_file *m, void *v) +{ + return show_map(m, v, 0); +} + static const struct seq_operations proc_pid_maps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map }; -static int maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map +}; + +static int pid_maps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_maps_op); } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_maps_op); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -413,7 +453,7 @@ static int smaps_pte_range(pmd_t *pmd, u return 0; } -static int show_smap(struct seq_file *m, void *v) +static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; struct task_struct *task = priv->task; @@ -431,7 +471,7 @@ static int show_smap(struct seq_file *m, if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); - show_map_vma(m, vma); + show_map_vma(m, vma, is_pid); seq_printf(m, "Size: %8lu kB\n" @@ -470,20 +510,49 @@ static int show_smap(struct seq_file *m, return 0; } +static int show_pid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 1); +} + +static int show_tid_smap(struct seq_file *m, void *v) +{ + return show_smap(m, v, 0); +} + static const struct seq_operations proc_pid_smaps_op = { .start = m_start, .next = m_next, .stop = m_stop, - .show = show_smap + .show = show_pid_smap }; -static int smaps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_smaps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_smap +}; + +static int pid_smaps_open(struct inode *inode, struct file *file) { return do_maps_open(inode, file, &proc_pid_smaps_op); } -const struct file_operations proc_smaps_operations = { - .open = smaps_open, +static int tid_smaps_open(struct inode *inode, struct file *file) +{ + return do_maps_open(inode, file, &proc_tid_smaps_op); +} + +const struct file_operations proc_pid_smaps_operations = { + .open = pid_smaps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_smaps_operations = { + .open = tid_smaps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, @@ -1026,7 +1095,7 @@ static int gather_hugetbl_stats(pte_t *p /* * Display pages allocated per node and memory policy via /proc. */ -static int show_numa_map(struct seq_file *m, void *v) +static int show_numa_map(struct seq_file *m, void *v, int is_pid) { struct numa_maps_private *numa_priv = m->private; struct proc_maps_private *proc_priv = &numa_priv->proc_maps; @@ -1063,9 +1132,10 @@ static int show_numa_map(struct seq_file seq_path(m, &file->f_path, "\n\t= "); } else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { seq_printf(m, " heap"); - } else if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { - seq_printf(m, " stack"); + } else { + pid_t tid = vm_is_stack(proc_priv->task, vma, is_pid); + if (tid != 0) + seq_printf(m, " stack:%d", tid); } if (is_vm_hugetlb_page(vma)) @@ -1108,21 +1178,39 @@ out: return 0; } +static int show_pid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 1); +} + +static int show_tid_numa_map(struct seq_file *m, void *v) +{ + return show_numa_map(m, v, 0); +} + static const struct seq_operations proc_pid_numa_maps_op = { - .start = m_start, - .next = m_next, - .stop = m_stop, - .show = show_numa_map, + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_pid_numa_map, }; -static int numa_maps_open(struct inode *inode, struct file *file) +static const struct seq_operations proc_tid_numa_maps_op = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_numa_map, +}; + +static int numa_maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct numa_maps_private *priv; int ret = -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->proc_maps.pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_numa_maps_op); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -1133,8 +1221,25 @@ static int numa_maps_open(struct inode * return ret; } -const struct file_operations proc_numa_maps_operations = { - .open = numa_maps_open, +static int pid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_pid_numa_maps_op); +} + +static int tid_numa_maps_open(struct inode *inode, struct file *file) +{ + return numa_maps_open(inode, file, &proc_tid_numa_maps_op); +} + +const struct file_operations proc_pid_numa_maps_operations = { + .open = pid_numa_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_numa_maps_operations = { + .open = tid_numa_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff -puN fs/proc/task_nommu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps fs/proc/task_nommu.c --- a/fs/proc/task_nommu.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/fs/proc/task_nommu.c @@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_fi /* * display a single VMA to a sequenced file */ -static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) +static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, + int is_pid) { struct mm_struct *mm = vma->vm_mm; + struct proc_maps_private *priv = m->private; unsigned long ino = 0; struct file *file; dev_t dev = 0; @@ -168,10 +170,10 @@ static int nommu_vma_show(struct seq_fil pad_len_spaces(m, len); seq_path(m, &file->f_path, ""); } else if (mm) { - if (vma->vm_start <= mm->start_stack && - vma->vm_end >= mm->start_stack) { + pid_t tid = vm_is_stack(priv->task, vma, is_pid); + if (tid != 0) { pad_len_spaces(m, len); - seq_puts(m, "[stack]"); + seq_printf(m, "[stack:%d]", tid); } } @@ -182,11 +184,22 @@ static int nommu_vma_show(struct seq_fil /* * display mapping lines for a particular process's /proc/pid/maps */ -static int show_map(struct seq_file *m, void *_p) +static int show_map(struct seq_file *m, void *_p, int is_pid) { struct rb_node *p = _p; - return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb), + is_pid); +} + +static int show_pid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 1); +} + +static int show_tid_map(struct seq_file *m, void *_p) +{ + return show_map(m, _p, 0); } static void *m_start(struct seq_file *m, loff_t *pos) @@ -240,10 +253,18 @@ static const struct seq_operations proc_ .start = m_start, .next = m_next, .stop = m_stop, - .show = show_map + .show = show_pid_map +}; + +static const struct seq_operations proc_tid_maps_ops = { + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_tid_map }; -static int maps_open(struct inode *inode, struct file *file) +static int maps_open(struct inode *inode, struct file *file, + const struct seq_operations *ops) { struct proc_maps_private *priv; int ret = -ENOMEM; @@ -251,7 +272,7 @@ static int maps_open(struct inode *inode priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (priv) { priv->pid = proc_pid(inode); - ret = seq_open(file, &proc_pid_maps_ops); + ret = seq_open(file, ops); if (!ret) { struct seq_file *m = file->private_data; m->private = priv; @@ -262,8 +283,25 @@ static int maps_open(struct inode *inode return ret; } -const struct file_operations proc_maps_operations = { - .open = maps_open, +static int pid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_pid_maps_ops); +} + +static int tid_maps_open(struct inode *inode, struct file *file) +{ + return maps_open(inode, file, &proc_tid_maps_ops); +} + +const struct file_operations proc_pid_maps_operations = { + .open = pid_maps_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + +const struct file_operations proc_tid_maps_operations = { + .open = tid_maps_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release_private, diff -puN include/linux/mm.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps include/linux/mm.h --- a/include/linux/mm.h~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/include/linux/mm.h @@ -1041,6 +1041,9 @@ static inline int stack_guard_page_end(s !vma_growsup(vma->vm_next, addr); } +extern pid_t +vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group); + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff -puN mm/memory.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps mm/memory.c --- a/mm/memory.c~procfs-mark-thread-stack-correctly-in-proc-pid-maps +++ a/mm/memory.c @@ -112,6 +112,13 @@ __setup("norandmaps", disable_randmaps); unsigned long zero_pfn __read_mostly; unsigned long highest_memmap_pfn __read_mostly; +/* Check if the vma is being used as a stack by this task */ +static int vm_is_stack_for_task(struct task_struct *t, + struct vm_area_struct *vma) +{ + return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t)); +} + /* * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init() */ @@ -3891,6 +3898,36 @@ void print_vma_addr(char *prefix, unsign up_read(¤t->mm->mmap_sem); } +/* + * Check if the vma is being used as a stack. + * If is_group is non-zero, check in the entire thread group or else + * just check in the current task. Returns the pid of the task that + * the vma is stack for. + */ +pid_t vm_is_stack(struct task_struct *task, + struct vm_area_struct *vma, int in_group) +{ + pid_t ret = 0; + + if (vm_is_stack_for_task(task, vma)) + return task->pid; + + if (in_group) { + struct task_struct *t = task; + rcu_read_lock(); + while_each_thread(task, t) { + if (vm_is_stack_for_task(t, vma)) { + ret = t->pid; + goto done; + } + } + } + +done: + rcu_read_unlock(); + return ret; +} + #ifdef CONFIG_PROVE_LOCKING void might_fault(void) { _ ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-11 15:03 ` [PATCH] " Siddhesh Poyarekar 2012-02-21 4:24 ` [RESEND][PATCH] " Siddhesh Poyarekar @ 2012-02-23 23:17 ` KOSAKI Motohiro 2012-02-24 0:49 ` KOSAKI Motohiro 2012-02-24 5:29 ` Siddhesh Poyarekar 1 sibling, 2 replies; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-23 23:17 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier Hi This version makes sense to me. and I verified this change don't break procps tools. But, > +int vm_is_stack(struct task_struct *task, > + struct vm_area_struct *vma, int in_group) > +{ > + if (vm_is_stack_for_task(task, vma)) > + return 1; > + > + if (in_group) { > + struct task_struct *t = task; > + while_each_thread(task, t) { How protect this loop from task exiting? AFAIK, while_each_thread require rcu_read_lock or task_list_lock. > + if (vm_is_stack_for_task(t, vma)) > + return 1; > + } > + } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 23:17 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps KOSAKI Motohiro @ 2012-02-24 0:49 ` KOSAKI Motohiro 2012-02-24 5:29 ` Siddhesh Poyarekar 1 sibling, 0 replies; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-24 0:49 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier 2012/2/23 KOSAKI Motohiro <kosaki.motohiro@gmail.com>: > Hi > > This version makes sense to me. and I verified this change don't break > procps tools. Sigh. No, I missed one thing. If application use makecontext()/swapcontext() pair, ESP is not reliable way to detect pthread stack. At that time the stack is still marked as anonymous memory. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-23 23:17 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps KOSAKI Motohiro 2012-02-24 0:49 ` KOSAKI Motohiro @ 2012-02-24 5:29 ` Siddhesh Poyarekar 2012-02-24 16:14 ` KOSAKI Motohiro 1 sibling, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-24 5:29 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier, Andrew Morton On Fri, Feb 24, 2012 at 4:47 AM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > How protect this loop from task exiting? AFAIK, while_each_thread > require rcu_read_lock or task_list_lock. I missed this, thanks. I'll send a patch for this on top of my earlier patch since Andrew has already included the earlier patch. > Sigh. No, I missed one thing. If application use > makecontext()/swapcontext() pair, > ESP is not reliable way to detect pthread stack. At that time the > stack is still marked > as anonymous memory. This is not wrong, because it essentially gives the correct picture of the state of that task -- the task is using another vma as a stack during that point and not the one it was allotted by pthreads during thread creation. I don't think we can successfully stick to the idea of trying to mark stack space allocated by pthreads but not used by any task *currently* as stack as long as the allocation happens outside the kernel space. The only way to mark this is either by marking the stack as VM_GROWSDOWN (which will make the stack grow and break some pthreads functions) or create a new flag, which a simple display such as this does not deserve. So it's best that this sticks to what the kernel *knows* is being used as stack. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-24 5:29 ` Siddhesh Poyarekar @ 2012-02-24 16:14 ` KOSAKI Motohiro 2012-02-24 18:58 ` Siddhesh Poyarekar 0 siblings, 1 reply; 42+ messages in thread From: KOSAKI Motohiro @ 2012-02-24 16:14 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier, Andrew Morton >> Sigh. No, I missed one thing. If application use >> makecontext()/swapcontext() pair, >> ESP is not reliable way to detect pthread stack. At that time the >> stack is still marked >> as anonymous memory. > > This is not wrong, because it essentially gives the correct picture of > the state of that task -- the task is using another vma as a stack > during that point and not the one it was allotted by pthreads during > thread creation. > > I don't think we can successfully stick to the idea of trying to mark > stack space allocated by pthreads but not used by any task *currently* > as stack as long as the allocation happens outside the kernel space. > The only way to mark this is either by marking the stack as > VM_GROWSDOWN (which will make the stack grow and break some pthreads > functions) or create a new flag, which a simple display such as this > does not deserve. So it's best that this sticks to what the kernel > *knows* is being used as stack. Oh, maybe generically you are right. but you missed one thing. Before your patch, stack or not stack are address space property. thus, using /proc/pid/maps makes sense. but after your patch, it's no longer memory property. applications can use heap or mapped file as a stack. then, at least, current your code is wrong. the code assume each memory property are exclusive. Moreover, if pthread stack is unimportant, I wonder why we need this patch at all. Which application does need it? and When? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps 2012-02-24 16:14 ` KOSAKI Motohiro @ 2012-02-24 18:58 ` Siddhesh Poyarekar 0 siblings, 0 replies; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-24 18:58 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm, linux-kernel, Alexander Viro, linux-fsdevel, Jamie Lokier, vapier, Andrew Morton On Fri, Feb 24, 2012 at 9:44 PM, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > Oh, maybe generically you are right. but you missed one thing. Before > your patch, stack or not stack are address space property. thus, using > /proc/pid/maps makes sense. but after your patch, it's no longer memory > property. applications can use heap or mapped file as a stack. then, at > least, current your code is wrong. the code assume each memory property > are exclusive. Right, but I cannot think of any other alternative that does not involve touching some sensitive code. The setcontext family of functions where any heap, stack or even data area portion could be used as stack, break the very concept of an entire vma being used as a stack. In such a scenario the kernel can only show what it knows, which is that a specific vma is being used as a stack. I agree that it is not correct to show the entire vma as stack, but there doesn't seem to be a better way right now in that implementation. FWIW, if the stack space is allocated in heap, it will show up as heap and not stack since the former gets preference. > Moreover, if pthread stack is unimportant, I wonder why we need this patch > at all. Which application does need it? and When? Right, my original intent was to mark stack vmas allocated by pthreads, which included those vmas that are in the pthreads cache. However, this means that the kernel does not have any real control over what it marks as stack. Also, the concept is very specific to the glibc pthreads implementation and we're essentially just making the kernel spit out some data blindly for glibc. The other solution I can think of is to have stack_start as a task level property so that each task knows their stack vma start (obtained from its sys_clone call and not from mmap). This however means increasing the size of task_struct by sizeof(unsigned long). Is that overhead acceptable? -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: + procfs-mark-thread-stack-correctly-in-proc-pid-maps.patch added to -mm tree
@ 2012-02-28 17:04 Oleg Nesterov
2012-02-28 17:18 ` Siddhesh Poyarekar
0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2012-02-28 17:04 UTC (permalink / raw)
To: Siddhesh Poyarekar, KOSAKI Motohiro, Alexander Viro,
Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall,
Andrew Morton
Cc: linux-kernel
> +int vm_is_stack(struct task_struct *task,
> + struct vm_area_struct *vma, int in_group)
> +{
> + if (vm_is_stack_for_task(task, vma))
> + return 1;
> +
> + if (in_group) {
> + struct task_struct *t = task;
> + while_each_thread(task, t) {
> + if (vm_is_stack_for_task(t, vma))
> + return 1;
> + }
> + }
> +
> + return 0;
> +}
This is obviously wrong, while_each_thread() is not safe without
tasklist or siglock or rcu.
Oleg.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: + procfs-mark-thread-stack-correctly-in-proc-pid-maps.patch added to -mm tree 2012-02-28 17:04 + procfs-mark-thread-stack-correctly-in-proc-pid-maps.patch added to -mm tree Oleg Nesterov @ 2012-02-28 17:18 ` Siddhesh Poyarekar 2012-02-28 17:40 ` Oleg Nesterov 0 siblings, 1 reply; 42+ messages in thread From: Siddhesh Poyarekar @ 2012-02-28 17:18 UTC (permalink / raw) To: Oleg Nesterov Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, Andrew Morton, linux-kernel On Tue, Feb 28, 2012 at 10:34 PM, Oleg Nesterov <oleg@redhat.com> wrote: >> +int vm_is_stack(struct task_struct *task, >> + struct vm_area_struct *vma, int in_group) >> +{ >> + if (vm_is_stack_for_task(task, vma)) >> + return 1; >> + >> + if (in_group) { >> + struct task_struct *t = task; >> + while_each_thread(task, t) { >> + if (vm_is_stack_for_task(t, vma)) >> + return 1; >> + } >> + } >> + >> + return 0; >> +} > > This is obviously wrong, while_each_thread() is not safe without > tasklist or siglock or rcu. I have fixed this in my git stash. I'll submit once I get to work on Mike Frysinger's idea of marking stacks with their tids and see if it breaks anything. -- Siddhesh Poyarekar http://siddhesh.in ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: + procfs-mark-thread-stack-correctly-in-proc-pid-maps.patch added to -mm tree 2012-02-28 17:18 ` Siddhesh Poyarekar @ 2012-02-28 17:40 ` Oleg Nesterov 0 siblings, 0 replies; 42+ messages in thread From: Oleg Nesterov @ 2012-02-28 17:40 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: KOSAKI Motohiro, Alexander Viro, Jamie Lokier, Mike Frysinger, Alexey Dobriyan, Matt Mackall, Andrew Morton, linux-kernel On 02/28, Siddhesh Poyarekar wrote: > > On Tue, Feb 28, 2012 at 10:34 PM, Oleg Nesterov <oleg@redhat.com> wrote: > >> +int vm_is_stack(struct task_struct *task, > >> + struct vm_area_struct *vma, int in_group) > >> +{ > >> + if (vm_is_stack_for_task(task, vma)) > >> + return 1; > >> + > >> + if (in_group) { > >> + struct task_struct *t = task; > >> + while_each_thread(task, t) { > >> + if (vm_is_stack_for_task(t, vma)) > >> + return 1; > >> + } > >> + } > >> + > >> + return 0; > >> +} > > > > This is obviously wrong, while_each_thread() is not safe without > > tasklist or siglock or rcu. > > I have fixed this in my git stash. I'll submit OK, please CC me ;) Oleg. ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2012-03-04 20:04 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-14 12:35 [PATCH] Mark thread stack correctly in proc/<pid>/maps Siddhesh Poyarekar 2012-01-16 11:28 ` Jamie Lokier 2012-01-16 13:08 ` Siddhesh Poyarekar 2012-01-16 16:31 ` Jamie Lokier 2012-01-16 17:01 ` Siddhesh Poyarekar 2012-01-17 4:54 ` Siddhesh Poyarekar 2012-02-02 6:24 ` [RESEND][PATCH] " Siddhesh Poyarekar 2012-02-02 21:40 ` KOSAKI Motohiro 2012-02-03 7:09 ` Siddhesh Poyarekar 2012-02-03 8:01 ` KOSAKI Motohiro 2012-02-03 9:49 ` Siddhesh Poyarekar 2012-02-03 10:29 ` Mike Frysinger 2012-02-03 18:34 ` Siddhesh Poyarekar 2012-02-08 4:00 ` Siddhesh Poyarekar 2012-02-08 17:57 ` KOSAKI Motohiro 2012-02-11 10:19 ` Siddhesh Poyarekar 2012-02-11 15:03 ` [PATCH] " Siddhesh Poyarekar 2012-02-21 4:24 ` [RESEND][PATCH] " Siddhesh Poyarekar 2012-02-22 23:00 ` Andrew Morton 2012-02-23 4:03 ` [PATCH] " Siddhesh Poyarekar 2012-02-23 20:22 ` Andrew Morton 2012-02-24 13:05 ` Siddhesh Poyarekar 2012-02-26 16:17 ` [PATCH] x86_64: Record stack pointer before task execution begins Siddhesh Poyarekar 2012-02-27 6:17 ` [tip:x86/process] " tip-bot for Siddhesh Poyarekar 2012-02-23 23:47 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps Mike Frysinger 2012-02-24 5:47 ` Siddhesh Poyarekar 2012-02-24 16:12 ` Mike Frysinger 2012-02-24 18:23 ` Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Siddhesh Poyarekar 2012-03-01 5:20 ` [PATCH 2/2] procfs: Mark stack vma with pid of the owning task Siddhesh Poyarekar 2012-03-01 23:17 ` Andrew Morton 2012-03-01 16:51 ` [PATCH 1/2] Take rcu read lock when iterating through thread group Oleg Nesterov 2012-03-01 23:21 ` Andrew Morton 2012-03-04 20:04 ` Siddhesh Poyarekar 2012-02-23 23:17 ` [PATCH] Mark thread stack correctly in proc/<pid>/maps KOSAKI Motohiro 2012-02-24 0:49 ` KOSAKI Motohiro 2012-02-24 5:29 ` Siddhesh Poyarekar 2012-02-24 16:14 ` KOSAKI Motohiro 2012-02-24 18:58 ` Siddhesh Poyarekar 2012-02-28 17:04 + procfs-mark-thread-stack-correctly-in-proc-pid-maps.patch added to -mm tree Oleg Nesterov 2012-02-28 17:18 ` Siddhesh Poyarekar 2012-02-28 17:40 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).