From: Matt Helsley <matthltc@us.ibm.com>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Matt Helsley <matthltc@us.ibm.com>,
xemul@parallels.com, containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, dave@linux.vnet.ibm.com,
mingo@elte.hu, torvalds@linux-foundation.org,
linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 1/9] exec_path 1/9: introduce ->exec_path and switch /proc/*/exe
Date: Thu, 4 Jun 2009 00:55:32 -0700 [thread overview]
Message-ID: <20090604075532.GU9285@us.ibm.com> (raw)
In-Reply-To: <20090603230422.GB853@x200.localdomain>
On Thu, Jun 04, 2009 at 03:04:22AM +0400, Alexey Dobriyan wrote:
> On Sun, May 31, 2009 at 03:19:53PM -0700, Andrew Morton wrote:
> > On Mon, 1 Jun 2009 01:54:27 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > > And BTW, there is something unnatural when executable path is attached
> > > to mm_struct(!) not task_struct,
> >
> > mm_struct is the central object for a heavyweight process. All threads
> > within that process share the same executable path (don't they?) so
> > attaching the executable path to the mm seems OK to me.
>
> OK, let's try this:
>
>
> [PATCH 1/9] exec_path 1/9: introduce ->exec_path and switch /proc/*/exe
>
> ->exec_path marks executable which is associated with running task.
> Binfmt loader decides which executable is such and can, in theory,
> assign anything. Unlike current status quo when first VM_EXECUTABLE mapping is
> sort of marks running executable.
>
> If executable unmaps its all VM_EXECUTABLE mappings, /proc/*/exe ceases
> to exists, ick! And userpsace can't even use MAP_EXECUTABLE.
Suprising but intentional and unavoidable. More below..
>
> Tasks which aren't created by running clone(2) and execve(2)
> (read: kernel threads) get empty ->exec_path and
>
> ->exec_path is copied on clone(2) and put at do_exit() time.
Doesn't this pin the vfs mount of the executable for the lifetime of
the task?
That was one of Al Viro's objections to early revisions of the exe_file
patches. It's the reason the exe_file patches kept track of the number of
VM_EXECUTABLE mappings with num_exe_file_vmas.
I've cc'd Al so he can confirm/deny my recollection of this. Basically
some programs need to be able to umount the filesystem that back their
executables. Being able to unmap these regions was effectively a
userspace API for unpinning these mounts. I needed to preserve that API,
hence the VMA ugliness of exe_file that you object to with the exe_file
patches.
I think patches 2-7 look great and could be adapted to use exe_file instead
of ->exec_path.
Cheers,
-Matt Helsley
>
> ->exec_path is going to replace struct mm_struct::exe_file et al
> and allows to remove VM_EXECUTABLE flag while keeping readlink("/proc/*/exe")
> without loop over all VMAs.
>
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
> ---
> fs/binfmt_aout.c | 1 +
> fs/binfmt_elf.c | 1 +
> fs/binfmt_elf_fdpic.c | 1 +
> fs/binfmt_flat.c | 1 +
> fs/binfmt_som.c | 1 +
> fs/proc/base.c | 38 ++++++++++++++------------------------
> include/linux/sched.h | 25 +++++++++++++++++++++++++
> kernel/exit.c | 1 +
> kernel/fork.c | 2 ++
> 9 files changed, 47 insertions(+), 24 deletions(-)
>
> diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
> index b639dcf..a19b185 100644
> --- a/fs/binfmt_aout.c
> +++ b/fs/binfmt_aout.c
> @@ -379,6 +379,7 @@ beyond_if:
> regs->gp = ex.a_gpvalue;
> #endif
> start_thread(regs, ex.a_entry, current->mm->start_stack);
> + set_task_exec_path(current, &bprm->file->f_path);
> return 0;
> }
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 40381df..b815bfc 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -999,6 +999,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
> #endif
>
> start_thread(regs, elf_entry, bprm->p);
> + set_task_exec_path(current, &bprm->file->f_path);
> retval = 0;
> out:
> kfree(loc);
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index fdb66fa..f545504 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -1185,6 +1185,7 @@ static int elf_fdpic_map_file_by_direct_mmap(struct elf_fdpic_params *params,
> seg++;
> }
>
> + set_task_exec_path(current, &file->f_path);
> return 0;
> }
>
> diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
> index 697f6b5..a16f977 100644
> --- a/fs/binfmt_flat.c
> +++ b/fs/binfmt_flat.c
> @@ -798,6 +798,7 @@ static int load_flat_file(struct linux_binprm * bprm,
> libinfo->lib_list[id].start_brk) + /* start brk */
> stack_len);
>
> + set_task_exec_path(current, &bprm->file->f_path);
> return 0;
> err:
> return ret;
> diff --git a/fs/binfmt_som.c b/fs/binfmt_som.c
> index eff74b9..6c56262 100644
> --- a/fs/binfmt_som.c
> +++ b/fs/binfmt_som.c
> @@ -174,6 +174,7 @@ static int map_som_binary(struct file *file,
> up_write(¤t->mm->mmap_sem);
> if (retval > 0 || retval < -1024)
> retval = 0;
> + set_task_exec_path(current, &bprm->file->f_path);
> out:
> set_fs(old_fs);
> return retval;
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 3326bbf..dc4ee6a 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -201,6 +201,20 @@ static int proc_root_link(struct inode *inode, struct path *path)
> return result;
> }
>
> +static int proc_exe_link(struct inode *inode, struct path *path)
> +{
> + struct task_struct *tsk;
> +
> + tsk = get_proc_task(inode);
> + if (!tsk)
> + return -ENOENT;
> + get_task_exec_path(tsk, path);
> + put_task_struct(tsk);
> + if (!path->mnt || !path->dentry)
> + return -ENOENT;
> + return 0;
> +}
> +
> /*
> * Return zero if current may access user memory in @task, -error if not.
> */
> @@ -1302,30 +1316,6 @@ void dup_mm_exe_file(struct mm_struct *oldmm, struct mm_struct *newmm)
> newmm->exe_file = get_mm_exe_file(oldmm);
> }
>
> -static int proc_exe_link(struct inode *inode, struct path *exe_path)
> -{
> - struct task_struct *task;
> - struct mm_struct *mm;
> - struct file *exe_file;
> -
> - task = get_proc_task(inode);
> - if (!task)
> - return -ENOENT;
> - mm = get_task_mm(task);
> - put_task_struct(task);
> - if (!mm)
> - return -ENOENT;
> - exe_file = get_mm_exe_file(mm);
> - mmput(mm);
> - if (exe_file) {
> - *exe_path = exe_file->f_path;
> - path_get(&exe_file->f_path);
> - fput(exe_file);
> - return 0;
> - } else
> - return -ENOENT;
> -}
> -
> static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
> {
> struct inode *inode = dentry->d_inode;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b4c38bc..6b2dd01 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1265,6 +1265,12 @@ struct task_struct {
> #endif
> /* CPU-specific state of this task */
> struct thread_struct thread;
> + /*
> + * Executable, binfmt loader wants to associate with task
> + * (read: execve(2) argument).
> + * Empty, if concept isn't applicable, e. g. kernel thread.
> + */
> + struct path exec_path;
> /* filesystem information */
> struct fs_struct *fs;
> /* open file information */
> @@ -2403,6 +2409,25 @@ static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
>
> #define TASK_STATE_TO_CHAR_STR "RSDTtZX"
>
> +static inline void get_task_exec_path(struct task_struct *tsk, struct path *path)
> +{
> + task_lock(tsk);
> + path_get(&tsk->exec_path);
> + *path = tsk->exec_path;
> + task_unlock(tsk);
> +}
> +
> +static inline void set_task_exec_path(struct task_struct *tsk, struct path *path)
> +{
> + struct path old_path;
> +
> + path_get(path);
> + task_lock(tsk);
> + old_path = tsk->exec_path;
> + tsk->exec_path = *path;
> + task_unlock(tsk);
> + path_put(&old_path);
> +}
> #endif /* __KERNEL__ */
>
> #endif
> diff --git a/kernel/exit.c b/kernel/exit.c
> index abf9cf3..8e70b54 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -962,6 +962,7 @@ NORET_TYPE void do_exit(long code)
>
> exit_sem(tsk);
> exit_files(tsk);
> + set_task_exec_path(tsk, &(struct path){ .mnt = NULL, .dentry = NULL });
> exit_fs(tsk);
> check_stack_usage();
> exit_thread();
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b9e2edd..c0ee931 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1191,6 +1191,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> cgroup_fork_callbacks(p);
> cgroup_callbacks_done = 1;
>
> + get_task_exec_path(current, &p->exec_path);
> +
> /* Need tasklist lock for parent etc handling! */
> write_lock_irq(&tasklist_lock);
>
next prev parent reply other threads:[~2009-06-04 7:55 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090526113618.GJ28083@us.ibm.com>
[not found] ` <20090526162415.fb9cefef.akpm@linux-foundation.org>
[not found] ` <20090531215427.GA29534@x200.localdomain>
[not found] ` <20090531151953.8f8b14b5.akpm@linux-foundation.org>
[not found] ` <20090531151953.8f8b14b5.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-06-03 23:04 ` [PATCH 1/9] exec_path 1/9: introduce ->exec_path and switch /proc/*/exe Alexey Dobriyan
2009-06-03 23:05 ` [PATCH 2/9] exec_path 2/9: switch audit to ->exec_path Alexey Dobriyan
[not found] ` <20090603230422.GB853-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-06-03 23:05 ` [PATCH 3/9] exec_path 3/9: switch TOMOYO " Alexey Dobriyan
2009-06-03 23:06 ` [PATCH 4/9] exec_path 4/9: switch oprofile " Alexey Dobriyan
2009-06-03 23:06 ` [PATCH 6/9] exec_path 6/9: add struct spu::tsk Alexey Dobriyan
2009-06-03 23:06 ` [PATCH 5/9] exec_path 5/9: make struct spu_context::owner task_struct Alexey Dobriyan
2009-06-03 23:07 ` [PATCH 7/9] exec_path 7/9: switch cell SPU thing to ->exec_path Alexey Dobriyan
2009-06-03 23:07 ` [PATCH 8/9] exec_path 8/9: remove ->exe_file et al Alexey Dobriyan
2009-06-03 23:08 ` [PATCH 9/9] exec_path 9/9: remove VM_EXECUTABLE Alexey Dobriyan
2009-06-04 7:24 ` Matt Helsley
2009-06-03 23:36 ` [PATCH 1/9] exec_path 1/9: introduce ->exec_path and switch /proc/*/exe Linus Torvalds
2009-06-04 7:55 ` Matt Helsley [this message]
2009-06-04 8:10 ` Matt Helsley
2009-06-04 15:07 ` Linus Torvalds
2009-06-04 21:30 ` Matt Helsley
2009-06-04 22:42 ` Alexey Dobriyan
2009-06-05 3:49 ` Matt Helsley
2009-06-05 10:45 ` Christoph Hellwig
2009-06-05 15:10 ` Linus Torvalds
2009-06-05 15:41 ` Alexey Dobriyan
2009-06-05 15:49 ` Linus Torvalds
2009-06-05 16:09 ` Alexey Dobriyan
[not found] ` <20090605160943.GA5262-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-06-05 16:48 ` Linus Torvalds
2009-06-05 17:46 ` Alexey Dobriyan
2009-06-06 7:22 ` Al Viro
2009-06-15 22:10 ` Alexey Dobriyan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090604075532.GU9285@us.ibm.com \
--to=matthltc@us.ibm.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).