linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] exec: fix passing of file locks across execve in multithreaded processes
@ 2018-08-30 17:24 Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 1/3] exec: separate thread_count for files_struct Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jeff Layton @ 2018-08-30 17:24 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Eric W. Biederman, Daniel P . Berrangé, linux-fsdevel, linux-kernel

v2: fix displaced_files cleanup in __do_execve_file

I've done a bit more testing (now with the error handling fixed and it
seems to work ok. I have not looked at performance regressions here,
as I'm not sure how best to test for that.

My main question at this point is whether this is the correct way to
fix it. Cover letter from the RFC set follows:

-------------------------8<----------------------
A few months ago, Dan reported that when you call execve in process that
is multithreaded, any traditional POSIX locks are silently dropped.

The problem is that we end up unsharing the files_struct from the
process very early during exec, when it looks like it's shared between
tasks. Eventually, when the other, non-exec'ing tasks are killed, we
tear down the old files_struct. That ends up tearing down the old
files_struct, which ends up looking like a close() was issues on each
open fd and that causes the locks to be dropped.

This patchset is a second stab at fixing this issue, this time following
the method suggested by Eric Biederman. The idea here is to move the
unshare_files() call after de_thread(), which helps ensure that we only
unshare the files_struct when it's truly shared between different
processes, and not just when the exec'ing process is multithreaded.

This seems to fix the originally reported problem (now known as xfstest
generic/484), and basic testing doesn't seem to show any issues.

During the original discussion though, Al had mentioned that this could
be problematic due to the fdtable being modifiable by other threads
(or even processes) during the binfmt probe. That may make this idea
non-viable.

I'm also not terribly thrilled with the way this sprinkles the
files_struct->file_lock all over the place. It may be possible to do
some of this with atomic ops if the basic approach turns out to be
reasonable.

Comments and suggestions welcome.

Jeff Layton (3):
  exec: separate thread_count for files_struct
  exec: delay clone(CLONE_FILES) if task associated with current
    files_struct is exec'ing
  exec: do unshare_files after de_thread

 fs/exec.c               | 25 ++++++++++++++++++-------
 fs/file.c               | 18 ++++++++++++++++++
 include/linux/binfmts.h |  1 +
 include/linux/fdtable.h |  2 ++
 kernel/fork.c           | 26 ++++++++++++++++++++++----
 5 files changed, 61 insertions(+), 11 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/3] exec: separate thread_count for files_struct
  2018-08-30 17:24 [PATCH v2 0/3] exec: fix passing of file locks across execve in multithreaded processes Jeff Layton
@ 2018-08-30 17:24 ` Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 2/3] exec: delay clone(CLONE_FILES) if task associated with current files_struct is exec'ing Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 3/3] exec: do unshare_files after de_thread Jeff Layton
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2018-08-30 17:24 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Eric W. Biederman, Daniel P . Berrangé, linux-fsdevel, linux-kernel

Currently, we have a single refcount variable inside the files_struct.
When we go to unshare the files_struct, we check this counter and if
it's elevated, then we allocate a new files_struct instead of just
repurposing the old one, under the assumption that that indicates that
it's shared between tasks.

This is not necessarily the case, however. Each task associated with the
files_struct does get a long-held reference, but the refcount can be
elevated for other reasons as well, by callers of get_files_struct.

This means that we can end up allocating a new files_struct if we just
happen to race with a call to get_files_struct. Fix this by adding a new
counter to track the number associated threads, and use that to
determine whether to allocate a new files_struct when unsharing.

We protect this new counter with the file_lock instead of doing it with
atomics, as a later patch will need to track an "in_exec" flag that will
need to be handled in conjunction with the thread counter.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/file.c               | 17 +++++++++++++++++
 include/linux/fdtable.h |  1 +
 kernel/fork.c           | 17 ++++++++++++++---
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 7ffd6e9d103d..42e0c8448b20 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -284,6 +284,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	atomic_set(&newf->count, 1);
 
 	spin_lock_init(&newf->file_lock);
+	newf->thread_count = 1;
 	newf->resize_in_progress = false;
 	init_waitqueue_head(&newf->resize_wait);
 	newf->next_fd = 0;
@@ -415,6 +416,8 @@ void put_files_struct(struct files_struct *files)
 	if (atomic_dec_and_test(&files->count)) {
 		struct fdtable *fdt = close_files(files);
 
+		WARN_ON_ONCE(files->thread_count);
+
 		/* free the arrays if they are not embedded */
 		if (fdt != &files->fdtab)
 			__free_fdtable(fdt);
@@ -428,9 +431,19 @@ void reset_files_struct(struct files_struct *files)
 	struct files_struct *old;
 
 	old = tsk->files;
+
+	spin_lock(&files->file_lock);
+	++files->thread_count;
+	spin_unlock(&files->file_lock);
+
 	task_lock(tsk);
 	tsk->files = files;
 	task_unlock(tsk);
+
+	spin_lock(&old->file_lock);
+	--old->thread_count;
+	spin_unlock(&old->file_lock);
+
 	put_files_struct(old);
 }
 
@@ -442,6 +455,9 @@ void exit_files(struct task_struct *tsk)
 		task_lock(tsk);
 		tsk->files = NULL;
 		task_unlock(tsk);
+		spin_lock(&files->file_lock);
+		--files->thread_count;
+		spin_unlock(&files->file_lock);
 		put_files_struct(files);
 	}
 }
@@ -457,6 +473,7 @@ struct files_struct init_files = {
 		.full_fds_bits	= init_files.full_fds_bits_init,
 	},
 	.file_lock	= __SPIN_LOCK_UNLOCKED(init_files.file_lock),
+	.thread_count	= 1,
 };
 
 static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 41615f38bcff..213ec1430ba4 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -59,6 +59,7 @@ struct files_struct {
    * written part on a separate cache line in SMP
    */
 	spinlock_t file_lock ____cacheline_aligned_in_smp;
+	int thread_count;
 	unsigned int next_fd;
 	unsigned long close_on_exec_init[1];
 	unsigned long open_fds_init[1];
diff --git a/kernel/fork.c b/kernel/fork.c
index d896e9ca38b0..82799544b646 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1374,6 +1374,9 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
 
 	if (clone_flags & CLONE_FILES) {
 		atomic_inc(&oldf->count);
+		spin_lock(&oldf->file_lock);
+		oldf->thread_count++;
+		spin_unlock(&oldf->file_lock);
 		goto out;
 	}
 
@@ -2417,10 +2420,15 @@ static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp)
 static int unshare_fd(unsigned long unshare_flags, struct files_struct **new_fdp)
 {
 	struct files_struct *fd = current->files;
-	int error = 0;
+	int error, count;
+
+	if (!(unshare_flags & CLONE_FILES) || !fd)
+		return 0;
 
-	if ((unshare_flags & CLONE_FILES) &&
-	    (fd && atomic_read(&fd->count) > 1)) {
+	spin_lock(&fd->file_lock);
+	count = fd->thread_count;
+	spin_unlock(&fd->file_lock);
+	if (count > 1) {
 		*new_fdp = dup_fd(fd, &error);
 		if (!*new_fdp)
 			return error;
@@ -2579,6 +2587,9 @@ int unshare_files(struct files_struct **displaced)
 	task_lock(task);
 	task->files = copy;
 	task_unlock(task);
+	spin_lock(&task->files->file_lock);
+	--task->files->thread_count;
+	spin_unlock(&task->files->file_lock);
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/3] exec: delay clone(CLONE_FILES) if task associated with current files_struct is exec'ing
  2018-08-30 17:24 [PATCH v2 0/3] exec: fix passing of file locks across execve in multithreaded processes Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 1/3] exec: separate thread_count for files_struct Jeff Layton
@ 2018-08-30 17:24 ` Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 3/3] exec: do unshare_files after de_thread Jeff Layton
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2018-08-30 17:24 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Eric W. Biederman, Daniel P . Berrangé, linux-fsdevel, linux-kernel

In a later patch, we're going to move the unshare_files call in
__do_execve_file to later in the process, but this opens up a potential
race with clone(CLONE_FILES). We could end up bumping the refcount on
the files_struct after we've checked to see if it was shared. What we
really want to do in that case is have the clone return -EAGAIN, much
like the handling of the similar situation in copy_fs.

Add a new in_exec boolean to files_struct that is protected by the
file_lock. Set it if the files_struct turns out not to be shared, and
clear at the end of __do_execve_file. If it's set when we go to
copy_process, then just return -EAGAIN.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/exec.c               | 14 ++++++++++++--
 fs/file.c               |  1 +
 include/linux/fdtable.h |  1 +
 kernel/fork.c           | 21 ++++++++++++++-------
 4 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1ebf6e5a521d..ca25f805ebad 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1831,8 +1831,13 @@ static int __do_execve_file(int fd, struct filename *filename,
 	kfree(pathbuf);
 	if (filename)
 		putname(filename);
-	if (displaced)
+	if (displaced) {
 		put_files_struct(displaced);
+	} else {
+		spin_lock(&current->files->file_lock);
+		current->files->in_exec = false;
+		spin_unlock(&current->files->file_lock);
+	}
 	return retval;
 
 out:
@@ -1850,8 +1855,13 @@ static int __do_execve_file(int fd, struct filename *filename,
 	kfree(pathbuf);
 
 out_files:
-	if (displaced)
+	if (displaced) {
 		reset_files_struct(displaced);
+	} else {
+		spin_lock(&current->files->file_lock);
+		current->files->in_exec = false;
+		spin_unlock(&current->files->file_lock);
+	}
 out_ret:
 	if (filename)
 		putname(filename);
diff --git a/fs/file.c b/fs/file.c
index 42e0c8448b20..dc5de1ec5395 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -286,6 +286,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	spin_lock_init(&newf->file_lock);
 	newf->thread_count = 1;
 	newf->resize_in_progress = false;
+	newf->in_exec = false;
 	init_waitqueue_head(&newf->resize_wait);
 	newf->next_fd = 0;
 	new_fdt = &newf->fdtab;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 213ec1430ba4..310454db9049 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -51,6 +51,7 @@ struct files_struct {
    */
 	atomic_t count;
 	bool resize_in_progress;
+	bool in_exec;
 	wait_queue_head_t resize_wait;
 
 	struct fdtable __rcu *fdt;
diff --git a/kernel/fork.c b/kernel/fork.c
index 82799544b646..d296dc501c0c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1373,10 +1373,15 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
 		goto out;
 
 	if (clone_flags & CLONE_FILES) {
-		atomic_inc(&oldf->count);
 		spin_lock(&oldf->file_lock);
-		oldf->thread_count++;
-		spin_unlock(&oldf->file_lock);
+		if (oldf->in_exec) {
+			spin_unlock(&oldf->file_lock);
+			error = -EAGAIN;
+		} else {
+			oldf->thread_count++;
+			spin_unlock(&oldf->file_lock);
+			atomic_inc(&oldf->count);
+		}
 		goto out;
 	}
 
@@ -2420,18 +2425,20 @@ static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp)
 static int unshare_fd(unsigned long unshare_flags, struct files_struct **new_fdp)
 {
 	struct files_struct *fd = current->files;
-	int error, count;
+	int error;
 
 	if (!(unshare_flags & CLONE_FILES) || !fd)
 		return 0;
 
 	spin_lock(&fd->file_lock);
-	count = fd->thread_count;
-	spin_unlock(&fd->file_lock);
-	if (count > 1) {
+	if (fd->thread_count > 1) {
+		spin_unlock(&fd->file_lock);
 		*new_fdp = dup_fd(fd, &error);
 		if (!*new_fdp)
 			return error;
+	} else {
+		fd->in_exec = true;
+		spin_unlock(&fd->file_lock);
 	}
 
 	return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] exec: do unshare_files after de_thread
  2018-08-30 17:24 [PATCH v2 0/3] exec: fix passing of file locks across execve in multithreaded processes Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 1/3] exec: separate thread_count for files_struct Jeff Layton
  2018-08-30 17:24 ` [PATCH v2 2/3] exec: delay clone(CLONE_FILES) if task associated with current files_struct is exec'ing Jeff Layton
@ 2018-08-30 17:24 ` Jeff Layton
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2018-08-30 17:24 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Eric W. Biederman, Daniel P . Berrangé, linux-fsdevel, linux-kernel

POSIX mandates that open fds and their associated file locks should be
preserved across an execve. This works, unless the process is
multithreaded at the time that execve is called.

In that case, we'll end up unsharing the files_struct but the locks will
still have their fl_owner set to the address of the old one. Eventually,
when the other threads die and the last reference to the old
files_struct is put, any POSIX locks get torn down since it looks like
a close occurred on them.

The result is that all of your open files will be intact with none of
the locks you held before execve. The simple answer to this is "use OFD
locks", but this is a nasty surprise and it violates the spec.

Fix this by doing unshare_files later during exec, after we've already
killed off the other threads in the process. This helps ensure that we
only unshare the files_struct during exec when it is truly shared with
other processes.

Note that because the unshare_files call is now done just after
de_thread, we need a mechanism to pass the displaced files_struct back
up to __do_execve_file. This is done via a new displaced_files field
inside the linux_binprm.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/exec.c               | 11 ++++++-----
 include/linux/binfmts.h |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ca25f805ebad..e32bd1a33660 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1262,6 +1262,10 @@ int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	retval = unshare_files(&bprm->displaced_files);
+	if (retval)
+		goto out;
+
 	/*
 	 * Must be called _before_ exec_mmap() as bprm->mm is
 	 * not visibile until then. This also enables the update
@@ -1713,7 +1717,7 @@ static int __do_execve_file(int fd, struct filename *filename,
 {
 	char *pathbuf = NULL;
 	struct linux_binprm *bprm;
-	struct files_struct *displaced;
+	struct files_struct *displaced = NULL;
 	int retval;
 
 	if (IS_ERR(filename))
@@ -1735,10 +1739,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	retval = unshare_files(&displaced);
-	if (retval)
-		goto out_ret;
-
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
 	if (!bprm)
@@ -1817,6 +1817,7 @@ static int __do_execve_file(int fd, struct filename *filename,
 	would_dump(bprm, bprm->file);
 
 	retval = exec_binprm(bprm);
+	displaced = bprm->displaced_files;
 	if (retval < 0)
 		goto out;
 
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c05f24fac4f6..d7ec384bb1b0 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -49,6 +49,7 @@ struct linux_binprm {
 	unsigned int taso:1;
 #endif
 	unsigned int recursion_depth; /* only for search_binary_handler() */
+	struct files_struct * displaced_files;
 	struct file * file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-30 21:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 17:24 [PATCH v2 0/3] exec: fix passing of file locks across execve in multithreaded processes Jeff Layton
2018-08-30 17:24 ` [PATCH v2 1/3] exec: separate thread_count for files_struct Jeff Layton
2018-08-30 17:24 ` [PATCH v2 2/3] exec: delay clone(CLONE_FILES) if task associated with current files_struct is exec'ing Jeff Layton
2018-08-30 17:24 ` [PATCH v2 3/3] exec: do unshare_files after de_thread Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).