linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.6 005/606] epoll: call final ep_events_available() check under the lock
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
@ 2020-06-08 23:02 ` Sasha Levin
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 015/606] gcc-10 warnings: fix low-hanging fruit Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:02 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Roman Penyaev, Jason Baron, Randy Dunlap, Andrew Morton,
	Khazhismel Kumykov, Alexander Viro, Linus Torvalds, Sasha Levin,
	linux-fsdevel

From: Roman Penyaev <rpenyaev@suse.de>

[ Upstream commit 65759097d804d2a9ad2b687db436319704ba7019 ]

There is a possible race when ep_scan_ready_list() leaves ->rdllist and
->obflist empty for a short period of time although some events are
pending.  It is quite likely that ep_events_available() observes empty
lists and goes to sleep.

Since commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of
nested epoll") we are conservative in wakeups (there is only one place
for wakeup and this is ep_poll_callback()), thus ep_events_available()
must always observe correct state of two lists.

The easiest and correct way is to do the final check under the lock.
This does not impact the performance, since lock is taken anyway for
adding a wait entry to the wait queue.

The discussion of the problem can be found here:

   https://lore.kernel.org/linux-fsdevel/a2f22c3c-c25a-4bda-8339-a7bdaf17849e@akamai.com/

In this patch barrierless __set_current_state() is used.  This is safe
since waitqueue_active() is called under the same lock on wakeup side.

Short-circuit for fatal signals (i.e.  fatal_signal_pending() check) is
moved to the line just before actual events harvesting routine.  This is
fully compliant to what is said in the comment of the patch where the
actual fatal_signal_pending() check was added: c257a340ede0 ("fs, epoll:
short circuit fetching events if thread has been killed").

Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Reported-by: Jason Baron <jbaron@akamai.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200505145609.1865152-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/eventpoll.c | 48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b0a097274cfe..f5a481089893 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1857,34 +1857,33 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		 * event delivery.
 		 */
 		init_wait(&wait);
-		write_lock_irq(&ep->lock);
-		__add_wait_queue_exclusive(&ep->wq, &wait);
-		write_unlock_irq(&ep->lock);
 
+		write_lock_irq(&ep->lock);
 		/*
-		 * We don't want to sleep if the ep_poll_callback() sends us
-		 * a wakeup in between. That's why we set the task state
-		 * to TASK_INTERRUPTIBLE before doing the checks.
+		 * Barrierless variant, waitqueue_active() is called under
+		 * the same lock on wakeup ep_poll_callback() side, so it
+		 * is safe to avoid an explicit barrier.
 		 */
-		set_current_state(TASK_INTERRUPTIBLE);
+		__set_current_state(TASK_INTERRUPTIBLE);
+
 		/*
-		 * Always short-circuit for fatal signals to allow
-		 * threads to make a timely exit without the chance of
-		 * finding more events available and fetching
-		 * repeatedly.
+		 * Do the final check under the lock. ep_scan_ready_list()
+		 * plays with two lists (->rdllist and ->ovflist) and there
+		 * is always a race when both lists are empty for short
+		 * period of time although events are pending, so lock is
+		 * important.
 		 */
-		if (fatal_signal_pending(current)) {
-			res = -EINTR;
-			break;
+		eavail = ep_events_available(ep);
+		if (!eavail) {
+			if (signal_pending(current))
+				res = -EINTR;
+			else
+				__add_wait_queue_exclusive(&ep->wq, &wait);
 		}
+		write_unlock_irq(&ep->lock);
 
-		eavail = ep_events_available(ep);
-		if (eavail)
-			break;
-		if (signal_pending(current)) {
-			res = -EINTR;
+		if (eavail || res)
 			break;
-		}
 
 		if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) {
 			timed_out = 1;
@@ -1905,6 +1904,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 	}
 
 send_events:
+	if (fatal_signal_pending(current)) {
+		/*
+		 * Always short-circuit for fatal signals to allow
+		 * threads to make a timely exit without the chance of
+		 * finding more events available and fetching
+		 * repeatedly.
+		 */
+		res = -EINTR;
+	}
 	/*
 	 * Try to transfer events to user space. In case we get 0 events and
 	 * there's still timeout left over, we go trying again in search of
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.6 015/606] gcc-10 warnings: fix low-hanging fruit
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 005/606] epoll: call final ep_events_available() check under the lock Sasha Levin
@ 2020-06-08 23:02 ` Sasha Levin
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 044/606] exec: Move would_dump into flush_old_exec Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:02 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Linus Torvalds, Greg Kroah-Hartman, linux-fsdevel

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 9d82973e032e246ff5663c9805fbb5407ae932e3 upstream.

Due to a bug-report that was compiler-dependent, I updated one of my
machines to gcc-10.  That shows a lot of new warnings.  Happily they
seem to be mostly the valid kind, but it's going to cause a round of
churn for getting rid of them..

This is the really low-hanging fruit of removing a couple of zero-sized
arrays in some core code.  We have had a round of these patches before,
and we'll have many more coming, and there is nothing special about
these except that they were particularly trivial, and triggered more
warnings than most.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/fs.h  | 2 +-
 include/linux/tty.h | 2 +-
 scripts/kallsyms.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index abedbffe2c9e..872ee2131589 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -978,7 +978,7 @@ struct file_handle {
 	__u32 handle_bytes;
 	int handle_type;
 	/* file identifier */
-	unsigned char f_handle[0];
+	unsigned char f_handle[];
 };
 
 static inline struct file *get_file(struct file *f)
diff --git a/include/linux/tty.h b/include/linux/tty.h
index bd5fe0e907e8..a99e9b8e4e31 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -66,7 +66,7 @@ struct tty_buffer {
 	int read;
 	int flags;
 	/* Data points here */
-	unsigned long data[0];
+	unsigned long data[];
 };
 
 /* Values for .flags field of tty_buffer */
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 3e8dea6e0a95..6dc3078649fa 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -34,7 +34,7 @@ struct sym_entry {
 	unsigned int len;
 	unsigned int start_pos;
 	unsigned int percpu_absolute;
-	unsigned char sym[0];
+	unsigned char sym[];
 };
 
 struct addr_range {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.6 044/606] exec: Move would_dump into flush_old_exec
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 005/606] epoll: call final ep_events_available() check under the lock Sasha Levin
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 015/606] gcc-10 warnings: fix low-hanging fruit Sasha Levin
@ 2020-06-08 23:02 ` Sasha Levin
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 047/606] fanotify: fix merging marks masks with FAN_ONDIR Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:02 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eric W. Biederman, Greg Kroah-Hartman, linux-fsdevel

From: "Eric W. Biederman" <ebiederm@xmission.com>

commit f87d1c9559164294040e58f5e3b74a162bf7c6e8 upstream.

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6f268 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a58625f27652..77603ceed51f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1277,6 +1277,8 @@ int flush_old_exec(struct linux_binprm * bprm)
 	 */
 	set_mm_exe_file(bprm->mm, bprm->file);
 
+	would_dump(bprm, bprm->file);
+
 	/*
 	 * Release all of the old mmap stuff
 	 */
@@ -1820,8 +1822,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval < 0)
 		goto out;
 
-	would_dump(bprm, bprm->file);
-
 	retval = exec_binprm(bprm);
 	if (retval < 0)
 		goto out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.6 047/606] fanotify: fix merging marks masks with FAN_ONDIR
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 044/606] exec: Move would_dump into flush_old_exec Sasha Levin
@ 2020-06-08 23:02 ` Sasha Levin
  2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 089/606] fix multiplication overflow in copy_fdtable() Sasha Levin
  2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 090/606] pipe: Fix pipe_full() test in opipe_prep() Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:02 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Amir Goldstein, Jan Kara, Rachel Sibley, Greg Kroah-Hartman,
	linux-fsdevel

From: Amir Goldstein <amir73il@gmail.com>

commit 55bf882c7f13dda8bbe624040c6d5b4fbb812d16 upstream.

Change the logic of FAN_ONDIR in two ways that are similar to the logic
of FAN_EVENT_ON_CHILD, that was fixed in commit 54a307ba8d3c ("fanotify:
fix logic of events on child"):

1. The flag is meaningless in ignore mask
2. The flag refers only to events in the mask of the mark where it is set

This is what the fanotify_mark.2 man page says about FAN_ONDIR:
"Without this flag, only events for files are created."  It doesn't
say anything about setting this flag in ignore mask to stop getting
events on directories nor can I think of any setup where this capability
would be useful.

Currently, when marks masks are merged, the FAN_ONDIR flag set in one
mark affects the events that are set in another mark's mask and this
behavior causes unexpected results.  For example, a user adds a mark on a
directory with mask FAN_ATTRIB | FAN_ONDIR and a mount mark with mask
FAN_OPEN (without FAN_ONDIR).  An opendir() of that directory (which is
inside that mount) generates a FAN_OPEN event even though neither of the
marks requested to get open events on directories.

Link: https://lore.kernel.org/r/20200319151022.31456-10-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Rachel Sibley <rasibley@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/notify/fanotify/fanotify.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index f5d30573f4a9..deb13f0a0f7d 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -171,6 +171,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		if (!fsnotify_iter_should_report_type(iter_info, type))
 			continue;
 		mark = iter_info->marks[type];
+		/*
+		 * If the event is on dir and this mark doesn't care about
+		 * events on dir, don't send it!
+		 */
+		if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR))
+			continue;
+
 		/*
 		 * If the event is for a child and this mark doesn't care about
 		 * events on a child, don't send it!
@@ -203,10 +210,6 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		user_mask &= ~FAN_ONDIR;
 	}
 
-	if (event_mask & FS_ISDIR &&
-	    !(marks_mask & FS_ISDIR & ~marks_ignored_mask))
-		return 0;
-
 	return test_mask & user_mask;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.6 089/606] fix multiplication overflow in copy_fdtable()
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 047/606] fanotify: fix merging marks masks with FAN_ONDIR Sasha Levin
@ 2020-06-08 23:03 ` Sasha Levin
  2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 090/606] pipe: Fix pipe_full() test in opipe_prep() Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:03 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Al Viro, stable, Thiago Macieira, Sasha Levin, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

[ Upstream commit 4e89b7210403fa4a8acafe7c602b6212b7af6c3b ]

cpy and set really should be size_t; we won't get an overflow on that,
since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
so nr that would've managed to overflow size_t on that multiplication
won't get anywhere near copy_fdtable() - we'll fail with EMFILE
before that.

Cc: stable@kernel.org # v2.6.25+
Fixes: 9cfe015aa424 (get rid of NR_OPEN and introduce a sysctl_nr_open)
Reported-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/file.c b/fs/file.c
index c8a4e4c86e55..abb8b7081d7a 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -70,7 +70,7 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
  */
 static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
 {
-	unsigned int cpy, set;
+	size_t cpy, set;
 
 	BUG_ON(nfdt->max_fds < ofdt->max_fds);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.6 090/606] pipe: Fix pipe_full() test in opipe_prep().
       [not found] <20200608231211.3363633-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 089/606] fix multiplication overflow in copy_fdtable() Sasha Levin
@ 2020-06-08 23:03 ` Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-06-08 23:03 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tetsuo Handa, syzbot+b48daca8639150bc5e73, Linus Torvalds,
	Sasha Levin, linux-fsdevel

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

[ Upstream commit 566d136289dc57816ac290de87a9a0f7d9bd3cbb ]

syzbot is reporting that splice()ing from non-empty read side to
already-full write side causes unkillable task, for opipe_prep() is by
error not inverting pipe_full() test.

  CPU: 0 PID: 9460 Comm: syz-executor.5 Not tainted 5.6.0-rc3-next-20200228-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:rol32 include/linux/bitops.h:105 [inline]
  RIP: 0010:iterate_chain_key kernel/locking/lockdep.c:369 [inline]
  RIP: 0010:__lock_acquire+0x6a3/0x5270 kernel/locking/lockdep.c:4178
  Call Trace:
     lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4720
     __mutex_lock_common kernel/locking/mutex.c:956 [inline]
     __mutex_lock+0x156/0x13c0 kernel/locking/mutex.c:1103
     pipe_lock_nested fs/pipe.c:66 [inline]
     pipe_double_lock+0x1a0/0x1e0 fs/pipe.c:104
     splice_pipe_to_pipe fs/splice.c:1562 [inline]
     do_splice+0x35f/0x1520 fs/splice.c:1141
     __do_sys_splice fs/splice.c:1447 [inline]
     __se_sys_splice fs/splice.c:1427 [inline]
     __x64_sys_splice+0x2b5/0x320 fs/splice.c:1427
     do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot+b48daca8639150bc5e73@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=9386d051e11e09973d5a4cf79af5e8cedf79386d
Fixes: 8cefc107ca54c8b0 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/splice.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index d671936d0aad..39b11a9a6b98 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1503,7 +1503,7 @@ static int opipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
 	 * Check pipe occupancy without the inode lock first. This function
 	 * is speculative anyways, so missing one is ok.
 	 */
-	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+	if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage))
 		return 0;
 
 	ret = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-09  0:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200608231211.3363633-1-sashal@kernel.org>
2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 005/606] epoll: call final ep_events_available() check under the lock Sasha Levin
2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 015/606] gcc-10 warnings: fix low-hanging fruit Sasha Levin
2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 044/606] exec: Move would_dump into flush_old_exec Sasha Levin
2020-06-08 23:02 ` [PATCH AUTOSEL 5.6 047/606] fanotify: fix merging marks masks with FAN_ONDIR Sasha Levin
2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 089/606] fix multiplication overflow in copy_fdtable() Sasha Levin
2020-06-08 23:03 ` [PATCH AUTOSEL 5.6 090/606] pipe: Fix pipe_full() test in opipe_prep() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).