linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jann Horn <jannh@google.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.19 24/32] sched/fair: Dont free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 11:39:58 +0200	[thread overview]
Message-ID: <20190802092109.441126804@linuxfoundation.org> (raw)
In-Reply-To: <20190802092101.913646560@linuxfoundation.org>

From: Jann Horn <jannh@google.com>

commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/exec.c                            |    2 +-
 include/linux/sched/numa_balancing.h |    4 ++--
 kernel/fork.c                        |    2 +-
 kernel/sched/fair.c                  |   24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1826,7 +1826,7 @@ static int __do_execve_file(int fd, stru
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(s
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -679,7 +679,7 @@ void __put_task_struct(struct task_struc
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2345,13 +2345,23 @@ no_join:
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2364,8 +2374,14 @@ void task_numa_free(struct task_struct *
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*



  parent reply	other threads:[~2019-08-02  9:56 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  9:39 [PATCH 4.19 00/32] 4.19.64-stable review Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 01/32] hv_sock: Add support for delayed close Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 02/32] vsock: correct removal of socket from the list Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 03/32] NFS: Fix dentry revalidation on NFSv4 lookup Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 04/32] NFS: Refactor nfs_lookup_revalidate() Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 05/32] NFSv4: Fix lookup revalidate of regular files Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 06/32] usb: dwc2: Disable all EPs on disconnect Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 07/32] usb: dwc2: Fix disable " Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 08/32] arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 09/32] binder: fix possible UAF when freeing buffer Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 10/32] ISDN: hfcsusb: checking idx of ep configuration Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 11/32] media: au0828: fix null dereference in error path Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 12/32] ath10k: Change the warning message string Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 13/32] media: cpia2_usb: first wake up, then free in disconnect Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 14/32] media: pvrusb2: use a different format for warnings Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 15/32] NFS: Cleanup if nfs_match_client is interrupted Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 16/32] media: radio-raremono: change devm_k*alloc to k*alloc Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 17/32] iommu/vt-d: Dont queue_iova() if there is no flush queue Greg Kroah-Hartman
2019-08-03 21:34   ` Pavel Machek
2019-08-06 22:47     ` Dmitry Safonov
2019-08-06 23:16       ` Dmitry Safonov
2019-08-02  9:39 ` [PATCH 4.19 18/32] iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 19/32] Bluetooth: hci_uart: check for missing tty operations Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 20/32] vhost: introduce vhost_exceeds_weight() Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 21/32] vhost_net: fix possible infinite loop Greg Kroah-Hartman
2019-08-03 21:49   ` Pavel Machek
2019-08-05  4:17     ` Jason Wang
2019-08-02  9:39 ` [PATCH 4.19 22/32] vhost: vsock: add weight support Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 23/32] vhost: scsi: " Greg Kroah-Hartman
2019-08-02  9:39 ` Greg Kroah-Hartman [this message]
2019-08-02  9:39 ` [PATCH 4.19 25/32] sched/fair: Use RCU accessors consistently for ->numa_group Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 26/32] /proc/<pid>/cmdline: remove all the special cases Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 27/32] /proc/<pid>/cmdline: add back the setproctitle() special case Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 28/32] drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 29/32] Fix allyesconfig output Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 30/32] ceph: hold i_ceph_lock when removing caps for freeing inode Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 31/32] block, scsi: Change the preempt-only flag into a counter Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 32/32] scsi: core: Avoid that a kernel warning appears during system resume Greg Kroah-Hartman
2019-08-02 23:22 ` [PATCH 4.19 00/32] 4.19.64-stable review shuah
2019-08-03  5:46 ` Naresh Kamboju
2019-08-03  9:58 ` Pavel Machek
2019-08-03 10:34   ` Greg Kroah-Hartman
2019-08-03 15:59 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802092109.441126804@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).