stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jann Horn <jannh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 38/42] sched/fair: Don't free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 09:22:58 -0400	[thread overview]
Message-ID: <20190802132302.13537-38-sashal@kernel.org> (raw)
In-Reply-To: <20190802132302.13537-1-sashal@kernel.org>

From: Jann Horn <jannh@google.com>

[ Upstream commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 ]

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/exec.c                            |  2 +-
 include/linux/sched/numa_balancing.h |  4 ++--
 kernel/fork.c                        |  2 +-
 kernel/sched/fair.c                  | 24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 433b1257694ab..561ea64829ece 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1826,7 +1826,7 @@ static int __do_execve_file(int fd, struct filename *filename,
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h
index e7dd04a84ba89..3988762efe15c 100644
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(struct task_struct *p)
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
diff --git a/kernel/fork.c b/kernel/fork.c
index 69874db3fba83..e76ce81c9c757 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -679,7 +679,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a433608ba74a..34b998678b97d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2345,13 +2345,23 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2364,8 +2374,14 @@ void task_numa_free(struct task_struct *p)
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*
-- 
2.20.1


  parent reply	other threads:[~2019-08-02 13:31 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 13:22 [PATCH AUTOSEL 4.19 01/42] netfilter: nfnetlink: avoid deadlock due to synchronous request_module Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 02/42] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 03/42] netfilter: Fix rpfilter dropping vrf packets by mistake Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled Sasha Levin
2019-08-08  9:02   ` Thomas Jarosch
2019-08-14 10:19     ` Reindl Harald
2019-08-14 11:17       ` Jakub Jankowski
2019-08-14 17:01         ` Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 05/42] netfilter: nft_hash: fix symhash with modulus one Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 06/42] scripts/sphinx-pre-install: fix script for RHEL/CentOS Sasha Levin
2019-08-03 10:31   ` Alexander Kapshuk
2019-08-03 10:37     ` Mauro Carvalho Chehab
2019-08-03 12:09       ` Alexander Kapshuk
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 07/42] drm/amd/display: Wait for backlight programming completion in set backlight level Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 08/42] drm/amd/display: use encoder's engine id to find matched free audio device Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 09/42] drm/amd/display: Fix dc_create failure handling and 666 color depths Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 10/42] drm/amd/display: Only enable audio if speaker allocation exists Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 11/42] drm/amd/display: Increase size of audios array Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 12/42] iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 13/42] nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 14/42] mac80211: don't warn about CW params when not using them Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 15/42] allocate_flower_entry: should check for null deref Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 16/42] hwmon: (nct6775) Fix register address and added missed tolerance for nct6106 Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 17/42] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 18/42] x86/mm: Sync also unmappings in vmalloc_sync_all() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 19/42] drm/msm: stop abusing dma_map/unmap for cache Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 20/42] drm: silence variable 'conn' set but not used Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 21/42] cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 22/42] s390/qdio: add sanity checks to the fast-requeue path Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 23/42] ALSA: compress: Fix regression on compressed capture streams Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 24/42] ALSA: compress: Prevent bypasses of set_params Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 25/42] ALSA: compress: Don't allow paritial drain operations on capture streams Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 26/42] ALSA: compress: Be more restrictive about when a drain is allowed Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 27/42] perf tools: Fix proper buffer size for feature processing Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 28/42] perf probe: Avoid calling freeing routine multiple times for same pointer Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 29/42] drbd: dynamically allocate shash descriptor Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 30/42] ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 31/42] nvme: fix multipath crash when ANA is deactivated Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 32/42] ARM: davinci: fix sleep.S build error on ARMv4 Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 33/42] ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 34/42] scsi: megaraid_sas: fix panic on loading firmware crashdump Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 35/42] scsi: ibmvfc: fix WARN_ON during event pool release Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 36/42] scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 37/42] test_firmware: fix a memory leak bug Sasha Levin
2019-08-02 13:22 ` Sasha Levin [this message]
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 39/42] sched/fair: Use RCU accessors consistently for ->numa_group Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 40/42] tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 41/42] perf/core: Fix creating kernel counters for PMUs that override event->cpu Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 42/42] s390/dma: provide proper ARCH_ZONE_DMA_BITS value Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802132302.13537-38-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=jannh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).