From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jann Horn <jannh@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Petr Mladek <pmladek@suse.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
Sasha Levin <sashal@kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 38/42] sched/fair: Don't free p->numa_faults with concurrent readers
Date: Fri, 2 Aug 2019 09:22:58 -0400 [thread overview]
Message-ID: <20190802132302.13537-38-sashal@kernel.org> (raw)
In-Reply-To: <20190802132302.13537-1-sashal@kernel.org>
From: Jann Horn <jannh@google.com>
[ Upstream commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 ]
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/exec.c | 2 +-
include/linux/sched/numa_balancing.h | 4 ++--
kernel/fork.c | 2 +-
kernel/sched/fair.c | 24 ++++++++++++++++++++----
4 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 433b1257694ab..561ea64829ece 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1826,7 +1826,7 @@ static int __do_execve_file(int fd, struct filename *filename,
membarrier_execve(current);
rseq_execve(current);
acct_update_integrals(current);
- task_numa_free(current);
+ task_numa_free(current, false);
free_bprm(bprm);
kfree(pathbuf);
if (filename)
diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h
index e7dd04a84ba89..3988762efe15c 100644
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
extern void task_numa_fault(int last_node, int node, int pages, int flags);
extern pid_t task_numa_group_id(struct task_struct *p);
extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
int src_nid, int dst_cpu);
#else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(struct task_struct *p)
static inline void set_numabalancing_state(bool enabled)
{
}
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
{
}
static inline bool should_numa_migrate_memory(struct task_struct *p,
diff --git a/kernel/fork.c b/kernel/fork.c
index 69874db3fba83..e76ce81c9c757 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -679,7 +679,7 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(tsk == current);
cgroup_free(tsk);
- task_numa_free(tsk);
+ task_numa_free(tsk, true);
security_task_free(tsk);
exit_creds(tsk);
delayacct_tsk_free(tsk);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a433608ba74a..34b998678b97d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2345,13 +2345,23 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
return;
}
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
{
struct numa_group *grp = p->numa_group;
- void *numa_faults = p->numa_faults;
+ unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
+ if (!numa_faults)
+ return;
+
if (grp) {
spin_lock_irqsave(&grp->lock, flags);
for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2364,8 +2374,14 @@ void task_numa_free(struct task_struct *p)
put_numa_group(grp);
}
- p->numa_faults = NULL;
- kfree(numa_faults);
+ if (final) {
+ p->numa_faults = NULL;
+ kfree(numa_faults);
+ } else {
+ p->total_numa_faults = 0;
+ for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+ numa_faults[i] = 0;
+ }
}
/*
--
2.20.1
next prev parent reply other threads:[~2019-08-02 13:31 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-02 13:22 [PATCH AUTOSEL 4.19 01/42] netfilter: nfnetlink: avoid deadlock due to synchronous request_module Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 02/42] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 03/42] netfilter: Fix rpfilter dropping vrf packets by mistake Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled Sasha Levin
2019-08-08 9:02 ` Thomas Jarosch
2019-08-14 10:19 ` Reindl Harald
2019-08-14 11:17 ` Jakub Jankowski
2019-08-14 17:01 ` Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 05/42] netfilter: nft_hash: fix symhash with modulus one Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 06/42] scripts/sphinx-pre-install: fix script for RHEL/CentOS Sasha Levin
2019-08-03 10:31 ` Alexander Kapshuk
2019-08-03 10:37 ` Mauro Carvalho Chehab
2019-08-03 12:09 ` Alexander Kapshuk
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 07/42] drm/amd/display: Wait for backlight programming completion in set backlight level Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 08/42] drm/amd/display: use encoder's engine id to find matched free audio device Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 09/42] drm/amd/display: Fix dc_create failure handling and 666 color depths Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 10/42] drm/amd/display: Only enable audio if speaker allocation exists Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 11/42] drm/amd/display: Increase size of audios array Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 12/42] iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 13/42] nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 14/42] mac80211: don't warn about CW params when not using them Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 15/42] allocate_flower_entry: should check for null deref Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 16/42] hwmon: (nct6775) Fix register address and added missed tolerance for nct6106 Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 17/42] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 18/42] x86/mm: Sync also unmappings in vmalloc_sync_all() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 19/42] drm/msm: stop abusing dma_map/unmap for cache Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 20/42] drm: silence variable 'conn' set but not used Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 21/42] cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 22/42] s390/qdio: add sanity checks to the fast-requeue path Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 23/42] ALSA: compress: Fix regression on compressed capture streams Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 24/42] ALSA: compress: Prevent bypasses of set_params Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 25/42] ALSA: compress: Don't allow paritial drain operations on capture streams Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 26/42] ALSA: compress: Be more restrictive about when a drain is allowed Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 27/42] perf tools: Fix proper buffer size for feature processing Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 28/42] perf probe: Avoid calling freeing routine multiple times for same pointer Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 29/42] drbd: dynamically allocate shash descriptor Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 30/42] ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id() Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 31/42] nvme: fix multipath crash when ANA is deactivated Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 32/42] ARM: davinci: fix sleep.S build error on ARMv4 Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 33/42] ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 34/42] scsi: megaraid_sas: fix panic on loading firmware crashdump Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 35/42] scsi: ibmvfc: fix WARN_ON during event pool release Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 36/42] scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG Sasha Levin
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 37/42] test_firmware: fix a memory leak bug Sasha Levin
2019-08-02 13:22 ` Sasha Levin [this message]
2019-08-02 13:22 ` [PATCH AUTOSEL 4.19 39/42] sched/fair: Use RCU accessors consistently for ->numa_group Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 40/42] tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 41/42] perf/core: Fix creating kernel counters for PMUs that override event->cpu Sasha Levin
2019-08-02 13:23 ` [PATCH AUTOSEL 4.19 42/42] s390/dma: provide proper ARCH_ZONE_DMA_BITS value Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190802132302.13537-38-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=jannh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).