linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Paul Burton <paul.burton@mips.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH 4.14 58/61] sched/core: Require cpu_active() in select_task_rq(), for user tasks
Date: Fri,  6 Jul 2018 07:47:22 +0200	[thread overview]
Message-ID: <20180706054714.558088084@linuxfoundation.org> (raw)
In-Reply-To: <20180706054712.332416244@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Burton <paul.burton@mips.com>

[ Upstream commit 7af443ee1697607541c6346c87385adab2214743 ]

select_task_rq() is used in a few paths to select the CPU upon which a
thread should be run - for example it is used by try_to_wake_up() & by
fork or exec balancing. As-is it allows use of any online CPU that is
present in the task's cpus_allowed mask.

This presents a problem because there is a period whilst CPUs are
brought online where a CPU is marked online, but is not yet fully
initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state <
CPUHP_ONLINE. Usually we don't run any user tasks during this window,
but there are corner cases where this can happen. An example observed
is:

  - Some user task A, running on CPU X, forks to create task B.

  - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's
    task_struct::cpu field to X.

  - CPU X is offlined.

  - Task A, currently somewhere between the __set_task_cpu() in
    copy_process() and the call to wake_up_new_task(), is migrated to
    CPU Y by migrate_tasks() when CPU X is offlined.

  - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The
    scheduler is now active on CPU X, but there are no user tasks on
    the runqueue.

  - Task A runs on CPU Y & reaches wake_up_new_task(). This calls
    select_task_rq() with cpu=X, taken from task B's task_struct,
    and select_task_rq() allows CPU X to be returned.

  - Task A enqueues task B on CPU X's runqueue, via activate_task() &
    enqueue_task().

  - CPU X now has a user task on its runqueue before it has reached the
    CPUHP_ONLINE state.

In most cases, the user tasks that schedule on the newly onlined CPU
have no idea that anything went wrong, but one case observed to be
problematic is if the task goes on to invoke the sched_setaffinity
syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state
before the CPU that brought it online calls stop_machine_unpark(). This
means that for a portion of the window of time between
CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct
cpu_stopper has its enabled field set to false. If a user thread is
executed on the CPU during this window and it invokes sched_setaffinity
with a CPU mask that does not include the CPU it's running on, then when
__set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke
migration_cpu_stop() and perform the actual migration away from the CPU
it will simply return -ENOENT rather than calling migration_cpu_stop().
We then return from the sched_setaffinity syscall back to the user task
that is now running on a CPU which it just asked not to run on, and
which is not present in its cpus_allowed mask.

This patch resolves the problem by having select_task_rq() enforce that
user tasks run on CPUs that are active - the same requirement that
select_fallback_rq() already enforces. This should ensure that newly
onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to
schedule user tasks, and also implies that bringup_wait_for_ap() will
have called stop_machine_unpark() which resolves the sched_setaffinity
issue above.

I haven't yet investigated them, but it may be of interest to review
whether any of the actions performed by hotplug states between
CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended
effects on user tasks that might schedule before they are reached, which
might widen the scope of the problem from just affecting the behaviour
of sched_setaffinity.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180526154648.11635-2-paul.burton@mips.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/sched/core.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1573,8 +1573,7 @@ int select_task_rq(struct task_struct *p
 	 * [ this allows ->select_task() to simply return task_cpu(p) and
 	 *   not worry about this generic constraint ]
 	 */
-	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_online(cpu)))
+	if (unlikely(!is_cpu_allowed(p, cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;



  parent reply	other threads:[~2018-07-06  5:52 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-06  5:46 [PATCH 4.14 00/61] 4.14.54-stable review Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 01/61] usb: cdc_acm: Add quirk for Uniden UBC125 scanner Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 02/61] USB: serial: cp210x: add CESINEL device ids Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 03/61] USB: serial: cp210x: add Silicon Labs IDs for Windows Update Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 04/61] usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 05/61] acpi: Add helper for deactivating memory region Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 06/61] usb: typec: ucsi: acpi: Workaround for cache mode issue Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 07/61] usb: typec: ucsi: Fix for incorrect status data issue Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 08/61] xhci: Fix kernel oops in trace_xhci_free_virt_device Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 09/61] n_tty: Fix stall at n_tty_receive_char_special() Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 10/61] n_tty: Access echo_* variables carefully Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 11/61] staging: android: ion: Return an ERR_PTR in ion_map_kernel Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 12/61] serial: 8250_pci: Remove stalled entries in blacklist Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 13/61] serdev: fix memleak on module unload Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 14/61] vt: prevent leaking uninitialized data to userspace via /dev/vcs* Greg Kroah-Hartman
2018-07-06  5:52   ` syzbot
2018-07-06  5:46 ` [PATCH 4.14 18/61] drm/qxl: Call qxl_bo_unref outside atomic context Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 19/61] drm/atmel-hlcdc: check stride values in the first plane Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 22/61] drm/i915: Enable provoking vertex fix on Gen9 systems Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 23/61] netfilter: nf_tables: nft_compat: fix refcount leak on xt module Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 24/61] netfilter: nft_compat: prepare for indirect info storage Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 25/61] netfilter: nft_compat: fix handling of large matchinfo size Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 26/61] netfilter: nf_tables: dont assume chain stats are set when jumplabel is set Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 27/61] netfilter: nf_tables: bogus EBUSY in chain deletions Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 28/61] netfilter: nft_meta: fix wrong value dereference in nft_meta_set_eval Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 29/61] netfilter: nf_tables: disable preemption in nft_update_chain_stats() Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 30/61] netfilter: nf_tables: increase nft_counters_enabled in nft_chain_stats_replace() Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 31/61] netfilter: nf_tables: fix memory leak on error exit return Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 32/61] netfilter: nf_tables: add missing netlink attrs to policies Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 33/61] netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj() Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 34/61] md: always hold reconfig_mutex when calling mddev_suspend() Greg Kroah-Hartman
2018-07-06  5:46 ` [PATCH 4.14 35/61] md: dont call bitmap_create() while array is quiesced Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 36/61] md: move suspend_hi/lo handling into core md code Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 37/61] md: use mddev_suspend/resume instead of ->quiesce() Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 38/61] md: allow metadata update while suspending Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 39/61] md: remove special meaning of ->quiesce(.., 2) Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 40/61] netfilter: dont set F_IFACE on ipv6 fib lookups Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 41/61] netfilter: ip6t_rpfilter: provide input interface for route lookup Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 42/61] netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain() Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 43/61] ARM: dts: imx6q: Use correct SDMA script for SPI5 core Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 44/61] mtd: rawnand: fix return value check for bad block status Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 46/61] afs: Fix directory permissions check Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 47/61] netfilter: ebtables: handle string from userspace with care Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 48/61] s390/dasd: use blk_mq_rq_from_pdu for per request data Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 49/61] netfilter: nft_limit: fix packet ratelimiting Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 50/61] ipvs: fix buffer overflow with sync daemon and service Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 51/61] iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 52/61] atm: zatm: fix memcmp casting Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 54/61] perf test: "Session topology" dumps core on s390 Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 55/61] perf bpf: Fix NULL return handling in bpf__prepare_load() Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 56/61] fs: clear writeback errors in inode_init_always Greg Kroah-Hartman
2018-07-06  5:47 ` [PATCH 4.14 57/61] sched/core: Fix rules for running on online && !active CPUs Greg Kroah-Hartman
2018-07-06  5:47 ` Greg Kroah-Hartman [this message]
2018-07-06  5:47 ` [PATCH 4.14 60/61] net/sonic: Use dma_mapping_error() Greg Kroah-Hartman
2018-07-06 17:54 ` [PATCH 4.14 00/61] 4.14.54-stable review Dan Rue
2018-07-07 21:39 ` Guenter Roeck
2018-07-08 13:29   ` Greg Kroah-Hartman
2018-07-09 13:28     ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180706054714.558088084@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paul.burton@mips.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).