linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sodagudi Prasad <psodagud@codeaurora.org>
To: peterz@infradead.org, mingo@kernel.org,
	gregkh@linuxfoundation.org, bigeasy@linutronix.de,
	tglx@linutronix.de
Cc: isaacm@codeaurora.org, psodagud@codeaurora.org,
	linux-kernel@vger.kernel.org, mingo@kernel.org
Subject: cpu stopper threads and setaffinity leads to deadlock
Date: Wed, 01 Aug 2018 18:34:40 -0700	[thread overview]
Message-ID: <24eebe1d874cb8e3b9a18087554544fa@codeaurora.org> (raw)

Hi Peter and Tglx,

We are observing another deadlock issue due to commit 
0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() 
deadlock), even after taking the following fix
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html 
on the Linux-4.14.56  kernel.

Here is the scenario that leads to this deadlock.
We have used the stress-ng-64 --affinity test case to reproduce this 
issue in a controlled environment, while simultaneously running CPU hot 
plug and task migrations.

Stress-ng-affin (call stack shown below) is changing its own affinity 
from cpu3 to cpu7. Stress-ng-affin is preempted in the 
cpu_stop_queue_work() function
as soon as the stopper lock for migration/3 is released . At the same 
time, on CPU 7, cross migration of tasks happens between  cpu3 and cpu7.

=======================================================
Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffffffd8817e4480
=====================================================
     Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffffffd8817e4480
     state: 0x0 exit_state: 0x0 stack base: 0xffffff801c8e8000 Prio: 120
     Stack:
     [<ffffff87754864f4>] __switch_to+0xb8
     [<ffffff87763ebf8c>] __schedule+0x690
     [<ffffff87763ec388>] preempt_schedule_common+0x100
     [<ffffff87763eb8f4>] preempt_schedule+0x24
     [<ffffff87763f0e58>] _raw_spin_unlock_irqrestore+0x64
     [<ffffff8775574f8c>] cpu_stop_queue_work+0x9c
     [<ffffff8775574dfc>] stop_one_cpu+0x58
     [<ffffff87754e4884>] __set_cpus_allowed_ptr+0x234
     [<ffffff87754e8888>] sched_setaffinity+0x150
     [<ffffff87754e8ad8>] SyS_sched_setaffinity+0xcc
     [<ffffff87754837c0>] el0_svc_naked+0x34
     [<0>] UNKNOWN+0x0

Due to cross migration of tasks between cpu7 and cpu3, migration/7 has 
started executing and waits for the migration/3 task, so that they can 
proceed within the multi cpu stop state machine together.
Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 
has started running, and has monopolized cpu7’s execution, stress-ng 
will never run on cpu7, and cpu3’s migration task is never woken up.

Essentially:
Due to the nature of the wake_q interface,  a thread can only be in at 
most one wake queue at a time.
migration/3 is currently in stress-ng-affin’s wake_q. This means that no 
other thread can add migration/3 to their wake queue.
Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, 
hot plugging, etc), no thread will wake up migration/3.

Below change helped to fix this deadlock.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..f932e1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, 
struct cpu_stop_work *work)
                 __cpu_stop_queue_work(stopper, work, &wakeq);
         else if (work->done)
                 cpu_stop_signal_done(work->done);
-       raw_spin_unlock_irqrestore(&stopper->lock, flags);

         wake_up_q(&wakeq);
+       raw_spin_unlock_irqrestore(&stopper->lock, flags);


-Thanks, Prasad

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

             reply	other threads:[~2018-08-02  1:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-02  1:34 Sodagudi Prasad [this message]
2018-08-02  8:12 ` cpu stopper threads and setaffinity leads to deadlock Peter Zijlstra
2018-08-02  8:27   ` Mike Galbraith
2018-08-02  8:45 ` Peter Zijlstra
2018-08-02  9:49 ` Peter Zijlstra
2018-08-03 11:41   ` Thomas Gleixner
2018-08-03 18:57     ` Sodagudi Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24eebe1d874cb8e3b9a18087554544fa@codeaurora.org \
    --to=psodagud@codeaurora.org \
    --cc=bigeasy@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=isaacm@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).