All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: kernel test robot <lkp@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, LKP <lkp@lists.01.org>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>
Subject: [PATCH] workqueue: Don't double assign worker->sleeping
Date: Fri, 27 Mar 2020 18:53:50 +0100	[thread overview]
Message-ID: <20200327175350.rw5gex6cwum3ohnu@linutronix.de> (raw)
In-Reply-To: <20200327074308.GY11705@shao2-debian>

The kernel test robot triggered a warning with the following race:
   task-ctx                              interrupt-ctx
 worker
  -> process_one_work()
    -> work_item()
      -> schedule();
         -> sched_submit_work()
           -> wq_worker_sleeping()
             -> ->sleeping = 1
               atomic_dec_and_test(nr_running)
         __schedule();                *interrupt*
                                       async_page_fault()
                                       -> local_irq_enable();
                                       -> schedule();
                                          -> sched_submit_work()
                                            -> wq_worker_sleeping()
                                               -> if (WARN_ON(->sleeping)) return
                                          -> __schedule()
                                            ->  sched_update_worker()
                                              -> wq_worker_running()
                                                 -> atomic_inc(nr_running);
                                                 -> ->sleeping = 0;

      ->  sched_update_worker()
        -> wq_worker_running()
          if (!->sleeping) return

In this context the warning is pointless everything is fine.

However, if the interrupt occurs in wq_worker_sleeping() between reading and
setting `sleeping' i.e.

|        if (WARN_ON_ONCE(worker->sleeping))
|                return;
 *interrupt*
|        worker->sleeping = 1;

then pool->nr_running will be decremented twice in wq_worker_sleeping()
but it will be incremented only once in wq_worker_running().

Replace the assignment of `sleeping' with a cmpxchg_local() to ensure
that there is no double assignment of the variable. The variable is only
accessed from the local CPU. Remove the WARN statement because this
condition can be valid.

An alternative would be to move `->sleeping' to `->flags' as a new bit
but this would require to acquire the pool->lock in wq_worker_running().

Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4e01c448b4b48..dc477a2a3ce30 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -846,11 +846,10 @@ void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!worker->sleeping)
+	if (cmpxchg_local(&worker->sleeping, 1, 0) == 0)
 		return;
 	if (!(worker->flags & WORKER_NOT_RUNNING))
 		atomic_inc(&worker->pool->nr_running);
-	worker->sleeping = 0;
 }
 
 /**
@@ -875,10 +874,9 @@ void wq_worker_sleeping(struct task_struct *task)
 
 	pool = worker->pool;
 
-	if (WARN_ON_ONCE(worker->sleeping))
+	if (cmpxchg_local(&worker->sleeping, 0, 1) == 1)
 		return;
 
-	worker->sleeping = 1;
 	spin_lock_irq(&pool->lock);
 
 	/*
-- 
2.26.0


WARNING: multiple messages have this Message-ID (diff)
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: lkp@lists.01.org
Subject: [PATCH] workqueue: Don't double assign worker->sleeping
Date: Fri, 27 Mar 2020 18:53:50 +0100	[thread overview]
Message-ID: <20200327175350.rw5gex6cwum3ohnu@linutronix.de> (raw)
In-Reply-To: <20200327074308.GY11705@shao2-debian>

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

The kernel test robot triggered a warning with the following race:
   task-ctx                              interrupt-ctx
 worker
  -> process_one_work()
    -> work_item()
      -> schedule();
         -> sched_submit_work()
           -> wq_worker_sleeping()
             -> ->sleeping = 1
               atomic_dec_and_test(nr_running)
         __schedule();                *interrupt*
                                       async_page_fault()
                                       -> local_irq_enable();
                                       -> schedule();
                                          -> sched_submit_work()
                                            -> wq_worker_sleeping()
                                               -> if (WARN_ON(->sleeping)) return
                                          -> __schedule()
                                            ->  sched_update_worker()
                                              -> wq_worker_running()
                                                 -> atomic_inc(nr_running);
                                                 -> ->sleeping = 0;

      ->  sched_update_worker()
        -> wq_worker_running()
          if (!->sleeping) return

In this context the warning is pointless everything is fine.

However, if the interrupt occurs in wq_worker_sleeping() between reading and
setting `sleeping' i.e.

|        if (WARN_ON_ONCE(worker->sleeping))
|                return;
 *interrupt*
|        worker->sleeping = 1;

then pool->nr_running will be decremented twice in wq_worker_sleeping()
but it will be incremented only once in wq_worker_running().

Replace the assignment of `sleeping' with a cmpxchg_local() to ensure
that there is no double assignment of the variable. The variable is only
accessed from the local CPU. Remove the WARN statement because this
condition can be valid.

An alternative would be to move `->sleeping' to `->flags' as a new bit
but this would require to acquire the pool->lock in wq_worker_running().

Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Link: https://lkml.kernel.org/r/20200327074308.GY11705(a)shao2-debian
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4e01c448b4b48..dc477a2a3ce30 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -846,11 +846,10 @@ void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!worker->sleeping)
+	if (cmpxchg_local(&worker->sleeping, 1, 0) == 0)
 		return;
 	if (!(worker->flags & WORKER_NOT_RUNNING))
 		atomic_inc(&worker->pool->nr_running);
-	worker->sleeping = 0;
 }
 
 /**
@@ -875,10 +874,9 @@ void wq_worker_sleeping(struct task_struct *task)
 
 	pool = worker->pool;
 
-	if (WARN_ON_ONCE(worker->sleeping))
+	if (cmpxchg_local(&worker->sleeping, 0, 1) == 1)
 		return;
 
-	worker->sleeping = 1;
 	spin_lock_irq(&pool->lock);
 
 	/*
-- 
2.26.0

  reply	other threads:[~2020-03-27 17:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27  7:43 6d25be5782 ("sched/core, workqueues: Distangle worker .."): [ 52.816697] WARNING: CPU: 0 PID: 14 at kernel/workqueue.c:882 wq_worker_sleeping kernel test robot
2020-03-27  7:43 ` kernel test robot
2020-03-27 17:53 ` Sebastian Andrzej Siewior [this message]
2020-03-27 17:53   ` [PATCH] workqueue: Don't double assign worker->sleeping Sebastian Andrzej Siewior
2020-03-27 23:29   ` [PATCH v2] workqueue: Remove the warning in wq_worker_sleeping() Sebastian Andrzej Siewior
2020-03-27 23:29     ` Sebastian Andrzej Siewior
2020-04-03 14:53     ` Tejun Heo
2020-04-03 14:53       ` Tejun Heo
2020-04-03 19:29       ` Sebastian Andrzej Siewior
2020-04-03 19:29         ` Sebastian Andrzej Siewior
2020-04-03 17:45     ` Daniel Jordan
2020-04-03 17:45       ` Daniel Jordan
2020-04-03 18:25       ` Sebastian Andrzej Siewior
2020-04-03 18:25         ` Sebastian Andrzej Siewior
2020-04-03 19:05         ` Daniel Jordan
2020-04-03 19:05           ` Daniel Jordan
2020-04-01  3:22   ` [PATCH] workqueue: Don't double assign worker->sleeping Lai Jiangshan
2020-04-01  3:44     ` Lai Jiangshan
2020-04-01 13:03       ` Sebastian Andrzej Siewior
2020-04-01 13:03         ` Sebastian Andrzej Siewior
2020-04-02  0:07         ` Lai Jiangshan
2020-04-02  7:29           ` Sebastian Andrzej Siewior
2020-04-02  7:29             ` Sebastian Andrzej Siewior
2020-04-08 12:20 ` [tip: sched/urgent] workqueue: Remove the warning in wq_worker_sleeping() tip-bot2 for Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200327175350.rw5gex6cwum3ohnu@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.