All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] sched: Cleanup task_struct::state
@ 2021-06-02 13:12 ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

Hi!

The task_struct::state variable is a bit odd in a number of ways:

 - it's declared 'volatile' (against current practises);
 - it's 'unsigned long' which is a weird size;
 - it's type is inconsistent when used for function arguments.

These patches clean that up by making it consistently 'unsigned int', and
replace (almost) all accesses with READ_ONCE()/WRITE_ONCE(). In order to not
miss any, the variable is renamed, ensuring a missed conversion results in a
compile error.

The first few patches fix a number of pre-existing errors and introduce a few
helpers to make the final conversion less painful.



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 0/6] sched: Cleanup task_struct::state
@ 2021-06-02 13:12 ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

Hi!

The task_struct::state variable is a bit odd in a number of ways:

 - it's declared 'volatile' (against current practises);
 - it's 'unsigned long' which is a weird size;
 - it's type is inconsistent when used for function arguments.

These patches clean that up by making it consistently 'unsigned int', and
replace (almost) all accesses with READ_ONCE()/WRITE_ONCE(). In order to not
miss any, the variable is renamed, ensuring a missed conversion results in a
compile error.

The first few patches fix a number of pre-existing errors and introduce a few
helpers to make the final conversion less painful.


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/6] sched: Unbreak wakeups
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

Remove broken task->state references and let wake_up_process() DTRT.

The anti-pattern in these patches breaks the ordering of ->state vs
COND as described in the comment near set_current_state() and can lead
to missed wakeups:

	(OoO load, observes RUNNING)<-.
	for (;;) {                    |
	  t->state = UNINTERRUPTIBLE; |
	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
                             |       |
	                     |       |	COND = 1;
			     |	     `- if (t->state != RUNNING)
                             |		  wake_up_process(t); // not done
	  if (COND) ---------'
	    break;
	  schedule(); // forever waiting
	}
	t->state = TASK_RUNNING;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
 drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
 drivers/usb/host/max3421-hcd.c          |    3 +--
 kernel/softirq.c                        |    2 +-
 4 files changed, 9 insertions(+), 17 deletions(-)

--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -653,8 +653,7 @@ qcaspi_intr_handler(int irq, void *data)
 	struct qcaspi *qca = data;
 
 	qca->intr_req++;
-	if (qca->spi_thread &&
-	    qca->spi_thread->state != TASK_RUNNING)
+	if (qca->spi_thread)
 		wake_up_process(qca->spi_thread);
 
 	return IRQ_HANDLED;
@@ -777,8 +776,7 @@ qcaspi_netdev_xmit(struct sk_buff *skb,
 
 	netif_trans_update(dev);
 
-	if (qca->spi_thread &&
-	    qca->spi_thread->state != TASK_RUNNING)
+	if (qca->spi_thread)
 		wake_up_process(qca->spi_thread);
 
 	return NETDEV_TX_OK;
--- a/drivers/usb/gadget/udc/max3420_udc.c
+++ b/drivers/usb/gadget/udc/max3420_udc.c
@@ -509,8 +509,7 @@ static irqreturn_t max3420_vbus_handler(
 			     ? USB_STATE_POWERED : USB_STATE_NOTATTACHED);
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return IRQ_HANDLED;
@@ -529,8 +528,7 @@ static irqreturn_t max3420_irq_handler(i
 	}
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return IRQ_HANDLED;
@@ -1093,8 +1091,7 @@ static int max3420_wakeup(struct usb_gad
 
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 	return ret;
 }
@@ -1117,8 +1114,7 @@ static int max3420_udc_start(struct usb_
 	udc->todo |= UDC_START;
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return 0;
@@ -1137,8 +1133,7 @@ static int max3420_udc_stop(struct usb_g
 	udc->todo |= UDC_START;
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return 0;
--- a/drivers/usb/host/max3421-hcd.c
+++ b/drivers/usb/host/max3421-hcd.c
@@ -1169,8 +1169,7 @@ max3421_irq_handler(int irq, void *dev_i
 	struct spi_device *spi = to_spi_device(hcd->self.controller);
 	struct max3421_hcd *max3421_hcd = hcd_to_max3421(hcd);
 
-	if (max3421_hcd->spi_thread &&
-	    max3421_hcd->spi_thread->state != TASK_RUNNING)
+	if (max3421_hcd->spi_thread)
 		wake_up_process(max3421_hcd->spi_thread);
 	if (!test_and_set_bit(ENABLE_IRQ, &max3421_hcd->todo))
 		disable_irq_nosync(spi->irq);
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -76,7 +76,7 @@ static void wakeup_softirqd(void)
 	/* Interrupts are disabled: no need to stop preemption */
 	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
 
-	if (tsk && tsk->state != TASK_RUNNING)
+	if (tsk)
 		wake_up_process(tsk);
 }
 



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 1/6] sched: Unbreak wakeups
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

Remove broken task->state references and let wake_up_process() DTRT.

The anti-pattern in these patches breaks the ordering of ->state vs
COND as described in the comment near set_current_state() and can lead
to missed wakeups:

	(OoO load, observes RUNNING)<-.
	for (;;) {                    |
	  t->state = UNINTERRUPTIBLE; |
	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
                             |       |
	                     |       |	COND = 1;
			     |	     `- if (t->state != RUNNING)
                             |		  wake_up_process(t); // not done
	  if (COND) ---------'
	    break;
	  schedule(); // forever waiting
	}
	t->state = TASK_RUNNING;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
 drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
 drivers/usb/host/max3421-hcd.c          |    3 +--
 kernel/softirq.c                        |    2 +-
 4 files changed, 9 insertions(+), 17 deletions(-)

--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -653,8 +653,7 @@ qcaspi_intr_handler(int irq, void *data)
 	struct qcaspi *qca = data;
 
 	qca->intr_req++;
-	if (qca->spi_thread &&
-	    qca->spi_thread->state != TASK_RUNNING)
+	if (qca->spi_thread)
 		wake_up_process(qca->spi_thread);
 
 	return IRQ_HANDLED;
@@ -777,8 +776,7 @@ qcaspi_netdev_xmit(struct sk_buff *skb,
 
 	netif_trans_update(dev);
 
-	if (qca->spi_thread &&
-	    qca->spi_thread->state != TASK_RUNNING)
+	if (qca->spi_thread)
 		wake_up_process(qca->spi_thread);
 
 	return NETDEV_TX_OK;
--- a/drivers/usb/gadget/udc/max3420_udc.c
+++ b/drivers/usb/gadget/udc/max3420_udc.c
@@ -509,8 +509,7 @@ static irqreturn_t max3420_vbus_handler(
 			     ? USB_STATE_POWERED : USB_STATE_NOTATTACHED);
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return IRQ_HANDLED;
@@ -529,8 +528,7 @@ static irqreturn_t max3420_irq_handler(i
 	}
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return IRQ_HANDLED;
@@ -1093,8 +1091,7 @@ static int max3420_wakeup(struct usb_gad
 
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 	return ret;
 }
@@ -1117,8 +1114,7 @@ static int max3420_udc_start(struct usb_
 	udc->todo |= UDC_START;
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return 0;
@@ -1137,8 +1133,7 @@ static int max3420_udc_stop(struct usb_g
 	udc->todo |= UDC_START;
 	spin_unlock_irqrestore(&udc->lock, flags);
 
-	if (udc->thread_task &&
-	    udc->thread_task->state != TASK_RUNNING)
+	if (udc->thread_task)
 		wake_up_process(udc->thread_task);
 
 	return 0;
--- a/drivers/usb/host/max3421-hcd.c
+++ b/drivers/usb/host/max3421-hcd.c
@@ -1169,8 +1169,7 @@ max3421_irq_handler(int irq, void *dev_i
 	struct spi_device *spi = to_spi_device(hcd->self.controller);
 	struct max3421_hcd *max3421_hcd = hcd_to_max3421(hcd);
 
-	if (max3421_hcd->spi_thread &&
-	    max3421_hcd->spi_thread->state != TASK_RUNNING)
+	if (max3421_hcd->spi_thread)
 		wake_up_process(max3421_hcd->spi_thread);
 	if (!test_and_set_bit(ENABLE_IRQ, &max3421_hcd->todo))
 		disable_irq_nosync(spi->irq);
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -76,7 +76,7 @@ static void wakeup_softirqd(void)
 	/* Interrupts are disabled: no need to stop preemption */
 	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
 
-	if (tsk && tsk->state != TASK_RUNNING)
+	if (tsk)
 		wake_up_process(tsk);
 }
 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/6] sched: Introduce task_is_running()
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/process.c |    4 ++--
 block/blk-mq.c            |    2 +-
 include/linux/sched.h     |    2 ++
 kernel/locking/lockdep.c  |    2 +-
 kernel/rcu/tree_plugin.h  |    2 +-
 kernel/sched/core.c       |    6 +++---
 kernel/sched/stats.h      |    2 +-
 kernel/signal.c           |    2 +-
 kernel/softirq.c          |    3 +--
 mm/compaction.c           |    2 +-
 10 files changed, 14 insertions(+), 13 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
 	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
 	int count = 0;
 
-	if (p == current || p->state == TASK_RUNNING)
+	if (p == current || task_is_running(p))
 		return 0;
 
 	if (!try_get_task_stack(p))
@@ -975,7 +975,7 @@ unsigned long get_wchan(struct task_stru
 			goto out;
 		}
 		fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
-	} while (count++ < 16 && p->state != TASK_RUNNING);
+	} while (count++ < 16 && !task_is_running(p));
 
 out:
 	put_task_stack(p);
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3926,7 +3926,7 @@ int blk_poll(struct request_queue *q, bl
 		if (signal_pending_state(state, current))
 			__set_current_state(TASK_RUNNING);
 
-		if (current->state == TASK_RUNNING)
+		if (task_is_running(current))
 			return 1;
 		if (ret < 0 || !spin)
 			break;
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,6 +113,8 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
+#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
+
 #define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
 
 #define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -760,7 +760,7 @@ static void lockdep_print_held_locks(str
 	 * It's not reliable to print a task's held locks if it's not sleeping
 	 * and it's not the current task.
 	 */
-	if (p->state == TASK_RUNNING && p != current)
+	if (p != current && task_is_running(p))
 		return;
 	for (i = 0; i < depth; i++) {
 		printk(" #%d: ", i);
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2768,7 +2768,7 @@ EXPORT_SYMBOL_GPL(rcu_bind_current_to_no
 #ifdef CONFIG_SMP
 static char *show_rcu_should_be_on_cpu(struct task_struct *tsp)
 {
-	return tsp && tsp->state == TASK_RUNNING && !tsp->on_cpu ? "!" : "";
+	return tsp && task_is_running(tsp) && !tsp->on_cpu ? "!" : "";
 }
 #else // #ifdef CONFIG_SMP
 static char *show_rcu_should_be_on_cpu(struct task_struct *tsp)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5989,7 +5989,7 @@ static inline void sched_submit_work(str
 {
 	unsigned int task_flags;
 
-	if (!tsk->state)
+	if (task_is_running(tsk))
 		return;
 
 	task_flags = tsk->flags;
@@ -7964,7 +7964,7 @@ int __sched yield_to(struct task_struct
 	if (curr->sched_class != p->sched_class)
 		goto out_unlock;
 
-	if (task_running(p_rq, p) || p->state)
+	if (task_running(p_rq, p) || !task_is_running(p))
 		goto out_unlock;
 
 	yielded = curr->sched_class->yield_to_task(rq, p);
@@ -8167,7 +8167,7 @@ void sched_show_task(struct task_struct
 
 	pr_info("task:%-15.15s state:%c", p->comm, task_state_to_char(p));
 
-	if (p->state == TASK_RUNNING)
+	if (task_is_running(p))
 		pr_cont("  running task    ");
 #ifdef CONFIG_DEBUG_STACK_USAGE
 	free = stack_not_used(p);
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -217,7 +217,7 @@ static inline void sched_info_depart(str
 
 	rq_sched_info_depart(rq, delta);
 
-	if (t->state == TASK_RUNNING)
+	if (task_is_running(t))
 		sched_info_enqueue(rq, t);
 }
 
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4719,7 +4719,7 @@ void kdb_send_sig(struct task_struct *t,
 	}
 	new_t = kdb_prev_t != t;
 	kdb_prev_t = t;
-	if (t->state != TASK_RUNNING && new_t) {
+	if (!task_is_running(t) && new_t) {
 		spin_unlock(&t->sighand->siglock);
 		kdb_printf("Process is not RUNNING, sending a signal from "
 			   "kdb risks deadlock\n"
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -92,8 +92,7 @@ static bool ksoftirqd_running(unsigned l
 
 	if (pending & SOFTIRQ_NOW_MASK)
 		return false;
-	return tsk && (tsk->state == TASK_RUNNING) &&
-		!__kthread_should_park(tsk);
+	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
 
 #ifdef CONFIG_TRACE_IRQFLAGS
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1955,7 +1955,7 @@ static inline bool is_via_compact_memory
 
 static bool kswapd_is_running(pg_data_t *pgdat)
 {
-	return pgdat->kswapd && (pgdat->kswapd->state == TASK_RUNNING);
+	return pgdat->kswapd && task_is_running(pgdat->kswapd);
 }
 
 /*



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 2/6] sched: Introduce task_is_running()
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/process.c |    4 ++--
 block/blk-mq.c            |    2 +-
 include/linux/sched.h     |    2 ++
 kernel/locking/lockdep.c  |    2 +-
 kernel/rcu/tree_plugin.h  |    2 +-
 kernel/sched/core.c       |    6 +++---
 kernel/sched/stats.h      |    2 +-
 kernel/signal.c           |    2 +-
 kernel/softirq.c          |    3 +--
 mm/compaction.c           |    2 +-
 10 files changed, 14 insertions(+), 13 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
 	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
 	int count = 0;
 
-	if (p == current || p->state == TASK_RUNNING)
+	if (p == current || task_is_running(p))
 		return 0;
 
 	if (!try_get_task_stack(p))
@@ -975,7 +975,7 @@ unsigned long get_wchan(struct task_stru
 			goto out;
 		}
 		fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
-	} while (count++ < 16 && p->state != TASK_RUNNING);
+	} while (count++ < 16 && !task_is_running(p));
 
 out:
 	put_task_stack(p);
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3926,7 +3926,7 @@ int blk_poll(struct request_queue *q, bl
 		if (signal_pending_state(state, current))
 			__set_current_state(TASK_RUNNING);
 
-		if (current->state == TASK_RUNNING)
+		if (task_is_running(current))
 			return 1;
 		if (ret < 0 || !spin)
 			break;
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,6 +113,8 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
+#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
+
 #define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
 
 #define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -760,7 +760,7 @@ static void lockdep_print_held_locks(str
 	 * It's not reliable to print a task's held locks if it's not sleeping
 	 * and it's not the current task.
 	 */
-	if (p->state == TASK_RUNNING && p != current)
+	if (p != current && task_is_running(p))
 		return;
 	for (i = 0; i < depth; i++) {
 		printk(" #%d: ", i);
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2768,7 +2768,7 @@ EXPORT_SYMBOL_GPL(rcu_bind_current_to_no
 #ifdef CONFIG_SMP
 static char *show_rcu_should_be_on_cpu(struct task_struct *tsp)
 {
-	return tsp && tsp->state == TASK_RUNNING && !tsp->on_cpu ? "!" : "";
+	return tsp && task_is_running(tsp) && !tsp->on_cpu ? "!" : "";
 }
 #else // #ifdef CONFIG_SMP
 static char *show_rcu_should_be_on_cpu(struct task_struct *tsp)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5989,7 +5989,7 @@ static inline void sched_submit_work(str
 {
 	unsigned int task_flags;
 
-	if (!tsk->state)
+	if (task_is_running(tsk))
 		return;
 
 	task_flags = tsk->flags;
@@ -7964,7 +7964,7 @@ int __sched yield_to(struct task_struct
 	if (curr->sched_class != p->sched_class)
 		goto out_unlock;
 
-	if (task_running(p_rq, p) || p->state)
+	if (task_running(p_rq, p) || !task_is_running(p))
 		goto out_unlock;
 
 	yielded = curr->sched_class->yield_to_task(rq, p);
@@ -8167,7 +8167,7 @@ void sched_show_task(struct task_struct
 
 	pr_info("task:%-15.15s state:%c", p->comm, task_state_to_char(p));
 
-	if (p->state == TASK_RUNNING)
+	if (task_is_running(p))
 		pr_cont("  running task    ");
 #ifdef CONFIG_DEBUG_STACK_USAGE
 	free = stack_not_used(p);
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -217,7 +217,7 @@ static inline void sched_info_depart(str
 
 	rq_sched_info_depart(rq, delta);
 
-	if (t->state == TASK_RUNNING)
+	if (task_is_running(t))
 		sched_info_enqueue(rq, t);
 }
 
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4719,7 +4719,7 @@ void kdb_send_sig(struct task_struct *t,
 	}
 	new_t = kdb_prev_t != t;
 	kdb_prev_t = t;
-	if (t->state != TASK_RUNNING && new_t) {
+	if (!task_is_running(t) && new_t) {
 		spin_unlock(&t->sighand->siglock);
 		kdb_printf("Process is not RUNNING, sending a signal from "
 			   "kdb risks deadlock\n"
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -92,8 +92,7 @@ static bool ksoftirqd_running(unsigned l
 
 	if (pending & SOFTIRQ_NOW_MASK)
 		return false;
-	return tsk && (tsk->state == TASK_RUNNING) &&
-		!__kthread_should_park(tsk);
+	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
 
 #ifdef CONFIG_TRACE_IRQFLAGS
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1955,7 +1955,7 @@ static inline bool is_via_compact_memory
 
 static bool kswapd_is_running(pg_data_t *pgdat)
 {
-	return pgdat->kswapd && (pgdat->kswapd->state == TASK_RUNNING);
+	return pgdat->kswapd && task_is_running(pgdat->kswapd);
 }
 
 /*


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

When ran from the sched-out path (preempt_notifier or perf_event),
p->state is irrelevant to determine preemption. You can get preempted
with !task_is_running() just fine.

The right indicator for preemption is if the task is still on the
runqueue in the sched-out path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    7 +++----
 virt/kvm/kvm_main.c  |    2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
 		},
 	};
 
-	if (!sched_in && task->state == TASK_RUNNING)
+	if (!sched_in && current->on_rq) {
 		switch_event.event_id.header.misc |=
 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
+	}
 
-	perf_iterate_sb(perf_event_switch_output,
-		       &switch_event,
-		       NULL);
+	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
 }
 
 /*
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
 {
 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
-	if (current->state == TASK_RUNNING) {
+	if (current->on_rq) {
 		WRITE_ONCE(vcpu->preempted, true);
 		WRITE_ONCE(vcpu->ready, true);
 	}



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 3/6] sched,perf,kvm: Fix preemption condition
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

When ran from the sched-out path (preempt_notifier or perf_event),
p->state is irrelevant to determine preemption. You can get preempted
with !task_is_running() just fine.

The right indicator for preemption is if the task is still on the
runqueue in the sched-out path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    7 +++----
 virt/kvm/kvm_main.c  |    2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
 		},
 	};
 
-	if (!sched_in && task->state == TASK_RUNNING)
+	if (!sched_in && current->on_rq) {
 		switch_event.event_id.header.misc |=
 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
+	}
 
-	perf_iterate_sb(perf_event_switch_output,
-		       &switch_event,
-		       NULL);
+	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
 }
 
 /*
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
 {
 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
-	if (current->state == TASK_RUNNING) {
+	if (current->on_rq) {
 		WRITE_ONCE(vcpu->preempted, true);
 		WRITE_ONCE(vcpu->ready, true);
 	}


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 4/6] sched: Add get_current_state()
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

Remove yet another few p->state accesses.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 block/blk-mq.c        |    2 +-
 include/linux/sched.h |    2 ++
 kernel/freezer.c      |    2 +-
 kernel/sched/core.c   |    6 +++---
 4 files changed, 7 insertions(+), 5 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3891,7 +3891,7 @@ int blk_poll(struct request_queue *q, bl
 
 	hctx->poll_considered++;
 
-	state = current->state;
+	state = get_current_state();
 	do {
 		int ret;
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -212,6 +212,8 @@ struct task_group;
 
 #endif
 
+#define get_current_state()	READ_ONCE(current->state)
+
 /* Task command name length: */
 #define TASK_COMM_LEN			16
 
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -58,7 +58,7 @@ bool __refrigerator(bool check_kthr_stop
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	bool was_frozen = false;
-	long save = current->state;
+	unsigned int save = get_current_state();
 
 	pr_debug("%s entered refrigerator\n", current->comm);
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8273,15 +8273,15 @@ static inline int preempt_count_equals(i
 
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
+	unsigned int state = get_current_state();
 	/*
 	 * Blocking primitives will set (and therefore destroy) current->state,
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
 	 * otherwise we will destroy state.
 	 */
-	WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,
+	WARN_ONCE(state != TASK_RUNNING && current->task_state_change,
 			"do not call blocking ops when !TASK_RUNNING; "
-			"state=%lx set at [<%p>] %pS\n",
-			current->state,
+			"state=%x set at [<%p>] %pS\n", state,
 			(void *)current->task_state_change,
 			(void *)current->task_state_change);
 



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 4/6] sched: Add get_current_state()
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

Remove yet another few p->state accesses.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 block/blk-mq.c        |    2 +-
 include/linux/sched.h |    2 ++
 kernel/freezer.c      |    2 +-
 kernel/sched/core.c   |    6 +++---
 4 files changed, 7 insertions(+), 5 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3891,7 +3891,7 @@ int blk_poll(struct request_queue *q, bl
 
 	hctx->poll_considered++;
 
-	state = current->state;
+	state = get_current_state();
 	do {
 		int ret;
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -212,6 +212,8 @@ struct task_group;
 
 #endif
 
+#define get_current_state()	READ_ONCE(current->state)
+
 /* Task command name length: */
 #define TASK_COMM_LEN			16
 
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -58,7 +58,7 @@ bool __refrigerator(bool check_kthr_stop
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	bool was_frozen = false;
-	long save = current->state;
+	unsigned int save = get_current_state();
 
 	pr_debug("%s entered refrigerator\n", current->comm);
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8273,15 +8273,15 @@ static inline int preempt_count_equals(i
 
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
+	unsigned int state = get_current_state();
 	/*
 	 * Blocking primitives will set (and therefore destroy) current->state,
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
 	 * otherwise we will destroy state.
 	 */
-	WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,
+	WARN_ONCE(state != TASK_RUNNING && current->task_state_change,
 			"do not call blocking ops when !TASK_RUNNING; "
-			"state=%lx set at [<%p>] %pS\n",
-			current->state,
+			"state=%x set at [<%p>] %pS\n", state,
 			(void *)current->task_state_change,
 			(void *)current->task_state_change);
 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 5/6] sched,timer: Use __set_current_state()
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/time/timer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1879,7 +1879,7 @@ signed long __sched schedule_timeout(sig
 			printk(KERN_ERR "schedule_timeout: wrong timeout "
 				"value %lx\n", timeout);
 			dump_stack();
-			current->state = TASK_RUNNING;
+			__set_current_state(TASK_RUNNING);
 			goto out;
 		}
 	}



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 5/6] sched,timer: Use __set_current_state()
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/time/timer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1879,7 +1879,7 @@ signed long __sched schedule_timeout(sig
 			printk(KERN_ERR "schedule_timeout: wrong timeout "
 				"value %lx\n", timeout);
 			dump_stack();
-			current->state = TASK_RUNNING;
+			__set_current_state(TASK_RUNNING);
 			goto out;
 		}
 	}


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 6/6] sched: Change task_struct::state
  2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 block/blk-mq.c                 |    2 -
 drivers/md/dm.c                |    6 ++--
 fs/binfmt_elf.c                |    8 +++---
 fs/userfaultfd.c               |    4 +--
 include/linux/sched.h          |   31 +++++++++++------------
 include/linux/sched/debug.h    |    2 -
 include/linux/sched/signal.h   |    2 -
 init/init_task.c               |    2 -
 kernel/cgroup/cgroup-v1.c      |    2 -
 kernel/debug/kdb/kdb_support.c |   18 +++++++------
 kernel/fork.c                  |    4 +--
 kernel/hung_task.c             |    2 -
 kernel/kthread.c               |    4 +--
 kernel/locking/mutex.c         |    6 ++--
 kernel/locking/rtmutex.c       |    4 +--
 kernel/locking/rwsem.c         |    2 -
 kernel/ptrace.c                |   12 ++++-----
 kernel/rcu/rcutorture.c        |    4 +--
 kernel/rcu/tree_stall.h        |   12 ++++-----
 kernel/sched/core.c            |   53 +++++++++++++++++++++--------------------
 kernel/sched/deadline.c        |   10 +++----
 kernel/sched/fair.c            |   11 +++++---
 lib/syscall.c                  |    4 +--
 net/core/dev.c                 |    2 -
 24 files changed, 108 insertions(+), 99 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3886,7 +3886,7 @@ static bool blk_mq_poll_hybrid(struct re
 int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
-	long state;
+	unsigned int state;
 
 	if (!blk_qc_t_valid(cookie) ||
 	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2328,7 +2328,7 @@ static bool md_in_flight_bios(struct map
 	return sum != 0;
 }
 
-static int dm_wait_for_bios_completion(struct mapped_device *md, long task_state)
+static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int task_state)
 {
 	int r = 0;
 	DEFINE_WAIT(wait);
@@ -2351,7 +2351,7 @@ static int dm_wait_for_bios_completion(s
 	return r;
 }
 
-static int dm_wait_for_completion(struct mapped_device *md, long task_state)
+static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state)
 {
 	int r = 0;
 
@@ -2478,7 +2478,7 @@ static void unlock_fs(struct mapped_devi
  * are being added to md->deferred list.
  */
 static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
-			unsigned suspend_flags, long task_state,
+			unsigned suspend_flags, unsigned int task_state,
 			int dmf_suspended_flag)
 {
 	bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1537,7 +1537,8 @@ static int fill_psinfo(struct elf_prpsin
 {
 	const struct cred *cred;
 	unsigned int i, len;
-	
+	unsigned int state;
+
 	/* first copy the parameters from user space */
 	memset(psinfo, 0, sizeof(struct elf_prpsinfo));
 
@@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
 	psinfo->pr_pgrp = task_pgrp_vnr(p);
 	psinfo->pr_sid = task_session_vnr(p);
 
-	i = p->state ? ffz(~p->state) + 1 : 0;
+	state = READ_ONCE(p->__state);
+	i = state ? ffz(~state) + 1 : 0;
 	psinfo->pr_state = i;
 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';
@@ -1571,7 +1573,7 @@ static int fill_psinfo(struct elf_prpsin
 	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
-	
+
 	return 0;
 }
 
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -337,7 +337,7 @@ static inline bool userfaultfd_must_wait
 	return ret;
 }
 
-static inline long userfaultfd_get_blocking_state(unsigned int flags)
+static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
 {
 	if (flags & FAULT_FLAG_INTERRUPTIBLE)
 		return TASK_INTERRUPTIBLE;
@@ -370,7 +370,7 @@ vm_fault_t handle_userfault(struct vm_fa
 	struct userfaultfd_wait_queue uwq;
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	bool must_wait;
-	long blocking_state;
+	unsigned int blocking_state;
 
 	/*
 	 * We don't do userfault handling for the final child pid update.
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,13 +113,13 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
-#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
+#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
 
-#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
 
-#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
@@ -134,14 +134,14 @@ struct task_group;
 	do {							\
 		WARN_ON_ONCE(is_special_task_state(state_value));\
 		current->task_state_change = _THIS_IP_;		\
-		current->state = (state_value);			\
+		WRITE_ONCE(current->__state, (state_value));	\
 	} while (0)
 
 #define set_current_state(state_value)				\
 	do {							\
 		WARN_ON_ONCE(is_special_task_state(state_value));\
 		current->task_state_change = _THIS_IP_;		\
-		smp_store_mb(current->state, (state_value));	\
+		smp_store_mb(current->__state, (state_value));	\
 	} while (0)
 
 #define set_special_state(state_value)					\
@@ -150,7 +150,7 @@ struct task_group;
 		WARN_ON_ONCE(!is_special_task_state(state_value));	\
 		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		current->task_state_change = _THIS_IP_;			\
-		current->state = (state_value);				\
+		WRITE_ONCE(current->__state, (state_value));		\
 		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0)
 #else
@@ -192,10 +192,10 @@ struct task_group;
  * Also see the comments of try_to_wake_up().
  */
 #define __set_current_state(state_value)				\
-	current->state = (state_value)
+	WRITE_ONCE(current->__state, (state_value))
 
 #define set_current_state(state_value)					\
-	smp_store_mb(current->state, (state_value))
+	smp_store_mb(current->__state, (state_value))
 
 /*
  * set_special_state() should be used for those states when the blocking task
@@ -207,13 +207,13 @@ struct task_group;
 	do {								\
 		unsigned long flags; /* may shadow */			\
 		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
-		current->state = (state_value);				\
+		WRITE_ONCE(current->__state, (state_value));		\
 		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0)
 
 #endif
 
-#define get_current_state()	READ_ONCE(current->state)
+#define get_current_state()	READ_ONCE(current->__state)
 
 /* Task command name length: */
 #define TASK_COMM_LEN			16
@@ -658,8 +658,7 @@ struct task_struct {
 	 */
 	struct thread_info		thread_info;
 #endif
-	/* -1 unrunnable, 0 runnable, >0 stopped: */
-	volatile long			state;
+	unsigned int			__state;
 
 	/*
 	 * This begins the randomizable portion of task_struct. Only
@@ -1524,7 +1523,7 @@ static inline pid_t task_pgrp_nr(struct
 
 static inline unsigned int task_state_index(struct task_struct *tsk)
 {
-	unsigned int tsk_state = READ_ONCE(tsk->state);
+	unsigned int tsk_state = READ_ONCE(tsk->__state);
 	unsigned int state = (tsk_state | tsk->exit_state) & TASK_REPORT;
 
 	BUILD_BUG_ON_NOT_POWER_OF_2(TASK_REPORT_MAX);
@@ -1832,10 +1831,10 @@ static __always_inline void scheduler_ip
 	 */
 	preempt_fold_need_resched();
 }
-extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
+extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state);
 #else
 static inline void scheduler_ipi(void) { }
-static inline unsigned long wait_task_inactive(struct task_struct *p, long match_state)
+static inline unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
 	return 1;
 }
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -14,7 +14,7 @@ extern void dump_cpu_task(int cpu);
 /*
  * Only dump TASK_* tasks. (0 for all tasks)
  */
-extern void show_state_filter(unsigned long state_filter);
+extern void show_state_filter(unsigned int state_filter);
 
 static inline void show_state(void)
 {
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -382,7 +382,7 @@ static inline int fatal_signal_pending(s
 	return task_sigpending(p) && __fatal_signal_pending(p);
 }
 
-static inline int signal_pending_state(long state, struct task_struct *p)
+static inline int signal_pending_state(unsigned int state, struct task_struct *p)
 {
 	if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
 		return 0;
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -71,7 +71,7 @@ struct task_struct init_task
 	.thread_info	= INIT_THREAD_INFO(init_task),
 	.stack_refcount	= REFCOUNT_INIT(1),
 #endif
-	.state		= 0,
+	.__state	= 0,
 	.stack		= init_stack,
 	.usage		= REFCOUNT_INIT(2),
 	.flags		= PF_KTHREAD,
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -713,7 +713,7 @@ int cgroupstats_build(struct cgroupstats
 
 	css_task_iter_start(&cgrp->self, 0, &it);
 	while ((tsk = css_task_iter_next(&it))) {
-		switch (tsk->state) {
+		switch (READ_ONCE(tsk->__state)) {
 		case TASK_RUNNING:
 			stats->nr_running++;
 			break;
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
  */
 char kdb_task_state_char (const struct task_struct *p)
 {
-	int cpu;
-	char state;
+	unsigned int p_state;
 	unsigned long tmp;
+	char state;
+	int cpu;
 
 	if (!p ||
 	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
 		return 'E';
 
 	cpu = kdb_process_cpu(p);
-	state = (p->state == 0) ? 'R' :
-		(p->state < 0) ? 'U' :
-		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
-		(p->state & TASK_STOPPED) ? 'T' :
-		(p->state & TASK_TRACED) ? 'C' :
+	p_state = READ_ONCE(p->__state);
+	state = (p_state == 0) ? 'R' :
+		(p_state < 0) ? 'U' :
+		(p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
+		(p_state & TASK_STOPPED) ? 'T' :
+		(p_state & TASK_TRACED) ? 'C' :
 		(p->exit_state & EXIT_ZOMBIE) ? 'Z' :
 		(p->exit_state & EXIT_DEAD) ? 'E' :
-		(p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
+		(p_state & TASK_INTERRUPTIBLE) ? 'S' : '?';
 	if (is_idle_task(p)) {
 		/* Idle task.  Is it really idle, apart from the kdb
 		 * interrupt? */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -425,7 +425,7 @@ static int memcg_charge_kernel_stack(str
 
 static void release_task_stack(struct task_struct *tsk)
 {
-	if (WARN_ON(tsk->state != TASK_DEAD))
+	if (WARN_ON(READ_ONCE(tsk->__state) != TASK_DEAD))
 		return;  /* Better to leak the stack than to free prematurely */
 
 	account_kernel_stack(tsk, -1);
@@ -2392,7 +2392,7 @@ static __latent_entropy struct task_stru
 	atomic_dec(&p->cred->user->processes);
 	exit_creds(p);
 bad_fork_free:
-	p->state = TASK_DEAD;
+	WRITE_ONCE(p->__state, TASK_DEAD);
 	put_task_stack(p);
 	delayed_free_task(p);
 fork_out:
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -196,7 +196,7 @@ static void check_hung_uninterruptible_t
 			last_break = jiffies;
 		}
 		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
-		if (t->state == TASK_UNINTERRUPTIBLE)
+		if (READ_ONCE(t->__state) == TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, timeout);
 	}
  unlock:
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -457,7 +457,7 @@ struct task_struct *kthread_create_on_no
 }
 EXPORT_SYMBOL(kthread_create_on_node);
 
-static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
+static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, unsigned int state)
 {
 	unsigned long flags;
 
@@ -473,7 +473,7 @@ static void __kthread_bind_mask(struct t
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 }
 
-static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
+static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int state)
 {
 	__kthread_bind_mask(p, cpumask_of(cpu), state);
 }
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -923,7 +923,7 @@ __ww_mutex_add_waiter(struct mutex_waite
  * Lock a mutex (possibly interruptible), slowpath:
  */
 static __always_inline int __sched
-__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
+__mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclass,
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
@@ -1098,14 +1098,14 @@ __mutex_lock_common(struct mutex *lock,
 }
 
 static int __sched
-__mutex_lock(struct mutex *lock, long state, unsigned int subclass,
+__mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 	     struct lockdep_map *nest_lock, unsigned long ip)
 {
 	return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false);
 }
 
 static int __sched
-__ww_mutex_lock(struct mutex *lock, long state, unsigned int subclass,
+__ww_mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 		struct lockdep_map *nest_lock, unsigned long ip,
 		struct ww_acquire_ctx *ww_ctx)
 {
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1135,7 +1135,7 @@ void __sched rt_mutex_init_waiter(struct
  *
  * Must be called with lock->wait_lock held and interrupts disabled
  */
-static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, int state,
+static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, unsigned int state,
 				       struct hrtimer_sleeper *timeout,
 				       struct rt_mutex_waiter *waiter)
 {
@@ -1190,7 +1190,7 @@ static void __sched rt_mutex_handle_dead
 /*
  * Slow path lock function:
  */
-static int __sched rt_mutex_slowlock(struct rt_mutex *lock, int state,
+static int __sched rt_mutex_slowlock(struct rt_mutex *lock, unsigned int state,
 				     struct hrtimer_sleeper *timeout,
 				     enum rtmutex_chainwalk chwalk)
 {
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -889,7 +889,7 @@ rwsem_spin_on_owner(struct rw_semaphore
  * Wait for the read lock to be granted
  */
 static struct rw_semaphore __sched *
-rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, int state)
+rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int state)
 {
 	long adjustment = -RWSEM_READER_BIAS;
 	long rcnt = (count >> RWSEM_READER_SHIFT);
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		task->state = __TASK_TRACED;
+		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,7 +207,7 @@ static bool ptrace_freeze_traced(struct
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (task->state != __TASK_TRACED)
+	if (READ_ONCE(task->__state) != __TASK_TRACED)
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -217,11 +217,11 @@ static void ptrace_unfreeze_traced(struc
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (task->state == __TASK_TRACED) {
+	if (READ_ONCE(task->__state) == __TASK_TRACED) {
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
 		else
-			task->state = TASK_TRACED;
+			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -256,7 +256,7 @@ static int ptrace_check_attach(struct ta
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(child->state == __TASK_TRACED);
+		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
@@ -273,7 +273,7 @@ static int ptrace_check_attach(struct ta
 			 * ptrace_stop() changes ->state back to TASK_RUNNING,
 			 * so we should not worry about leaking __TASK_TRACED.
 			 */
-			WARN_ON(child->state == __TASK_TRACED);
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 			ret = -ESRCH;
 		}
 	}
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1831,10 +1831,10 @@ rcu_torture_stats_print(void)
 		srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
 					&flags, &gp_seq);
 		wtp = READ_ONCE(writer_task);
-		pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#lx cpu %d\n",
+		pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n",
 			 rcu_torture_writer_state_getname(),
 			 rcu_torture_writer_state, gp_seq, flags,
-			 wtp == NULL ? ~0UL : wtp->state,
+			 wtp == NULL ? ~0U : wtp->__state,
 			 wtp == NULL ? -1 : (int)task_cpu(wtp));
 		if (!splatted && wtp) {
 			sched_show_task(wtp);
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -460,12 +460,12 @@ static void rcu_check_gp_kthread_starvat
 
 	if (rcu_is_gp_kthread_starving(&j)) {
 		cpu = gpk ? task_cpu(gpk) : -1;
-		pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
+		pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x ->cpu=%d\n",
 		       rcu_state.name, j,
 		       (long)rcu_seq_current(&rcu_state.gp_seq),
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
-		       gpk ? gpk->state : ~0, cpu);
+		       gpk ? gpk->__state : ~0, cpu);
 		if (gpk) {
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
@@ -503,12 +503,12 @@ static void rcu_check_gp_kthread_expired
 	    time_after(jiffies, jiffies_fqs + RCU_STALL_MIGHT_MIN) &&
 	    gpk && !READ_ONCE(gpk->on_rq)) {
 		cpu = task_cpu(gpk);
-		pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx\n",
+		pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x\n",
 		       rcu_state.name, (jiffies - jiffies_fqs),
 		       (long)rcu_seq_current(&rcu_state.gp_seq),
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(RCU_GP_WAIT_FQS), RCU_GP_WAIT_FQS,
-		       gpk->state);
+		       gpk->__state);
 		pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%u\n",
 		       cpu, kstat_softirqs_cpu(TIMER_SOFTIRQ, cpu));
 	}
@@ -735,9 +735,9 @@ void show_rcu_gp_kthreads(void)
 	ja = j - data_race(rcu_state.gp_activity);
 	jr = j - data_race(rcu_state.gp_req_activity);
 	jw = j - data_race(rcu_state.gp_wake_time);
-	pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+	pr_info("%s: wait state: %s(%d) ->state: %#x delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
 		rcu_state.name, gp_state_getname(rcu_state.gp_state),
-		rcu_state.gp_state, t ? t->state : 0x1ffffL,
+		rcu_state.gp_state, t ? t->__state : 0x1ffff,
 		ja, jr, jw, (long)data_race(rcu_state.gp_wake_seq),
 		(long)data_race(rcu_state.gp_seq),
 		(long)data_race(rcu_get_root()->gp_seq_needed),
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2643,7 +2643,7 @@ static int affine_move_task(struct rq *r
 		return -EINVAL;
 	}
 
-	if (task_running(rq, p) || p->state == TASK_WAKING) {
+	if (task_running(rq, p) || READ_ONCE(p->__state) == TASK_WAKING) {
 		/*
 		 * MIGRATE_ENABLE gets here because 'p == current', but for
 		 * anything else we cannot do is_migration_disabled(), punt
@@ -2786,19 +2786,20 @@ EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
+	unsigned int state = READ_ONCE(p->__state);
+
 	/*
 	 * We should never call set_task_cpu() on a blocked task,
 	 * ttwu() will sort out the placement.
 	 */
-	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-			!p->on_rq);
+	WARN_ON_ONCE(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);
 
 	/*
 	 * Migrating fair class task must have p->on_rq = TASK_ON_RQ_MIGRATING,
 	 * because schedstat_wait_{start,end} rebase migrating task's wait_start
 	 * time relying on p->on_rq.
 	 */
-	WARN_ON_ONCE(p->state == TASK_RUNNING &&
+	WARN_ON_ONCE(state == TASK_RUNNING &&
 		     p->sched_class == &fair_sched_class &&
 		     (p->on_rq && !task_on_rq_migrating(p)));
 
@@ -2970,7 +2971,7 @@ int migrate_swap(struct task_struct *cur
  * smp_call_function() if an IPI is sent by the same process we are
  * waiting to become inactive.
  */
-unsigned long wait_task_inactive(struct task_struct *p, long match_state)
+unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
 	int running, queued;
 	struct rq_flags rf;
@@ -2998,7 +2999,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(p->state != match_state))
+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3013,7 +3014,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || p->state == match_state)
+		if (!match_state || READ_ONCE(p->__state) == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 
@@ -3322,7 +3323,7 @@ static void ttwu_do_wakeup(struct rq *rq
 			   struct rq_flags *rf)
 {
 	check_preempt_curr(rq, p, wake_flags);
-	p->state = TASK_RUNNING;
+	WRITE_ONCE(p->__state, TASK_RUNNING);
 	trace_sched_wakeup(p);
 
 #ifdef CONFIG_SMP
@@ -3711,12 +3712,12 @@ try_to_wake_up(struct task_struct *p, un
 		 *  - we're serialized against set_special_state() by virtue of
 		 *    it disabling IRQs (this allows not taking ->pi_lock).
 		 */
-		if (!(p->state & state))
+		if (!(READ_ONCE(p->__state) & state))
 			goto out;
 
 		success = 1;
 		trace_sched_waking(p);
-		p->state = TASK_RUNNING;
+		WRITE_ONCE(p->__state, TASK_RUNNING);
 		trace_sched_wakeup(p);
 		goto out;
 	}
@@ -3729,7 +3730,7 @@ try_to_wake_up(struct task_struct *p, un
 	 */
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	smp_mb__after_spinlock();
-	if (!(p->state & state))
+	if (!(READ_ONCE(p->__state) & state))
 		goto unlock;
 
 	trace_sched_waking(p);
@@ -3795,7 +3796,7 @@ try_to_wake_up(struct task_struct *p, un
 	 * TASK_WAKING such that we can unlock p->pi_lock before doing the
 	 * enqueue, such as ttwu_queue_wakelist().
 	 */
-	p->state = TASK_WAKING;
+	WRITE_ONCE(p->__state, TASK_WAKING);
 
 	/*
 	 * If the owning (remote) CPU is still in the middle of schedule() with
@@ -3888,7 +3889,7 @@ bool try_invoke_on_locked_down_task(stru
 			ret = func(p, arg);
 		rq_unlock(rq, &rf);
 	} else {
-		switch (p->state) {
+		switch (READ_ONCE(p->__state)) {
 		case TASK_RUNNING:
 		case TASK_WAKING:
 			break;
@@ -4101,7 +4102,7 @@ int sched_fork(unsigned long clone_flags
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->state = TASK_NEW;
+	p->__state = TASK_NEW;
 
 	/*
 	 * Make sure we do not leak PI boosting priority to the child.
@@ -4207,7 +4208,7 @@ void wake_up_new_task(struct task_struct
 	struct rq *rq;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
-	p->state = TASK_RUNNING;
+	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
@@ -4569,7 +4570,7 @@ static struct rq *finish_task_switch(str
 	 * running on another CPU and we could rave with its RUNNING -> DEAD
 	 * transition, resulting in a double drop.
 	 */
-	prev_state = prev->state;
+	prev_state = READ_ONCE(prev->__state);
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
 	finish_task(prev);
@@ -5263,7 +5264,7 @@ static inline void schedule_debug(struct
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-	if (!preempt && prev->state && prev->non_block_count) {
+	if (!preempt && READ_ONCE(prev->__state) && prev->non_block_count) {
 		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
 			prev->comm, prev->pid, prev->non_block_count);
 		dump_stack();
@@ -5889,10 +5890,10 @@ static void __sched notrace __schedule(b
 	 *  - we form a control dependency vs deactivate_task() below.
 	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
 	 */
-	prev_state = prev->state;
+	prev_state = READ_ONCE(prev->__state);
 	if (!preempt && prev_state) {
 		if (signal_pending_state(prev_state, prev)) {
-			prev->state = TASK_RUNNING;
+			WRITE_ONCE(prev->__state, TASK_RUNNING);
 		} else {
 			prev->sched_contributes_to_load =
 				(prev_state & TASK_UNINTERRUPTIBLE) &&
@@ -6064,7 +6065,7 @@ void __sched schedule_idle(void)
 	 * current task can be in any other state. Note, idle is always in the
 	 * TASK_RUNNING state.
 	 */
-	WARN_ON_ONCE(current->state);
+	WARN_ON_ONCE(current->__state);
 	do {
 		__schedule(false);
 	} while (need_resched());
@@ -8191,26 +8192,28 @@ EXPORT_SYMBOL_GPL(sched_show_task);
 static inline bool
 state_filter_match(unsigned long state_filter, struct task_struct *p)
 {
+	unsigned int state = READ_ONCE(p->__state);
+
 	/* no filter, everything matches */
 	if (!state_filter)
 		return true;
 
 	/* filter, but doesn't match */
-	if (!(p->state & state_filter))
+	if (!(state & state_filter))
 		return false;
 
 	/*
 	 * When looking for TASK_UNINTERRUPTIBLE skip TASK_IDLE (allows
 	 * TASK_KILLABLE).
 	 */
-	if (state_filter == TASK_UNINTERRUPTIBLE && p->state == TASK_IDLE)
+	if (state_filter == TASK_UNINTERRUPTIBLE && state == TASK_IDLE)
 		return false;
 
 	return true;
 }
 
 
-void show_state_filter(unsigned long state_filter)
+void show_state_filter(unsigned int state_filter)
 {
 	struct task_struct *g, *p;
 
@@ -8267,7 +8270,7 @@ void __init init_idle(struct task_struct
 	raw_spin_lock_irqsave(&idle->pi_lock, flags);
 	raw_spin_rq_lock(rq);
 
-	idle->state = TASK_RUNNING;
+	idle->__state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
 	/*
 	 * PF_KTHREAD should already be set at this point; regardless, make it
@@ -9582,7 +9585,7 @@ static int cpu_cgroup_can_attach(struct
 		 * has happened. This would lead to problems with PELT, due to
 		 * move wanting to detach+attach while we're not attached yet.
 		 */
-		if (task->state == TASK_NEW)
+		if (READ_ONCE(task->__state) == TASK_NEW)
 			ret = -EINVAL;
 		raw_spin_unlock_irq(&task->pi_lock);
 
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -348,10 +348,10 @@ static void task_non_contending(struct t
 	if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) {
 		if (dl_task(p))
 			sub_running_bw(dl_se, dl_rq);
-		if (!dl_task(p) || p->state == TASK_DEAD) {
+		if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) {
 			struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 
-			if (p->state == TASK_DEAD)
+			if (READ_ONCE(p->__state) == TASK_DEAD)
 				sub_rq_bw(&p->dl, &rq->dl);
 			raw_spin_lock(&dl_b->lock);
 			__dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
@@ -1355,10 +1355,10 @@ static enum hrtimer_restart inactive_tas
 	sched_clock_tick();
 	update_rq_clock(rq);
 
-	if (!dl_task(p) || p->state == TASK_DEAD) {
+	if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) {
 		struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 
-		if (p->state == TASK_DEAD && dl_se->dl_non_contending) {
+		if (READ_ONCE(p->__state) == TASK_DEAD && dl_se->dl_non_contending) {
 			sub_running_bw(&p->dl, dl_rq_of_se(&p->dl));
 			sub_rq_bw(&p->dl, dl_rq_of_se(&p->dl));
 			dl_se->dl_non_contending = 0;
@@ -1722,7 +1722,7 @@ static void migrate_task_rq_dl(struct ta
 {
 	struct rq *rq;
 
-	if (p->state != TASK_WAKING)
+	if (READ_ONCE(p->__state) != TASK_WAKING)
 		return;
 
 	rq = task_rq(p);
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -993,11 +993,14 @@ update_stats_dequeue(struct cfs_rq *cfs_
 
 	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
 		struct task_struct *tsk = task_of(se);
+		unsigned int state;
 
-		if (tsk->state & TASK_INTERRUPTIBLE)
+		/* XXX racy against TTWU */
+		state = READ_ONCE(tsk->__state);
+		if (state & TASK_INTERRUPTIBLE)
 			__schedstat_set(se->statistics.sleep_start,
 				      rq_clock(rq_of(cfs_rq)));
-		if (tsk->state & TASK_UNINTERRUPTIBLE)
+		if (state & TASK_UNINTERRUPTIBLE)
 			__schedstat_set(se->statistics.block_start,
 				      rq_clock(rq_of(cfs_rq)));
 	}
@@ -6830,7 +6833,7 @@ static void migrate_task_rq_fair(struct
 	 * min_vruntime -- the latter is done by enqueue_entity() when placing
 	 * the task on the new runqueue.
 	 */
-	if (p->state == TASK_WAKING) {
+	if (READ_ONCE(p->__state) == TASK_WAKING) {
 		struct sched_entity *se = &p->se;
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 		u64 min_vruntime;
@@ -11012,7 +11015,7 @@ static inline bool vruntime_normalized(s
 	 *   waiting for actually being woken up by sched_ttwu_pending().
 	 */
 	if (!se->sum_exec_runtime ||
-	    (p->state == TASK_WAKING && p->sched_remote_wakeup))
+	    (READ_ONCE(p->__state) == TASK_WAKING && p->sched_remote_wakeup))
 		return true;
 
 	return false;
--- a/lib/syscall.c
+++ b/lib/syscall.c
@@ -68,13 +68,13 @@ static int collect_syscall(struct task_s
  */
 int task_current_syscall(struct task_struct *target, struct syscall_info *info)
 {
-	long state;
 	unsigned long ncsw;
+	unsigned int state;
 
 	if (target == current)
 		return collect_syscall(target, info);
 
-	state = target->state;
+	state = READ_ONCE(target->__state);
 	if (unlikely(!state))
 		return -EAGAIN;
 
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4363,7 +4363,7 @@ static inline void ____napi_schedule(str
 			 * makes sure to proceed with napi polling
 			 * if the thread is explicitly woken from here.
 			 */
-			if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
+			if (READ_ONCE(thread->__state) != TASK_INTERRUPTIBLE)
 				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
 			wake_up_process(thread);
 			return;



^ permalink raw reply	[flat|nested] 64+ messages in thread

* [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-02 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 13:12 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Mark Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Jakub Kicinski, John Stultz,
	Paul E. McKenney, linux-pm, Boqun Feng, Johannes Weiner,
	Josh Triplett, Arnaldo Carvalho de Melo, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Jens Axboe, Felipe Balbi, Stephen Boyd,
	Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Jason Wessel,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 block/blk-mq.c                 |    2 -
 drivers/md/dm.c                |    6 ++--
 fs/binfmt_elf.c                |    8 +++---
 fs/userfaultfd.c               |    4 +--
 include/linux/sched.h          |   31 +++++++++++------------
 include/linux/sched/debug.h    |    2 -
 include/linux/sched/signal.h   |    2 -
 init/init_task.c               |    2 -
 kernel/cgroup/cgroup-v1.c      |    2 -
 kernel/debug/kdb/kdb_support.c |   18 +++++++------
 kernel/fork.c                  |    4 +--
 kernel/hung_task.c             |    2 -
 kernel/kthread.c               |    4 +--
 kernel/locking/mutex.c         |    6 ++--
 kernel/locking/rtmutex.c       |    4 +--
 kernel/locking/rwsem.c         |    2 -
 kernel/ptrace.c                |   12 ++++-----
 kernel/rcu/rcutorture.c        |    4 +--
 kernel/rcu/tree_stall.h        |   12 ++++-----
 kernel/sched/core.c            |   53 +++++++++++++++++++++--------------------
 kernel/sched/deadline.c        |   10 +++----
 kernel/sched/fair.c            |   11 +++++---
 lib/syscall.c                  |    4 +--
 net/core/dev.c                 |    2 -
 24 files changed, 108 insertions(+), 99 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3886,7 +3886,7 @@ static bool blk_mq_poll_hybrid(struct re
 int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
-	long state;
+	unsigned int state;
 
 	if (!blk_qc_t_valid(cookie) ||
 	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2328,7 +2328,7 @@ static bool md_in_flight_bios(struct map
 	return sum != 0;
 }
 
-static int dm_wait_for_bios_completion(struct mapped_device *md, long task_state)
+static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int task_state)
 {
 	int r = 0;
 	DEFINE_WAIT(wait);
@@ -2351,7 +2351,7 @@ static int dm_wait_for_bios_completion(s
 	return r;
 }
 
-static int dm_wait_for_completion(struct mapped_device *md, long task_state)
+static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state)
 {
 	int r = 0;
 
@@ -2478,7 +2478,7 @@ static void unlock_fs(struct mapped_devi
  * are being added to md->deferred list.
  */
 static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
-			unsigned suspend_flags, long task_state,
+			unsigned suspend_flags, unsigned int task_state,
 			int dmf_suspended_flag)
 {
 	bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1537,7 +1537,8 @@ static int fill_psinfo(struct elf_prpsin
 {
 	const struct cred *cred;
 	unsigned int i, len;
-	
+	unsigned int state;
+
 	/* first copy the parameters from user space */
 	memset(psinfo, 0, sizeof(struct elf_prpsinfo));
 
@@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
 	psinfo->pr_pgrp = task_pgrp_vnr(p);
 	psinfo->pr_sid = task_session_vnr(p);
 
-	i = p->state ? ffz(~p->state) + 1 : 0;
+	state = READ_ONCE(p->__state);
+	i = state ? ffz(~state) + 1 : 0;
 	psinfo->pr_state = i;
 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';
@@ -1571,7 +1573,7 @@ static int fill_psinfo(struct elf_prpsin
 	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
-	
+
 	return 0;
 }
 
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -337,7 +337,7 @@ static inline bool userfaultfd_must_wait
 	return ret;
 }
 
-static inline long userfaultfd_get_blocking_state(unsigned int flags)
+static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
 {
 	if (flags & FAULT_FLAG_INTERRUPTIBLE)
 		return TASK_INTERRUPTIBLE;
@@ -370,7 +370,7 @@ vm_fault_t handle_userfault(struct vm_fa
 	struct userfaultfd_wait_queue uwq;
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	bool must_wait;
-	long blocking_state;
+	unsigned int blocking_state;
 
 	/*
 	 * We don't do userfault handling for the final child pid update.
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,13 +113,13 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
-#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
+#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
 
-#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
 
-#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
@@ -134,14 +134,14 @@ struct task_group;
 	do {							\
 		WARN_ON_ONCE(is_special_task_state(state_value));\
 		current->task_state_change = _THIS_IP_;		\
-		current->state = (state_value);			\
+		WRITE_ONCE(current->__state, (state_value));	\
 	} while (0)
 
 #define set_current_state(state_value)				\
 	do {							\
 		WARN_ON_ONCE(is_special_task_state(state_value));\
 		current->task_state_change = _THIS_IP_;		\
-		smp_store_mb(current->state, (state_value));	\
+		smp_store_mb(current->__state, (state_value));	\
 	} while (0)
 
 #define set_special_state(state_value)					\
@@ -150,7 +150,7 @@ struct task_group;
 		WARN_ON_ONCE(!is_special_task_state(state_value));	\
 		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		current->task_state_change = _THIS_IP_;			\
-		current->state = (state_value);				\
+		WRITE_ONCE(current->__state, (state_value));		\
 		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0)
 #else
@@ -192,10 +192,10 @@ struct task_group;
  * Also see the comments of try_to_wake_up().
  */
 #define __set_current_state(state_value)				\
-	current->state = (state_value)
+	WRITE_ONCE(current->__state, (state_value))
 
 #define set_current_state(state_value)					\
-	smp_store_mb(current->state, (state_value))
+	smp_store_mb(current->__state, (state_value))
 
 /*
  * set_special_state() should be used for those states when the blocking task
@@ -207,13 +207,13 @@ struct task_group;
 	do {								\
 		unsigned long flags; /* may shadow */			\
 		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
-		current->state = (state_value);				\
+		WRITE_ONCE(current->__state, (state_value));		\
 		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0)
 
 #endif
 
-#define get_current_state()	READ_ONCE(current->state)
+#define get_current_state()	READ_ONCE(current->__state)
 
 /* Task command name length: */
 #define TASK_COMM_LEN			16
@@ -658,8 +658,7 @@ struct task_struct {
 	 */
 	struct thread_info		thread_info;
 #endif
-	/* -1 unrunnable, 0 runnable, >0 stopped: */
-	volatile long			state;
+	unsigned int			__state;
 
 	/*
 	 * This begins the randomizable portion of task_struct. Only
@@ -1524,7 +1523,7 @@ static inline pid_t task_pgrp_nr(struct
 
 static inline unsigned int task_state_index(struct task_struct *tsk)
 {
-	unsigned int tsk_state = READ_ONCE(tsk->state);
+	unsigned int tsk_state = READ_ONCE(tsk->__state);
 	unsigned int state = (tsk_state | tsk->exit_state) & TASK_REPORT;
 
 	BUILD_BUG_ON_NOT_POWER_OF_2(TASK_REPORT_MAX);
@@ -1832,10 +1831,10 @@ static __always_inline void scheduler_ip
 	 */
 	preempt_fold_need_resched();
 }
-extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
+extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state);
 #else
 static inline void scheduler_ipi(void) { }
-static inline unsigned long wait_task_inactive(struct task_struct *p, long match_state)
+static inline unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
 	return 1;
 }
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -14,7 +14,7 @@ extern void dump_cpu_task(int cpu);
 /*
  * Only dump TASK_* tasks. (0 for all tasks)
  */
-extern void show_state_filter(unsigned long state_filter);
+extern void show_state_filter(unsigned int state_filter);
 
 static inline void show_state(void)
 {
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -382,7 +382,7 @@ static inline int fatal_signal_pending(s
 	return task_sigpending(p) && __fatal_signal_pending(p);
 }
 
-static inline int signal_pending_state(long state, struct task_struct *p)
+static inline int signal_pending_state(unsigned int state, struct task_struct *p)
 {
 	if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
 		return 0;
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -71,7 +71,7 @@ struct task_struct init_task
 	.thread_info	= INIT_THREAD_INFO(init_task),
 	.stack_refcount	= REFCOUNT_INIT(1),
 #endif
-	.state		= 0,
+	.__state	= 0,
 	.stack		= init_stack,
 	.usage		= REFCOUNT_INIT(2),
 	.flags		= PF_KTHREAD,
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -713,7 +713,7 @@ int cgroupstats_build(struct cgroupstats
 
 	css_task_iter_start(&cgrp->self, 0, &it);
 	while ((tsk = css_task_iter_next(&it))) {
-		switch (tsk->state) {
+		switch (READ_ONCE(tsk->__state)) {
 		case TASK_RUNNING:
 			stats->nr_running++;
 			break;
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
  */
 char kdb_task_state_char (const struct task_struct *p)
 {
-	int cpu;
-	char state;
+	unsigned int p_state;
 	unsigned long tmp;
+	char state;
+	int cpu;
 
 	if (!p ||
 	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
 		return 'E';
 
 	cpu = kdb_process_cpu(p);
-	state = (p->state == 0) ? 'R' :
-		(p->state < 0) ? 'U' :
-		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
-		(p->state & TASK_STOPPED) ? 'T' :
-		(p->state & TASK_TRACED) ? 'C' :
+	p_state = READ_ONCE(p->__state);
+	state = (p_state == 0) ? 'R' :
+		(p_state < 0) ? 'U' :
+		(p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
+		(p_state & TASK_STOPPED) ? 'T' :
+		(p_state & TASK_TRACED) ? 'C' :
 		(p->exit_state & EXIT_ZOMBIE) ? 'Z' :
 		(p->exit_state & EXIT_DEAD) ? 'E' :
-		(p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
+		(p_state & TASK_INTERRUPTIBLE) ? 'S' : '?';
 	if (is_idle_task(p)) {
 		/* Idle task.  Is it really idle, apart from the kdb
 		 * interrupt? */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -425,7 +425,7 @@ static int memcg_charge_kernel_stack(str
 
 static void release_task_stack(struct task_struct *tsk)
 {
-	if (WARN_ON(tsk->state != TASK_DEAD))
+	if (WARN_ON(READ_ONCE(tsk->__state) != TASK_DEAD))
 		return;  /* Better to leak the stack than to free prematurely */
 
 	account_kernel_stack(tsk, -1);
@@ -2392,7 +2392,7 @@ static __latent_entropy struct task_stru
 	atomic_dec(&p->cred->user->processes);
 	exit_creds(p);
 bad_fork_free:
-	p->state = TASK_DEAD;
+	WRITE_ONCE(p->__state, TASK_DEAD);
 	put_task_stack(p);
 	delayed_free_task(p);
 fork_out:
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -196,7 +196,7 @@ static void check_hung_uninterruptible_t
 			last_break = jiffies;
 		}
 		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
-		if (t->state == TASK_UNINTERRUPTIBLE)
+		if (READ_ONCE(t->__state) == TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, timeout);
 	}
  unlock:
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -457,7 +457,7 @@ struct task_struct *kthread_create_on_no
 }
 EXPORT_SYMBOL(kthread_create_on_node);
 
-static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
+static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, unsigned int state)
 {
 	unsigned long flags;
 
@@ -473,7 +473,7 @@ static void __kthread_bind_mask(struct t
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 }
 
-static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
+static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int state)
 {
 	__kthread_bind_mask(p, cpumask_of(cpu), state);
 }
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -923,7 +923,7 @@ __ww_mutex_add_waiter(struct mutex_waite
  * Lock a mutex (possibly interruptible), slowpath:
  */
 static __always_inline int __sched
-__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
+__mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclass,
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
@@ -1098,14 +1098,14 @@ __mutex_lock_common(struct mutex *lock,
 }
 
 static int __sched
-__mutex_lock(struct mutex *lock, long state, unsigned int subclass,
+__mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 	     struct lockdep_map *nest_lock, unsigned long ip)
 {
 	return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false);
 }
 
 static int __sched
-__ww_mutex_lock(struct mutex *lock, long state, unsigned int subclass,
+__ww_mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
 		struct lockdep_map *nest_lock, unsigned long ip,
 		struct ww_acquire_ctx *ww_ctx)
 {
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1135,7 +1135,7 @@ void __sched rt_mutex_init_waiter(struct
  *
  * Must be called with lock->wait_lock held and interrupts disabled
  */
-static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, int state,
+static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, unsigned int state,
 				       struct hrtimer_sleeper *timeout,
 				       struct rt_mutex_waiter *waiter)
 {
@@ -1190,7 +1190,7 @@ static void __sched rt_mutex_handle_dead
 /*
  * Slow path lock function:
  */
-static int __sched rt_mutex_slowlock(struct rt_mutex *lock, int state,
+static int __sched rt_mutex_slowlock(struct rt_mutex *lock, unsigned int state,
 				     struct hrtimer_sleeper *timeout,
 				     enum rtmutex_chainwalk chwalk)
 {
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -889,7 +889,7 @@ rwsem_spin_on_owner(struct rw_semaphore
  * Wait for the read lock to be granted
  */
 static struct rw_semaphore __sched *
-rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, int state)
+rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int state)
 {
 	long adjustment = -RWSEM_READER_BIAS;
 	long rcnt = (count >> RWSEM_READER_SHIFT);
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		task->state = __TASK_TRACED;
+		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,7 +207,7 @@ static bool ptrace_freeze_traced(struct
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (task->state != __TASK_TRACED)
+	if (READ_ONCE(task->__state) != __TASK_TRACED)
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -217,11 +217,11 @@ static void ptrace_unfreeze_traced(struc
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (task->state == __TASK_TRACED) {
+	if (READ_ONCE(task->__state) == __TASK_TRACED) {
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
 		else
-			task->state = TASK_TRACED;
+			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -256,7 +256,7 @@ static int ptrace_check_attach(struct ta
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(child->state == __TASK_TRACED);
+		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
@@ -273,7 +273,7 @@ static int ptrace_check_attach(struct ta
 			 * ptrace_stop() changes ->state back to TASK_RUNNING,
 			 * so we should not worry about leaking __TASK_TRACED.
 			 */
-			WARN_ON(child->state == __TASK_TRACED);
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 			ret = -ESRCH;
 		}
 	}
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1831,10 +1831,10 @@ rcu_torture_stats_print(void)
 		srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
 					&flags, &gp_seq);
 		wtp = READ_ONCE(writer_task);
-		pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#lx cpu %d\n",
+		pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n",
 			 rcu_torture_writer_state_getname(),
 			 rcu_torture_writer_state, gp_seq, flags,
-			 wtp == NULL ? ~0UL : wtp->state,
+			 wtp == NULL ? ~0U : wtp->__state,
 			 wtp == NULL ? -1 : (int)task_cpu(wtp));
 		if (!splatted && wtp) {
 			sched_show_task(wtp);
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -460,12 +460,12 @@ static void rcu_check_gp_kthread_starvat
 
 	if (rcu_is_gp_kthread_starving(&j)) {
 		cpu = gpk ? task_cpu(gpk) : -1;
-		pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
+		pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x ->cpu=%d\n",
 		       rcu_state.name, j,
 		       (long)rcu_seq_current(&rcu_state.gp_seq),
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
-		       gpk ? gpk->state : ~0, cpu);
+		       gpk ? gpk->__state : ~0, cpu);
 		if (gpk) {
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
@@ -503,12 +503,12 @@ static void rcu_check_gp_kthread_expired
 	    time_after(jiffies, jiffies_fqs + RCU_STALL_MIGHT_MIN) &&
 	    gpk && !READ_ONCE(gpk->on_rq)) {
 		cpu = task_cpu(gpk);
-		pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx\n",
+		pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x\n",
 		       rcu_state.name, (jiffies - jiffies_fqs),
 		       (long)rcu_seq_current(&rcu_state.gp_seq),
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(RCU_GP_WAIT_FQS), RCU_GP_WAIT_FQS,
-		       gpk->state);
+		       gpk->__state);
 		pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%u\n",
 		       cpu, kstat_softirqs_cpu(TIMER_SOFTIRQ, cpu));
 	}
@@ -735,9 +735,9 @@ void show_rcu_gp_kthreads(void)
 	ja = j - data_race(rcu_state.gp_activity);
 	jr = j - data_race(rcu_state.gp_req_activity);
 	jw = j - data_race(rcu_state.gp_wake_time);
-	pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
+	pr_info("%s: wait state: %s(%d) ->state: %#x delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
 		rcu_state.name, gp_state_getname(rcu_state.gp_state),
-		rcu_state.gp_state, t ? t->state : 0x1ffffL,
+		rcu_state.gp_state, t ? t->__state : 0x1ffff,
 		ja, jr, jw, (long)data_race(rcu_state.gp_wake_seq),
 		(long)data_race(rcu_state.gp_seq),
 		(long)data_race(rcu_get_root()->gp_seq_needed),
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2643,7 +2643,7 @@ static int affine_move_task(struct rq *r
 		return -EINVAL;
 	}
 
-	if (task_running(rq, p) || p->state == TASK_WAKING) {
+	if (task_running(rq, p) || READ_ONCE(p->__state) == TASK_WAKING) {
 		/*
 		 * MIGRATE_ENABLE gets here because 'p == current', but for
 		 * anything else we cannot do is_migration_disabled(), punt
@@ -2786,19 +2786,20 @@ EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
+	unsigned int state = READ_ONCE(p->__state);
+
 	/*
 	 * We should never call set_task_cpu() on a blocked task,
 	 * ttwu() will sort out the placement.
 	 */
-	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-			!p->on_rq);
+	WARN_ON_ONCE(state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq);
 
 	/*
 	 * Migrating fair class task must have p->on_rq = TASK_ON_RQ_MIGRATING,
 	 * because schedstat_wait_{start,end} rebase migrating task's wait_start
 	 * time relying on p->on_rq.
 	 */
-	WARN_ON_ONCE(p->state == TASK_RUNNING &&
+	WARN_ON_ONCE(state == TASK_RUNNING &&
 		     p->sched_class == &fair_sched_class &&
 		     (p->on_rq && !task_on_rq_migrating(p)));
 
@@ -2970,7 +2971,7 @@ int migrate_swap(struct task_struct *cur
  * smp_call_function() if an IPI is sent by the same process we are
  * waiting to become inactive.
  */
-unsigned long wait_task_inactive(struct task_struct *p, long match_state)
+unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
 {
 	int running, queued;
 	struct rq_flags rf;
@@ -2998,7 +2999,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(p->state != match_state))
+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3013,7 +3014,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || p->state == match_state)
+		if (!match_state || READ_ONCE(p->__state) == match_state)
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 
@@ -3322,7 +3323,7 @@ static void ttwu_do_wakeup(struct rq *rq
 			   struct rq_flags *rf)
 {
 	check_preempt_curr(rq, p, wake_flags);
-	p->state = TASK_RUNNING;
+	WRITE_ONCE(p->__state, TASK_RUNNING);
 	trace_sched_wakeup(p);
 
 #ifdef CONFIG_SMP
@@ -3711,12 +3712,12 @@ try_to_wake_up(struct task_struct *p, un
 		 *  - we're serialized against set_special_state() by virtue of
 		 *    it disabling IRQs (this allows not taking ->pi_lock).
 		 */
-		if (!(p->state & state))
+		if (!(READ_ONCE(p->__state) & state))
 			goto out;
 
 		success = 1;
 		trace_sched_waking(p);
-		p->state = TASK_RUNNING;
+		WRITE_ONCE(p->__state, TASK_RUNNING);
 		trace_sched_wakeup(p);
 		goto out;
 	}
@@ -3729,7 +3730,7 @@ try_to_wake_up(struct task_struct *p, un
 	 */
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	smp_mb__after_spinlock();
-	if (!(p->state & state))
+	if (!(READ_ONCE(p->__state) & state))
 		goto unlock;
 
 	trace_sched_waking(p);
@@ -3795,7 +3796,7 @@ try_to_wake_up(struct task_struct *p, un
 	 * TASK_WAKING such that we can unlock p->pi_lock before doing the
 	 * enqueue, such as ttwu_queue_wakelist().
 	 */
-	p->state = TASK_WAKING;
+	WRITE_ONCE(p->__state, TASK_WAKING);
 
 	/*
 	 * If the owning (remote) CPU is still in the middle of schedule() with
@@ -3888,7 +3889,7 @@ bool try_invoke_on_locked_down_task(stru
 			ret = func(p, arg);
 		rq_unlock(rq, &rf);
 	} else {
-		switch (p->state) {
+		switch (READ_ONCE(p->__state)) {
 		case TASK_RUNNING:
 		case TASK_WAKING:
 			break;
@@ -4101,7 +4102,7 @@ int sched_fork(unsigned long clone_flags
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->state = TASK_NEW;
+	p->__state = TASK_NEW;
 
 	/*
 	 * Make sure we do not leak PI boosting priority to the child.
@@ -4207,7 +4208,7 @@ void wake_up_new_task(struct task_struct
 	struct rq *rq;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
-	p->state = TASK_RUNNING;
+	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
@@ -4569,7 +4570,7 @@ static struct rq *finish_task_switch(str
 	 * running on another CPU and we could rave with its RUNNING -> DEAD
 	 * transition, resulting in a double drop.
 	 */
-	prev_state = prev->state;
+	prev_state = READ_ONCE(prev->__state);
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
 	finish_task(prev);
@@ -5263,7 +5264,7 @@ static inline void schedule_debug(struct
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-	if (!preempt && prev->state && prev->non_block_count) {
+	if (!preempt && READ_ONCE(prev->__state) && prev->non_block_count) {
 		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
 			prev->comm, prev->pid, prev->non_block_count);
 		dump_stack();
@@ -5889,10 +5890,10 @@ static void __sched notrace __schedule(b
 	 *  - we form a control dependency vs deactivate_task() below.
 	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
 	 */
-	prev_state = prev->state;
+	prev_state = READ_ONCE(prev->__state);
 	if (!preempt && prev_state) {
 		if (signal_pending_state(prev_state, prev)) {
-			prev->state = TASK_RUNNING;
+			WRITE_ONCE(prev->__state, TASK_RUNNING);
 		} else {
 			prev->sched_contributes_to_load =
 				(prev_state & TASK_UNINTERRUPTIBLE) &&
@@ -6064,7 +6065,7 @@ void __sched schedule_idle(void)
 	 * current task can be in any other state. Note, idle is always in the
 	 * TASK_RUNNING state.
 	 */
-	WARN_ON_ONCE(current->state);
+	WARN_ON_ONCE(current->__state);
 	do {
 		__schedule(false);
 	} while (need_resched());
@@ -8191,26 +8192,28 @@ EXPORT_SYMBOL_GPL(sched_show_task);
 static inline bool
 state_filter_match(unsigned long state_filter, struct task_struct *p)
 {
+	unsigned int state = READ_ONCE(p->__state);
+
 	/* no filter, everything matches */
 	if (!state_filter)
 		return true;
 
 	/* filter, but doesn't match */
-	if (!(p->state & state_filter))
+	if (!(state & state_filter))
 		return false;
 
 	/*
 	 * When looking for TASK_UNINTERRUPTIBLE skip TASK_IDLE (allows
 	 * TASK_KILLABLE).
 	 */
-	if (state_filter == TASK_UNINTERRUPTIBLE && p->state == TASK_IDLE)
+	if (state_filter == TASK_UNINTERRUPTIBLE && state == TASK_IDLE)
 		return false;
 
 	return true;
 }
 
 
-void show_state_filter(unsigned long state_filter)
+void show_state_filter(unsigned int state_filter)
 {
 	struct task_struct *g, *p;
 
@@ -8267,7 +8270,7 @@ void __init init_idle(struct task_struct
 	raw_spin_lock_irqsave(&idle->pi_lock, flags);
 	raw_spin_rq_lock(rq);
 
-	idle->state = TASK_RUNNING;
+	idle->__state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
 	/*
 	 * PF_KTHREAD should already be set at this point; regardless, make it
@@ -9582,7 +9585,7 @@ static int cpu_cgroup_can_attach(struct
 		 * has happened. This would lead to problems with PELT, due to
 		 * move wanting to detach+attach while we're not attached yet.
 		 */
-		if (task->state == TASK_NEW)
+		if (READ_ONCE(task->__state) == TASK_NEW)
 			ret = -EINVAL;
 		raw_spin_unlock_irq(&task->pi_lock);
 
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -348,10 +348,10 @@ static void task_non_contending(struct t
 	if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) {
 		if (dl_task(p))
 			sub_running_bw(dl_se, dl_rq);
-		if (!dl_task(p) || p->state == TASK_DEAD) {
+		if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) {
 			struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 
-			if (p->state == TASK_DEAD)
+			if (READ_ONCE(p->__state) == TASK_DEAD)
 				sub_rq_bw(&p->dl, &rq->dl);
 			raw_spin_lock(&dl_b->lock);
 			__dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
@@ -1355,10 +1355,10 @@ static enum hrtimer_restart inactive_tas
 	sched_clock_tick();
 	update_rq_clock(rq);
 
-	if (!dl_task(p) || p->state == TASK_DEAD) {
+	if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) {
 		struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 
-		if (p->state == TASK_DEAD && dl_se->dl_non_contending) {
+		if (READ_ONCE(p->__state) == TASK_DEAD && dl_se->dl_non_contending) {
 			sub_running_bw(&p->dl, dl_rq_of_se(&p->dl));
 			sub_rq_bw(&p->dl, dl_rq_of_se(&p->dl));
 			dl_se->dl_non_contending = 0;
@@ -1722,7 +1722,7 @@ static void migrate_task_rq_dl(struct ta
 {
 	struct rq *rq;
 
-	if (p->state != TASK_WAKING)
+	if (READ_ONCE(p->__state) != TASK_WAKING)
 		return;
 
 	rq = task_rq(p);
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -993,11 +993,14 @@ update_stats_dequeue(struct cfs_rq *cfs_
 
 	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
 		struct task_struct *tsk = task_of(se);
+		unsigned int state;
 
-		if (tsk->state & TASK_INTERRUPTIBLE)
+		/* XXX racy against TTWU */
+		state = READ_ONCE(tsk->__state);
+		if (state & TASK_INTERRUPTIBLE)
 			__schedstat_set(se->statistics.sleep_start,
 				      rq_clock(rq_of(cfs_rq)));
-		if (tsk->state & TASK_UNINTERRUPTIBLE)
+		if (state & TASK_UNINTERRUPTIBLE)
 			__schedstat_set(se->statistics.block_start,
 				      rq_clock(rq_of(cfs_rq)));
 	}
@@ -6830,7 +6833,7 @@ static void migrate_task_rq_fair(struct
 	 * min_vruntime -- the latter is done by enqueue_entity() when placing
 	 * the task on the new runqueue.
 	 */
-	if (p->state == TASK_WAKING) {
+	if (READ_ONCE(p->__state) == TASK_WAKING) {
 		struct sched_entity *se = &p->se;
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 		u64 min_vruntime;
@@ -11012,7 +11015,7 @@ static inline bool vruntime_normalized(s
 	 *   waiting for actually being woken up by sched_ttwu_pending().
 	 */
 	if (!se->sum_exec_runtime ||
-	    (p->state == TASK_WAKING && p->sched_remote_wakeup))
+	    (READ_ONCE(p->__state) == TASK_WAKING && p->sched_remote_wakeup))
 		return true;
 
 	return false;
--- a/lib/syscall.c
+++ b/lib/syscall.c
@@ -68,13 +68,13 @@ static int collect_syscall(struct task_s
  */
 int task_current_syscall(struct task_struct *target, struct syscall_info *info)
 {
-	long state;
 	unsigned long ncsw;
+	unsigned int state;
 
 	if (target == current)
 		return collect_syscall(target, info);
 
-	state = target->state;
+	state = READ_ONCE(target->__state);
 	if (unlikely(!state))
 		return -EAGAIN;
 
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4363,7 +4363,7 @@ static inline void ____napi_schedule(str
 			 * makes sure to proceed with napi polling
 			 * if the thread is explicitly woken from here.
 			 */
-			if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
+			if (READ_ONCE(thread->__state) != TASK_INTERRUPTIBLE)
 				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
 			wake_up_process(thread);
 			return;


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/6] sched: Unbreak wakeups
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 13:58     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 64+ messages in thread
From: Greg Kroah-Hartman @ 2021-06-02 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote:
> Remove broken task->state references and let wake_up_process() DTRT.
> 
> The anti-pattern in these patches breaks the ordering of ->state vs
> COND as described in the comment near set_current_state() and can lead
> to missed wakeups:
> 
> 	(OoO load, observes RUNNING)<-.
> 	for (;;) {                    |
> 	  t->state = UNINTERRUPTIBLE; |
> 	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                              |       |
> 	                     |       |	COND = 1;
> 			     |	     `- if (t->state != RUNNING)
>                              |		  wake_up_process(t); // not done
> 	  if (COND) ---------'
> 	    break;
> 	  schedule(); // forever waiting
> 	}
> 	t->state = TASK_RUNNING;
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
>  drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
>  drivers/usb/host/max3421-hcd.c          |    3 +--
>  kernel/softirq.c                        |    2 +-
>  4 files changed, 9 insertions(+), 17 deletions(-)

For USB stuff:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 1/6] sched: Unbreak wakeups
@ 2021-06-02 13:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 64+ messages in thread
From: Greg Kroah-Hartman @ 2021-06-02 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Jiri Olsa, Alasdair Kergon,
	Daniel Thompson, Davidlohr Bueso, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, netdev, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users,
	Johannes Weiner, Tejun Heo, Mathieu Desnoyers, Andrew Morton,
	rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote:
> Remove broken task->state references and let wake_up_process() DTRT.
> 
> The anti-pattern in these patches breaks the ordering of ->state vs
> COND as described in the comment near set_current_state() and can lead
> to missed wakeups:
> 
> 	(OoO load, observes RUNNING)<-.
> 	for (;;) {                    |
> 	  t->state = UNINTERRUPTIBLE; |
> 	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                              |       |
> 	                     |       |	COND = 1;
> 			     |	     `- if (t->state != RUNNING)
>                              |		  wake_up_process(t); // not done
> 	  if (COND) ---------'
> 	    break;
> 	  schedule(); // forever waiting
> 	}
> 	t->state = TASK_RUNNING;
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
>  drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
>  drivers/usb/host/max3421-hcd.c          |    3 +--
>  kernel/softirq.c                        |    2 +-
>  4 files changed, 9 insertions(+), 17 deletions(-)

For USB stuff:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
  (?)
@ 2021-06-02 13:59     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> When ran from the sched-out path (preempt_notifier or perf_event),
> p->state is irrelevant to determine preemption. You can get preempted
> with !task_is_running() just fine.
> 
> The right indicator for preemption is if the task is still on the
> runqueue in the sched-out path.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c |    7 +++----
> virt/kvm/kvm_main.c  |    2 +-
> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {

This changes from checking task->state to current->on_rq, but this change
from "task" to "current" is not described in the commit message, which is odd.

Are we really sure that task == current here ?

Thanks,

Mathieu

> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }
> 
> /*
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
> {
> 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> 
> -	if (current->state == TASK_RUNNING) {
> +	if (current->on_rq) {
> 		WRITE_ONCE(vcpu->preempted, true);
> 		WRITE_ONCE(vcpu->ready, true);
>  	}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
@ 2021-06-02 13:59     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> When ran from the sched-out path (preempt_notifier or perf_event),
> p->state is irrelevant to determine preemption. You can get preempted
> with !task_is_running() just fine.
> 
> The right indicator for preemption is if the task is still on the
> runqueue in the sched-out path.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c |    7 +++----
> virt/kvm/kvm_main.c  |    2 +-
> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {

This changes from checking task->state to current->on_rq, but this change
from "task" to "current" is not described in the commit message, which is odd.

Are we really sure that task == current here ?

Thanks,

Mathieu

> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }
> 
> /*
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
> {
> 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> 
> -	if (current->state == TASK_RUNNING) {
> +	if (current->on_rq) {
> 		WRITE_ONCE(vcpu->preempted, true);
> 		WRITE_ONCE(vcpu->ready, true);
>  	}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition
@ 2021-06-02 13:59     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> When ran from the sched-out path (preempt_notifier or perf_event),
> p->state is irrelevant to determine preemption. You can get preempted
> with !task_is_running() just fine.
> 
> The right indicator for preemption is if the task is still on the
> runqueue in the sched-out path.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c |    7 +++----
> virt/kvm/kvm_main.c  |    2 +-
> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {

This changes from checking task->state to current->on_rq, but this change
from "task" to "current" is not described in the commit message, which is odd.

Are we really sure that task == current here ?

Thanks,

Mathieu

> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }
> 
> /*
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
> {
> 	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> 
> -	if (current->state == TASK_RUNNING) {
> +	if (current->on_rq) {
> 		WRITE_ONCE(vcpu->preempted, true);
> 		WRITE_ONCE(vcpu->ready, true);
>  	}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/6] sched: Add get_current_state()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
  (?)
@ 2021-06-02 14:01     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()	READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/6] sched: Add get_current_state()
@ 2021-06-02 14:01     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()	READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 4/6] sched: Add get_current_state()
@ 2021-06-02 14:01     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()	READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
  (?)
@ 2021-06-02 14:06     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
> 	psinfo->pr_pgrp = task_pgrp_vnr(p);
> 	psinfo->pr_sid = task_session_vnr(p);
> 
> -	i = p->state ? ffz(~p->state) + 1 : 0;
> +	state = READ_ONCE(p->__state);
> +	i = state ? ffz(~state) + 1 : 0;
> 	psinfo->pr_state = i;
> 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
> 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
> 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> 					 TASK_PARKED)
> 
> -#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
> +#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
> 
> -#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
> +#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
> 	do {							\
> 		WARN_ON_ONCE(is_special_task_state(state_value));\
> 		current->task_state_change = _THIS_IP_;		\
> -		current->state = (state_value);			\
> +		WRITE_ONCE(current->__state, (state_value));	\
> 	} while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-02 14:06     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
> 	psinfo->pr_pgrp = task_pgrp_vnr(p);
> 	psinfo->pr_sid = task_session_vnr(p);
> 
> -	i = p->state ? ffz(~p->state) + 1 : 0;
> +	state = READ_ONCE(p->__state);
> +	i = state ? ffz(~state) + 1 : 0;
> 	psinfo->pr_state = i;
> 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
> 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
> 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> 					 TASK_PARKED)
> 
> -#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
> +#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
> 
> -#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
> +#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
> 	do {							\
> 		WARN_ON_ONCE(is_special_task_state(state_value));\
> 		current->task_state_change = _THIS_IP_;		\
> -		current->state = (state_value);			\
> +		WRITE_ONCE(current->__state, (state_value));	\
> 	} while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-02 14:06     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
> 	psinfo->pr_pgrp = task_pgrp_vnr(p);
> 	psinfo->pr_sid = task_session_vnr(p);
> 
> -	i = p->state ? ffz(~p->state) + 1 : 0;
> +	state = READ_ONCE(p->__state);
> +	i = state ? ffz(~state) + 1 : 0;
> 	psinfo->pr_state = i;
> 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
> 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
> 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> 					 TASK_PARKED)
> 
> -#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
> +#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
> 
> -#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
> +#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
> 	do {							\
> 		WARN_ON_ONCE(is_special_task_state(state_value));\
> 		current->task_state_change = _THIS_IP_;		\
> -		current->state = (state_value);			\
> +		WRITE_ONCE(current->__state, (state_value));	\
> 	} while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 13:59     ` [PATCH 3/6] sched,perf,kvm: " Mathieu Desnoyers
@ 2021-06-02 14:10       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

On Wed, Jun 02, 2021 at 09:59:07AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > When ran from the sched-out path (preempt_notifier or perf_event),
> > p->state is irrelevant to determine preemption. You can get preempted
> > with !task_is_running() just fine.
> > 
> > The right indicator for preemption is if the task is still on the
> > runqueue in the sched-out path.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/events/core.c |    7 +++----
> > virt/kvm/kvm_main.c  |    2 +-
> > 2 files changed, 4 insertions(+), 5 deletions(-)
> > 
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > 		},
> > 	};
> > 
> > -	if (!sched_in && task->state == TASK_RUNNING)
> > +	if (!sched_in && current->on_rq) {
> 
> This changes from checking task->state to current->on_rq, but this change
> from "task" to "current" is not described in the commit message, which is odd.
> 
> Are we really sure that task == current here ?

Yeah, @task == @prev == current at this point, but yes, not sure why I
changed that... lemme change that back to task.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition
@ 2021-06-02 14:10       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

On Wed, Jun 02, 2021 at 09:59:07AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > When ran from the sched-out path (preempt_notifier or perf_event),
> > p->state is irrelevant to determine preemption. You can get preempted
> > with !task_is_running() just fine.
> > 
> > The right indicator for preemption is if the task is still on the
> > runqueue in the sched-out path.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/events/core.c |    7 +++----
> > virt/kvm/kvm_main.c  |    2 +-
> > 2 files changed, 4 insertions(+), 5 deletions(-)
> > 
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > 		},
> > 	};
> > 
> > -	if (!sched_in && task->state == TASK_RUNNING)
> > +	if (!sched_in && current->on_rq) {
> 
> This changes from checking task->state to current->on_rq, but this change
> from "task" to "current" is not described in the commit message, which is odd.
> 
> Are we really sure that task == current here ?

Yeah, @task == @prev == current at this point, but yes, not sure why I
changed that... lemme change that back to task.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/6] sched: Add get_current_state()
  2021-06-02 14:01     ` Mathieu Desnoyers
@ 2021-06-02 14:12       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

On Wed, Jun 02, 2021 at 10:01:29AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > Remove yet another few p->state accesses.
> 
> [...]
> 
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -212,6 +212,8 @@ struct task_group;
> > 
> > #endif
> > 
> > +#define get_current_state()	READ_ONCE(current->state)
> 
> Why use a macro rather than a static inline here ?

Mostly to be consistent, all that state stuff is macros. I suppose we
could try and make them inlines at the end or so -- if the header maze
allows.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 4/6] sched: Add get_current_state()
@ 2021-06-02 14:12       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

On Wed, Jun 02, 2021 at 10:01:29AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > Remove yet another few p->state accesses.
> 
> [...]
> 
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -212,6 +212,8 @@ struct task_group;
> > 
> > #endif
> > 
> > +#define get_current_state()	READ_ONCE(current->state)
> 
> Why use a macro rather than a static inline here ?

Mostly to be consistent, all that state stuff is macros. I suppose we
could try and make them inlines at the end or so -- if the header maze
allows.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
  (?)
@ 2021-06-02 14:15     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
[...]
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {
> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }

There is a lot of code cleanup going on here which does not seem to belong
to a "Fix" patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
@ 2021-06-02 14:15     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
[...]
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {
> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }

There is a lot of code cleanup going on here which does not seem to belong
to a "Fix" patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition
@ 2021-06-02 14:15     ` Mathieu Desnoyers
  0 siblings, 0 replies; 64+ messages in thread
From: Mathieu Desnoyers @ 2021-06-02 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
[...]
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> 		},
> 	};
> 
> -	if (!sched_in && task->state == TASK_RUNNING)
> +	if (!sched_in && current->on_rq) {
> 		switch_event.event_id.header.misc |=
> 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> +	}
> 
> -	perf_iterate_sb(perf_event_switch_output,
> -		       &switch_event,
> -		       NULL);
> +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> }

There is a lot of code cleanup going on here which does not seem to belong
to a "Fix" patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
  2021-06-02 14:06     ` Mathieu Desnoyers
@ 2021-06-02 14:20       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:20 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

On Wed, Jun 02, 2021 at 10:06:58AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> > @@ -134,14 +134,14 @@ struct task_group;
> > 	do {							\
> > 		WARN_ON_ONCE(is_special_task_state(state_value));\
> > 		current->task_state_change = _THIS_IP_;		\
> > -		current->state = (state_value);			\
> > +		WRITE_ONCE(current->__state, (state_value));	\
> > 	} while (0)
> 
> Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
> READ_ONCE() and WRITE_ONCE() all over the kernel ?

set_task_state() is fundamentally unsound, there's very few sites that
_set_ state on anything other than current, and those sites are super
tricky, eg. ptrace.

Having get_task_state() would seem to suggest it's actually a sane thing
to do, it's not really. Inspecting remote state is full of races, and
some of that really wants cleaning up, but that's for another day.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-02 14:20       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:20 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

On Wed, Jun 02, 2021 at 10:06:58AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> > @@ -134,14 +134,14 @@ struct task_group;
> > 	do {							\
> > 		WARN_ON_ONCE(is_special_task_state(state_value));\
> > 		current->task_state_change = _THIS_IP_;		\
> > -		current->state = (state_value);			\
> > +		WRITE_ONCE(current->__state, (state_value));	\
> > 	} while (0)
> 
> Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
> READ_ONCE() and WRITE_ONCE() all over the kernel ?

set_task_state() is fundamentally unsound, there's very few sites that
_set_ state on anything other than current, and those sites are super
tricky, eg. ptrace.

Having get_task_state() would seem to suggest it's actually a sane thing
to do, it's not really. Inspecting remote state is full of races, and
some of that really wants cleaning up, but that's for another day.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 14:15     ` [PATCH 3/6] sched,perf,kvm: " Mathieu Desnoyers
@ 2021-06-02 14:23       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, rostedt, Ben Segall, Mel Gorman, bristot,
	Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, acme, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Rafael J. Wysocki,
	Pavel Machek, Will Deacon, Waiman Long, Boqun Feng,
	Oleg Nesterov, Davidlohr Bueso, paulmck, Josh Triplett,
	Lai Jiangshan, Joel Fernandes, Google, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, KVM list

On Wed, Jun 02, 2021 at 10:15:16AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> [...]
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > 		},
> > 	};
> > 
> > -	if (!sched_in && task->state == TASK_RUNNING)
> > +	if (!sched_in && current->on_rq) {
> > 		switch_event.event_id.header.misc |=
> > 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> > +	}
> > 
> > -	perf_iterate_sb(perf_event_switch_output,
> > -		       &switch_event,
> > -		       NULL);
> > +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> > }
> 
> There is a lot of code cleanup going on here which does not seem to belong
> to a "Fix" patch.

Maybe, but I so hate whitespace only patches :-/

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition
@ 2021-06-02 14:23       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 14:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, Google, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, KVM list, Will Deacon, cgroups, x86,
	Ingo Molnar, Mel Gorman, Jakub Kicinski, paulmck, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, rostedt, linux-block,
	linux-fsdevel, Borislav Petkov, Alexander Viro, Waiman Long,
	Namhyung Kim, Thomas Gleixner, acme, Dietmar Eggemann,
	Jens Axboe, Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, John Stultz,
	Andrew Morton, rcu, bristot, David S. Miller

On Wed, Jun 02, 2021 at 10:15:16AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> [...]
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > 		},
> > 	};
> > 
> > -	if (!sched_in && task->state == TASK_RUNNING)
> > +	if (!sched_in && current->on_rq) {
> > 		switch_event.event_id.header.misc |=
> > 				PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
> > +	}
> > 
> > -	perf_iterate_sb(perf_event_switch_output,
> > -		       &switch_event,
> > -		       NULL);
> > +	perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
> > }
> 
> There is a lot of code cleanup going on here which does not seem to belong
> to a "Fix" patch.

Maybe, but I so hate whitespace only patches :-/

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/6] sched,perf,kvm: Fix preemption condition
  2021-06-02 14:10       ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Peter Zijlstra
@ 2021-06-02 14:30         ` Mark Rutland
  -1 siblings, 0 replies; 64+ messages in thread
From: Mark Rutland @ 2021-06-02 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Thomas Gleixner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, rostedt, Ben Segall,
	Mel Gorman, bristot, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	acme, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, paulmck,
	Josh Triplett, Lai Jiangshan, Joel Fernandes, Google,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, KVM list

On Wed, Jun 02, 2021 at 04:10:29PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 02, 2021 at 09:59:07AM -0400, Mathieu Desnoyers wrote:
> > ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> > 
> > > When ran from the sched-out path (preempt_notifier or perf_event),
> > > p->state is irrelevant to determine preemption. You can get preempted
> > > with !task_is_running() just fine.
> > > 
> > > The right indicator for preemption is if the task is still on the
> > > runqueue in the sched-out path.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > kernel/events/core.c |    7 +++----
> > > virt/kvm/kvm_main.c  |    2 +-
> > > 2 files changed, 4 insertions(+), 5 deletions(-)
> > > 
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > > 		},
> > > 	};
> > > 
> > > -	if (!sched_in && task->state == TASK_RUNNING)
> > > +	if (!sched_in && current->on_rq) {
> > 
> > This changes from checking task->state to current->on_rq, but this change
> > from "task" to "current" is not described in the commit message, which is odd.
> > 
> > Are we really sure that task == current here ?
> 
> Yeah, @task == @prev == current at this point, but yes, not sure why I
> changed that... lemme change that back to task.

FWIW, with that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

I have no strong feelings either way w.r.t. the whitespace cleanup. ;)

Thanks,
Mark

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 3/6] sched, perf, kvm: Fix preemption condition
@ 2021-06-02 14:30         ` Mark Rutland
  0 siblings, 0 replies; 64+ messages in thread
From: Mark Rutland @ 2021-06-02 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, Ben Segall,
	linux-mm, dm-devel, Paolo Bonzini, Zefan Li, H. Peter Anvin,
	Joel Fernandes, Google, netdev, Jiri Olsa, Alasdair Kergon,
	Daniel Thompson, Davidlohr Bueso, Vincent Guittot, KVM list,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, paulmck, linux-pm, Boqun Feng,
	Jason Wessel, Josh Triplett, rostedt, linux-block, linux-fsdevel,
	Mathieu Desnoyers, Alexander Viro, Waiman Long, Namhyung Kim,
	Thomas Gleixner, acme, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Borislav Petkov,
	Andrew Morton, rcu, bristot, David S. Miller

On Wed, Jun 02, 2021 at 04:10:29PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 02, 2021 at 09:59:07AM -0400, Mathieu Desnoyers wrote:
> > ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> > 
> > > When ran from the sched-out path (preempt_notifier or perf_event),
> > > p->state is irrelevant to determine preemption. You can get preempted
> > > with !task_is_running() just fine.
> > > 
> > > The right indicator for preemption is if the task is still on the
> > > runqueue in the sched-out path.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > kernel/events/core.c |    7 +++----
> > > virt/kvm/kvm_main.c  |    2 +-
> > > 2 files changed, 4 insertions(+), 5 deletions(-)
> > > 
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
> > > 		},
> > > 	};
> > > 
> > > -	if (!sched_in && task->state == TASK_RUNNING)
> > > +	if (!sched_in && current->on_rq) {
> > 
> > This changes from checking task->state to current->on_rq, but this change
> > from "task" to "current" is not described in the commit message, which is odd.
> > 
> > Are we really sure that task == current here ?
> 
> Yeah, @task == @prev == current at this point, but yes, not sure why I
> changed that... lemme change that back to task.

FWIW, with that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

I have no strong feelings either way w.r.t. the whitespace cleanup. ;)

Thanks,
Mark

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/6] sched: Unbreak wakeups
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 14:47     ` Will Deacon
  -1 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote:
> Remove broken task->state references and let wake_up_process() DTRT.
> 
> The anti-pattern in these patches breaks the ordering of ->state vs
> COND as described in the comment near set_current_state() and can lead
> to missed wakeups:
> 
> 	(OoO load, observes RUNNING)<-.
> 	for (;;) {                    |
> 	  t->state = UNINTERRUPTIBLE; |
> 	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                              |       |
> 	                     |       |	COND = 1;
> 			     |	     `- if (t->state != RUNNING)
>                              |		  wake_up_process(t); // not done
> 	  if (COND) ---------'
> 	    break;
> 	  schedule(); // forever waiting
> 	}
> 	t->state = TASK_RUNNING;
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
>  drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
>  drivers/usb/host/max3421-hcd.c          |    3 +--
>  kernel/softirq.c                        |    2 +-
>  4 files changed, 9 insertions(+), 17 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

I couldn't spot any others.

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 1/6] sched: Unbreak wakeups
@ 2021-06-02 14:47     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote:
> Remove broken task->state references and let wake_up_process() DTRT.
> 
> The anti-pattern in these patches breaks the ordering of ->state vs
> COND as described in the comment near set_current_state() and can lead
> to missed wakeups:
> 
> 	(OoO load, observes RUNNING)<-.
> 	for (;;) {                    |
> 	  t->state = UNINTERRUPTIBLE; |
> 	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                              |       |
> 	                     |       |	COND = 1;
> 			     |	     `- if (t->state != RUNNING)
>                              |		  wake_up_process(t); // not done
> 	  if (COND) ---------'
> 	    break;
> 	  schedule(); // forever waiting
> 	}
> 	t->state = TASK_RUNNING;
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/net/ethernet/qualcomm/qca_spi.c |    6 ++----
>  drivers/usb/gadget/udc/max3420_udc.c    |   15 +++++----------
>  drivers/usb/host/max3421-hcd.c          |    3 +--
>  kernel/softirq.c                        |    2 +-
>  4 files changed, 9 insertions(+), 17 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

I couldn't spot any others.

Will

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/6] sched: Introduce task_is_running()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 14:59     ` Will Deacon
  -1 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> task_is_running(p).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/process.c |    4 ++--
>  block/blk-mq.c            |    2 +-
>  include/linux/sched.h     |    2 ++
>  kernel/locking/lockdep.c  |    2 +-
>  kernel/rcu/tree_plugin.h  |    2 +-
>  kernel/sched/core.c       |    6 +++---
>  kernel/sched/stats.h      |    2 +-
>  kernel/signal.c           |    2 +-
>  kernel/softirq.c          |    3 +--
>  mm/compaction.c           |    2 +-
>  10 files changed, 14 insertions(+), 13 deletions(-)
> 
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
>  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
>  	int count = 0;
>  
> -	if (p == current || p->state == TASK_RUNNING)
> +	if (p == current || task_is_running(p))

Looks like this one in get_wchan() has been cargo-culted across most of
arch/ so they'll need fixing up before you rename the struct member.

There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 2/6] sched: Introduce task_is_running()
@ 2021-06-02 14:59     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> task_is_running(p).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/process.c |    4 ++--
>  block/blk-mq.c            |    2 +-
>  include/linux/sched.h     |    2 ++
>  kernel/locking/lockdep.c  |    2 +-
>  kernel/rcu/tree_plugin.h  |    2 +-
>  kernel/sched/core.c       |    6 +++---
>  kernel/sched/stats.h      |    2 +-
>  kernel/signal.c           |    2 +-
>  kernel/softirq.c          |    3 +--
>  mm/compaction.c           |    2 +-
>  10 files changed, 14 insertions(+), 13 deletions(-)
> 
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
>  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
>  	int count = 0;
>  
> -	if (p == current || p->state == TASK_RUNNING)
> +	if (p == current || task_is_running(p))

Looks like this one in get_wchan() has been cargo-culted across most of
arch/ so they'll need fixing up before you rename the struct member.

There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

Will

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/6] sched: Add get_current_state()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 15:02     ` Will Deacon
  -1 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:29PM +0200, Peter Zijlstra wrote:
> Remove yet another few p->state accesses.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c        |    2 +-
>  include/linux/sched.h |    2 ++
>  kernel/freezer.c      |    2 +-
>  kernel/sched/core.c   |    6 +++---
>  4 files changed, 7 insertions(+), 5 deletions(-)

I think you can include kernel/kcsan/report.c here too.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 4/6] sched: Add get_current_state()
@ 2021-06-02 15:02     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:29PM +0200, Peter Zijlstra wrote:
> Remove yet another few p->state accesses.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c        |    2 +-
>  include/linux/sched.h |    2 ++
>  kernel/freezer.c      |    2 +-
>  kernel/sched/core.c   |    6 +++---
>  4 files changed, 7 insertions(+), 5 deletions(-)

I think you can include kernel/kcsan/report.c here too.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 5/6] sched,timer: Use __set_current_state()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 15:06     ` Will Deacon
  -1 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:30PM +0200, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/time/timer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1879,7 +1879,7 @@ signed long __sched schedule_timeout(sig
>  			printk(KERN_ERR "schedule_timeout: wrong timeout "
>  				"value %lx\n", timeout);
>  			dump_stack();
> -			current->state = TASK_RUNNING;
> +			__set_current_state(TASK_RUNNING);

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 5/6] sched,timer: Use __set_current_state()
@ 2021-06-02 15:06     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:30PM +0200, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/time/timer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1879,7 +1879,7 @@ signed long __sched schedule_timeout(sig
>  			printk(KERN_ERR "schedule_timeout: wrong timeout "
>  				"value %lx\n", timeout);
>  			dump_stack();
> -			current->state = TASK_RUNNING;
> +			__set_current_state(TASK_RUNNING);

Acked-by: Will Deacon <will@kernel.org>

Will

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 15:10     ` Will Deacon
  -1 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c                 |    2 -
>  drivers/md/dm.c                |    6 ++--
>  fs/binfmt_elf.c                |    8 +++---
>  fs/userfaultfd.c               |    4 +--
>  include/linux/sched.h          |   31 +++++++++++------------
>  include/linux/sched/debug.h    |    2 -
>  include/linux/sched/signal.h   |    2 -
>  init/init_task.c               |    2 -
>  kernel/cgroup/cgroup-v1.c      |    2 -
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  kernel/fork.c                  |    4 +--
>  kernel/hung_task.c             |    2 -
>  kernel/kthread.c               |    4 +--
>  kernel/locking/mutex.c         |    6 ++--
>  kernel/locking/rtmutex.c       |    4 +--
>  kernel/locking/rwsem.c         |    2 -
>  kernel/ptrace.c                |   12 ++++-----
>  kernel/rcu/rcutorture.c        |    4 +--
>  kernel/rcu/tree_stall.h        |   12 ++++-----
>  kernel/sched/core.c            |   53 +++++++++++++++++++++--------------------
>  kernel/sched/deadline.c        |   10 +++----
>  kernel/sched/fair.c            |   11 +++++---
>  lib/syscall.c                  |    4 +--
>  net/core/dev.c                 |    2 -
>  24 files changed, 108 insertions(+), 99 deletions(-)

I think this makes the code a _lot_ easier to understand, so:

Acked-by: Will Deacon <will@kernel.org>

on the assumption that you'll fix get_wchan() for !x86 as well.

Cheers,

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-02 15:10     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2021-06-02 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c                 |    2 -
>  drivers/md/dm.c                |    6 ++--
>  fs/binfmt_elf.c                |    8 +++---
>  fs/userfaultfd.c               |    4 +--
>  include/linux/sched.h          |   31 +++++++++++------------
>  include/linux/sched/debug.h    |    2 -
>  include/linux/sched/signal.h   |    2 -
>  init/init_task.c               |    2 -
>  kernel/cgroup/cgroup-v1.c      |    2 -
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  kernel/fork.c                  |    4 +--
>  kernel/hung_task.c             |    2 -
>  kernel/kthread.c               |    4 +--
>  kernel/locking/mutex.c         |    6 ++--
>  kernel/locking/rtmutex.c       |    4 +--
>  kernel/locking/rwsem.c         |    2 -
>  kernel/ptrace.c                |   12 ++++-----
>  kernel/rcu/rcutorture.c        |    4 +--
>  kernel/rcu/tree_stall.h        |   12 ++++-----
>  kernel/sched/core.c            |   53 +++++++++++++++++++++--------------------
>  kernel/sched/deadline.c        |   10 +++----
>  kernel/sched/fair.c            |   11 +++++---
>  lib/syscall.c                  |    4 +--
>  net/core/dev.c                 |    2 -
>  24 files changed, 108 insertions(+), 99 deletions(-)

I think this makes the code a _lot_ easier to understand, so:

Acked-by: Will Deacon <will@kernel.org>

on the assumption that you'll fix get_wchan() for !x86 as well.

Cheers,

Will

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/6] sched: Introduce task_is_running()
  2021-06-02 14:59     ` [dm-devel] " Will Deacon
@ 2021-06-02 16:46       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 16:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Waiman Long, Boqun Feng, Oleg Nesterov, Davidlohr Bueso,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:59:21PM +0100, Will Deacon wrote:
> On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> > Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> > task_is_running(p).
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/process.c |    4 ++--
> >  block/blk-mq.c            |    2 +-
> >  include/linux/sched.h     |    2 ++
> >  kernel/locking/lockdep.c  |    2 +-
> >  kernel/rcu/tree_plugin.h  |    2 +-
> >  kernel/sched/core.c       |    6 +++---
> >  kernel/sched/stats.h      |    2 +-
> >  kernel/signal.c           |    2 +-
> >  kernel/softirq.c          |    3 +--
> >  mm/compaction.c           |    2 +-
> >  10 files changed, 14 insertions(+), 13 deletions(-)
> > 
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
> >  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
> >  	int count = 0;
> >  
> > -	if (p == current || p->state == TASK_RUNNING)
> > +	if (p == current || task_is_running(p))
> 
> Looks like this one in get_wchan() has been cargo-culted across most of
> arch/ so they'll need fixing up before you rename the struct member.

Yeah, this was x86_64 allmodconfig driven, I've already got a bunch of
robot mail telling me other archs need help, I'll fix it iup.

> There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

I'm tempted to let the bpf people sort their own gunk. This is not an
ABI. I so don't care breaking every script out there.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 2/6] sched: Introduce task_is_running()
@ 2021-06-02 16:46       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-02 16:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	Vincent Guittot, kvm, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:59:21PM +0100, Will Deacon wrote:
> On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> > Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> > task_is_running(p).
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/process.c |    4 ++--
> >  block/blk-mq.c            |    2 +-
> >  include/linux/sched.h     |    2 ++
> >  kernel/locking/lockdep.c  |    2 +-
> >  kernel/rcu/tree_plugin.h  |    2 +-
> >  kernel/sched/core.c       |    6 +++---
> >  kernel/sched/stats.h      |    2 +-
> >  kernel/signal.c           |    2 +-
> >  kernel/softirq.c          |    3 +--
> >  mm/compaction.c           |    2 +-
> >  10 files changed, 14 insertions(+), 13 deletions(-)
> > 
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
> >  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
> >  	int count = 0;
> >  
> > -	if (p == current || p->state == TASK_RUNNING)
> > +	if (p == current || task_is_running(p))
> 
> Looks like this one in get_wchan() has been cargo-culted across most of
> arch/ so they'll need fixing up before you rename the struct member.

Yeah, this was x86_64 allmodconfig driven, I've already got a bunch of
robot mail telling me other archs need help, I'll fix it iup.

> There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

I'm tempted to let the bpf people sort their own gunk. This is not an
ABI. I so don't care breaking every script out there.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/6] sched: Unbreak wakeups
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 19:43     ` Davidlohr Bueso
  -1 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 19:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

>Remove broken task->state references and let wake_up_process() DTRT.
>
>The anti-pattern in these patches breaks the ordering of ->state vs
>COND as described in the comment near set_current_state() and can lead
>to missed wakeups:
>
>	(OoO load, observes RUNNING)<-.
>	for (;;) {                    |
>	  t->state = UNINTERRUPTIBLE; |
>	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                             |       |
>	                     |       |	COND = 1;
>			     |	     `- if (t->state != RUNNING)
>                             |		  wake_up_process(t); // not done
>	  if (COND) ---------'
>	    break;
>	  schedule(); // forever waiting
>	}
>	t->state = TASK_RUNNING;
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 1/6] sched: Unbreak wakeups
@ 2021-06-02 19:43     ` Davidlohr Bueso
  0 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 19:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

>Remove broken task->state references and let wake_up_process() DTRT.
>
>The anti-pattern in these patches breaks the ordering of ->state vs
>COND as described in the comment near set_current_state() and can lead
>to missed wakeups:
>
>	(OoO load, observes RUNNING)<-.
>	for (;;) {                    |
>	  t->state = UNINTERRUPTIBLE; |
>	  smp_mb();          ,-----> ,' (OoO load, observed !COND)
>                             |       |
>	                     |       |	COND = 1;
>			     |	     `- if (t->state != RUNNING)
>                             |		  wake_up_process(t); // not done
>	  if (COND) ---------'
>	    break;
>	  schedule(); // forever waiting
>	}
>	t->state = TASK_RUNNING;
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 5/6] sched,timer: Use __set_current_state()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 19:54     ` Davidlohr Bueso
  -1 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 19:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

-ENOCHANGELONG

But yeah, I thought we had gotten rid of all these.

>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 5/6] sched,timer: Use __set_current_state()
@ 2021-06-02 19:54     ` Davidlohr Bueso
  0 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 19:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

-ENOCHANGELONG

But yeah, I thought we had gotten rid of all these.

>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/6] sched: Introduce task_is_running()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-02 23:15     ` Davidlohr Bueso
  -1 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 23:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

>Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
>task_is_running(p).
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Davidlohr Bueso

But afaict ....

>---
> arch/x86/kernel/process.c |    4 ++--
> block/blk-mq.c            |    2 +-
> include/linux/sched.h     |    2 ++
> kernel/locking/lockdep.c  |    2 +-
> kernel/rcu/tree_plugin.h  |    2 +-
> kernel/sched/core.c       |    6 +++---
> kernel/sched/stats.h      |    2 +-
> kernel/signal.c           |    2 +-
> kernel/softirq.c          |    3 +--
> mm/compaction.c           |    2 +-
> 10 files changed, 14 insertions(+), 13 deletions(-)

there are also (on top of the already mentioned arch/):

kernel/kcsan/report.c:  const bool is_running = current->state == TASK_RUNNING;
kernel/locking/lockdep.c:       if (p->state == TASK_RUNNING && p != current)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 2/6] sched: Introduce task_is_running()
@ 2021-06-02 23:15     ` Davidlohr Bueso
  0 siblings, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2021-06-02 23:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Daniel Thompson, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, 02 Jun 2021, Peter Zijlstra wrote:

>Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
>task_is_running(p).
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Davidlohr Bueso

But afaict ....

>---
> arch/x86/kernel/process.c |    4 ++--
> block/blk-mq.c            |    2 +-
> include/linux/sched.h     |    2 ++
> kernel/locking/lockdep.c  |    2 +-
> kernel/rcu/tree_plugin.h  |    2 +-
> kernel/sched/core.c       |    6 +++---
> kernel/sched/stats.h      |    2 +-
> kernel/signal.c           |    2 +-
> kernel/softirq.c          |    3 +--
> mm/compaction.c           |    2 +-
> 10 files changed, 14 insertions(+), 13 deletions(-)

there are also (on top of the already mentioned arch/):

kernel/kcsan/report.c:  const bool is_running = current->state == TASK_RUNNING;
kernel/locking/lockdep.c:       if (p->state == TASK_RUNNING && p != current)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 5/6] sched,timer: Use __set_current_state()
  2021-06-02 19:54     ` [dm-devel] " Davidlohr Bueso
@ 2021-06-03  6:39       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-03  6:39 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 12:54:58PM -0700, Davidlohr Bueso wrote:
> On Wed, 02 Jun 2021, Peter Zijlstra wrote:
> 
> -ENOCHANGELONG

I completely failed to come up with something useful, still do. Subject
says it all.

> But yeah, I thought we had gotten rid of all these.

I too was surprised to find it :-)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 5/6] sched,timer: Use __set_current_state()
@ 2021-06-03  6:39       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-03  6:39 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, John Stultz, Stephen Boyd,
	Andrew Morton, Paolo Bonzini, linux-kernel, linux-block, netdev,
	linux-usb, linux-fsdevel, cgroups, kgdb-bugreport,
	linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 12:54:58PM -0700, Davidlohr Bueso wrote:
> On Wed, 02 Jun 2021, Peter Zijlstra wrote:
> 
> -ENOCHANGELONG

I completely failed to come up with something useful, still do. Subject
says it all.

> But yeah, I thought we had gotten rid of all these.

I too was surprised to find it :-)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-07 10:45     ` Daniel Thompson
  -1 siblings, 0 replies; 64+ messages in thread
From: Daniel Thompson @ 2021-06-07 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, John Stultz,
	Stephen Boyd, Andrew Morton, Paolo Bonzini, linux-kernel,
	linux-block, netdev, linux-usb, linux-fsdevel, cgroups,
	kgdb-bugreport, linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  ...
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  ...
> --- a/kernel/debug/kdb/kdb_support.c
> +++ b/kernel/debug/kdb/kdb_support.c
> @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
>   */
>  char kdb_task_state_char (const struct task_struct *p)
>  {
> -	int cpu;
> -	char state;
> +	unsigned int p_state;
>  	unsigned long tmp;
> +	char state;
> +	int cpu;
>  
>  	if (!p ||
>  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
>  		return 'E';
>  
>  	cpu = kdb_process_cpu(p);
> -	state = (p->state == 0) ? 'R' :
> -		(p->state < 0) ? 'U' :
> -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
> -		(p->state & TASK_STOPPED) ? 'T' :
> -		(p->state & TASK_TRACED) ? 'C' :
> +	p_state = READ_ONCE(p->__state);
> +	state = (p_state == 0) ? 'R' :
> +		(p_state < 0) ? 'U' :

Looks like the U here stands for Unreachable since this patch makes it
more obvious that this clause is (and previously was) exactly that!

Dropping the U state would be good since I guess this will show up as a
"new" warning in some tools. However it was a preexisting problem so with
or without this cleaned up:
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

> +		(p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
> +		(p_state & TASK_STOPPED) ? 'T' :
> +		(p_state & TASK_TRACED) ? 'C' :
>  		(p->exit_state & EXIT_ZOMBIE) ? 'Z' :
>  		(p->exit_state & EXIT_DEAD) ? 'E' :
> -		(p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
> +		(p_state & TASK_INTERRUPTIBLE) ? 'S' : '?';
>  	if (is_idle_task(p)) {
>  		/* Idle task.  Is it really idle, apart from the kdb
>  		 * interrupt? */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-07 10:45     ` Daniel Thompson
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Thompson @ 2021-06-07 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Davidlohr Bueso, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  ...
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  ...
> --- a/kernel/debug/kdb/kdb_support.c
> +++ b/kernel/debug/kdb/kdb_support.c
> @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
>   */
>  char kdb_task_state_char (const struct task_struct *p)
>  {
> -	int cpu;
> -	char state;
> +	unsigned int p_state;
>  	unsigned long tmp;
> +	char state;
> +	int cpu;
>  
>  	if (!p ||
>  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
>  		return 'E';
>  
>  	cpu = kdb_process_cpu(p);
> -	state = (p->state == 0) ? 'R' :
> -		(p->state < 0) ? 'U' :
> -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
> -		(p->state & TASK_STOPPED) ? 'T' :
> -		(p->state & TASK_TRACED) ? 'C' :
> +	p_state = READ_ONCE(p->__state);
> +	state = (p_state == 0) ? 'R' :
> +		(p_state < 0) ? 'U' :

Looks like the U here stands for Unreachable since this patch makes it
more obvious that this clause is (and previously was) exactly that!

Dropping the U state would be good since I guess this will show up as a
"new" warning in some tools. However it was a preexisting problem so with
or without this cleaned up:
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

> +		(p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
> +		(p_state & TASK_STOPPED) ? 'T' :
> +		(p_state & TASK_TRACED) ? 'C' :
>  		(p->exit_state & EXIT_ZOMBIE) ? 'Z' :
>  		(p->exit_state & EXIT_DEAD) ? 'E' :
> -		(p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
> +		(p_state & TASK_INTERRUPTIBLE) ? 'S' : '?';
>  	if (is_idle_task(p)) {
>  		/* Idle task.  Is it really idle, apart from the kdb
>  		 * interrupt? */

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 6/6] sched: Change task_struct::state
  2021-06-07 10:45     ` [dm-devel] " Daniel Thompson
@ 2021-06-07 11:10       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-07 11:10 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Thomas Gleixner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Borislav Petkov, x86, H. Peter Anvin,
	Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel,
	David S. Miller, Jakub Kicinski, Felipe Balbi,
	Greg Kroah-Hartman, Alexander Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, Jason Wessel, Douglas Anderson,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Rafael J. Wysocki, Pavel Machek,
	Will Deacon, Waiman Long, Boqun Feng, Oleg Nesterov,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, John Stultz,
	Stephen Boyd, Andrew Morton, Paolo Bonzini, linux-kernel,
	linux-block, netdev, linux-usb, linux-fsdevel, cgroups,
	kgdb-bugreport, linux-perf-users, linux-pm, rcu, linux-mm, kvm

On Mon, Jun 07, 2021 at 11:45:00AM +0100, Daniel Thompson wrote:
> On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> > Change the type and name of task_struct::state. Drop the volatile and
> > shrink it to an 'unsigned int'. Rename it in order to find all uses
> > such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  ...
> >  kernel/debug/kdb/kdb_support.c |   18 +++++++------
> >  ...
> > --- a/kernel/debug/kdb/kdb_support.c
> > +++ b/kernel/debug/kdb/kdb_support.c
> > @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
> >   */
> >  char kdb_task_state_char (const struct task_struct *p)
> >  {
> > -	int cpu;
> > -	char state;
> > +	unsigned int p_state;
> >  	unsigned long tmp;
> > +	char state;
> > +	int cpu;
> >  
> >  	if (!p ||
> >  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
> >  		return 'E';
> >  
> >  	cpu = kdb_process_cpu(p);
> > -	state = (p->state == 0) ? 'R' :
> > -		(p->state < 0) ? 'U' :
> > -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
> > -		(p->state & TASK_STOPPED) ? 'T' :
> > -		(p->state & TASK_TRACED) ? 'C' :
> > +	p_state = READ_ONCE(p->__state);
> > +	state = (p_state == 0) ? 'R' :
> > +		(p_state < 0) ? 'U' :
> 
> Looks like the U here stands for Unreachable since this patch makes it
> more obvious that this clause is (and previously was) exactly that!
> 
> Dropping the U state would be good since I guess this will show up as a
> "new" warning in some tools. However it was a preexisting problem so with
> or without this cleaned up:
> Acked-by: Daniel Thompson <daniel.thompson@linaro.org>

Thanks!

Note that there's a second instance of this exact code in
arch/powerpc/xmon/xmon.c, with the same 'U' issue.

I'll repost this soon, as it seems I've fixed all robot failout (fingers
crossed).

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 6/6] sched: Change task_struct::state
@ 2021-06-07 11:10       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2021-06-07 11:10 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Juri Lelli, Mark Rutland, Pavel Machek, Mike Snitzer,
	Alexander Shishkin, kgdb-bugreport, Lai Jiangshan, Oleg Nesterov,
	Ben Segall, linux-mm, dm-devel, Paolo Bonzini, Zefan Li,
	H. Peter Anvin, Joel Fernandes, netdev, Jiri Olsa,
	Alasdair Kergon, Davidlohr Bueso, Vincent Guittot, kvm,
	Will Deacon, cgroups, x86, Ingo Molnar, Mel Gorman,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Josh Triplett, Steven Rostedt,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Dietmar Eggemann, Jens Axboe,
	Felipe Balbi, Stephen Boyd, Greg Kroah-Hartman, linux-usb,
	Rafael J. Wysocki, Douglas Anderson, linux-kernel,
	linux-perf-users, Johannes Weiner, Tejun Heo, Mathieu Desnoyers,
	Andrew Morton, rcu, Daniel Bristot de Oliveira, David S. Miller

On Mon, Jun 07, 2021 at 11:45:00AM +0100, Daniel Thompson wrote:
> On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> > Change the type and name of task_struct::state. Drop the volatile and
> > shrink it to an 'unsigned int'. Rename it in order to find all uses
> > such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  ...
> >  kernel/debug/kdb/kdb_support.c |   18 +++++++------
> >  ...
> > --- a/kernel/debug/kdb/kdb_support.c
> > +++ b/kernel/debug/kdb/kdb_support.c
> > @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
> >   */
> >  char kdb_task_state_char (const struct task_struct *p)
> >  {
> > -	int cpu;
> > -	char state;
> > +	unsigned int p_state;
> >  	unsigned long tmp;
> > +	char state;
> > +	int cpu;
> >  
> >  	if (!p ||
> >  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
> >  		return 'E';
> >  
> >  	cpu = kdb_process_cpu(p);
> > -	state = (p->state == 0) ? 'R' :
> > -		(p->state < 0) ? 'U' :
> > -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
> > -		(p->state & TASK_STOPPED) ? 'T' :
> > -		(p->state & TASK_TRACED) ? 'C' :
> > +	p_state = READ_ONCE(p->__state);
> > +	state = (p_state == 0) ? 'R' :
> > +		(p_state < 0) ? 'U' :
> 
> Looks like the U here stands for Unreachable since this patch makes it
> more obvious that this clause is (and previously was) exactly that!
> 
> Dropping the U state would be good since I guess this will show up as a
> "new" warning in some tools. However it was a preexisting problem so with
> or without this cleaned up:
> Acked-by: Daniel Thompson <daniel.thompson@linaro.org>

Thanks!

Note that there's a second instance of this exact code in
arch/powerpc/xmon/xmon.c, with the same 'U' issue.

I'll repost this soon, as it seems I've fixed all robot failout (fingers
crossed).

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 5/6] sched,timer: Use __set_current_state()
  2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
@ 2021-06-18 20:42     ` Thomas Gleixner
  -1 siblings, 0 replies; 64+ messages in thread
From: Thomas Gleixner @ 2021-06-18 20:42 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Borislav Petkov, x86, H. Peter Anvin, Jens Axboe,
	Alasdair Kergon, Mike Snitzer, dm-devel, David S. Miller,
	Jakub Kicinski, Felipe Balbi, Greg Kroah-Hartman, Alexander Viro,
	Tejun Heo, Zefan Li, Johannes Weiner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Rafael J. Wysocki, Pavel Machek, Will Deacon, Waiman Long,
	Boqun Feng, Oleg Nesterov, Davidlohr Bueso, Paul E. McKenney,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	John Stultz, Stephen Boyd, Andrew Morton, Paolo Bonzini,
	linux-kernel, linux-block, netdev, linux-usb, linux-fsdevel,
	cgroups, kgdb-bugreport, linux-perf-users, linux-pm, rcu,
	linux-mm, kvm

On Wed, Jun 02 2021 at 15:12, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [dm-devel] [PATCH 5/6] sched,timer: Use __set_current_state()
@ 2021-06-18 20:42     ` Thomas Gleixner
  0 siblings, 0 replies; 64+ messages in thread
From: Thomas Gleixner @ 2021-06-18 20:42 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira
  Cc: Rutland, Pavel Machek, Mike Snitzer, Alexander Shishkin,
	kgdb-bugreport, Lai Jiangshan, Oleg Nesterov, linux-mm, dm-devel,
	Paolo Bonzini, Zefan Li, H. Peter Anvin, Joel Fernandes, netdev,
	Jiri Olsa, Alasdair Kergon, Daniel Thompson, Davidlohr Bueso,
	kvm, Will Deacon, cgroups, x86, Josh, Mark, Johannes,
	Jakub Kicinski, John Stultz, Paul E. McKenney, linux-pm,
	Boqun Feng, Jason Wessel, Triplett, Arnaldo Carvalho de Melo,
	linux-block, linux-fsdevel, Borislav Petkov, Alexander Viro,
	Waiman Long, Namhyung Kim, Jens Axboe, Felipe Balbi,
	Stephen Boyd, Greg Kroah-Hartman, linux-usb, Rafael J. Wysocki,
	Douglas Anderson, linux-kernel, linux-perf-users, Weiner,
	Tejun Heo, Mathieu Desnoyers, Andrew Morton, rcu,
	David S. Miller

On Wed, Jun 02 2021 at 15:12, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2021-06-28  9:52 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 13:12 [PATCH 0/6] sched: Cleanup task_struct::state Peter Zijlstra
2021-06-02 13:12 ` [dm-devel] " Peter Zijlstra
2021-06-02 13:12 ` [PATCH 1/6] sched: Unbreak wakeups Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 13:58   ` Greg Kroah-Hartman
2021-06-02 13:58     ` [dm-devel] " Greg Kroah-Hartman
2021-06-02 14:47   ` Will Deacon
2021-06-02 14:47     ` [dm-devel] " Will Deacon
2021-06-02 19:43   ` Davidlohr Bueso
2021-06-02 19:43     ` [dm-devel] " Davidlohr Bueso
2021-06-02 13:12 ` [PATCH 2/6] sched: Introduce task_is_running() Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 14:59   ` Will Deacon
2021-06-02 14:59     ` [dm-devel] " Will Deacon
2021-06-02 16:46     ` Peter Zijlstra
2021-06-02 16:46       ` [dm-devel] " Peter Zijlstra
2021-06-02 23:15   ` Davidlohr Bueso
2021-06-02 23:15     ` [dm-devel] " Davidlohr Bueso
2021-06-02 13:12 ` [PATCH 3/6] sched,perf,kvm: Fix preemption condition Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 13:59   ` Mathieu Desnoyers
2021-06-02 13:59     ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Mathieu Desnoyers
2021-06-02 13:59     ` [PATCH 3/6] sched,perf,kvm: " Mathieu Desnoyers
2021-06-02 14:10     ` Peter Zijlstra
2021-06-02 14:10       ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Peter Zijlstra
2021-06-02 14:30       ` [PATCH 3/6] sched,perf,kvm: " Mark Rutland
2021-06-02 14:30         ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Mark Rutland
2021-06-02 14:15   ` [PATCH 3/6] sched,perf,kvm: " Mathieu Desnoyers
2021-06-02 14:15     ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Mathieu Desnoyers
2021-06-02 14:15     ` [PATCH 3/6] sched,perf,kvm: " Mathieu Desnoyers
2021-06-02 14:23     ` Peter Zijlstra
2021-06-02 14:23       ` [dm-devel] [PATCH 3/6] sched, perf, kvm: " Peter Zijlstra
2021-06-02 13:12 ` [PATCH 4/6] sched: Add get_current_state() Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 14:01   ` Mathieu Desnoyers
2021-06-02 14:01     ` [dm-devel] " Mathieu Desnoyers
2021-06-02 14:01     ` Mathieu Desnoyers
2021-06-02 14:12     ` Peter Zijlstra
2021-06-02 14:12       ` [dm-devel] " Peter Zijlstra
2021-06-02 15:02   ` Will Deacon
2021-06-02 15:02     ` [dm-devel] " Will Deacon
2021-06-02 13:12 ` [PATCH 5/6] sched,timer: Use __set_current_state() Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 15:06   ` Will Deacon
2021-06-02 15:06     ` [dm-devel] " Will Deacon
2021-06-02 19:54   ` Davidlohr Bueso
2021-06-02 19:54     ` [dm-devel] " Davidlohr Bueso
2021-06-03  6:39     ` Peter Zijlstra
2021-06-03  6:39       ` [dm-devel] " Peter Zijlstra
2021-06-18 20:42   ` Thomas Gleixner
2021-06-18 20:42     ` [dm-devel] " Thomas Gleixner
2021-06-02 13:12 ` [PATCH 6/6] sched: Change task_struct::state Peter Zijlstra
2021-06-02 13:12   ` [dm-devel] " Peter Zijlstra
2021-06-02 14:06   ` Mathieu Desnoyers
2021-06-02 14:06     ` [dm-devel] " Mathieu Desnoyers
2021-06-02 14:06     ` Mathieu Desnoyers
2021-06-02 14:20     ` Peter Zijlstra
2021-06-02 14:20       ` [dm-devel] " Peter Zijlstra
2021-06-02 15:10   ` Will Deacon
2021-06-02 15:10     ` [dm-devel] " Will Deacon
2021-06-07 10:45   ` Daniel Thompson
2021-06-07 10:45     ` [dm-devel] " Daniel Thompson
2021-06-07 11:10     ` Peter Zijlstra
2021-06-07 11:10       ` [dm-devel] " Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.