linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
@ 2020-09-18 17:27 Tejun Heo
  2020-09-24 11:50 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2020-09-18 17:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel

Hello,

Peter, I noticed /proc/stat::procs_blocked going U64_MAX transiently once in
the blue moon without any other persistent issues. After looking at the code
with Rik for a bit, the culprit seems to be c6e7bd7afaeb ("sched/core:
Optimize ttwu() spinning on p->on_cpu") - it changed where ttwu dec's
nr_iowait and it looks like that can happen before the target task gets to
inc.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
  2020-09-18 17:27 sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization Tejun Heo
@ 2020-09-24 11:50 ` Peter Zijlstra
  2020-09-24 14:27   ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2020-09-24 11:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Rik van Riel

On Fri, Sep 18, 2020 at 01:27:59PM -0400, Tejun Heo wrote:
> Hello,
> 
> Peter, I noticed /proc/stat::procs_blocked going U64_MAX transiently once in
> the blue moon without any other persistent issues. After looking at the code
> with Rik for a bit, the culprit seems to be c6e7bd7afaeb ("sched/core:
> Optimize ttwu() spinning on p->on_cpu") - it changed where ttwu dec's
> nr_iowait and it looks like that can happen before the target task gets to
> inc.

Hurmph.. I suppose you're right :/ And this is an actual problem?

I think the below should cure that, but blergh, not nice. If you could
confirm, I'll try and think of something nicer.


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ebb90572e10d..259a4ae8ab8e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2505,7 +2505,12 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 #ifdef CONFIG_SMP
 	if (wake_flags & WF_MIGRATED)
 		en_flags |= ENQUEUE_MIGRATED;
+	else
 #endif
+	if (p->in_iowait) {
+		delayacct_blkio_end(p);
+		atomic_dec(&task_rq(p)->nr_iowait);
+	}
 
 	activate_task(rq, p, en_flags);
 	ttwu_do_wakeup(rq, p, wake_flags, rf);
@@ -2892,11 +2897,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
 		goto unlock;
 
-	if (p->in_iowait) {
-		delayacct_blkio_end(p);
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
-
 #ifdef CONFIG_SMP
 	/*
 	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
@@ -2967,6 +2967,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
 	if (task_cpu(p) != cpu) {
+		if (p->in_iowait) {
+			delayacct_blkio_end(p);
+			atomic_dec(&task_rq(p)->nr_iowait);
+		}
+
 		wake_flags |= WF_MIGRATED;
 		psi_ttwu_dequeue(p);
 		set_task_cpu(p, cpu);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
  2020-09-24 11:50 ` Peter Zijlstra
@ 2020-09-24 14:27   ` Tejun Heo
  2020-09-24 14:50     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2020-09-24 14:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel

Hello,

On Thu, Sep 24, 2020 at 01:50:42PM +0200, Peter Zijlstra wrote:
> Hurmph.. I suppose you're right :/ And this is an actual problem?

Yeah, this got exposed to userspace as a full 64bit number which overflowed
u32 conversion in the rust procfs library which aborted a program I was
working on multiple times over several months.

On a more theoretical side, it might also surprise nr_iowait_cpu() users.
However, a real problem that may be.

> I think the below should cure that, but blergh, not nice. If you could
> confirm, I'll try and think of something nicer.

Rik suggested that it'd be sufficient to return 0 on underflow especially
given that 0 is actually the right number to describe the state. So, maybe
that can be a nicer code-wise?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
  2020-09-24 14:27   ` Tejun Heo
@ 2020-09-24 14:50     ` Peter Zijlstra
  2020-09-24 14:57       ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2020-09-24 14:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Rik van Riel

On Thu, Sep 24, 2020 at 10:27:51AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Thu, Sep 24, 2020 at 01:50:42PM +0200, Peter Zijlstra wrote:
> > Hurmph.. I suppose you're right :/ And this is an actual problem?
> 
> Yeah, this got exposed to userspace as a full 64bit number which overflowed
> u32 conversion in the rust procfs library which aborted a program I was
> working on multiple times over several months.
> 
> On a more theoretical side, it might also surprise nr_iowait_cpu() users.
> However, a real problem that may be.
> 
> > I think the below should cure that, but blergh, not nice. If you could
> > confirm, I'll try and think of something nicer.
> 
> Rik suggested that it'd be sufficient to return 0 on underflow especially
> given that 0 is actually the right number to describe the state. So, maybe
> that can be a nicer code-wise?

I worry about things where one CPU has a positive value and one or more
(other) CPUs have a temporary negative value.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
  2020-09-24 14:50     ` Peter Zijlstra
@ 2020-09-24 14:57       ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2020-09-24 14:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel

On Thu, Sep 24, 2020 at 04:50:41PM +0200, Peter Zijlstra wrote:
> > Rik suggested that it'd be sufficient to return 0 on underflow especially
> > given that 0 is actually the right number to describe the state. So, maybe
> > that can be a nicer code-wise?
> 
> I worry about things where one CPU has a positive value and one or more
> (other) CPUs have a temporary negative value.

I think it'd make more sense to max'ing them per-cpu as that's the right
per-cpu state. nr_iowait_cpu() needs that anyway, so maybe the summing
function can just use nr_iowait_cpu()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-24 14:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 17:27 sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization Tejun Heo
2020-09-24 11:50 ` Peter Zijlstra
2020-09-24 14:27   ` Tejun Heo
2020-09-24 14:50     ` Peter Zijlstra
2020-09-24 14:57       ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).