All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: fix a potential divide error
@ 2019-04-20  8:34 Xie XiuQi
  2019-04-23 18:44 ` Peter Zijlstra
  2019-04-25 17:59 ` [PATCH] sched: fix a potential divide error Ingo Molnar
  0 siblings, 2 replies; 8+ messages in thread
From: Xie XiuQi @ 2019-04-20  8:34 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, cj.chengjian

We meet a divide error on 3.10.0 kernel, the error message is bellow:

[499992.287996] divide error: 0000 [#1] SMP
[499992.297796] do nothing after die!
[499992.299108] Modules linked in: signo_catch macvlan binfmt_misc
ip_set_hash_netport ip_set_hash_ipport vport_vxlan ipt_REJECT
xt_statistic xt_physdev xt_nat xt_recent xt_mark xt_comment ...
[499992.312751] CPU: 8 PID: 23352 Comm: bash Tainted: G ----V-------   3.10.0+ #1
[499992.314308] Hardware name: OpenStack Foundation OpenStack Nova, BIOS
rel-1.9.1-0-gb3ef39f-20170329_185309-build9a64a246a231 04/01/2014
[499992.317411] task: ffff880033fc9700 ti: ffff8807fed60000 task.ti:ffff8807fed60000
[499992.318967] RIP: 0010:[<ffffffff810c15c2>]  [<ffffffff810c15c2>] task_numa_fault+0x1c2/0xbb0
[499992.320515] RSP: 0000:ffff8807fed63d38  EFLAGS: 00010246
[499992.322018] RAX: 0000002b7efd0000 RBX: ffff880033fc9700 RCX:0000000000000003
[499992.323563] RDX: 0000000000000000 RSI: 0000000000000400 RDI:ffffffff81a80f60
[499992.325052] RBP: ffff8807fed63db8 R08: ffffffff81a80f68 R09:0000000000000000
[499992.326531] R10: ffff88083ffda000 R11: 0000000000000000 R12:0000000000000424
[499992.327987] R13: 00000000002b7efd R14: 0000000000000000 R15:ffffea001ea42a00
[499992.329420] FS:  00007fa01a3b7740(0000) GS:ffff88103ec00000(0000)knlGS:0000000000000000
[499992.330866] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[499992.332302] CR2: 0000000000ff1fb0 CR3: 00000007ff1d1000 CR4:00000000003407e0
[499992.333763] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
[499992.335187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[499992.336595] Stack:
[499992.337974]  0000000000000000 00000001bc9598a8 ffffea001ea42a00 0000000100000001
[499992.339374]  0000000300000001 0000000000000001 ffffea001ea42a00 ffff8807fed63db8
[499992.340768]  0000000000000000 0000000000000000 00000000bc9598a8 0000000000000001
[499992.342148] Call Trace:
[499992.343494]  [<ffffffff8119ab62>] do_numa_page+0x162/0x1f0
[499992.344831]  [<ffffffff8119bde7>] handle_mm_fault+0x627/0xf50
[499992.346145]  [<ffffffff8164e486>] __do_page_fault+0x166/0x470
[499992.347442]  [<ffffffff8164e853>] trace_do_page_fault+0x43/0x110
[499992.348711]  [<ffffffff8164df29>] do_async_page_fault+0x29/0xe0
[499992.349948]  [<ffffffff8164a9f8>] async_page_fault+0x28/0x30
[499992.351149] Code: 00 3d 00 04 00 00 44 0f 4e d8 41 81 fb 00 04 00 00
0f 84 67 07 00 00 4c 89 e8 49 83 c6 01 31 d2 48 c1 e0 10 49 83 c4 01 45
31 c9 <49> f7 f6 48 c7 45 a8 00 00 00 00 48 c7 45 b0 00 00 00 00 49 89
[499992.353707] RIP  [<ffffffff810c15c2>] task_numa_fault+0x1c2/0xbb0
[499992.354927]  RSP <ffff8807fed63d38>
[499992.358114] ---[ end trace 4f2465cac18ff65e ]---
[499992.359304] Kernel panic - not syncing: Fatal exception

sched_clock_cpu may not be consistent bwtwen cpus. If a task migrate
to another cpu, the se.exec_start was set to that cpu's rq_clock_task
by update_stats_curr_start(). Which may not be monotonic.

update_stats_curr_start
  <- set_next_entity
     <- set_curr_task_fair
        <- sched_move_task

So, if  now - p->last_task_numa_placement is -1, then (*period + 1) is
0, and divide error was triggerred at the div operation:
  task_numa_placement:
    runtime = numa_get_avg_runtime(p, &period);
    f_weight = div64_u64(runtime << 16, period + 1);  // divide error here

In this patch, we make sure period is not less than 0 to avoid this
divide error.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Cc: stable@vger.kernel.org
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..f2abb258fc85 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
 	if (p->last_task_numa_placement) {
 		delta = runtime - p->last_sum_exec_runtime;
 		*period = now - p->last_task_numa_placement;
+
+		/* Avoid backward, and prevent potential divide error */
+		if ((s64)*period < 0)
+			*period = 0;
 	} else {
 		delta = p->se.avg.load_sum;
 		*period = LOAD_AVG_MAX;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: fix a potential divide error
  2019-04-20  8:34 [PATCH] sched: fix a potential divide error Xie XiuQi
@ 2019-04-23 18:44 ` Peter Zijlstra
  2019-04-25  3:52   ` Xie XiuQi
  2019-04-25 17:59 ` [PATCH] sched: fix a potential divide error Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-23 18:44 UTC (permalink / raw)
  To: Xie XiuQi; +Cc: mingo, linux-kernel, cj.chengjian

On Sat, Apr 20, 2019 at 04:34:16PM +0800, Xie XiuQi wrote:
> We meet a divide error on 3.10.0 kernel, the error message is bellow:

That is a _realllllllyyyy_ old kernel. I would urge you to upgrade.

> [499992.287996] divide error: 0000 [#1] SMP

> sched_clock_cpu may not be consistent bwtwen cpus. If a task migrate
> to another cpu, the se.exec_start was set to that cpu's rq_clock_task
> by update_stats_curr_start(). Which may not be monotonic.
> 
> update_stats_curr_start
>   <- set_next_entity
>      <- set_curr_task_fair
>         <- sched_move_task

That is not in fact a cross-cpu migration path. But I see the point.
Also many migration paths do in fact preserve monotonicity, even when
the clock is busted, but you're right, not all of them.

> So, if  now - p->last_task_numa_placement is -1, then (*period + 1) is
> 0, and divide error was triggerred at the div operation:
>   task_numa_placement:
>     runtime = numa_get_avg_runtime(p, &period);
>     f_weight = div64_u64(runtime << 16, period + 1);  // divide error here
> 
> In this patch, we make sure period is not less than 0 to avoid this
> divide error.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> Cc: stable@vger.kernel.org
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40bd1e27b1b7..f2abb258fc85 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
>  	if (p->last_task_numa_placement) {
>  		delta = runtime - p->last_sum_exec_runtime;
>  		*period = now - p->last_task_numa_placement;
> +
> +		/* Avoid backward, and prevent potential divide error */
> +		if ((s64)*period < 0)
> +			*period = 0;
>  	} else {
>  		delta = p->se.avg.load_sum;
>  		*period = LOAD_AVG_MAX;

Yeah, I suppose that is indeed correct.

I'll try and come up with a better Changelog tomorrow.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: fix a potential divide error
  2019-04-23 18:44 ` Peter Zijlstra
@ 2019-04-25  3:52   ` Xie XiuQi
  2019-04-25  8:00     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Xie XiuQi @ 2019-04-25  3:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, cj.chengjian

Hi Peter,

Thanks for your comments.

On 2019/4/24 2:44, Peter Zijlstra wrote:
> On Sat, Apr 20, 2019 at 04:34:16PM +0800, Xie XiuQi wrote:
>> We meet a divide error on 3.10.0 kernel, the error message is bellow:
> 
> That is a _realllllllyyyy_ old kernel. I would urge you to upgrade.

I will.

> 
>> [499992.287996] divide error: 0000 [#1] SMP
> 
>> sched_clock_cpu may not be consistent bwtwen cpus. If a task migrate
>> to another cpu, the se.exec_start was set to that cpu's rq_clock_task
>> by update_stats_curr_start(). Which may not be monotonic.
>>
>> update_stats_curr_start
>>   <- set_next_entity
>>      <- set_curr_task_fair
>>         <- sched_move_task
> 
> That is not in fact a cross-cpu migration path. But I see the point.
> Also many migration paths do in fact preserve monotonicity, even when
> the clock is busted, but you're right, not all of them.
> 
>> So, if  now - p->last_task_numa_placement is -1, then (*period + 1) is
>> 0, and divide error was triggerred at the div operation:
>>   task_numa_placement:
>>     runtime = numa_get_avg_runtime(p, &period);
>>     f_weight = div64_u64(runtime << 16, period + 1);  // divide error here
>>
>> In this patch, we make sure period is not less than 0 to avoid this
>> divide error.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> Cc: stable@vger.kernel.org
>> ---
>>  kernel/sched/fair.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 40bd1e27b1b7..f2abb258fc85 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
>>  	if (p->last_task_numa_placement) {
>>  		delta = runtime - p->last_sum_exec_runtime;
>>  		*period = now - p->last_task_numa_placement;
>> +
>> +		/* Avoid backward, and prevent potential divide error */
>> +		if ((s64)*period < 0)
>> +			*period = 0;
>>  	} else {
>>  		delta = p->se.avg.load_sum;
>>  		*period = LOAD_AVG_MAX;
> 
> Yeah, I suppose that is indeed correct.
> 
> I'll try and come up with a better Changelog tomorrow.

Thanks.

-- 
Thanks,
Xie XiuQi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: fix a potential divide error
  2019-04-25  3:52   ` Xie XiuQi
@ 2019-04-25  8:00     ` Peter Zijlstra
  2019-04-25  8:16       ` Xie XiuQi
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-25  8:00 UTC (permalink / raw)
  To: Xie XiuQi; +Cc: mingo, linux-kernel, cj.chengjian

On Thu, Apr 25, 2019 at 11:52:28AM +0800, Xie XiuQi wrote:
> On 2019/4/24 2:44, Peter Zijlstra wrote:

> > I'll try and come up with a better Changelog tomorrow.

I actually did, but forgot to send out. I have the below.
Does that work for you?

---
Subject: sched/numa: Fix a possible divide-by-zero
From: Xie XiuQi <xiexiuqi@huawei.com>
Date: Sat, 20 Apr 2019 16:34:16 +0800

sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.

So then if in numa_get_avg_runtime() the expression:

  'now - p->last_task_numa_placement'

ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.

Cc: mingo@redhat.com
Cc: cj.chengjian@huawei.com
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[peterz: simplified changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190420083416.170446-1-xiexiuqi@huawei.com
---
 kernel/sched/fair.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct t
 	if (p->last_task_numa_placement) {
 		delta = runtime - p->last_sum_exec_runtime;
 		*period = now - p->last_task_numa_placement;
+
+		/* Avoid backward, and prevent potential divide error */
+		if (unlikely((s64)*period < 0))
+			*period = 0;
 	} else {
 		delta = p->se.avg.load_sum;
 		*period = LOAD_AVG_MAX;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: fix a potential divide error
  2019-04-25  8:00     ` Peter Zijlstra
@ 2019-04-25  8:16       ` Xie XiuQi
  2019-04-25 18:01       ` [tip:sched/urgent] sched/numa: Fix a possible divide-by-zero tip-bot for Xie XiuQi
  2019-04-25 18:04       ` tip-bot for Xie XiuQi
  2 siblings, 0 replies; 8+ messages in thread
From: Xie XiuQi @ 2019-04-25  8:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, cj.chengjian

Hi Peter,

On 2019/4/25 16:00, Peter Zijlstra wrote:
> On Thu, Apr 25, 2019 at 11:52:28AM +0800, Xie XiuQi wrote:
>> On 2019/4/24 2:44, Peter Zijlstra wrote:
> 
>>> I'll try and come up with a better Changelog tomorrow.
> 
> I actually did, but forgot to send out. I have the below.
> Does that work for you?

It works for me, thank you very much.

> 
> ---
> Subject: sched/numa: Fix a possible divide-by-zero
> From: Xie XiuQi <xiexiuqi@huawei.com>
> Date: Sat, 20 Apr 2019 16:34:16 +0800
> 
> sched_clock_cpu() may not be consistent between CPUs. If a task
> migrates to another CPU, then se.exec_start is set to that CPU's
> rq_clock_task() by update_stats_curr_start(). Specifically, the new
> value might be before the old value due to clock skew.
> 
> So then if in numa_get_avg_runtime() the expression:
> 
>   'now - p->last_task_numa_placement'
> 
> ends up as -1, then the divider '*period + 1' in task_numa_placement()
> is 0 and things go bang. Similar to update_curr(), check if time goes
> backwards to avoid this.
> 
> Cc: mingo@redhat.com
> Cc: cj.chengjian@huawei.com
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> [peterz: simplified changelog]
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20190420083416.170446-1-xiexiuqi@huawei.com
> ---
>  kernel/sched/fair.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct t
>  	if (p->last_task_numa_placement) {
>  		delta = runtime - p->last_sum_exec_runtime;
>  		*period = now - p->last_task_numa_placement;
> +
> +		/* Avoid backward, and prevent potential divide error */
> +		if (unlikely((s64)*period < 0))
> +			*period = 0;
>  	} else {
>  		delta = p->se.avg.load_sum;
>  		*period = LOAD_AVG_MAX;
> 
> .
> 

-- 
Thanks,
Xie XiuQi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: fix a potential divide error
  2019-04-20  8:34 [PATCH] sched: fix a potential divide error Xie XiuQi
  2019-04-23 18:44 ` Peter Zijlstra
@ 2019-04-25 17:59 ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2019-04-25 17:59 UTC (permalink / raw)
  To: Xie XiuQi; +Cc: mingo, peterz, linux-kernel, cj.chengjian


* Xie XiuQi <xiexiuqi@huawei.com> wrote:

> We meet a divide error on 3.10.0 kernel, the error message is bellow:
> 
> [499992.287996] divide error: 0000 [#1] SMP
> [499992.297796] do nothing after die!
> [499992.299108] Modules linked in: signo_catch macvlan binfmt_misc
> ip_set_hash_netport ip_set_hash_ipport vport_vxlan ipt_REJECT
> xt_statistic xt_physdev xt_nat xt_recent xt_mark xt_comment ...
> [499992.312751] CPU: 8 PID: 23352 Comm: bash Tainted: G ----V-------   3.10.0+ #1
> [499992.314308] Hardware name: OpenStack Foundation OpenStack Nova, BIOS
> rel-1.9.1-0-gb3ef39f-20170329_185309-build9a64a246a231 04/01/2014
> [499992.317411] task: ffff880033fc9700 ti: ffff8807fed60000 task.ti:ffff8807fed60000
> [499992.318967] RIP: 0010:[<ffffffff810c15c2>]  [<ffffffff810c15c2>] task_numa_fault+0x1c2/0xbb0
> [499992.320515] RSP: 0000:ffff8807fed63d38  EFLAGS: 00010246
> [499992.322018] RAX: 0000002b7efd0000 RBX: ffff880033fc9700 RCX:0000000000000003
> [499992.323563] RDX: 0000000000000000 RSI: 0000000000000400 RDI:ffffffff81a80f60
> [499992.325052] RBP: ffff8807fed63db8 R08: ffffffff81a80f68 R09:0000000000000000
> [499992.326531] R10: ffff88083ffda000 R11: 0000000000000000 R12:0000000000000424
> [499992.327987] R13: 00000000002b7efd R14: 0000000000000000 R15:ffffea001ea42a00
> [499992.329420] FS:  00007fa01a3b7740(0000) GS:ffff88103ec00000(0000)knlGS:0000000000000000
> [499992.330866] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [499992.332302] CR2: 0000000000ff1fb0 CR3: 00000007ff1d1000 CR4:00000000003407e0
> [499992.333763] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> [499992.335187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> [499992.336595] Stack:
> [499992.337974]  0000000000000000 00000001bc9598a8 ffffea001ea42a00 0000000100000001
> [499992.339374]  0000000300000001 0000000000000001 ffffea001ea42a00 ffff8807fed63db8
> [499992.340768]  0000000000000000 0000000000000000 00000000bc9598a8 0000000000000001
> [499992.342148] Call Trace:
> [499992.343494]  [<ffffffff8119ab62>] do_numa_page+0x162/0x1f0
> [499992.344831]  [<ffffffff8119bde7>] handle_mm_fault+0x627/0xf50
> [499992.346145]  [<ffffffff8164e486>] __do_page_fault+0x166/0x470
> [499992.347442]  [<ffffffff8164e853>] trace_do_page_fault+0x43/0x110
> [499992.348711]  [<ffffffff8164df29>] do_async_page_fault+0x29/0xe0
> [499992.349948]  [<ffffffff8164a9f8>] async_page_fault+0x28/0x30
> [499992.351149] Code: 00 3d 00 04 00 00 44 0f 4e d8 41 81 fb 00 04 00 00
> 0f 84 67 07 00 00 4c 89 e8 49 83 c6 01 31 d2 48 c1 e0 10 49 83 c4 01 45
> 31 c9 <49> f7 f6 48 c7 45 a8 00 00 00 00 48 c7 45 b0 00 00 00 00 49 89
> [499992.353707] RIP  [<ffffffff810c15c2>] task_numa_fault+0x1c2/0xbb0
> [499992.354927]  RSP <ffff8807fed63d38>
> [499992.358114] ---[ end trace 4f2465cac18ff65e ]---
> [499992.359304] Kernel panic - not syncing: Fatal exception
> 
> sched_clock_cpu may not be consistent bwtwen cpus. If a task migrate
> to another cpu, the se.exec_start was set to that cpu's rq_clock_task
> by update_stats_curr_start(). Which may not be monotonic.
> 
> update_stats_curr_start
>   <- set_next_entity
>      <- set_curr_task_fair
>         <- sched_move_task
> 
> So, if  now - p->last_task_numa_placement is -1, then (*period + 1) is
> 0, and divide error was triggerred at the div operation:
>   task_numa_placement:
>     runtime = numa_get_avg_runtime(p, &period);
>     f_weight = div64_u64(runtime << 16, period + 1);  // divide error here
> 
> In this patch, we make sure period is not less than 0 to avoid this
> divide error.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> Cc: stable@vger.kernel.org
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40bd1e27b1b7..f2abb258fc85 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
>  	if (p->last_task_numa_placement) {
>  		delta = runtime - p->last_sum_exec_runtime;
>  		*period = now - p->last_task_numa_placement;
> +
> +		/* Avoid backward, and prevent potential divide error */
> +		if ((s64)*period < 0)
> +			*period = 0;
>  	} else {
>  		delta = p->se.avg.load_sum;
>  		*period = LOAD_AVG_MAX;

I've put this into sched/urgent, because this can be triggered in the 
wild it appears.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/urgent] sched/numa: Fix a possible divide-by-zero
  2019-04-25  8:00     ` Peter Zijlstra
  2019-04-25  8:16       ` Xie XiuQi
@ 2019-04-25 18:01       ` tip-bot for Xie XiuQi
  2019-04-25 18:04       ` tip-bot for Xie XiuQi
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Xie XiuQi @ 2019-04-25 18:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, peterz, tglx, torvalds, xiexiuqi, hpa

Commit-ID:  993efef2483bde15fd33ec5fba5464597c2d8124
Gitweb:     https://git.kernel.org/tip/993efef2483bde15fd33ec5fba5464597c2d8124
Author:     Xie XiuQi <xiexiuqi@huawei.com>
AuthorDate: Sat, 20 Apr 2019 16:34:16 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 25 Apr 2019 19:58:35 +0200

sched/numa: Fix a possible divide-by-zero

sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.

So then if in numa_get_avg_runtime() the expression:

  'now - p->last_task_numa_placement'

ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.

[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cj.chengjian@huawei.com
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a4d9e14bf138..35f3ea375084 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
 	if (p->last_task_numa_placement) {
 		delta = runtime - p->last_sum_exec_runtime;
 		*period = now - p->last_task_numa_placement;
+
+		/* Avoid time going backwards, prevent potential divide error: */
+		if (unlikely((s64)*period < 0))
+			*period = 0;
 	} else {
 		delta = p->se.avg.load_sum;
 		*period = LOAD_AVG_MAX;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip:sched/urgent] sched/numa: Fix a possible divide-by-zero
  2019-04-25  8:00     ` Peter Zijlstra
  2019-04-25  8:16       ` Xie XiuQi
  2019-04-25 18:01       ` [tip:sched/urgent] sched/numa: Fix a possible divide-by-zero tip-bot for Xie XiuQi
@ 2019-04-25 18:04       ` tip-bot for Xie XiuQi
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Xie XiuQi @ 2019-04-25 18:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, stable, xiexiuqi, hpa, torvalds, peterz, linux-kernel, tglx

Commit-ID:  a860fa7b96e1a1c974556327aa1aee852d434c21
Gitweb:     https://git.kernel.org/tip/a860fa7b96e1a1c974556327aa1aee852d434c21
Author:     Xie XiuQi <xiexiuqi@huawei.com>
AuthorDate: Sat, 20 Apr 2019 16:34:16 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 25 Apr 2019 19:58:54 +0200

sched/numa: Fix a possible divide-by-zero

sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.

So then if in numa_get_avg_runtime() the expression:

  'now - p->last_task_numa_placement'

ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.

[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cj.chengjian@huawei.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a4d9e14bf138..35f3ea375084 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
 	if (p->last_task_numa_placement) {
 		delta = runtime - p->last_sum_exec_runtime;
 		*period = now - p->last_task_numa_placement;
+
+		/* Avoid time going backwards, prevent potential divide error: */
+		if (unlikely((s64)*period < 0))
+			*period = 0;
 	} else {
 		delta = p->se.avg.load_sum;
 		*period = LOAD_AVG_MAX;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-25 18:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-20  8:34 [PATCH] sched: fix a potential divide error Xie XiuQi
2019-04-23 18:44 ` Peter Zijlstra
2019-04-25  3:52   ` Xie XiuQi
2019-04-25  8:00     ` Peter Zijlstra
2019-04-25  8:16       ` Xie XiuQi
2019-04-25 18:01       ` [tip:sched/urgent] sched/numa: Fix a possible divide-by-zero tip-bot for Xie XiuQi
2019-04-25 18:04       ` tip-bot for Xie XiuQi
2019-04-25 17:59 ` [PATCH] sched: fix a potential divide error Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.