linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched: odd values for effective load calculations
@ 2014-12-02 22:53 Sasha Levin
  2014-12-13  8:30 ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Sasha Levin @ 2014-12-02 22:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: LKML, Dave Jones, Andrey Ryabinin

Hi all,

I was fuzzing with trinity inside a KVM tools guest, running the latest -next
kernel along with the undefined behaviour sanitizer patch, and hit the following:

[  787.894288] ================================================================================
[  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
[  787.898981] signed integer overflow:
[  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
[  787.900066] CPU: 18 PID: 12958 Comm: trinity-c103 Not tainted 3.18.0-rc6-next-20141201-sasha-00070-g028060a-dirty #1528
[  787.900066]  0000000000000000 0000000000000000 ffffffff93b0f890 ffff8806e3eff918
[  787.900066]  ffffffff91f1cf26 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff938
[  787.900066]  ffffffff91f1fb90 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff9f8
[  787.900066] Call Trace:
[  787.900066] dump_stack (lib/dump_stack.c:52)
[  787.900066] ubsan_epilogue (lib/ubsan.c:159)
[  787.900066] handle_overflow (lib/ubsan.c:191)
[  787.900066] ? __do_page_fault (arch/x86/mm/fault.c:1220)
[  787.900066] ? local_clock (kernel/sched/clock.c:392)
[  787.900066] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
[  787.900066] select_task_rq_fair (kernel/sched/fair.c:4541 kernel/sched/fair.c:4755)
[  787.900066] try_to_wake_up (kernel/sched/core.c:1415 kernel/sched/core.c:1724)
[  787.900066] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:33)
[  787.900066] default_wake_function (kernel/sched/core.c:2979)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] autoremove_wake_function (kernel/sched/wait.c:295)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] __wake_up_common (kernel/sched/wait.c:73)
[  787.900066] __wake_up_sync_key (include/linux/spinlock.h:364 kernel/sched/wait.c:146)
[  787.900066] pipe_write (fs/pipe.c:466)
[  787.900066] ? kasan_poison_shadow (mm/kasan/kasan.c:48)
[  787.900066] ? new_sync_read (fs/read_write.c:480)
[  787.900066] do_iter_readv_writev (fs/read_write.c:681)
[  787.900066] compat_do_readv_writev (fs/read_write.c:1029)
[  787.900066] ? wait_for_partner (fs/pipe.c:340)
[  787.900066] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[  787.900066] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  787.900066] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
[  787.900066] compat_writev (fs/read_write.c:1145)
[  787.900066] compat_SyS_writev (fs/read_write.c:1163 fs/read_write.c:1151)
[  787.900066] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
[  787.900066] ================================================================================

The values for effective load seem a bit off (and are overflowing!).

One change I did recently was to disable lock debugging, could it be related?


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-02 22:53 sched: odd values for effective load calculations Sasha Levin
@ 2014-12-13  8:30 ` Ingo Molnar
  2014-12-15 12:12   ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2014-12-13  8:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds


* Sasha Levin <levinsasha928@gmail.com> wrote:

> Hi all,
> 
> I was fuzzing with trinity inside a KVM tools guest, running the latest -next
> kernel along with the undefined behaviour sanitizer patch, and hit the following:
> 
> [  787.894288] ================================================================================
> [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> [  787.898981] signed integer overflow:
> [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> [  787.900066] CPU: 18 PID: 12958 Comm: trinity-c103 Not tainted 3.18.0-rc6-next-20141201-sasha-00070-g028060a-dirty #1528
> [  787.900066]  0000000000000000 0000000000000000 ffffffff93b0f890 ffff8806e3eff918
> [  787.900066]  ffffffff91f1cf26 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff938
> [  787.900066]  ffffffff91f1fb90 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff9f8
> [  787.900066] Call Trace:
> [  787.900066] dump_stack (lib/dump_stack.c:52)
> [  787.900066] ubsan_epilogue (lib/ubsan.c:159)
> [  787.900066] handle_overflow (lib/ubsan.c:191)
> [  787.900066] ? __do_page_fault (arch/x86/mm/fault.c:1220)
> [  787.900066] ? local_clock (kernel/sched/clock.c:392)
> [  787.900066] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
> [  787.900066] select_task_rq_fair (kernel/sched/fair.c:4541 kernel/sched/fair.c:4755)
> [  787.900066] try_to_wake_up (kernel/sched/core.c:1415 kernel/sched/core.c:1724)
> [  787.900066] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:33)
> [  787.900066] default_wake_function (kernel/sched/core.c:2979)
> [  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
> [  787.900066] autoremove_wake_function (kernel/sched/wait.c:295)
> [  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
> [  787.900066] __wake_up_common (kernel/sched/wait.c:73)
> [  787.900066] __wake_up_sync_key (include/linux/spinlock.h:364 kernel/sched/wait.c:146)
> [  787.900066] pipe_write (fs/pipe.c:466)
> [  787.900066] ? kasan_poison_shadow (mm/kasan/kasan.c:48)
> [  787.900066] ? new_sync_read (fs/read_write.c:480)
> [  787.900066] do_iter_readv_writev (fs/read_write.c:681)
> [  787.900066] compat_do_readv_writev (fs/read_write.c:1029)
> [  787.900066] ? wait_for_partner (fs/pipe.c:340)
> [  787.900066] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
> [  787.900066] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [  787.900066] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
> [  787.900066] compat_writev (fs/read_write.c:1145)
> [  787.900066] compat_SyS_writev (fs/read_write.c:1163 fs/read_write.c:1151)
> [  787.900066] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
> [  787.900066] ================================================================================
> 
> The values for effective load seem a bit off (and are overflowing!).

It definitely looks like a bug in SMP load balancing!

> One change I did recently was to disable lock debugging, could it be related?

Should be unrelated to lock debugging, other than certain races 
will occur in different patterns with lock debugging on or off.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-13  8:30 ` Ingo Molnar
@ 2014-12-15 12:12   ` Peter Zijlstra
  2014-12-15 13:14     ` Peter Zijlstra
  2014-12-16  4:51     ` Sasha Levin
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2014-12-15 12:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sasha Levin, LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds


Sorry for the long delay, I was out for a few weeks due to having become
a dad for the second time.

On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
> * Sasha Levin <levinsasha928@gmail.com> wrote:
> 
> > Hi all,
> > 
> > I was fuzzing with trinity inside a KVM tools guest, running the latest -next
> > kernel along with the undefined behaviour sanitizer patch, and hit the following:
> > 
> > [  787.894288] ================================================================================
> > [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> > [  787.898981] signed integer overflow:
> > [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'

So that's:

	this_eff_load *= this_load +
		effective_load(tg, this_cpu, weight, weight);

Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
makes that. Which makes the rhs 'large'. Do you have
CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
are you using?

In any case, bit sad this doesn't have a register dump included :/

Is this easy to reproduce or something that happened once?

> > The values for effective load seem a bit off (and are overflowing!).
> 
> It definitely looks like a bug in SMP load balancing!

Yeah, although theoretically (and somewhat practical) this can be
triggered in more places if you manage to run up the 'weight' with
enough tasks.

That said, it should at worst result in 'funny' balancing behaviour, not
anything else.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-15 12:12   ` Peter Zijlstra
@ 2014-12-15 13:14     ` Peter Zijlstra
  2014-12-16  5:29       ` Sasha Levin
  2014-12-16  4:51     ` Sasha Levin
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2014-12-15 13:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sasha Levin, LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds

On Mon, Dec 15, 2014 at 01:12:27PM +0100, Peter Zijlstra wrote:
> 
> Sorry for the long delay, I was out for a few weeks due to having become
> a dad for the second time.
> 
> On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
> > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > 
> > > Hi all,
> > > 
> > > I was fuzzing with trinity inside a KVM tools guest, running the latest -next
> > > kernel along with the undefined behaviour sanitizer patch, and hit the following:
> > > 
> > > [  787.894288] ================================================================================
> > > [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> > > [  787.898981] signed integer overflow:
> > > [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> 
> So that's:
> 
> 	this_eff_load *= this_load +
> 		effective_load(tg, this_cpu, weight, weight);
> 
> Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
> makes that. Which makes the rhs 'large'. Do you have
> CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
> are you using?
> 
> In any case, bit sad this doesn't have a register dump included :/

Hmm, I was hoping to be able to see if it was this_load or the
effective_load() result being silly large, but going by the ASM output
of my compiler this isn't going to be available in registers, its all
stack spills.

Pinning my hopes on that reproducability thing :/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-16  4:51     ` Sasha Levin
@ 2014-12-16  2:09       ` Yuyang Du
  2014-12-19  0:29         ` Yuyang Du
  2014-12-16 15:33       ` sched: odd values for effective load calculations Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2014-12-16  2:09 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin,
	Linus Torvalds

On Mon, Dec 15, 2014 at 11:51:46PM -0500, Sasha Levin wrote:
> On 12/15/2014 07:12 AM, Peter Zijlstra wrote:
> > 
> > Sorry for the long delay, I was out for a few weeks due to having become
> > a dad for the second time.
> 
> Congrats! May you be able to sleep at night sooner rather than later.
 
Congrats +1.

> 
> It's fairy reproducible, I've seen it happen quite a few times. What other
> information might be useful?
> 

Sasha, it might be helpful to see this_load is from:

this_load1: this_load = target_load(this_cpu, idx);

or

this_load2: this_load += effective_load(tg, this_cpu, -weight, -weight);

It really does not seem to be this_load1, while the calc of effective_load is a bit
complicated to see what the problem is.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-15 12:12   ` Peter Zijlstra
  2014-12-15 13:14     ` Peter Zijlstra
@ 2014-12-16  4:51     ` Sasha Levin
  2014-12-16  2:09       ` Yuyang Du
  2014-12-16 15:33       ` sched: odd values for effective load calculations Peter Zijlstra
  1 sibling, 2 replies; 13+ messages in thread
From: Sasha Levin @ 2014-12-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds

On 12/15/2014 07:12 AM, Peter Zijlstra wrote:
> 
> Sorry for the long delay, I was out for a few weeks due to having become
> a dad for the second time.

Congrats! May you be able to sleep at night sooner rather than later.

> On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
>> * Sasha Levin <levinsasha928@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was fuzzing with trinity inside a KVM tools guest, running the latest -next
>>> kernel along with the undefined behaviour sanitizer patch, and hit the following:
>>>
>>> [  787.894288] ================================================================================
>>> [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
>>> [  787.898981] signed integer overflow:
>>> [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> 
> So that's:
> 
> 	this_eff_load *= this_load +
> 		effective_load(tg, this_cpu, weight, weight);
> 
> Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
> makes that. Which makes the rhs 'large'. Do you have
> CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
> are you using?

CONFIG_FAIR_GROUP_SCHED is enabled. There's no cgroup set-up initially,
but I figure that trinity is able to do crazy things here.

> In any case, bit sad this doesn't have a register dump included :/
> 
> Is this easy to reproduce or something that happened once?

It's fairy reproducible, I've seen it happen quite a few times. What other
information might be useful?

>>> The values for effective load seem a bit off (and are overflowing!).
>>
>> It definitely looks like a bug in SMP load balancing!
> 
> Yeah, although theoretically (and somewhat practical) this can be
> triggered in more places if you manage to run up the 'weight' with
> enough tasks.
> 
> That said, it should at worst result in 'funny' balancing behaviour, not
> anything else.

I'm not sure if you've caught up on the RCU stall issue we've been trying
to track down (https://lkml.org/lkml/2014/11/14/656), but could this "funny"
balancing behaviour be "funny" enough to cause a stall?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-15 13:14     ` Peter Zijlstra
@ 2014-12-16  5:29       ` Sasha Levin
  2014-12-16 15:37         ` Peter Zijlstra
  2014-12-18  2:10         ` Yuyang Du
  0 siblings, 2 replies; 13+ messages in thread
From: Sasha Levin @ 2014-12-16  5:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds

On 12/15/2014 08:14 AM, Peter Zijlstra wrote:
>>>> [  787.894288] ================================================================================
>>>> > > > [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
>>>> > > > [  787.898981] signed integer overflow:
>>>> > > > [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
>> > 
>> > So that's:
>> > 
>> > 	this_eff_load *= this_load +
>> > 		effective_load(tg, this_cpu, weight, weight);
>> > 
>> > Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
>> > makes that. Which makes the rhs 'large'. Do you have
>> > CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
>> > are you using?
>> > 
>> > In any case, bit sad this doesn't have a register dump included :/
> Hmm, I was hoping to be able to see if it was this_load or the
> effective_load() result being silly large, but going by the ASM output
> of my compiler this isn't going to be available in registers, its all
> stack spills.
> 
> Pinning my hopes on that reproducability thing :/

Okay, yeah, it's very reproducible. I've added:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df2cdf7..e1fbe1a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4486,7 +4486,7 @@ static int wake_wide(struct task_struct *p)

 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
-       s64 this_load, load;
+       s64 this_load, load, tmps;
        s64 this_eff_load, prev_eff_load;
        int idx, this_cpu, prev_cpu;
        struct task_group *tg;
@@ -4538,6 +4538,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
        prev_eff_load *= capacity_of(this_cpu);

        if (this_load > 0) {
+               if (__builtin_mul_overflow(this_eff_load, this_load +
+                        effective_load(tg, this_cpu, weight, weight), &tmps))
+                       printk(KERN_CRIT "%lld %lld %lld", this_eff_load, this_load, effective_load(tg, this_cpu, weight, weight));
                this_eff_load *= this_load +
                        effective_load(tg, this_cpu, weight, weight);

And got:

[  437.511964] 91600 1765238667340524 81
[  437.512781] ================================================================================
[  437.516069] UBSan: Undefined behaviour in kernel/sched/fair.c:4544:17
[  437.517888] signed integer overflow:
[  437.518721] 1765238667340605 * 91600 cannot be represented in type 'long long int'
[  437.520051] CPU: 16 PID: 23069 Comm: trinity-c510 Not tainted 3.18.0-next-20141211-sasha-00069-g11f17a7-dirty #1607
[  437.520153]  0000000000000000 0000000000000000 ffffffffb1f0fc70 ffff880321e8f908
[  437.520153]  ffffffffb0250d8f 1ffffffff78a0603 ffffffffb1f0fc88 ffff880321e8f928
[  437.520153]  ffffffffb0252f96 1ffffffff78a0603 ffffffffb1f0fc88 ffff880321e8f9e8
[  437.520153] Call Trace:
[  437.520153] dump_stack (lib/dump_stack.c:52)
[  437.520153] ubsan_epilogue (lib/ubsan.c:159)
[  437.520153] handle_overflow (lib/ubsan.c:191)
[  437.520153] ? printk (./arch/x86/include/asm/preempt.h:95 kernel/printk/printk.c:1863)
[  437.520153] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
[  437.520153] select_task_rq_fair (kernel/sched/fair.c:4544 kernel/sched/fair.c:4758)
[  437.520153] ? get_parent_ip (kernel/sched/core.c:2564)
[  437.520153] ? get_parent_ip (kernel/sched/core.c:2564)
[  437.520153] try_to_wake_up (kernel/sched/core.c:1415 kernel/sched/core.c:1729)
[  437.520153] ? deactivate_slab (include/linux/spinlock.h:349 mm/slub.c:1940)
[  437.520153] default_wake_function (kernel/sched/core.c:2988)
[  437.520153] ? get_parent_ip (kernel/sched/core.c:2564)
[  437.520153] autoremove_wake_function (kernel/sched/wait.c:295)
[  437.520153] __wake_up_common (kernel/sched/wait.c:73)
[  437.520153] ? copy_page_to_iter (./arch/x86/include/asm/preempt.h:95 include/linux/uaccess.h:36 include/linux/highmem.h:75 mm/iov_iter.c:180 mm/iov_iter.c:432)
[  437.520153] __wake_up_sync_key (include/linux/spinlock.h:364 kernel/sched/wait.c:146)
[  437.520153] pipe_read (fs/pipe.c:317)
[  437.520153] ? do_sync_readv_writev (fs/read_write.c:396)
[  437.520153] do_iter_readv_writev (fs/read_write.c:681)
[  437.520153] do_readv_writev (fs/read_write.c:849)
[  437.520153] ? pipe_write (fs/pipe.c:231)
[  437.520153] ? acct_account_cputime (kernel/tsacct.c:168)
[  437.520153] ? preempt_count_sub (kernel/sched/core.c:2620)
[  437.520153] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[  437.520153] ? vtime_account_user (kernel/sched/cputime.c:701)
[  437.520153] vfs_readv (fs/read_write.c:881)
[  437.520153] SyS_readv (fs/read_write.c:907 fs/read_write.c:898)
[  437.520153] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:33)
[  437.520153] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
[  437.520153] ================================================================================

So it's actually 'this_load' going bananas.


Thanks,
Sasha

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-16  4:51     ` Sasha Levin
  2014-12-16  2:09       ` Yuyang Du
@ 2014-12-16 15:33       ` Peter Zijlstra
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2014-12-16 15:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds

On Mon, Dec 15, 2014 at 11:51:46PM -0500, Sasha Levin wrote:
> I'm not sure if you've caught up on the RCU stall issue we've been trying
> to track down (https://lkml.org/lkml/2014/11/14/656), but could this "funny"
> balancing behaviour be "funny" enough to cause a stall?

Typically not, the worst degenerate modes are either running everything
on one cpu or constantly migrating tasks. Both suck performance wise,
but are 'valid' modes of operation.

RCU stalls require not actually getting to userspace or otherwise
delaying grace periods for egregious amounts of time. The only way I can
see that happening is a load-balance pass not finishing at all, and then
you'd consistently get stack traces from inside the balancer -- and I'm
not seeing that in the email referenced.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-16  5:29       ` Sasha Levin
@ 2014-12-16 15:37         ` Peter Zijlstra
  2014-12-18  2:10         ` Yuyang Du
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2014-12-16 15:37 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin, Linus Torvalds

On Tue, Dec 16, 2014 at 12:29:28AM -0500, Sasha Levin wrote:
> On 12/15/2014 08:14 AM, Peter Zijlstra wrote:

> > Pinning my hopes on that reproducability thing :/
> 
> Okay, yeah, it's very reproducible. I've added:
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index df2cdf7..e1fbe1a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4486,7 +4486,7 @@ static int wake_wide(struct task_struct *p)
> 
>  static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  {
> -       s64 this_load, load;
> +       s64 this_load, load, tmps;
>         s64 this_eff_load, prev_eff_load;
>         int idx, this_cpu, prev_cpu;
>         struct task_group *tg;
> @@ -4538,6 +4538,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>         prev_eff_load *= capacity_of(this_cpu);
> 
>         if (this_load > 0) {
> +               if (__builtin_mul_overflow(this_eff_load, this_load +
> +                        effective_load(tg, this_cpu, weight, weight), &tmps))
> +                       printk(KERN_CRIT "%lld %lld %lld", this_eff_load, this_load, effective_load(tg, this_cpu, weight, weight));
>                 this_eff_load *= this_load +
>                         effective_load(tg, this_cpu, weight, weight);

Minor nit: in general it would be recommend to evaluate effective_load()
once, not thrice, state might have changed in between the calls and
results might differ. Still..

> And got:
> 
> [  437.511964] 91600 1765238667340524 81

> So it's actually 'this_load' going bananas.

That is indeed a fairly strong indication its not effective_load(),
which is good, since that's one hairy piece of cra^Wcode.

Lemme go ponder about this_load.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-16  5:29       ` Sasha Levin
  2014-12-16 15:37         ` Peter Zijlstra
@ 2014-12-18  2:10         ` Yuyang Du
  1 sibling, 0 replies; 13+ messages in thread
From: Yuyang Du @ 2014-12-18  2:10 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin,
	Linus Torvalds

On Tue, Dec 16, 2014 at 12:29:28AM -0500, Sasha Levin wrote:
> On 12/15/2014 08:14 AM, Peter Zijlstra wrote:
> >>>> [  787.894288] ================================================================================
> >>>> > > > [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> >>>> > > > [  787.898981] signed integer overflow:
> >>>> > > > [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> >> > 
> >> > So that's:
> >> > 
> >> > 	this_eff_load *= this_load +
> >> > 		effective_load(tg, this_cpu, weight, weight);
> >> > 
> >> > Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
> >> > makes that. Which makes the rhs 'large'. Do you have
> >> > CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
> >> > are you using?
> >> > 
> >> > In any case, bit sad this doesn't have a register dump included :/
> > Hmm, I was hoping to be able to see if it was this_load or the
> > effective_load() result being silly large, but going by the ASM output
> > of my compiler this isn't going to be available in registers, its all
> > stack spills.
> > 
> > Pinning my hopes on that reproducability thing :/
> 
> Okay, yeah, it's very reproducible. I've added:
> 

Hi Sasha,

I tried to reproduce this overflow, but got not luck (the trinity has been
running in a KVM VM for an hour)...

The trinity used is v1.4, and simply launched as ./trinity.

Could you detail your setup and procedure?

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-16  2:09       ` Yuyang Du
@ 2014-12-19  0:29         ` Yuyang Du
  2014-12-19 11:20           ` Peter Zijlstra
  2015-01-09 12:33           ` [tip:sched/urgent] sched: Fix odd values in effective_load() calculations tip-bot for Yuyang Du
  0 siblings, 2 replies; 13+ messages in thread
From: Yuyang Du @ 2014-12-19  0:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin,
	Linus Torvalds

On Tue, Dec 16, 2014 at 10:09:48AM +0800, Yuyang Du wrote:
> 
> Sasha, it might be helpful to see this_load is from:
> 
> this_load1: this_load = target_load(this_cpu, idx);
> 
> or
> 
> this_load2: this_load += effective_load(tg, this_cpu, -weight, -weight);
> 
> It really does not seem to be this_load1, while the calc of effective_load is a bit
> complicated to see what the problem is.

Hi all,

I finally managed to reproduce this, but not by trinity, just by keeping rebooting.

Indeed, the problem is from:

this_load2: this_load += effective_load(tg, this_cpu, -weight, -weight);

After digging into effective_load(), the root cause is:

wl = (w * tg->shares) / W;

if we have negative w, then it will be cast to unsigned long, and then may or may not
overflow, and end up an insane number.

I tried this in userspace, interestingly if we have:

wl = w * tg->shares;
wl /= W;

the result is ok, but not ok with the lines combined as the original one.

Anyway, the following patch can fix this.

---
Subject: [PATCH] sched: Fix long and unsigned long multiplication error in
 effective_load

In effective_load, we have (long w * unsigned long tg->shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg->shares to long.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df2cdf7..6b99659 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4424,7 +4424,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 		 * wl = S * s'_i; see (2)
 		 */
 		if (W > 0 && w < W)
-			wl = (w * tg->shares) / W;
+			wl = (w * (long)tg->shares) / W;
 		else
 			wl = tg->shares;
 
-- 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: sched: odd values for effective load calculations
  2014-12-19  0:29         ` Yuyang Du
@ 2014-12-19 11:20           ` Peter Zijlstra
  2015-01-09 12:33           ` [tip:sched/urgent] sched: Fix odd values in effective_load() calculations tip-bot for Yuyang Du
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2014-12-19 11:20 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Sasha Levin, Ingo Molnar, LKML, Dave Jones, Andrey Ryabinin,
	Linus Torvalds

On Fri, Dec 19, 2014 at 08:29:56AM +0800, Yuyang Du wrote:
> Subject: [PATCH] sched: Fix long and unsigned long multiplication error in effective_load
> 
> In effective_load, we have (long w * unsigned long tg->shares) / long W,
> when w is negative, it is cast to unsigned long and hence the product is
> insanely large. Fix this by casting tg->shares to long.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index df2cdf7..6b99659 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4424,7 +4424,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>  		 * wl = S * s'_i; see (2)
>  		 */
>  		if (W > 0 && w < W)
> -			wl = (w * tg->shares) / W;
> +			wl = (w * (long)tg->shares) / W;
>  		else
>  			wl = tg->shares;


Oh, nice! thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip:sched/urgent] sched: Fix odd values in effective_load() calculations
  2014-12-19  0:29         ` Yuyang Du
  2014-12-19 11:20           ` Peter Zijlstra
@ 2015-01-09 12:33           ` tip-bot for Yuyang Du
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for Yuyang Du @ 2015-01-09 12:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.ryabinin, davej, mingo, peterz, hpa, sasha.levin, yuyang.du,
	torvalds, linux-kernel, tglx

Commit-ID:  32a8df4e0b33fccc9715213b382160415b5c4008
Gitweb:     http://git.kernel.org/tip/32a8df4e0b33fccc9715213b382160415b5c4008
Author:     Yuyang Du <yuyang.du@intel.com>
AuthorDate: Fri, 19 Dec 2014 08:29:56 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 9 Jan 2015 11:18:54 +0100

sched: Fix odd values in effective_load() calculations

In effective_load, we have (long w * unsigned long tg->shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg->shares to long.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df2cdf7..6b99659 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4424,7 +4424,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 		 * wl = S * s'_i; see (2)
 		 */
 		if (W > 0 && w < W)
-			wl = (w * tg->shares) / W;
+			wl = (w * (long)tg->shares) / W;
 		else
 			wl = tg->shares;
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-01-09 12:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02 22:53 sched: odd values for effective load calculations Sasha Levin
2014-12-13  8:30 ` Ingo Molnar
2014-12-15 12:12   ` Peter Zijlstra
2014-12-15 13:14     ` Peter Zijlstra
2014-12-16  5:29       ` Sasha Levin
2014-12-16 15:37         ` Peter Zijlstra
2014-12-18  2:10         ` Yuyang Du
2014-12-16  4:51     ` Sasha Levin
2014-12-16  2:09       ` Yuyang Du
2014-12-19  0:29         ` Yuyang Du
2014-12-19 11:20           ` Peter Zijlstra
2015-01-09 12:33           ` [tip:sched/urgent] sched: Fix odd values in effective_load() calculations tip-bot for Yuyang Du
2014-12-16 15:33       ` sched: odd values for effective load calculations Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).