All of lore.kernel.org
 help / color / mirror / Atom feed
* UBSAN: shift-out-of-bounds in load_balance
@ 2021-01-24  9:30 syzbot
  2021-02-22 17:12 ` syzbot
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: syzbot @ 2021-01-24  9:30 UTC (permalink / raw)
  To: akpm, bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    bc085f8f Add linux-next specific files for 20210121
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10b71a2cd00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=1224bbf217b0bec8
dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com

================================================================================
UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7741:14
shift exponent 86 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 13261 Comm: kworker/u4:13 Not tainted 5.11.0-rc4-next-20210121-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: krdsd rds_tcp_accept_worker
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 detach_tasks kernel/sched/fair.c:7741 [inline]
 load_balance.cold+0x1d/0x2e kernel/sched/fair.c:9670
 rebalance_domains+0x5cc/0xdb0 kernel/sched/fair.c:10058
 __do_softirq+0x2bc/0xa29 kernel/softirq.c:343
 asm_call_irq_on_stack+0xf/0x20
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 do_softirq kernel/softirq.c:246 [inline]
 do_softirq+0xb5/0xe0 kernel/softirq.c:233
 __local_bh_enable_ip+0xf4/0x110 kernel/softirq.c:196
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline]
 ip_finish_output2+0x88b/0x21b0 net/ipv4/ip_output.c:231
 __ip_finish_output net/ipv4/ip_output.c:308 [inline]
 __ip_finish_output+0x396/0x640 net/ipv4/ip_output.c:290
 ip_finish_output+0x35/0x200 net/ipv4/ip_output.c:318
 NF_HOOK_COND include/linux/netfilter.h:290 [inline]
 ip_output+0x196/0x310 net/ipv4/ip_output.c:432
 dst_output include/net/dst.h:441 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 __ip_queue_xmit+0x8e9/0x1a00 net/ipv4/ip_output.c:532
 __tcp_transmit_skb+0x188c/0x38f0 net/ipv4/tcp_output.c:1405
 tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
 tcp_write_xmit+0xde7/0x6140 net/ipv4/tcp_output.c:2689
 __tcp_push_pending_frames+0xaa/0x390 net/ipv4/tcp_output.c:2869
 tcp_send_fin+0x117/0xbb0 net/ipv4/tcp_output.c:3426
 tcp_shutdown net/ipv4/tcp.c:2636 [inline]
 tcp_shutdown+0x127/0x170 net/ipv4/tcp.c:2621
 inet_shutdown+0x1a8/0x430 net/ipv4/af_inet.c:890
 rds_tcp_accept_one+0x5e0/0xc10 net/rds/tcp_listen.c:214
 rds_tcp_accept_worker+0x50/0x80 net/rds/tcp.c:515
 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
================================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UBSAN: shift-out-of-bounds in load_balance
  2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
@ 2021-02-22 17:12 ` syzbot
  2021-02-23 12:03   ` Valentin Schneider
  2021-03-02  9:01 ` [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance() tip-bot2 for Valentin Schneider
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: syzbot @ 2021-02-22 17:12 UTC (permalink / raw)
  To: akpm, bp, hpa, linux-kernel, luto, mingo, peterz, syzkaller-bugs,
	tglx, valentin.schneider, x86

syzbot has found a reproducer for the following issue on:

HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com

================================================================================
UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
shift exponent 149 is too large for 64-bit type 'long unsigned int'
CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 detach_tasks kernel/sched/fair.c:7712 [inline]
 load_balance.cold+0x1d/0x2e kernel/sched/fair.c:9641
 rebalance_domains+0x5cc/0xdb0 kernel/sched/fair.c:10029
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:343
 run_ksoftirqd kernel/softirq.c:650 [inline]
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:642
 smpboot_thread_fn+0x655/0x9e0 kernel/smpboot.c:165
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
================================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 panic+0x306/0x73d kernel/panic.c:231
 ubsan_epilogue+0x54/0x5a lib/ubsan.c:162
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 detach_tasks kernel/sched/fair.c:7712 [inline]
 load_balance.cold+0x1d/0x2e kernel/sched/fair.c:9641
 rebalance_domains+0x5cc/0xdb0 kernel/sched/fair.c:10029
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:343
 run_ksoftirqd kernel/softirq.c:650 [inline]
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:642
 smpboot_thread_fn+0x655/0x9e0 kernel/smpboot.c:165
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

======================================================


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UBSAN: shift-out-of-bounds in load_balance
  2021-02-22 17:12 ` syzbot
@ 2021-02-23 12:03   ` Valentin Schneider
  2021-02-23 13:45     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2021-02-23 12:03 UTC (permalink / raw)
  To: syzbot, akpm, bp, hpa, linux-kernel, luto, mingo, peterz,
	syzkaller-bugs, tglx, x86, Vincent Guittot


+Vincent

On 22/02/21 09:12, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
> dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
>
> ================================================================================
> UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
> shift exponent 149 is too large for 64-bit type 'long unsigned int'

That 149 is surprising.

sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
the same ballpark.

A successful load_balance() resets it to 0; a failed one increments
it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
balance, which will either set it to sd->cache_nice_tries+1 or reset it to
0. There is this one condition that could let it creep up uncontrollably:

  /*
   * Don't kick the active_load_balance_cpu_stop,
   * if the curr task on busiest CPU can't be
   * moved to this_cpu:
   */
  if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
          raw_spin_unlock_irqrestore(&busiest->lock,
                                      flags);
          goto out_one_pinned;
  }

So despite the resulting sd->balance_interval increase, repeatedly hitting
this might yield the above. Would we then want something like this?

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a8bd7b13634..b65c24b5ae91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7422,6 +7422,11 @@ struct lb_env {
 	struct list_head	tasks;
 };
 
+static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
+{
+	return sd->cache_nice_tries + 3;
+}
+
 /*
  * Is this task likely cache-hot:
  */
@@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
 	 * threads on a system with spare capacity
 	 */
 	if ((env->migration_type == migrate_task) &&
-	    (sd->nr_balance_failed > sd->cache_nice_tries+2))
+	    (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
 		return 1;
 
 	return 0;
@@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		 * frequent, pollute the failure counter causing
 		 * excessive cache_hot migrations and active balances.
 		 */
-		if (idle != CPU_NEWLY_IDLE)
-			sd->nr_balance_failed++;
+		if (idle != CPU_NEWLY_IDLE) {
+			sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
+						    sd_balance_failed_cap(sd));
+		}
 
 		if (need_active_balance(&env)) {
 			unsigned long flags;


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: UBSAN: shift-out-of-bounds in load_balance
  2021-02-23 12:03   ` Valentin Schneider
@ 2021-02-23 13:45     ` Vincent Guittot
  2021-02-23 15:51       ` Valentin Schneider
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2021-02-23 13:45 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: syzbot, Andrew Morton, Borislav Petkov, H. Peter Anvin,
	linux-kernel, luto, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, x86

On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> +Vincent
>
> On 22/02/21 09:12, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
> >
> > ================================================================================
> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>
> That 149 is surprising.

Yes, surprising. But is it really a problem in itself  ? shifting left
 would be a problem because of the overflow but here we shift right to
divide and the result is correct

Beside this, it seems that a significant number of previous attempts
to balance load has been done with another migration_type otherwise it
would  have raised a problem earlier (at 65) if previous LB were also
migration_load. It would be good to understand why the 148 previous
ones failed

>
> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
> the same ballpark.
>
> A successful load_balance() resets it to 0; a failed one increments
> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
> 0. There is this one condition that could let it creep up uncontrollably:
>
>   /*
>    * Don't kick the active_load_balance_cpu_stop,
>    * if the curr task on busiest CPU can't be
>    * moved to this_cpu:
>    */
>   if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
>           raw_spin_unlock_irqrestore(&busiest->lock,
>                                       flags);
>           goto out_one_pinned;
>   }
>
> So despite the resulting sd->balance_interval increase, repeatedly hitting
> this might yield the above. Would we then want something like this?
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a8bd7b13634..b65c24b5ae91 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7422,6 +7422,11 @@ struct lb_env {
>         struct list_head        tasks;
>  };
>
> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
> +{
> +       return sd->cache_nice_tries + 3;
> +}
> +
>  /*
>   * Is this task likely cache-hot:
>   */
> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
>          * threads on a system with spare capacity
>          */
>         if ((env->migration_type == migrate_task) &&
> -           (sd->nr_balance_failed > sd->cache_nice_tries+2))
> +           (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
>                 return 1;
>
>         return 0;
> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>                  * frequent, pollute the failure counter causing
>                  * excessive cache_hot migrations and active balances.
>                  */
> -               if (idle != CPU_NEWLY_IDLE)
> -                       sd->nr_balance_failed++;
> +               if (idle != CPU_NEWLY_IDLE) {
> +                       sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
> +                                                   sd_balance_failed_cap(sd));

nr_balance_failed is an interesting metric that we want to monitor
sometimes and we would like to be able to divide higher than
2^(sd->cache_nice_tries + 3).

If we really want to prevent out of bound shift, The below is more
appropriate IMO:

index 636741fa27c9..4d0b3fa30849 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
                         * migrate.
                         */

-                       if ((load >> env->sd->nr_balance_failed) >
env->imbalance)
+                       if ((load >> min_t(int,
env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
                                goto next;

                        env->imbalance -= load;


> +               }
>
>                 if (need_active_balance(&env)) {
>                         unsigned long flags;
>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: UBSAN: shift-out-of-bounds in load_balance
  2021-02-23 13:45     ` Vincent Guittot
@ 2021-02-23 15:51       ` Valentin Schneider
  0 siblings, 0 replies; 8+ messages in thread
From: Valentin Schneider @ 2021-02-23 15:51 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: syzbot, Andrew Morton, Borislav Petkov, H. Peter Anvin,
	linux-kernel, luto, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, x86

On 23/02/21 14:45, Vincent Guittot wrote:
> On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>>
>>
>> +Vincent
>>
>> On 22/02/21 09:12, syzbot wrote:
>> > syzbot has found a reproducer for the following issue on:
>> >
>> > HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
>> > git tree:       upstream
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
>> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
>> >
>> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> > Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
>> >
>> > ================================================================================
>> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
>> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>>
>> That 149 is surprising.
>
> Yes, surprising. But is it really a problem in itself  ? shifting left
>  would be a problem because of the overflow but here we shift right to
> divide and the result is correct
>

I would tend to think so, but the UB seems to apply regardless of the
shifting direction:

"""
If the value of the right operand is negative or is greater than or equal
to the width of the promoted left operand, the behavior is undefined.
"""

> Beside this, it seems that a significant number of previous attempts
> to balance load has been done with another migration_type otherwise it
> would  have raised a problem earlier (at 65) if previous LB were also
> migration_load. It would be good to understand why the 148 previous
> ones failed
>
>>
>> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
>> the same ballpark.
>>
>> A successful load_balance() resets it to 0; a failed one increments
>> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
>> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
>> 0. There is this one condition that could let it creep up uncontrollably:
>>
>>   /*
>>    * Don't kick the active_load_balance_cpu_stop,
>>    * if the curr task on busiest CPU can't be
>>    * moved to this_cpu:
>>    */
>>   if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
>>           raw_spin_unlock_irqrestore(&busiest->lock,
>>                                       flags);
>>           goto out_one_pinned;
>>   }
>>
>> So despite the resulting sd->balance_interval increase, repeatedly hitting
>> this might yield the above. Would we then want something like this?
>>
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8a8bd7b13634..b65c24b5ae91 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7422,6 +7422,11 @@ struct lb_env {
>>         struct list_head        tasks;
>>  };
>>
>> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
>> +{
>> +       return sd->cache_nice_tries + 3;
>> +}
>> +
>>  /*
>>   * Is this task likely cache-hot:
>>   */
>> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
>>          * threads on a system with spare capacity
>>          */
>>         if ((env->migration_type == migrate_task) &&
>> -           (sd->nr_balance_failed > sd->cache_nice_tries+2))
>> +           (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
>>                 return 1;
>>
>>         return 0;
>> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>>                  * frequent, pollute the failure counter causing
>>                  * excessive cache_hot migrations and active balances.
>>                  */
>> -               if (idle != CPU_NEWLY_IDLE)
>> -                       sd->nr_balance_failed++;
>> +               if (idle != CPU_NEWLY_IDLE) {
>> +                       sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
>> +                                                   sd_balance_failed_cap(sd));
>
> nr_balance_failed is an interesting metric that we want to monitor
> sometimes and we would like to be able to divide higher than
> 2^(sd->cache_nice_tries + 3).
>
> If we really want to prevent out of bound shift, The below is more
> appropriate IMO:
>
> index 636741fa27c9..4d0b3fa30849 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
>                          * migrate.
>                          */
>
> -                       if ((load >> env->sd->nr_balance_failed) >
> env->imbalance)
> +                       if ((load >> min_t(int,
> env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
>                                 goto next;
>

From the UB definition above, sounds like we need to cap at

  BITS_PER_TYPE(unsigned long) - 1

i.e. something like

  #define shr_bound(val, shift) \
          (val >> min_t(int, shift, BITS_PER_TYPE(val) - 1))

>                         env->imbalance -= load;
>
>
>> +               }
>>
>>                 if (need_active_balance(&env)) {
>>                         unsigned long flags;
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance()
  2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
  2021-02-22 17:12 ` syzbot
@ 2021-03-02  9:01 ` tip-bot2 for Valentin Schneider
  2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
  2021-03-06 11:42 ` tip-bot2 for Valentin Schneider
  3 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2021-03-02  9:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+d7581744d5fd27c9fbe1, Valentin Schneider,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     95d3920902d198b78d489db7e11732661ef79357
Gitweb:        https://git.kernel.org/tip/95d3920902d198b78d489db7e11732661ef79357
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Thu, 25 Feb 2021 17:56:56 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 18:17:26 +01:00

sched/fair: Fix shift-out-of-bounds in load_balance()

Syzbot reported a handful of occurrences where an sd->nr_balance_failed can
grow to much higher values than one would expect.

A successful load_balance() resets it to 0; a failed one increments
it. Once it gets to sd->cache_nice_tries + 3, this *should* trigger an
active balance, which will either set it to sd->cache_nice_tries+1 or reset
it to 0. However, in case the to-be-active-balanced task is not allowed to
run on env->dst_cpu, then the increment is done without any further
modification.

This could then be repeated ad nauseam, and would explain the absurdly high
values reported by syzbot (86, 149). VincentG noted there is value in
letting sd->cache_nice_tries grow, so the shift itself should be
fixed. That means preventing:

  """
  If the value of the right operand is negative or is greater than or equal
  to the width of the promoted left operand, the behavior is undefined.
  """

Thus we need to cap the shift exponent to
  BITS_PER_TYPE(typeof(lefthand)) - 1.

I had a look around for other similar cases via coccinelle:

  @expr@
  position pos;
  expression E1;
  expression E2;
  @@
  (
  E1 >> E2@pos
  |
  E1 >> E2@pos
  )

  @cst depends on expr@
  position pos;
  expression expr.E1;
  constant cst;
  @@
  (
  E1 >> cst@pos
  |
  E1 << cst@pos
  )

  @script:python depends on !cst@
  pos << expr.pos;
  exp << expr.E2;
  @@
  # Dirty hack to ignore constexpr
  if exp.upper() != exp:
     coccilib.report.print_report(pos[0], "Possible UB shift here")

The only other match in kernel/sched is rq_clock_thermal() which employs
sched_thermal_decay_shift, and that exponent is already capped to 10, so
that one is fine.

Fixes: 5a7f55590467 ("sched/fair: Relax constraint on task's load during load balance")
Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lore.kernel.org/r/000000000000ffac1205b9a2112f@google.com
---
 kernel/sched/fair.c  | 3 +--
 kernel/sched/sched.h | 7 +++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b2fac0..1af51a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7722,8 +7722,7 @@ static int detach_tasks(struct lb_env *env)
 			 * scheduler fails to find a good waiting task to
 			 * migrate.
 			 */
-
-			if ((load >> env->sd->nr_balance_failed) > env->imbalance)
+			if (shr_bound(load, env->sd->nr_balance_failed) > env->imbalance)
 				goto next;
 
 			env->imbalance -= load;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0ddc9a6..bb8bb06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -205,6 +205,13 @@ static inline void update_avg(u64 *avg, u64 sample)
 }
 
 /*
+ * Shifting a value by an exponent greater *or equal* to the size of said value
+ * is UB; cap at size-1.
+ */
+#define shr_bound(val, shift)							\
+	(val >> min_t(typeof(shift), shift, BITS_PER_TYPE(typeof(val)) - 1))
+
+/*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
  * This is actually gross. :(

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance()
  2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
  2021-02-22 17:12 ` syzbot
  2021-03-02  9:01 ` [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance() tip-bot2 for Valentin Schneider
@ 2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
  2021-03-06 11:42 ` tip-bot2 for Valentin Schneider
  3 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2021-03-03  9:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+d7581744d5fd27c9fbe1, Valentin Schneider,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9ab8f620eea3a797667add72eae5e235d2ca2fc8
Gitweb:        https://git.kernel.org/tip/9ab8f620eea3a797667add72eae5e235d2ca2fc8
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Thu, 25 Feb 2021 17:56:56 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 03 Mar 2021 10:33:00 +01:00

sched/fair: Fix shift-out-of-bounds in load_balance()

Syzbot reported a handful of occurrences where an sd->nr_balance_failed can
grow to much higher values than one would expect.

A successful load_balance() resets it to 0; a failed one increments
it. Once it gets to sd->cache_nice_tries + 3, this *should* trigger an
active balance, which will either set it to sd->cache_nice_tries+1 or reset
it to 0. However, in case the to-be-active-balanced task is not allowed to
run on env->dst_cpu, then the increment is done without any further
modification.

This could then be repeated ad nauseam, and would explain the absurdly high
values reported by syzbot (86, 149). VincentG noted there is value in
letting sd->cache_nice_tries grow, so the shift itself should be
fixed. That means preventing:

  """
  If the value of the right operand is negative or is greater than or equal
  to the width of the promoted left operand, the behavior is undefined.
  """

Thus we need to cap the shift exponent to
  BITS_PER_TYPE(typeof(lefthand)) - 1.

I had a look around for other similar cases via coccinelle:

  @expr@
  position pos;
  expression E1;
  expression E2;
  @@
  (
  E1 >> E2@pos
  |
  E1 >> E2@pos
  )

  @cst depends on expr@
  position pos;
  expression expr.E1;
  constant cst;
  @@
  (
  E1 >> cst@pos
  |
  E1 << cst@pos
  )

  @script:python depends on !cst@
  pos << expr.pos;
  exp << expr.E2;
  @@
  # Dirty hack to ignore constexpr
  if exp.upper() != exp:
     coccilib.report.print_report(pos[0], "Possible UB shift here")

The only other match in kernel/sched is rq_clock_thermal() which employs
sched_thermal_decay_shift, and that exponent is already capped to 10, so
that one is fine.

Fixes: 5a7f55590467 ("sched/fair: Relax constraint on task's load during load balance")
Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lore.kernel.org/r/000000000000ffac1205b9a2112f@google.com
---
 kernel/sched/fair.c  | 3 +--
 kernel/sched/sched.h | 7 +++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b2fac0..1af51a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7722,8 +7722,7 @@ static int detach_tasks(struct lb_env *env)
 			 * scheduler fails to find a good waiting task to
 			 * migrate.
 			 */
-
-			if ((load >> env->sd->nr_balance_failed) > env->imbalance)
+			if (shr_bound(load, env->sd->nr_balance_failed) > env->imbalance)
 				goto next;
 
 			env->imbalance -= load;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0ddc9a6..bb8bb06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -205,6 +205,13 @@ static inline void update_avg(u64 *avg, u64 sample)
 }
 
 /*
+ * Shifting a value by an exponent greater *or equal* to the size of said value
+ * is UB; cap at size-1.
+ */
+#define shr_bound(val, shift)							\
+	(val >> min_t(typeof(shift), shift, BITS_PER_TYPE(typeof(val)) - 1))
+
+/*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
  * This is actually gross. :(

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance()
  2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
                   ` (2 preceding siblings ...)
  2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
@ 2021-03-06 11:42 ` tip-bot2 for Valentin Schneider
  3 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+d7581744d5fd27c9fbe1, Valentin Schneider,
	Peter Zijlstra (Intel),
	Ingo Molnar, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     39a2a6eb5c9b66ea7c8055026303b3aa681b49a5
Gitweb:        https://git.kernel.org/tip/39a2a6eb5c9b66ea7c8055026303b3aa681b49a5
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Thu, 25 Feb 2021 17:56:56 
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:22 +01:00

sched/fair: Fix shift-out-of-bounds in load_balance()

Syzbot reported a handful of occurrences where an sd->nr_balance_failed can
grow to much higher values than one would expect.

A successful load_balance() resets it to 0; a failed one increments
it. Once it gets to sd->cache_nice_tries + 3, this *should* trigger an
active balance, which will either set it to sd->cache_nice_tries+1 or reset
it to 0. However, in case the to-be-active-balanced task is not allowed to
run on env->dst_cpu, then the increment is done without any further
modification.

This could then be repeated ad nauseam, and would explain the absurdly high
values reported by syzbot (86, 149). VincentG noted there is value in
letting sd->cache_nice_tries grow, so the shift itself should be
fixed. That means preventing:

  """
  If the value of the right operand is negative or is greater than or equal
  to the width of the promoted left operand, the behavior is undefined.
  """

Thus we need to cap the shift exponent to
  BITS_PER_TYPE(typeof(lefthand)) - 1.

I had a look around for other similar cases via coccinelle:

  @expr@
  position pos;
  expression E1;
  expression E2;
  @@
  (
  E1 >> E2@pos
  |
  E1 >> E2@pos
  )

  @cst depends on expr@
  position pos;
  expression expr.E1;
  constant cst;
  @@
  (
  E1 >> cst@pos
  |
  E1 << cst@pos
  )

  @script:python depends on !cst@
  pos << expr.pos;
  exp << expr.E2;
  @@
  # Dirty hack to ignore constexpr
  if exp.upper() != exp:
     coccilib.report.print_report(pos[0], "Possible UB shift here")

The only other match in kernel/sched is rq_clock_thermal() which employs
sched_thermal_decay_shift, and that exponent is already capped to 10, so
that one is fine.

Fixes: 5a7f55590467 ("sched/fair: Relax constraint on task's load during load balance")
Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lore.kernel.org/r/000000000000ffac1205b9a2112f@google.com
---
 kernel/sched/fair.c  | 3 +--
 kernel/sched/sched.h | 7 +++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b2fac0..1af51a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7722,8 +7722,7 @@ static int detach_tasks(struct lb_env *env)
 			 * scheduler fails to find a good waiting task to
 			 * migrate.
 			 */
-
-			if ((load >> env->sd->nr_balance_failed) > env->imbalance)
+			if (shr_bound(load, env->sd->nr_balance_failed) > env->imbalance)
 				goto next;
 
 			env->imbalance -= load;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0ddc9a6..bb8bb06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -205,6 +205,13 @@ static inline void update_avg(u64 *avg, u64 sample)
 }
 
 /*
+ * Shifting a value by an exponent greater *or equal* to the size of said value
+ * is UB; cap at size-1.
+ */
+#define shr_bound(val, shift)							\
+	(val >> min_t(typeof(shift), shift, BITS_PER_TYPE(typeof(val)) - 1))
+
+/*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
  * This is actually gross. :(

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-06 11:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
2021-02-22 17:12 ` syzbot
2021-02-23 12:03   ` Valentin Schneider
2021-02-23 13:45     ` Vincent Guittot
2021-02-23 15:51       ` Valentin Schneider
2021-03-02  9:01 ` [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance() tip-bot2 for Valentin Schneider
2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
2021-03-06 11:42 ` tip-bot2 for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.