All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: syzbot <syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	luto@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller-bugs@googlegroups.com,
	Thomas Gleixner <tglx@linutronix.de>, x86 <x86@kernel.org>
Subject: Re: UBSAN: shift-out-of-bounds in load_balance
Date: Tue, 23 Feb 2021 15:51:29 +0000	[thread overview]
Message-ID: <jhj4ki27pfi.mognet@arm.com> (raw)
In-Reply-To: <CAKfTPtAkzDWfqAP=Fb+4B+PBUNN_7oTdZ3Cs+wLdfrJNa_ymTQ@mail.gmail.com>

On 23/02/21 14:45, Vincent Guittot wrote:
> On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>>
>>
>> +Vincent
>>
>> On 22/02/21 09:12, syzbot wrote:
>> > syzbot has found a reproducer for the following issue on:
>> >
>> > HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
>> > git tree:       upstream
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
>> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
>> >
>> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> > Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
>> >
>> > ================================================================================
>> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
>> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>>
>> That 149 is surprising.
>
> Yes, surprising. But is it really a problem in itself  ? shifting left
>  would be a problem because of the overflow but here we shift right to
> divide and the result is correct
>

I would tend to think so, but the UB seems to apply regardless of the
shifting direction:

"""
If the value of the right operand is negative or is greater than or equal
to the width of the promoted left operand, the behavior is undefined.
"""

> Beside this, it seems that a significant number of previous attempts
> to balance load has been done with another migration_type otherwise it
> would  have raised a problem earlier (at 65) if previous LB were also
> migration_load. It would be good to understand why the 148 previous
> ones failed
>
>>
>> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
>> the same ballpark.
>>
>> A successful load_balance() resets it to 0; a failed one increments
>> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
>> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
>> 0. There is this one condition that could let it creep up uncontrollably:
>>
>>   /*
>>    * Don't kick the active_load_balance_cpu_stop,
>>    * if the curr task on busiest CPU can't be
>>    * moved to this_cpu:
>>    */
>>   if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
>>           raw_spin_unlock_irqrestore(&busiest->lock,
>>                                       flags);
>>           goto out_one_pinned;
>>   }
>>
>> So despite the resulting sd->balance_interval increase, repeatedly hitting
>> this might yield the above. Would we then want something like this?
>>
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8a8bd7b13634..b65c24b5ae91 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7422,6 +7422,11 @@ struct lb_env {
>>         struct list_head        tasks;
>>  };
>>
>> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
>> +{
>> +       return sd->cache_nice_tries + 3;
>> +}
>> +
>>  /*
>>   * Is this task likely cache-hot:
>>   */
>> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
>>          * threads on a system with spare capacity
>>          */
>>         if ((env->migration_type == migrate_task) &&
>> -           (sd->nr_balance_failed > sd->cache_nice_tries+2))
>> +           (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
>>                 return 1;
>>
>>         return 0;
>> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>>                  * frequent, pollute the failure counter causing
>>                  * excessive cache_hot migrations and active balances.
>>                  */
>> -               if (idle != CPU_NEWLY_IDLE)
>> -                       sd->nr_balance_failed++;
>> +               if (idle != CPU_NEWLY_IDLE) {
>> +                       sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
>> +                                                   sd_balance_failed_cap(sd));
>
> nr_balance_failed is an interesting metric that we want to monitor
> sometimes and we would like to be able to divide higher than
> 2^(sd->cache_nice_tries + 3).
>
> If we really want to prevent out of bound shift, The below is more
> appropriate IMO:
>
> index 636741fa27c9..4d0b3fa30849 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
>                          * migrate.
>                          */
>
> -                       if ((load >> env->sd->nr_balance_failed) >
> env->imbalance)
> +                       if ((load >> min_t(int,
> env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
>                                 goto next;
>

From the UB definition above, sounds like we need to cap at

  BITS_PER_TYPE(unsigned long) - 1

i.e. something like

  #define shr_bound(val, shift) \
          (val >> min_t(int, shift, BITS_PER_TYPE(val) - 1))

>                         env->imbalance -= load;
>
>
>> +               }
>>
>>                 if (need_active_balance(&env)) {
>>                         unsigned long flags;
>>

  reply	other threads:[~2021-02-23 15:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
2021-02-22 17:12 ` syzbot
2021-02-23 12:03   ` Valentin Schneider
2021-02-23 13:45     ` Vincent Guittot
2021-02-23 15:51       ` Valentin Schneider [this message]
2021-03-02  9:01 ` [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance() tip-bot2 for Valentin Schneider
2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
2021-03-06 11:42 ` tip-bot2 for Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhj4ki27pfi.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.