Re: UBSAN: shift-out-of-bounds in load_balance

From: Valentin Schneider <valentin.schneider@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: syzbot <syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	luto@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller-bugs@googlegroups.com,
	Thomas Gleixner <tglx@linutronix.de>, x86 <x86@kernel.org>
Subject: Re: UBSAN: shift-out-of-bounds in load_balance
Date: Tue, 23 Feb 2021 15:51:29 +0000	[thread overview]
Message-ID: <jhj4ki27pfi.mognet@arm.com> (raw)
In-Reply-To: <CAKfTPtAkzDWfqAP=Fb+4B+PBUNN_7oTdZ3Cs+wLdfrJNa_ymTQ@mail.gmail.com>

On 23/02/21 14:45, Vincent Guittot wrote:
> On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>>
>>
>> +Vincent
>>
>> On 22/02/21 09:12, syzbot wrote:
>> > syzbot has found a reproducer for the following issue on:
>> >
>> > HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
>> > git tree:       upstream
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
>> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
>> >
>> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> > Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
>> >
>> > ================================================================================
>> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
>> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>>
>> That 149 is surprising.
>
> Yes, surprising. But is it really a problem in itself  ? shifting left
>  would be a problem because of the overflow but here we shift right to
> divide and the result is correct
>

I would tend to think so, but the UB seems to apply regardless of the
shifting direction:

"""
If the value of the right operand is negative or is greater than or equal
to the width of the promoted left operand, the behavior is undefined.
"""

> Beside this, it seems that a significant number of previous attempts
> to balance load has been done with another migration_type otherwise it
> would  have raised a problem earlier (at 65) if previous LB were also
> migration_load. It would be good to understand why the 148 previous
> ones failed
>
>>
>> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
>> the same ballpark.
>>
>> A successful load_balance() resets it to 0; a failed one increments
>> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
>> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
>> 0. There is this one condition that could let it creep up uncontrollably:
>>
>>   /*
>>    * Don't kick the active_load_balance_cpu_stop,
>>    * if the curr task on busiest CPU can't be
>>    * moved to this_cpu:
>>    */
>>   if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
>>           raw_spin_unlock_irqrestore(&busiest->lock,
>>                                       flags);
>>           goto out_one_pinned;
>>   }
>>
>> So despite the resulting sd->balance_interval increase, repeatedly hitting
>> this might yield the above. Would we then want something like this?
>>
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8a8bd7b13634..b65c24b5ae91 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7422,6 +7422,11 @@ struct lb_env {
>>         struct list_head        tasks;
>>  };
>>
>> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
>> +{
>> +       return sd->cache_nice_tries + 3;
>> +}
>> +
>>  /*
>>   * Is this task likely cache-hot:
>>   */
>> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
>>          * threads on a system with spare capacity
>>          */
>>         if ((env->migration_type == migrate_task) &&
>> -           (sd->nr_balance_failed > sd->cache_nice_tries+2))
>> +           (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
>>                 return 1;
>>
>>         return 0;
>> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>>                  * frequent, pollute the failure counter causing
>>                  * excessive cache_hot migrations and active balances.
>>                  */
>> -               if (idle != CPU_NEWLY_IDLE)
>> -                       sd->nr_balance_failed++;
>> +               if (idle != CPU_NEWLY_IDLE) {
>> +                       sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
>> +                                                   sd_balance_failed_cap(sd));
>
> nr_balance_failed is an interesting metric that we want to monitor
> sometimes and we would like to be able to divide higher than
> 2^(sd->cache_nice_tries + 3).
>
> If we really want to prevent out of bound shift, The below is more
> appropriate IMO:
>
> index 636741fa27c9..4d0b3fa30849 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
>                          * migrate.
>                          */
>
> -                       if ((load >> env->sd->nr_balance_failed) >
> env->imbalance)
> +                       if ((load >> min_t(int,
> env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
>                                 goto next;
>

From the UB definition above, sounds like we need to cap at

  BITS_PER_TYPE(unsigned long) - 1

i.e. something like

  #define shr_bound(val, shift) \
          (val >> min_t(int, shift, BITS_PER_TYPE(val) - 1))

>                         env->imbalance -= load;
>
>
>> +               }
>>
>>                 if (need_active_balance(&env)) {
>>                         unsigned long flags;
>>