All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: syzbot <syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	luto@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller-bugs@googlegroups.com,
	Thomas Gleixner <tglx@linutronix.de>, x86 <x86@kernel.org>
Subject: Re: UBSAN: shift-out-of-bounds in load_balance
Date: Tue, 23 Feb 2021 14:45:33 +0100	[thread overview]
Message-ID: <CAKfTPtAkzDWfqAP=Fb+4B+PBUNN_7oTdZ3Cs+wLdfrJNa_ymTQ@mail.gmail.com> (raw)
In-Reply-To: <jhjlfbfhty2.mognet@arm.com>

On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> +Vincent
>
> On 22/02/21 09:12, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit:    31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com
> >
> > ================================================================================
> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>
> That 149 is surprising.

Yes, surprising. But is it really a problem in itself  ? shifting left
 would be a problem because of the overflow but here we shift right to
divide and the result is correct

Beside this, it seems that a significant number of previous attempts
to balance load has been done with another migration_type otherwise it
would  have raised a problem earlier (at 65) if previous LB were also
migration_load. It would be good to understand why the 148 previous
ones failed

>
> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
> the same ballpark.
>
> A successful load_balance() resets it to 0; a failed one increments
> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
> 0. There is this one condition that could let it creep up uncontrollably:
>
>   /*
>    * Don't kick the active_load_balance_cpu_stop,
>    * if the curr task on busiest CPU can't be
>    * moved to this_cpu:
>    */
>   if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
>           raw_spin_unlock_irqrestore(&busiest->lock,
>                                       flags);
>           goto out_one_pinned;
>   }
>
> So despite the resulting sd->balance_interval increase, repeatedly hitting
> this might yield the above. Would we then want something like this?
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a8bd7b13634..b65c24b5ae91 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7422,6 +7422,11 @@ struct lb_env {
>         struct list_head        tasks;
>  };
>
> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
> +{
> +       return sd->cache_nice_tries + 3;
> +}
> +
>  /*
>   * Is this task likely cache-hot:
>   */
> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
>          * threads on a system with spare capacity
>          */
>         if ((env->migration_type == migrate_task) &&
> -           (sd->nr_balance_failed > sd->cache_nice_tries+2))
> +           (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
>                 return 1;
>
>         return 0;
> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>                  * frequent, pollute the failure counter causing
>                  * excessive cache_hot migrations and active balances.
>                  */
> -               if (idle != CPU_NEWLY_IDLE)
> -                       sd->nr_balance_failed++;
> +               if (idle != CPU_NEWLY_IDLE) {
> +                       sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
> +                                                   sd_balance_failed_cap(sd));

nr_balance_failed is an interesting metric that we want to monitor
sometimes and we would like to be able to divide higher than
2^(sd->cache_nice_tries + 3).

If we really want to prevent out of bound shift, The below is more
appropriate IMO:

index 636741fa27c9..4d0b3fa30849 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
                         * migrate.
                         */

-                       if ((load >> env->sd->nr_balance_failed) >
env->imbalance)
+                       if ((load >> min_t(int,
env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
                                goto next;

                        env->imbalance -= load;


> +               }
>
>                 if (need_active_balance(&env)) {
>                         unsigned long flags;
>

  reply	other threads:[~2021-02-23 13:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-24  9:30 UBSAN: shift-out-of-bounds in load_balance syzbot
2021-02-22 17:12 ` syzbot
2021-02-23 12:03   ` Valentin Schneider
2021-02-23 13:45     ` Vincent Guittot [this message]
2021-02-23 15:51       ` Valentin Schneider
2021-03-02  9:01 ` [tip: sched/core] sched/fair: Fix shift-out-of-bounds in load_balance() tip-bot2 for Valentin Schneider
2021-03-03  9:49 ` tip-bot2 for Valentin Schneider
2021-03-06 11:42 ` tip-bot2 for Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtAkzDWfqAP=Fb+4B+PBUNN_7oTdZ3Cs+wLdfrJNa_ymTQ@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+d7581744d5fd27c9fbe1@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.