All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/5] cpumask: fix comment of cpumask_xxx
Date: Mon, 6 Mar 2023 13:27:51 -0800	[thread overview]
Message-ID: <CAHk-=whsQuc-S4vuYpvi013-81UNvNc8CoTMC5AKOMT-uq_7Og@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjR6SGJhhHT6NzHcZHBJ3p5Y_JPvpQPjkeNQE+emivS6Q@mail.gmail.com>

On Mon, Mar 6, 2023 at 12:43 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And when I say "all", I might be missing some others that don't match
> that exact pattern, of course.

Indeed.

I'm looking at wg_cpumask_next_online(), and this:

        while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask)))
                cpu = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits;

seems very dodgy indeed. I'm not convinced it might not cause an endless loop.

The code does seem to expect that the modulus ends up being zero, and
thus starting from the beginning. But that's not necessarily at all
true.

For example, maybe you have CONFIG_NR_CPUS set to some odd value like
6, because who knows, you decided that you are on some mobile platform
that has a BIG.little setup with 4 little cores and 2 big ones.

But your're then running that on a machine that only has 4 cores.

And to make matters worse, you have decided to offline all but one of
those cores, for whatever reason. So only bit #0 is actually set in
'cpu_online_mask'.

End result:

 - nr_cpumask_bits is 4, because that's how many CPU cores you have.

 - cpumask_next(cpu, cpu_online_mask) will return 6 if 'cpu' >=0,
because it's looking for the next online CPU, doesn't find it, and it
uses that optimized version that uses the compile-time constant of
maximum CPU's rather than bothering with some run-time constant.

 - cpumask_test_cpu(2, cpu_online_mask) returns false, because CPU#2
is not online. It *could* be, but it's not.

now that loop is:

        while (unlikely(cpu2 is offline))
                cpu = 6 % 4;

and it will just loop forever on trying to get the "next" cpu, but
that will always remain that offline CPU #2.

I *think* what you want is just that same "find next wrapping CPU",
but I find it very odd that you use a potentially horrendously slow
modulus operation for it, and a while loop.

IOW, the more obvious code would have been just

        cpu = cpumask_next(cpu-1, cpu_online_mask);
        if (cpu >= nr_cpu_ids)
                cpu = cpumask_first(cpu_online_mask);

which seems much more straigthforward, and avoids both a pointless
while loop and a potentially expensive modulus.

And yes, I think that "cpumask_next -> cpumask_first" wrapping search
pattern should probably have a better helper, but even without the
helper the above code seems simpler and more logical than the odd code
that it has now.

Now, I suspect you have to hit a few fairly odd situations to actually
get the above endless loop, but I don't think they have to necessarily
be quite as made-up as my example above.

And I suspect you wrote it that odd way because you expect that the
first "cpumask_test_cpu()" always succeeds, so the whole while-loop
was then a complete afterthought, and any oddity there was just
"nobody cares". Except for the fact that it does seem to be possibly
an endless loop.

[ Adding back lkml to the participants, just because I think this was
an interesting case of cpumask operation mis-use. Even if it probably
doesn't cause issues in practice, it's the kind of code that *can*
cause problems ]

               Linus

  parent reply	other threads:[~2023-03-06 21:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 16:06 [PATCH 0/5] fix call cpumask_next() if no further cpus set Vernon Yang
2023-03-06 16:06 ` [PATCH 1/5] random: fix try_to_generate_entropy() " Vernon Yang
2023-03-06 16:26   ` Yury Norov
2023-03-06 16:06 ` [PATCH 2/5] wireguard: fix wg_cpumask_choose_online() " Vernon Yang
2023-03-06 16:06 ` [PATCH 3/5] scsi: lpfc: fix lpfc_cpu_affinity_check() " Vernon Yang
2023-03-06 18:48   ` Linus Torvalds
2023-03-06 20:09     ` Vernon Yang
2023-03-06 16:06 ` [PATCH 4/5] scsi: lpfc: fix lpfc_nvmet_setup_io_context() " Vernon Yang
2023-03-06 16:06 ` [PATCH 5/5] cpumask: fix comment of cpumask_xxx Vernon Yang
2023-03-06 16:39   ` Yury Norov
2023-03-06 16:44     ` Jason A. Donenfeld
2023-03-06 16:54       ` Yury Norov
2023-03-06 17:04         ` Jason A. Donenfeld
2023-03-06 17:45     ` Vernon Yang
2023-03-06 17:29   ` Linus Torvalds
2023-03-06 17:47     ` Linus Torvalds
2023-03-06 18:02       ` Linus Torvalds
     [not found]       ` <CAHmME9qN1EcfzE2ONA-B+F=8xaqZhqkEY=_npYHgtBpUFCj4Lw@mail.gmail.com>
     [not found]         ` <CAHk-=wjR6SGJhhHT6NzHcZHBJ3p5Y_JPvpQPjkeNQE+emivS6Q@mail.gmail.com>
2023-03-06 21:27           ` Linus Torvalds [this message]
2023-03-07 17:52             ` Jason A. Donenfeld
2023-03-06 18:13     ` Vernon Yang
2023-03-06 18:34       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whsQuc-S4vuYpvi013-81UNvNc8CoTMC5AKOMT-uq_7Og@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.