All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Eli Cohen <elic@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>
Subject: Re: system hang on start-up (mlx5?)
Date: Wed, 31 May 2023 00:17:59 +0200	[thread overview]
Message-ID: <87sfbdh3ag.ffs@tglx> (raw)
In-Reply-To: <C34181E7-A515-4BD1-8C38-CB8BCF2D987D@oracle.com>

On Tue, May 30 2023 at 21:48, Chuck Lever III wrote:
>> On May 30, 2023, at 3:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> cpumask_copy(d, s)
>>   bitmap_copy(d, s, nbits = 32)
>>     len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
>> 
>> So it copies as many longs as required to cover nbits, i.e. it copies
>> any clobbered bits beyond nbits too. While that looks odd at the first
>> glance, that's just an optimization which is harmless.
>> 
>> for_each_cpu() finds the next set bit in a mask and breaks the loop once
>> bitnr >= small_cpumask_bits, which is nr_cpu_ids and should be 32 too.
>> 
>> I just booted a kernel with NR_CPUS=32:
>
> My system has only 12 CPUs. So every bit in your mask represents
> a present CPU, but on my system, only 0x00000fff are ever present.
>
> Therefore, on my system, any bit higher than bit 11 in a CPU mask
> will reference a CPU that is not present.

Correct....

Sorry, I missed the part that your machine has only 12 CPUs....

Now I can reproduce the wreckage even with that trivial test I did:

[    0.210089] setup_percpu: NR_CPUS:32 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:1
...
[    0.606591] smp: MASKBITS: 5555555555555555
[    0.607026] smp: CPUs: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

I'm way too tired to make sense of that right now. Will have a look at
it tomorrow with brain awake unless you beat me to it.

That's one mystery but the other one is this:

[   71.273798][ T1185] irq_matrix_reserve_managed: MASKBITS:   ffffb1a74686bcd8

That's clearly a kernel address within the direct map. How does that end
up as content of a cpumask?

Thanks,

        tglx

  reply	other threads:[~2023-05-30 22:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03  1:03 system hang on start-up (mlx5?) Chuck Lever III
2023-05-03  6:34 ` Eli Cohen
2023-05-03 14:02   ` Chuck Lever III
2023-05-04  7:29     ` Leon Romanovsky
2023-05-04 19:02       ` Chuck Lever III
2023-05-04 23:38         ` Jason Gunthorpe
2023-05-07  5:23           ` Eli Cohen
2023-05-07  5:31         ` Eli Cohen
2023-05-27 20:16           ` Chuck Lever III
2023-05-29 21:20             ` Thomas Gleixner
2023-05-30 13:09               ` Chuck Lever III
2023-05-30 13:28                 ` Chuck Lever III
2023-05-30 13:48                   ` Eli Cohen
2023-05-30 13:51                     ` Chuck Lever III
2023-05-30 13:54                       ` Eli Cohen
2023-05-30 15:08                         ` Shay Drory
2023-05-31 14:15                           ` Chuck Lever III
2023-05-30 19:46                 ` Thomas Gleixner
2023-05-30 21:48                   ` Chuck Lever III
2023-05-30 22:17                     ` Thomas Gleixner [this message]
2023-05-31 14:43                     ` Thomas Gleixner
2023-05-31 15:06                       ` Chuck Lever III
2023-05-31 17:11                         ` Thomas Gleixner
2023-05-31 18:52                           ` Chuck Lever III
2023-05-31 19:19                             ` Thomas Gleixner
2023-05-16 19:23         ` Chuck Lever III
2023-05-23 14:20           ` Linux regression tracking (Thorsten Leemhuis)
2023-05-24 14:59             ` Chuck Lever III
2023-05-08 12:29 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-06-02 11:05   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-02 13:38     ` Chuck Lever III
2023-06-02 13:55       ` Linux regression tracking (Thorsten Leemhuis)
2023-06-02 14:03         ` Chuck Lever III
2023-06-02 14:29         ` Jason Gunthorpe
2023-06-02 15:58           ` Thorsten Leemhuis
2023-06-02 16:54           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfbdh3ag.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=chuck.lever@oracle.com \
    --cc=elic@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.