All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Eli Cohen <elic@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: system hang on start-up (mlx5?)
Date: Wed, 31 May 2023 18:52:39 +0000	[thread overview]
Message-ID: <7AECA9A6-1F7D-4F82-A4C1-83616D4D6566@oracle.com> (raw)
In-Reply-To: <87leh4fmsg.ffs@tglx>



> On May 31, 2023, at 1:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Wed, May 31 2023 at 15:06, Chuck Lever III wrote:
>>> On May 31, 2023, at 10:43 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>> mlx5_irq_alloc(af_desc)
>>> pci_msix_alloc_irq_at(af_desc)
>>>   msi_domain_alloc_irq_at(af_desc)
>>>     __msi_domain_alloc_irqs(af_desc)
>>> 1)      msidesc->affinity = kmemdup(af_desc);
>>>       __irq_domain_alloc_irqs()
>>>         __irq_domain_alloc_irqs(aff=msidesc->affinity)
>>>           irq_domain_alloc_irqs_locked(aff)
>>>             irq_domain_alloc_irqs_locked(aff)
>>>               irq_domain_alloc_descs(aff)
>>>                 alloc_desc(mask=&aff->mask)
>>>                   desc_smp_init(mask)
>>> 2)                    cpumask_copy(desc->irq_common_data.affinity, mask);
>>>               irq_domain_alloc_irqs_hierarchy()
>>>                 msi_domain_alloc()
>>>                   intel_irq_remapping_alloc()
>>>                     x86_vector_alloc_irqs()
>> 
>> It is x86_vector_alloc_irqs() where the struct irq_data is
>> fabricated that ends up in irq_matrix_reserve_managed().
> 
> Kinda fabricated :)
> 
>     irqd = irq_domain_get_irq_data(domain, virq + i);
> 
> Thats finding the irqdata which is associated to the vector domain. That
> has been allocated earlier. The affinity mask is retrieved via:
> 
>    const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd);
> 
> which does:
> 
>      return irqd->common->affinity;
> 
> irqd->common points to desc->irq_common_data. The affinity there was
> copied in #2 above.
> 
>>> This also ends up in the wrong place. That mlx code does:
>>> 
>>>  af_desc.is_managed = false;
>>> 
>>> but the allocation ends up allocating a managed vector.
>> 
>> That line was changed in 6.4-rc4 to address another bug,
>> and it avoids the crash by not calling into the misbehaving
>> code. It doesn't address the mlx5_core initialization issue
>> though, because as I said before, execution continues and
>> crashes in a similar scenario later on.
> 
> Ok.
> 
>> On my system, I've reverted that fix:
>> 
>> -       af_desc.is_managed = false;
>> +       af_desc.is_managed = 1;
>> 
>> so that we can continue debugging this crash.
> 
> Ah.
> 
>>> Can you please instrument this along the call chain so we can see where
>>> or at least when this gets corrupted? Please print the relevant pointer
>>> addresses too so we can see whether that's consistent or not.
>> 
>> I will continue working on this today.
>>> But that's just the symptom, not the root cause. That code is perfectly
>>> fine when all callers use the proper cpumask functions.
>> 
>> Agreed: we're crashing here because of the extra bits
>> in the affinity mask, but those bits should not be set
>> in the first place.
> 
> Correct.
> 
>> I wasn't sure if for_each_cpu() was supposed to iterate
>> into non-present CPUs -- and I guess the answer
>> is yes, it will iterate the full length of the mask.
>> The caller is responsible for ensuring the mask is valid.
> 
> Yes, that's the assumption of this constant optimization for the small
> number of CPUs case. All other cases use nr_cpu_ids as limit and won't
> go into non-possible CPUs. I didn't spot it yesterday night either.

This addresses the problem for me with both is_managed = 1
and is_managed = false:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index db5687d9fec9..bcf5df316c8f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -570,11 +570,11 @@ int mlx5_irqs_request_vectors(struct mlx5_core_dev *dev, u16 *cpus, int nirqs,
        af_desc.is_managed = false;
        for (i = 0; i < nirqs; i++) {
+               cpumask_clear(&af_desc.mask);
                cpumask_set_cpu(cpus[i], &af_desc.mask);
                irq = mlx5_irq_request(dev, i + 1, &af_desc, rmap);
                if (IS_ERR(irq))
                        break;
-               cpumask_clear(&af_desc.mask);
                irqs[i] = irq;
        }

If you agree this looks reasonable, I can package it with a
proper patch description and send it to Eli and Saeed.

--
Chuck Lever



  reply	other threads:[~2023-05-31 18:53 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03  1:03 system hang on start-up (mlx5?) Chuck Lever III
2023-05-03  6:34 ` Eli Cohen
2023-05-03 14:02   ` Chuck Lever III
2023-05-04  7:29     ` Leon Romanovsky
2023-05-04 19:02       ` Chuck Lever III
2023-05-04 23:38         ` Jason Gunthorpe
2023-05-07  5:23           ` Eli Cohen
2023-05-07  5:31         ` Eli Cohen
2023-05-27 20:16           ` Chuck Lever III
2023-05-29 21:20             ` Thomas Gleixner
2023-05-30 13:09               ` Chuck Lever III
2023-05-30 13:28                 ` Chuck Lever III
2023-05-30 13:48                   ` Eli Cohen
2023-05-30 13:51                     ` Chuck Lever III
2023-05-30 13:54                       ` Eli Cohen
2023-05-30 15:08                         ` Shay Drory
2023-05-31 14:15                           ` Chuck Lever III
2023-05-30 19:46                 ` Thomas Gleixner
2023-05-30 21:48                   ` Chuck Lever III
2023-05-30 22:17                     ` Thomas Gleixner
2023-05-31 14:43                     ` Thomas Gleixner
2023-05-31 15:06                       ` Chuck Lever III
2023-05-31 17:11                         ` Thomas Gleixner
2023-05-31 18:52                           ` Chuck Lever III [this message]
2023-05-31 19:19                             ` Thomas Gleixner
2023-05-16 19:23         ` Chuck Lever III
2023-05-23 14:20           ` Linux regression tracking (Thorsten Leemhuis)
2023-05-24 14:59             ` Chuck Lever III
2023-05-08 12:29 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-06-02 11:05   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-02 13:38     ` Chuck Lever III
2023-06-02 13:55       ` Linux regression tracking (Thorsten Leemhuis)
2023-06-02 14:03         ` Chuck Lever III
2023-06-02 14:29         ` Jason Gunthorpe
2023-06-02 15:58           ` Thorsten Leemhuis
2023-06-02 16:54           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7AECA9A6-1F7D-4F82-A4C1-83616D4D6566@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=elic@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=saeedm@nvidia.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.