From: Josef Johansson <josef@oderland.se>
To: Jason Andryuk <jandryuk@gmail.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Bjorn Helgaas <helgaas@kernel.org>,
Juergen Gross <jgross@suse.com>,
linux-pci@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Date: Tue, 26 Oct 2021 23:17:10 +0200 [thread overview]
Message-ID: <ad39ee7a-3c50-58c1-6e8e-e384e4d054c6@oderland.se> (raw)
In-Reply-To: <b76373a7-1e1d-3aae-66ba-09221c752c11@oderland.se>
On 10/25/21 18:46, Josef Johansson wrote:
> On 10/25/21 14:27, Jason Andryuk wrote:
>> On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>>> functions") introduce functions pci_msi_update_mask() and
>>> pci_msix_write_vector_ctrl() that is missing checks for
>>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
>>> new mask/unmask functions"). The checks are in place at the high level
>>> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
>>> directly to the helpers.
>>>
>>> Push the pci_msi_ignore_mask check down to the functions that make
>>> the actual writes. This keeps the logic local to the writes that need
>>> to be bypassed.
>>>
>>> With Xen PV, the hypervisor is responsible for masking and unmasking the
>>> interrupts, which pci_msi_ignore_mask is used to indicate.
>>>
>>> This change avoids lockups in amdgpu drivers under Xen during boot.
>>>
>>> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
>>> Reported-by: Josef Johansson <josef@oderland.se>
>>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
>>> ---
>> I should have written that this is untested. If this is the desired
>> approach, Josef should test that it solves his boot hangs.
>>
>> Regards,
>> Jason
> I've tested this today, both the above patch, but also my own below
> where I'm patching inside __pci_write_msi_msg,
> which is the outcome of the patch above.
I tested a lot of kernels today. To create a good baseline I compiled
without any of our patches here
and with my config flags set.
CONFIG_AMD_PMC=y
# CONFIG_HSA_AMD is not set
# CONFIG_CRYPTO_DEV_CCP is not set
The kernel stopped as before, and hung.
Test number 2 was to boot with amdgpu.msi=0.
This still resulted in a bad boot since all the xhcd drivers complained.
We can be sure that it's not amdgpu per se.
Test number 3 was with Jason's patch. It worked, but suspend/resume is
not working well.
Generally it's not behaving like other kernels do, which makes it
actually change the behavior.
Now with test 4 I tried that thought, maybe this is still a good change?
I'm deprived of a good baseline in all this, so it's very hard to
navigate between all the variables.
Test number 4 was with Jason's patch plus the amdgpu-patch below.
It worked, even suspend/resume, 2 times, but then it all crashed and
burn with quite interesting stacktraces. Are amdgpu doing it wrong here
or is it just me nitpicking?
index cc2e0c9cfe0a..f125597eb991 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -279,17 +279,8 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
static void amdgpu_restore_msix(struct amdgpu_device *adev)
{
- u16 ctrl;
-
- pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
- if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
- return;
-
- /* VF FLR */
- ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
- pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
- ctrl |= PCI_MSIX_FLAGS_ENABLE;
- pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+ // checks if msix is enabled also
+ pci_restore_msi_state(adev->pdev);
}
/**
During the tests I fiddled with dpm settings, and it does have an effect
one the graphical output during suspend/resume. So maybe there's
hardware problems at play here as well.
I also looked through the code before and after Thomas' changes, and I
can't see that
this patch should make any functional difference compared to before the
MSI series of patches.
It's even such that is_virtual should be checked withing vector_ctrl. I
find Jason's patch
quite nice since it really places the checks on few places making it
easier not to slip.
Compared to my attempt that even failed because I forgot one more place
to put the checks.
With that said I would really like some more tests on this with
different chipsets, on Xen.
Any takers?
What I'm seeing is that there's no readl() in pci_msix_unmask(), it was
one in the code path before.
I'm very much unsure if there should be one there though.
We can really do a better job at the documentation for
pci_msi_ignore_mask, at least in msi.c,
maybe that should be a different patch adding some comments such that
driver folks really see
the benefits of using the built in restore functions e.g.
This became so much bigger project than I thought, thanks all for
chiming in on it.
next prev parent reply other threads:[~2021-10-26 21:17 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-18 6:22 [PATCH] PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV Josef Johansson
2021-10-19 19:57 ` Bjorn Helgaas
2021-10-19 20:15 ` Josef Johansson
2021-10-19 20:29 ` Bjorn Helgaas
2021-10-19 21:48 ` [PATCH v2] " Josef Johansson
2021-10-20 12:51 ` Marc Zyngier
2021-10-20 14:03 ` Jason Andryuk
2021-10-21 8:25 ` Josef Johansson
2021-10-24 18:55 ` Josef Johansson
2021-10-25 1:25 ` [PATCH] PCI/MSI: Fix masking MSI/MSI-X " Jason Andryuk
2021-10-25 7:44 ` David Woodhouse
2021-10-25 11:43 ` Roger Pau Monné
2021-10-25 11:53 ` David Woodhouse
2021-10-25 12:58 ` Roger Pau Monné
2021-10-25 13:02 ` David Woodhouse
2021-10-25 14:12 ` Roger Pau Monné
2021-10-25 12:31 ` Jason Andryuk
2021-10-25 12:27 ` Jason Andryuk
2021-10-25 16:46 ` Josef Johansson
2021-10-26 21:17 ` Josef Johansson [this message]
2021-10-27 8:45 ` Thomas Gleixner
2021-10-27 9:50 ` [PATCH] PCI/MSI: Move non-mask check back into low level accessors Thomas Gleixner
2021-10-27 9:54 ` Josef Johansson
2021-10-27 12:01 ` Josef Johansson
2021-10-27 15:29 ` Josef Johansson
2021-11-03 23:26 ` Thomas Gleixner
2021-11-03 23:27 ` [PATCH v2] " Thomas Gleixner
2021-11-09 14:53 ` Thomas Gleixner
2021-11-10 13:31 ` Josef Johansson
2021-11-10 16:05 ` Josef Johansson
2021-11-03 23:45 ` [PATCH] " Thomas Gleixner
2021-11-04 9:00 ` Josef Johansson
2021-11-04 17:12 ` Peter Zijlstra
2021-11-04 17:31 ` Vincent Guittot
2021-11-10 20:30 ` Josef Johansson
2021-11-10 23:13 ` Josef Johansson
2021-10-27 9:57 ` David Woodhouse
2021-10-25 1:25 ` [PATCH v2] PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV Jason Andryuk
2021-10-25 19:21 ` Josef Johansson
2021-10-27 6:24 ` David Woodhouse
2021-10-27 8:13 ` Josef Johansson
2021-10-27 8:26 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad39ee7a-3c50-58c1-6e8e-e384e4d054c6@oderland.se \
--to=josef@oderland.se \
--cc=boris.ostrovsky@oracle.com \
--cc=helgaas@kernel.org \
--cc=jandryuk@gmail.com \
--cc=jgross@suse.com \
--cc=linux-pci@vger.kernel.org \
--cc=maz@kernel.org \
--cc=tglx@linutronix.de \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).