xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Josef Johansson <josef@oderland.se>
To: Jason Andryuk <jandryuk@gmail.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Juergen Gross <jgross@suse.com>,
	linux-pci@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Date: Tue, 26 Oct 2021 23:17:10 +0200	[thread overview]
Message-ID: <ad39ee7a-3c50-58c1-6e8e-e384e4d054c6@oderland.se> (raw)
In-Reply-To: <b76373a7-1e1d-3aae-66ba-09221c752c11@oderland.se>

On 10/25/21 18:46, Josef Johansson wrote:
> On 10/25/21 14:27, Jason Andryuk wrote:
>> On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>>> functions") introduce functions pci_msi_update_mask() and
>>> pci_msix_write_vector_ctrl() that is missing checks for
>>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
>>> new mask/unmask functions").  The checks are in place at the high level
>>> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
>>> directly to the helpers.
>>>
>>> Push the pci_msi_ignore_mask check down to the functions that make
>>> the actual writes.  This keeps the logic local to the writes that need
>>> to be bypassed.
>>>
>>> With Xen PV, the hypervisor is responsible for masking and unmasking the
>>> interrupts, which pci_msi_ignore_mask is used to indicate.
>>>
>>> This change avoids lockups in amdgpu drivers under Xen during boot.
>>>
>>> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
>>> Reported-by: Josef Johansson <josef@oderland.se>
>>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
>>> ---
>> I should have written that this is untested.  If this is the desired
>> approach, Josef should test that it solves his boot hangs.
>>
>> Regards,
>> Jason
> I've tested this today, both the above patch, but also my own below
> where I'm patching inside __pci_write_msi_msg,
> which is the outcome of the patch above.
I tested a lot of kernels today. To create a good baseline I compiled
without any of our patches here
and with my config flags set.

CONFIG_AMD_PMC=y
# CONFIG_HSA_AMD is not set
# CONFIG_CRYPTO_DEV_CCP is not set

The kernel stopped as before, and hung.

Test number 2 was to boot with amdgpu.msi=0.
This still resulted in a bad boot since all the xhcd drivers complained.
We can be sure that it's not amdgpu per se.

Test number 3 was with Jason's patch. It worked, but suspend/resume is
not working well.
Generally it's not behaving like other kernels do, which makes it
actually change the behavior.
Now with test 4 I tried that thought, maybe this is still a good change?
I'm deprived of a good baseline in all this, so it's very hard to
navigate between all the variables.

Test number 4 was with Jason's patch plus the amdgpu-patch below.
It worked, even suspend/resume, 2 times, but then it all crashed and
burn with quite interesting stacktraces. Are amdgpu doing it wrong here
or is it just me nitpicking?

index cc2e0c9cfe0a..f125597eb991 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -279,17 +279,8 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
 
 static void amdgpu_restore_msix(struct amdgpu_device *adev)
 {
-	u16 ctrl;
-
-	pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
-	if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
-		return;
-
-	/* VF FLR */
-	ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
-	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
-	ctrl |= PCI_MSIX_FLAGS_ENABLE;
-	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+	// checks if msix is enabled also
+	pci_restore_msi_state(adev->pdev);
 }
 
 /**

During the tests I fiddled with dpm settings, and it does have an effect
one the graphical output during suspend/resume. So maybe there's
hardware problems at play here as well.

I also looked through the code before and after Thomas' changes, and I
can't see that
this patch should make any functional difference compared to before the
MSI series of patches.
It's even such that is_virtual should be checked withing vector_ctrl. I
find Jason's patch
quite nice since it really places the checks on few places making it
easier not to slip.
Compared to my attempt that even failed because I forgot one more place
to put the checks.

With that said I would really like some more tests on this with
different chipsets, on Xen.
Any takers?

What I'm seeing is that there's no readl() in pci_msix_unmask(), it was
one in the code path before.
I'm very much unsure if there should be one there though.

We can really do a better job at the documentation for
pci_msi_ignore_mask, at least in msi.c,
maybe that should be a different patch adding some comments such that
driver folks really see
the benefits of using the built in restore functions e.g.

This became so much bigger project than I thought, thanks all for
chiming in on it.


  reply	other threads:[~2021-10-26 21:17 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18  6:22 [PATCH] PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV Josef Johansson
2021-10-19 19:57 ` Bjorn Helgaas
2021-10-19 20:15   ` Josef Johansson
2021-10-19 20:29     ` Bjorn Helgaas
2021-10-19 21:48       ` [PATCH v2] " Josef Johansson
2021-10-20 12:51         ` Marc Zyngier
2021-10-20 14:03           ` Jason Andryuk
2021-10-21  8:25             ` Josef Johansson
2021-10-24 18:55               ` Josef Johansson
2021-10-25  1:25                 ` [PATCH] PCI/MSI: Fix masking MSI/MSI-X " Jason Andryuk
2021-10-25  7:44                   ` David Woodhouse
2021-10-25 11:43                     ` Roger Pau Monné
2021-10-25 11:53                       ` David Woodhouse
2021-10-25 12:58                         ` Roger Pau Monné
2021-10-25 13:02                           ` David Woodhouse
2021-10-25 14:12                             ` Roger Pau Monné
2021-10-25 12:31                     ` Jason Andryuk
2021-10-25 12:27                   ` Jason Andryuk
2021-10-25 16:46                     ` Josef Johansson
2021-10-26 21:17                       ` Josef Johansson [this message]
2021-10-27  8:45                   ` Thomas Gleixner
2021-10-27  9:50                     ` [PATCH] PCI/MSI: Move non-mask check back into low level accessors Thomas Gleixner
2021-10-27  9:54                       ` Josef Johansson
2021-10-27 12:01                         ` Josef Johansson
2021-10-27 15:29                           ` Josef Johansson
2021-11-03 23:26                             ` Thomas Gleixner
2021-11-03 23:27                               ` [PATCH v2] " Thomas Gleixner
2021-11-09 14:53                                 ` Thomas Gleixner
2021-11-10 13:31                                   ` Josef Johansson
2021-11-10 16:05                                     ` Josef Johansson
2021-11-03 23:45                             ` [PATCH] " Thomas Gleixner
2021-11-04  9:00                               ` Josef Johansson
2021-11-04 17:12                               ` Peter Zijlstra
2021-11-04 17:31                               ` Vincent Guittot
2021-11-10 20:30                               ` Josef Johansson
2021-11-10 23:13                                 ` Josef Johansson
2021-10-27  9:57                       ` David Woodhouse
2021-10-25  1:25                 ` [PATCH v2] PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV Jason Andryuk
2021-10-25 19:21                   ` Josef Johansson
2021-10-27  6:24                     ` David Woodhouse
2021-10-27  8:13                       ` Josef Johansson
2021-10-27  8:26                         ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad39ee7a-3c50-58c1-6e8e-e384e4d054c6@oderland.se \
    --to=josef@oderland.se \
    --cc=boris.ostrovsky@oracle.com \
    --cc=helgaas@kernel.org \
    --cc=jandryuk@gmail.com \
    --cc=jgross@suse.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).