linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, alex.williamson@redhat.com,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH 4/4] vfio/pci: Restore MSIx message prior to enabling
Date: Sat, 31 May 2014 21:42:52 +1000	[thread overview]
Message-ID: <20140531114252.GA8509@shangw> (raw)
In-Reply-To: <20140530221232.GG4607@google.com>

On Fri, May 30, 2014 at 04:12:32PM -0600, Bjorn Helgaas wrote:
>On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote:
>> The MSIx vector table lives in device memory, which may be cleared as
>> part of a backdoor device reset. This is the case on the IBM IPR HBA
>> when the BIST is run on the device. When assigned to a QEMU guest,
>> the guest driver does a pci_save_state(), issues a BIST, then does a
>> pci_restore_state(). The BIST clears the MSIx vector table, but due
>> to the way interrupts are configured the pci_restore_state() does not
>> restore the vector table as expected. Eventually this results in an
>> EEH error on Power platforms when the device attempts to signal an
>> interrupt with the zero'd table entry.
>> 
>> Fix the problem by restoring the host cached MSI message prior to
>> enabling each vector.
>> 
>> Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>> ---
>>  drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>> 
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 9dd49c9..553212f 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -16,6 +16,7 @@
>>  #include <linux/device.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/eventfd.h>
>> +#include <linux/msi.h>
>>  #include <linux/pci.h>
>>  #include <linux/file.h>
>>  #include <linux/poll.h>
>> @@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>>  		return PTR_ERR(trigger);
>>  	}
>>  
>> +	/*
>> +	 * The MSIx vector table resides in device memory which may be cleared
>> +	 * via backdoor resets. We don't allow direct access to the vector
>> +	 * table so even if a userspace driver attempts to save/restore around
>> +	 * such a reset it would be unsuccessful. To avoid this, restore the
>> +	 * cached value of the message prior to enabling.
>> +	 */
>> +	if (msix) {
>> +		struct msi_msg msg;
>> +
>> +		get_cached_msi_msg(irq, &msg);
>> +		write_msi_msg(irq, &msg);
>> +	}
>
>I think this is pretty ugly.  Drivers should not be writing to the
>MSI-X vector table, so I don't really want to export these internal
>implementation functions if we can avoid it.
>

I agree that it's ugly and I need discuss with Alex about the potential
solutions: fix the issue either from guest or qemu.

- If the "reset" is special backdoor for some devices, the device driver
  on guest side should have something like: disable MSIx entries that have
  been enabled (updating MSIx entries maintained by QEMU), pci_save_state(),
  reset(), pci_restore_state(), enable MSIx entries (updating MSIx entries
  maintained by QEMU). Disadvantage of this way would be guest driver has
  to accomodate QEMU, which sounds bad.

- In QEMU, we could have some quirk to trap when writting to registers
  for reset on basis of devices. From there, to clear the MSIx entries
  maintained by QEMU. It's similar thing to be applied when having FLR
  reset. We have to have separate quirk to accomodate every kind of devices.

- Last one is what we had. However, it's really "hack".

>I chatted with Alex about this last week on IRC, trying to understand
>what's going on here, but I'm afraid I didn't get very far.
>
>I think I understand what happens when there's no virtualization
>involved.  The driver enables MSI-X and writes the vector table via
>this path:
>
>    pci_enable_msix
>      msix_capability_init
>	arch_setup_msi_irqs
>	  native_setup_msi_irqs		# .setup_msi_irqs (on x86)
>	    setup_msi_irq
>	      write_msi_msg
>		__write_msi_msg		# write vector table
>
>When a device is reset, its MSI-X vector table is cleared.  The type
>of reset (FLR, "backdoor", etc.) doesn't really matter.
>
>After a device reset, the driver would use this path to restore the
>vector table:
>
>    pci_restore_state
>      pci_restore_msi_state
>        __pci_restore_msix_state
>          arch_restore_msi_irqs
>            default_restore_msi_irqs	# .restore_msi_irqs (on x86)
>              default_restore_msi_irq
>                write_msi_msg
>                  __write_msi_msg	# write vector table
>
>This rewrites the MSI-X vector table (it doesn't use any data that was
>saved by pci_save_state(), so it's not really a "restore" in that
>sense; it writes the vector table from scratch based on the data
>structures maintained by the MSI core).
>
>If the same driver is running in a qemu guest, it still calls
>pci_enable_msix() and pci_restore_state(), but apparently the restore
>path doesn't work.  Alex mentioned that qemu virtualizes the vector
>table, so I assume it traps the writel() to the vector table when
>enabling MSI-X?  And I assume qemu would also trap the writel() in the
>restore path, but it sounded like it ignores the write because we're
>writing the same data qemu believes to be there?
>
>I'd like to understand more details about how those writel()s
>performed by the guest kernel are handled.  Alex mentioned that the
>vector table is inaccessible to the guest, and I see code in
>vfio_pci_bar_rw() that looks like it excludes the table area, so I
>assume that is involved somehow, but I don't know how to connect the
>dots.  Obviously the enable path must be handled differently from the
>restore path somehow, because if the enable used vfio_pci_bar_rw(),
>that write would just be dropped, too, and it's not.
>

The problem is basically the MSIx entries maintained in QEMU mismatched
with those in hardware (host kernel), which is caused by backdoor "reset":

- Guest driver enables MSIx entries. MSIx entries are marked as "enabled"
  in hardware, QEMU, guest.
- Guest driver calls pci_save_state() and then issues backdoor reset. We
  lose everything in MSIx table in hardware. QEMU still maintains "enabled"
  MSIx entries.
- Guest driver calls to pci_restore_state() and tries to enable MSIx entries.
  Writing to MSIx entries trapped in QEMU. QEMU won't update MSIx entries in
  hardware because the MSIx entries are marked as "enabled" in QEMU.

Thanks,
Gavin


  reply	other threads:[~2014-05-31 11:43 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-19  3:01 [PATCH 0/4] VFIO Misc fixes Gavin Shan
2014-05-19  3:01 ` [PATCH 1/4] PCI: Export MSI message relevant functions Gavin Shan
2014-05-22  5:10   ` Gavin Shan
2014-09-04 22:57   ` Bjorn Helgaas
2014-09-05  0:15     ` Gavin Shan
2014-05-19  3:01 ` [PATCH 2/4] drivers/vfio: Rework offsetofend() Gavin Shan
2014-05-19  3:01 ` [PATCH 3/4] drivers/vfio/pci: Fix wrong MSI interrupt count Gavin Shan
2014-05-19  3:01 ` [PATCH 4/4] vfio/pci: Restore MSIx message prior to enabling Gavin Shan
2014-05-30 22:12   ` Bjorn Helgaas
2014-05-31 11:42     ` Gavin Shan [this message]
2014-06-02 16:57       ` Bjorn Helgaas
2014-06-05  5:51         ` Gavin Shan
2014-09-10  8:13   ` Gavin Shan
2014-09-26  3:19     ` Gavin Shan
2014-09-26  3:46       ` Alex Williamson
2014-09-27  5:33         ` Gavin Shan
2014-05-19 21:37 ` [PATCH 0/4] VFIO Misc fixes Alex Williamson
2014-05-30 21:06   ` Alex Williamson
  -- strict thread matches above, loose matches on Subject: below --
2014-05-13  1:35 [PATCH v2 " Gavin Shan
2014-05-13  1:35 ` [PATCH 4/4] vfio/pci: Restore MSIx message prior to enabling Gavin Shan
2014-05-19  2:54   ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140531114252.GA8509@shangw \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).