All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Rosato <mjrosato@linux.ibm.com>
To: Pierre Morel <pmorel@linux.ibm.com>, cohuck@redhat.com, thuth@redhat.com
Cc: schnelle@linux.ibm.com, david@redhat.com, mst@redhat.com,
	richard.henderson@linaro.org, qemu-s390x@nongnu.org,
	qemu-devel@nongnu.org, pasic@linux.ibm.com,
	borntraeger@de.ibm.com, alex.williamson@redhat.com,
	pbonzini@redhat.com
Subject: Re: [PATCH 0/8] s390x/pci: Fixing s390 vfio-pci ISM support
Date: Wed, 20 Jan 2021 10:59:53 -0500	[thread overview]
Message-ID: <f3e074d2-4f47-d229-9002-010e91df95d1@linux.ibm.com> (raw)
In-Reply-To: <a1d1df76-07df-9879-ae77-ff677efdd291@linux.ibm.com>

On 1/20/21 9:45 AM, Pierre Morel wrote:
> 
> 
> On 1/20/21 3:03 PM, Matthew Rosato wrote:
>> On 1/20/21 4:12 AM, Pierre Morel wrote:
>>>
>>>
>>> On 1/19/21 9:44 PM, Matthew Rosato wrote:
>>>> Today, ISM devices are completely disallowed for vfio-pci 
>>>> passthrough as
>>>> QEMU rejects the device due to an (inappropriate) MSI-X check.  
>>>> Removing
>>>> this fence, however, reveals additional deficiencies in the s390x PCI
>>>> interception layer that prevent ISM devices from working correctly.
>>>> Namely, ISM block write operations have particular requirements in 
>>>> regards
>>>> to the alignment, size and order of writes performed that cannot be
>>>> guaranteed when breaking up write operations through the typical
>>>> vfio_pci_bar_rw paths. Furthermore, ISM requires that legacy/non-MIO
>>>> s390 PCI instructions are used, which is also not guaranteed when 
>>>> the I/O
>>>> is passed through the typical userspace channels.
>>>>
>>>> This patchset provides a set of fixes related to enabling ISM device
>>>> passthrough and includes patches to enable use of a new vfio region 
>>>> that
>>>> will allow s390x PCI pass-through devices to perform s390 PCI 
>>>> instructions
>>>> in such a way that the same instruction issued on the guest is 
>>>> re-issued
>>>> on the host.
>>>>
>>>> Associated kernel patchset:
>>>> https://lkml.org/lkml/2021/1/19/874
>>>>
>>>> Changes from RFC -> v1:
>>>> - Refresh the header sync (built using Eric's 'update-linux-headers:
>>>> Include const.h' + manually removed pvrdma_ring.h again)
>>>> - Remove s390x/pci: fix pcistb length (already merged)
>>>> - Remove s390x/pci: Fix memory_region_access_valid call (already 
>>>> merged)
>>>> - Fix bug: s390_pci_vfio_pcistb should use the pre-allocated PCISTB
>>>> buffer pcistb_buf rather than allocating/freeing its own.
>>>> - New patch: track the PFT (PCI Function Type) separately from guest 
>>>> CLP
>>>> response data -- we tell the guest '0' for now due to limitations in
>>>> measurement block support, but we can still use the real value 
>>>> provided via
>>>> the vfio CLP capabilities to make decisions.
>>>> - Use the PFT (pci function type) to determine when to use the region
>>>> for PCISTB/PCILG (only for ISM), rather than using the relaxed 
>>>> alignment
>>>> bit.
>>>> - As a result, the pcistb_default is now updated to also handle the
>>>> possibility of relaxed alignment via 2 new functions, 
>>>> pcistb_validate_write
>>>> and pcistb_write, which serve as wrappers to the memory_region calls.
>>>> - New patch, which partially restores the MSI-X fence for passthrough
>>>> devices...  Could potentially be squashed with 's390x/pci: MSI-X isn't
>>>> strictly required for passthrough' but left separately for now as I 
>>>> felt it
>>>> needed a clear commit description of why we should still fence this 
>>>> case.
>>>>
>>> Hi,
>>>
>>> The choice of using the new VFIO region is made on the ISM PCI 
>>> function type (PFT), which makes the patch ISM specific, why don't we 
>>> use here the MIO bit common to any zPCI function and present in 
>>> kernel to make the choice?
>>>
>>
>> As discussed during the RFC (and see my reply also to the kernel set), 
>> the use of this region only works for devices that do not rely on 
>> MSI-X interrupts.  If we did as you suggest, other device types like 
>> mlx would not receive MSI-X interrupts in the guest (And I did indeed 
>> try variations where I used the special VFIO region for all 
>> PCISTG/PCILG/PCISTB for various device types)
>>
>> So the idea for now was to solve the specific problem at hand (getting 
>> ISM devices working).
>>
>>
> 
> Sorry, if I missed or forgot some discussions, but I understood that we 
> are using this region to handle PCISTB instructions when the device do 
> not support MIO.
> Don't we?

Sure thing - It's probably good to refresh the issue/rationale anyway as 
we've had the holidays in between.

You are correct, a primary reason we need to resort to a separate VFIO 
region for PCISTB (and PCILG) instructions for ISM devices is that they 
do not support the MIO instruction set, yet the host kernel will 
translate everything coming through the PCI I/O layer to MIO 
instructions whenever that facility is available to the host (and not 
purposely disabled).  This issue is unique to vfio-pci/passthrough - in 
the host, the ISM driver directly invokes functions in s390 pci code to 
ensure that MIO instructions are not used.

But this is not the only reason.  There are additional reasons for using 
this VFIO region:
1) ISM devices also don't support PCISTG instructions to certain address 
spaces and PCISTB must be used regardless of operation length.  However 
the standard s390 PCI I/O path always uses PCISTG for anything <=8B. 
Trying to determine whether a given I/O is intended for an ISM device at 
that point in kernel code so as to use PCISTB instead of PCISTG is the 
same problem as attempting to decide whether to use MIO vs non-MIO 
instructions at that point.
2) It allows for much larger PCISTB operations (4K) than allowed via the 
memory regions (loop of 8B operations).
3) The above also has the added benefit of eliminating certain write 
pattern requirements that are unique to ISM that would be introduced if 
we split up the I/O into 8B chunks (if we can't write the whole PCISTB 
in one go, ISM requires data written in a certain order for some address 
spaces, or with certain bits on/off on the PCISTB instruction to signify 
the state of the larger operation)

> 
> I do not understand the relation between MSI-X and MIO.
> Can you please explain?
> 

There is not a relation between MSI-X and MIO really.  Rather, this is a 
case of the solution that is being offered here ONLY works for devices 
that use MSI -- and ISM is a device that only supports MSI.  If you try 
to use this new VFIO region to pass I/O for an MSI-X enabled device, the 
notifiers set up via vfio_msix_setup won't be triggered because we are 
writing to the new VFIO region, not the virtual bar regions that may 
have had notifiers setup as part of vfio_msix_setup.  This results in 
missing interrupts on MSI-X-enabled vfio-pci devices.

These notifiers aren't a factor when the device is using MSI.



  reply	other threads:[~2021-01-20 16:03 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 20:44 [PATCH 0/8] s390x/pci: Fixing s390 vfio-pci ISM support Matthew Rosato
2021-01-19 20:44 ` [PATCH 1/8] linux-headers: update against 5.11-rc4 Matthew Rosato
2021-01-19 20:44 ` [PATCH 2/8] s390x/pci: Keep track of the PCI Function type Matthew Rosato
2021-01-19 20:44 ` [PATCH 3/8] s390x/pci: MSI-X isn't strictly required for passthrough Matthew Rosato
2021-01-19 20:44 ` [PATCH 4/8] s390x/pci: Introduce the ZpciOps structure Matthew Rosato
2021-01-19 20:44 ` [PATCH 5/8] s390x/pci: Handle devices that support relaxed alignment Matthew Rosato
2021-01-19 20:44 ` [PATCH 6/8] s390x/pci: PCISTB via the vfio zPCI I/O region Matthew Rosato
2021-01-19 20:44 ` [PATCH 7/8] s390x/pci: PCILG " Matthew Rosato
2021-01-19 20:44 ` [PATCH 8/8] s390x/pci: Prevent ISM device passthrough on older host kernels Matthew Rosato
2021-01-20  9:12 ` [PATCH 0/8] s390x/pci: Fixing s390 vfio-pci ISM support Pierre Morel
2021-01-20 14:03   ` Matthew Rosato
2021-01-20 14:45     ` Pierre Morel
2021-01-20 15:59       ` Matthew Rosato [this message]
2021-01-20 19:18         ` Pierre Morel
2021-01-20 20:29           ` Matthew Rosato
2021-01-21  8:27             ` Pierre Morel
2021-01-21  9:58               ` Niklas Schnelle
2021-01-21 12:30                 ` Pierre Morel
2021-01-21 13:37                   ` Niklas Schnelle
2021-01-21 14:46                     ` Pierre Morel
2021-01-21 14:54                       ` Niklas Schnelle
2021-01-21 17:50                         ` Cornelia Huck
2021-01-21 18:06                           ` Matthew Rosato
2021-01-22 16:46                             ` Cornelia Huck
2021-01-25 14:55                               ` Matthew Rosato
2021-01-21 14:42               ` Matthew Rosato
2021-01-21 15:45                 ` Pierre Morel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3e074d2-4f47-d229-9002-010e91df95d1@linux.ibm.com \
    --to=mjrosato@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=pmorel@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=schnelle@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.