iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	Ashok Raj <ashok_raj@linux.intel.com>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	iommu@lists.linux.dev, Ashok Raj <ashok.raj@intel.com>
Subject: Re: Question about reserved_regions w/ Intel IOMMU
Date: Thu, 8 Jun 2023 19:02:06 +0100	[thread overview]
Message-ID: <7f1797b1-cd50-3c8d-59ff-8ce82ef1adb4@arm.com> (raw)
In-Reply-To: <CAKgT0UfTzExYZGNCEXCJaS7huWDxwoC3Z_2JCzJHAgr9Qyxmsg@mail.gmail.com>

On 2023-06-08 18:10, Alexander Duyck wrote:
> On Thu, Jun 8, 2023 at 8:40 AM Ashok Raj <ashok_raj@linux.intel.com> wrote:
>>
>> On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote:
>>> On Wed, Jun 7, 2023 at 8:05 PM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>>>>
>>>> On 6/8/23 7:03 AM, Alexander Duyck wrote:
>>>>> On Wed, Jun 7, 2023 at 3:40 PM Alexander Duyck
>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>>
>>>>>> I am running into a DMA issue that appears to be a conflict between
>>>>>> ACS and IOMMU. As per the documentation I can find, the IOMMU is
>>>>>> supposed to create reserved regions for MSI and the memory window
>>>>>> behind the root port. However looking at reserved_regions I am not
>>>>>> seeing that. I only see the reservation for the MSI.
>>>>>>
>>>>>> So for example with an enabled NIC and iommu enabled w/o passthru I am seeing:
>>>>>> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_regions
>>>>>> 0x00000000fee00000 0x00000000feefffff msi
>>>>>>
>>>>>> Shouldn't there also be a memory window for the region behind the root
>>>>>> port to prevent any possible peer-to-peer access?
>>>>>
>>>>> Since the iommu portion of the email bounced I figured I would fix
>>>>> that and provide some additional info.
>>>>>
>>>>> I added some instrumentation to the kernel to dump the resources found
>>>>> in iova_reserve_pci_windows. From what I can tell it is finding the
>>>>> correct resources for the Memory and Prefetchable regions behind the
>>>>> root port. It seems to be calling reserve_iova which is successfully
>>>>> allocating an iova to reserve the region.
>>>>>
>>>>> However still no luck on why it isn't showing up in reserved_regions.
>>>>
>>>> Perhaps I can ask the opposite question, why it should show up in
>>>> reserve_regions? Why does the iommu subsystem block any possible peer-
>>>> to-peer DMA access? Isn't that a decision of the device driver.
>>>>
>>>> The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
>>>> which is not related to peer-to-peer accesses.
>>>
>>> The problem is if the IOVA overlaps with the physical addresses of
>>> other devices that can be routed to via ACS redirect. As such if ACS
>>> redirect is enabled a host IOVA could be directed to another device on
>>> the switch instead. To prevent that we need to reserve those addresses
>>> to avoid address space collisions.
> 
> Our test case is just to perform DMA to/from the host on one device on
> a switch and what we are seeing is that when we hit an IOVA that
> matches up with the physical address of the neighboring devices BAR0
> then we are seeing an AER followed by a hot reset.
> 
>> Any untranslated address from a device must be forwarded to the IOMMU when
>> ACS is enabled correct?I guess if you want true p2p, then you would need
>> to map so that the hpa turns into the peer address.. but its always a round
>> trip to IOMMU.
> 
> This assumes all parts are doing the Request Redirect "correctly". In
> our case there is a PCIe switch we are trying to debug and we have a
> few working theories. One concern I have is that the switch may be
> throwing an ACS violation for us using an address that matches a
> neighboring device instead of redirecting it to the upstream port. If
> we pull the switch and just run on the root complex the issue seems to
> be resolved so I started poking into the code which led me to the
> documentation pointing out what is supposed to be reserved based on
> the root complex and MSI regions.
> 
> As a part of going down that rabbit hole I realized that the
> reserved_regions seems to only list the MSI reservation. However after
> digging a bit deeper it seems like there is code to reserve the memory
> behind the root complex in the IOVA but it doesn't look like that is
> visible anywhere and is the piece I am currently trying to sort out.
> What I am working on is trying to figure out if the system that is
> failing is actually reserving that memory region in the IOVA, or if
> that is somehow not happening in our test setup.

How old's the kernel? Before 5.11, intel-iommu wasn't hooked up to 
iommu-dma so didn't do quite the same thing - it only reserved whatever 
specific PCI memory resources existed at boot, rather than the whole 
window as iommu-dma does. Either way, ftrace on reserve_iova() (or just 
whack a print in there) should suffice to see what's happened.

Robin.

  parent reply	other threads:[~2023-06-08 18:02 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAKgT0UezciLjHacOx372+v8MZkDf22D5Thn82n-07xxKy_0FTQ@mail.gmail.com>
2023-06-07 23:03 ` Question about reserved_regions w/ Intel IOMMU Alexander Duyck
2023-06-08  3:03   ` Baolu Lu
2023-06-08 14:33     ` Alexander Duyck
2023-06-08 15:38       ` Ashok Raj
2023-06-08 17:10         ` Alexander Duyck
2023-06-08 17:52           ` Ashok Raj
2023-06-08 18:15             ` Alexander Duyck
2023-06-08 18:02           ` Robin Murphy [this message]
2023-06-08 18:17             ` Alexander Duyck
2023-06-08 15:28     ` Robin Murphy
2023-06-13 15:54       ` Jason Gunthorpe
2023-06-16  8:39         ` Tian, Kevin
2023-06-16 12:20           ` Jason Gunthorpe
2023-06-16 15:27             ` Alexander Duyck
2023-06-16 16:34               ` Robin Murphy
2023-06-16 18:59                 ` Jason Gunthorpe
2023-06-19 10:20                   ` Robin Murphy
2023-06-19 14:02                     ` Jason Gunthorpe
2023-06-20 14:57                       ` Alexander Duyck
2023-06-20 16:55                         ` Jason Gunthorpe
2023-06-20 17:47                           ` Alexander Duyck
2023-06-21 11:30                             ` Robin Murphy
2023-06-16 18:48               ` Jason Gunthorpe
2023-06-21  8:16             ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f1797b1-cd50-3c8d-59ff-8ce82ef1adb4@arm.com \
    --to=robin.murphy@arm.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=ashok_raj@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).