linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Ashok Raj <ashok.raj@intel.com>
Cc: Ashok Raj <ashok_raj@linux.intel.com>,
	Baolu Lu <baolu.lu@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	iommu@lists.linux.dev
Subject: Re: Question about reserved_regions w/ Intel IOMMU
Date: Thu, 8 Jun 2023 11:15:52 -0700	[thread overview]
Message-ID: <CAKgT0Ue6xhwtRvk+sBUeinx4_mgPDoeyxLQ_2hrAVOeFMsxC6g@mail.gmail.com> (raw)
In-Reply-To: <ZIIVR2+rGemC7wlF@a4bf019067fa.jf.intel.com>

On Thu, Jun 8, 2023 at 10:52 AM Ashok Raj <ashok.raj@intel.com> wrote:
>
> On Thu, Jun 08, 2023 at 10:10:54AM -0700, Alexander Duyck wrote:
> > On Thu, Jun 8, 2023 at 8:40 AM Ashok Raj <ashok_raj@linux.intel.com> wrote:
> > >
> > > On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote:
> > > > On Wed, Jun 7, 2023 at 8:05 PM Baolu Lu <baolu.lu@linux.intel.com> wrote:
> > > > >
> > > > > On 6/8/23 7:03 AM, Alexander Duyck wrote:
> > > > > > On Wed, Jun 7, 2023 at 3:40 PM Alexander Duyck
> > > > > > <alexander.duyck@gmail.com> wrote:
> > > > > >>
> > > > > >> I am running into a DMA issue that appears to be a conflict between
> > > > > >> ACS and IOMMU. As per the documentation I can find, the IOMMU is
> > > > > >> supposed to create reserved regions for MSI and the memory window
> > > > > >> behind the root port. However looking at reserved_regions I am not
> > > > > >> seeing that. I only see the reservation for the MSI.
> > > > > >>
> > > > > >> So for example with an enabled NIC and iommu enabled w/o passthru I am seeing:
> > > > > >> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_regions
> > > > > >> 0x00000000fee00000 0x00000000feefffff msi
> > > > > >>
> > > > > >> Shouldn't there also be a memory window for the region behind the root
> > > > > >> port to prevent any possible peer-to-peer access?
> > > > > >
> > > > > > Since the iommu portion of the email bounced I figured I would fix
> > > > > > that and provide some additional info.
> > > > > >
> > > > > > I added some instrumentation to the kernel to dump the resources found
> > > > > > in iova_reserve_pci_windows. From what I can tell it is finding the
> > > > > > correct resources for the Memory and Prefetchable regions behind the
> > > > > > root port. It seems to be calling reserve_iova which is successfully
> > > > > > allocating an iova to reserve the region.
> > > > > >
> > > > > > However still no luck on why it isn't showing up in reserved_regions.
> > > > >
> > > > > Perhaps I can ask the opposite question, why it should show up in
> > > > > reserve_regions? Why does the iommu subsystem block any possible peer-
> > > > > to-peer DMA access? Isn't that a decision of the device driver.
> > > > >
> > > > > The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
> > > > > which is not related to peer-to-peer accesses.
> > > >
> > > > The problem is if the IOVA overlaps with the physical addresses of
> > > > other devices that can be routed to via ACS redirect. As such if ACS
> > > > redirect is enabled a host IOVA could be directed to another device on
> > > > the switch instead. To prevent that we need to reserve those addresses
> > > > to avoid address space collisions.
> >
> > Our test case is just to perform DMA to/from the host on one device on
> > a switch and what we are seeing is that when we hit an IOVA that
> > matches up with the physical address of the neighboring devices BAR0
> > then we are seeing an AER followed by a hot reset.
>
> ACS is always confusing.. Does your NIC have a DTLB?

No. It is using the IOMMU for all address translation. I am also
pushing back on the test being used as well. It is always possible
they have implemented something incorrectly and are overrunning a
buffer going into the reserved IOVA region and the overlap is just a
coincidence.

> If request redirect is set, and the Egress is enabled, then all
> transactions should go upstream to the root-port->IOMMU before being
> served.
>
> In my 6.0 spec its in 6.12.3 ACS Peer-to-Peer Control Interactions?
>
> And maybe lspci would show how things are setup in the switch?

We were setting the Redirect Request only, no Egress. I agree, based
on the config everything should just go upstream. However if we
eliminate the switch or put things in passthrough mode the problem
goes away.

> >
> > > Any untranslated address from a device must be forwarded to the IOMMU when
> > > ACS is enabled correct?I guess if you want true p2p, then you would need
> > > to map so that the hpa turns into the peer address.. but its always a round
> > > trip to IOMMU.
> >
> > This assumes all parts are doing the Request Redirect "correctly". In
> > our case there is a PCIe switch we are trying to debug and we have a
> > few working theories. One concern I have is that the switch may be
> > throwing an ACS violation for us using an address that matches a
> > neighboring device instead of redirecting it to the upstream port. If
> > we pull the switch and just run on the root complex the issue seems to
> > be resolved so I started poking into the code which led me to the
> > documentation pointing out what is supposed to be reserved based on
> > the root complex and MSI regions.
> >
> > As a part of going down that rabbit hole I realized that the
> > reserved_regions seems to only list the MSI reservation. However after
> > digging a bit deeper it seems like there is code to reserve the memory
> > behind the root complex in the IOVA but it doesn't look like that is
> > visible anywhere and is the piece I am currently trying to sort out.
> > What I am working on is trying to figure out if the system that is
> > failing is actually reserving that memory region in the IOVA, or if
> > that is somehow not happening in our test setup.
>
> I suspect with IOMMU, there is no need to pluck holes like we do for the
> MSI. In very early code in IOMMU i vaguely recall we did that, but our
> knowledge on ACS was weak. (not that has improved :-)).

The hole has to do mostly with avoiding any possibility of misrouting
things, or at least that was my understanding after reading it.

> Knowing how the switch and root ports are setup with forwarding may help
> with some clues.  The easy option is maybe forcibly adding to the reserved
> range may help to see if you don't see the ACS violation.
>
> Baolu might have some better ideas.

I'm working with the team having the issue to try and verify that now.
In theory it should already be reserved so I am working with them to
check that.

Thanks,

- Alex

  reply	other threads:[~2023-06-08 18:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07 22:40 Question about reserved_regions w/ Intel IOMMU Alexander Duyck
2023-06-07 23:03 ` Alexander Duyck
2023-06-08  3:03   ` Baolu Lu
2023-06-08 14:33     ` Alexander Duyck
2023-06-08 15:38       ` Ashok Raj
2023-06-08 17:10         ` Alexander Duyck
2023-06-08 17:52           ` Ashok Raj
2023-06-08 18:15             ` Alexander Duyck [this message]
2023-06-08 18:02           ` Robin Murphy
2023-06-08 18:17             ` Alexander Duyck
2023-06-08 15:28     ` Robin Murphy
2023-06-13 15:54       ` Jason Gunthorpe
2023-06-16  8:39         ` Tian, Kevin
2023-06-16 12:20           ` Jason Gunthorpe
2023-06-16 15:27             ` Alexander Duyck
2023-06-16 16:34               ` Robin Murphy
2023-06-16 18:59                 ` Jason Gunthorpe
2023-06-19 10:20                   ` Robin Murphy
2023-06-19 14:02                     ` Jason Gunthorpe
2023-06-20 14:57                       ` Alexander Duyck
2023-06-20 16:55                         ` Jason Gunthorpe
2023-06-20 17:47                           ` Alexander Duyck
2023-06-21 11:30                             ` Robin Murphy
2023-06-16 18:48               ` Jason Gunthorpe
2023-06-21  8:16             ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0Ue6xhwtRvk+sBUeinx4_mgPDoeyxLQ_2hrAVOeFMsxC6g@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=ashok_raj@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).