All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <sstabellini@kernel.org>
To: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Chen" <Wei.Chen@arm.com>,
	"Steve Capper" <Steve.Capper@arm.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Jiandi An" <anjiandi@codeaurora.org>,
	"Julien Grall" <julien.grall@linaro.org>,
	alistair.francis@xilinx.com,
	"Punit Agrawal" <punit.agrawal@arm.com>,
	"Campbell Sean" <scampbel@codeaurora.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"manish.jaggi@caviumnetworks.com"
	<manish.jaggi@caviumnetworks.com>,
	"Shanker Donthineni" <shankerd@codeaurora.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [early RFC] ARM PCI Passthrough design document
Date: Thu, 9 Feb 2017 17:01:37 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1702091658420.20549@sstabellini-ThinkPad-X260> (raw)
In-Reply-To: <20170202234452.GN9606@toto>

On Fri, 3 Feb 2017, Edgar E. Iglesias wrote:
> On Thu, Feb 02, 2017 at 03:12:52PM -0800, Stefano Stabellini wrote:
> > On Thu, 2 Feb 2017, Edgar E. Iglesias wrote:
> > > On Wed, Feb 01, 2017 at 07:04:43PM +0000, Julien Grall wrote:
> > > > Hi Edgar,
> > > > 
> > > > On 31/01/2017 19:06, Edgar E. Iglesias wrote:
> > > > >On Tue, Jan 31, 2017 at 05:09:53PM +0000, Julien Grall wrote:
> > > > >>On 31/01/17 16:53, Edgar E. Iglesias wrote:
> > > > >>>On Wed, Jan 25, 2017 at 06:53:20PM +0000, Julien Grall wrote:
> > > > >>>>On 24/01/17 20:07, Stefano Stabellini wrote:
> > > > >>>>>On Tue, 24 Jan 2017, Julien Grall wrote:
> > > > >>>>For generic host bridge, the initialization is inexistent. However some host
> > > > >>>>bridge (e.g xgene, xilinx) may require some specific setup and also
> > > > >>>>configuring clocks. Given that Xen only requires to access the configuration
> > > > >>>>space, I was thinking to let DOM0 initialization the host bridge. This would
> > > > >>>>avoid to import a lot of code in Xen, however this means that we need to
> > > > >>>>know when the host bridge has been initialized before accessing the
> > > > >>>>configuration space.
> > > > >>>
> > > > >>>
> > > > >>>Yes, that's correct.
> > > > >>>There's a sequence on the ZynqMP that involves assiging Gigabit Transceivers
> > > > >>>to PCI (GTs are shared among PCIe, USB, SATA and the Display Port),
> > > > >>>enabling clocks and configuring a few registers to enable ECAM and MSI.
> > > > >>>
> > > > >>>I'm not sure if this could be done prior to starting Xen. Perhaps.
> > > > >>>If so, bootloaders would have to know a head of time what devices
> > > > >>>the GTs are supposed to be configured for.
> > > > >>
> > > > >>I've got further questions regarding the Gigabit Transceivers. You mention
> > > > >>they are shared, do you mean that multiple devices can use a GT at the same
> > > > >>time? Or the software is deciding at startup which device will use a given
> > > > >>GT? If so, how does the software make this decision?
> > > > >
> > > > >Software will decide at startup. AFAIK, the allocation is normally done
> > > > >once but I guess that in theory you could design boards that could switch
> > > > >at runtime. I'm not sure we need to worry about that use-case though.
> > > > >
> > > > >The details can be found here:
> > > > >https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > >I suggest looking at pages 672 and 733.
> > > > 
> > > > Thank you for the documentation. I am trying to understand if we could move
> > > > initialization in Xen as suggested by Stefano. I looked at the driver in
> > > > Linux and the code looks simple not many dependencies. However, I was not
> > > > able to find where the Gigabit Transceivers are configured. Do you have any
> > > > link to the code for that?
> > > 
> > > Hi Julien,
> > > 
> > > I suspect that this setup has previously been done by the initial bootloader
> > > auto-generated from design configuration tools.
> > > 
> > > Now, this is moving into Linux.
> > > There's a specific driver that does that but AFAICS, it has not been upstreamed yet.
> > > You can see it here:
> > > https://github.com/Xilinx/linux-xlnx/blob/master/drivers/phy/phy-zynqmp.c
> > > 
> > > DTS nodes that need a PHY can then just refer to it, here's an example from SATA:
> > > &sata {
> > >         phy-names = "sata-phy";
> > >         phys = <&lane3 PHY_TYPE_SATA 1 3 150000000>;
> > > };
> > > 
> > > I'll see if I can find working examples for PCIe on the ZCU102. Then I'll share
> > > DTS, Kernel etc.
> > > 
> > > If you are looking for a platform to get started, an option could be if I get you a build of
> > > our QEMU that includes models for the PCIe controller, MSI and SMMU connections.
> > > These models are friendly wrt. PHY configs and initialization sequences, it will
> > > accept pretty much any sequence and still work. This would allow you to focus on
> > > architectural issues rather than exact details of init sequences (which we can
> > > deal with later).
> > > 
> > > 
> > > 
> > > > 
> > > > This would also mean that the MSI interrupt controller will be moved in Xen.
> > > > Which I think is a more sensible design (see more below).
> > > > 
> > > > >>
> > > > >>>>	- For all other host bridges => I don't know if there are host bridges
> > > > >>>>falling under this category. I also don't have any idea how to handle this.
> > > > >>>>
> > > > >>>>>
> > > > >>>>>Otherwise, if Dom0 is the only one to drive the physical host bridge,
> > > > >>>>>and Xen is the one to provide the emulated host bridge, how are DomU PCI
> > > > >>>>>config reads and writes supposed to work in details?
> > > > >>>>
> > > > >>>>I think I have answered to this question with my explanation above. Let me
> > > > >>>>know if it is not the case.
> > > > >>>>
> > > > >>>>>How is MSI configuration supposed to work?
> > > > >>>>
> > > > >>>>For GICv3 ITS, the MSI will be configured with the eventID (it is uniq
> > > > >>>>per-device) and the address of the doorbell. The linkage between the LPI and
> > > > >>>>"MSI" will be done through the ITS.
> > > > >>>>
> > > > >>>>For GICv2m, the MSI will be configured with an SPIs (or offset on some
> > > > >>>>GICv2m) and the address of the doorbell. Note that for DOM0 SPIs are mapped
> > > > >>>>1:1.
> > > > >>>>
> > > > >>>>So in both case, I don't think it is necessary to trap MSI configuration for
> > > > >>>>DOM0. This may not be true if we want to handle other MSI controller.
> > > > >>>>
> > > > >>>>I have in mind the xilinx MSI controller (embedded in the host bridge? [4])
> > > > >>>>and xgene MSI controller ([5]). But I have no idea how they work and if we
> > > > >>>>need to support them. Maybe Edgar could share details on the Xilinx one?
> > > > >>>
> > > > >>>
> > > > >>>The Xilinx controller has 2 dedicated SPIs and pages for MSIs. AFAIK, there's no
> > > > >>>way to protect the MSI doorbells from mal-configured end-points raising malicious EventIDs.
> > > > >>>So perhaps trapped config accesses from domUs can help by adding this protection
> > > > >>>as drivers configure the device.
> > > > >>>
> > > > >>>On Linux, Once MSI's hit, the kernel takes the SPI interrupts, reads
> > > > >>>out the EventID from a FIFO in the controller and injects a new IRQ into
> > > > >>>the kernel.
> > > > >>
> > > > >>It might be early to ask, but how do you expect  MSI to work with DOMU on
> > > > >>your hardware? Does your MSI controller supports virtualization? Or are you
> > > > >>looking for a different way to inject MSI?
> > > > >
> > > > >MSI support in HW is quite limited to support domU and will require SW hacks :-(
> > > > >
> > > > >Anyway, something along the lines of this might work:
> > > > >
> > > > >* Trap domU CPU writes to MSI descriptors in config space.
> > > > >  Force real MSI descriptors to the address of the door bell area.
> > > > >  Force real MSI descriptors to use a specific device unique Event ID allocated by Xen.
> > > > >  Remember what EventID domU requested per device and descriptor.
> > > > >
> > > > >* Xen or Dom0 take the real SPI generated when device writes into the doorbell area.
> > > > >  At this point, we can read out the EventID from the MSI FIFO and map it to the one requested from domU.
> > > > >  Xen or Dom0 inject the expected EventID into domU
> > > > >
> > > > >Do you have any good ideas? :-)
> > > > 
> > > > From my understanding your MSI controller is embedded in the hostbridge,
> > > > right? If so, the MSIs would need to be handled where the host bridge will
> > > > be initialized (e.g either Xen or DOM0).
> > > 
> > > Yes, it is.
> > > 
> > > > 
> > > > From a design point of view, it would make more sense to have the MSI
> > > > controller driver in Xen as the hostbridge emulation for guest will also
> > > > live there.
> > > > 
> > > > So if we receive MSI in Xen, we need to figure out a way for DOM0 and guest
> > > > to receive MSI. The same way would be the best, and I guess non-PV if
> > > > possible. I know you are looking to boot unmodified OS in a VM. This would
> > > > mean we need to emulate the MSI controller and potentially xilinx PCI
> > > > controller. How much are you willing to modify the OS?
> > > 
> > > Today, we have not yet implemented PCIe drivers for our baremetal SDK. So
> > > things are very open and we could design with pretty much anything in mind.
> > > 
> > > Yes, we could perhaps include a very small model with most registers dummied.
> > > Implementing the MSI read FIFO would allow us to:
> > > 
> > > 1. Inject the MSI doorbell SPI into guests. The guest will then see the same
> > >    IRQ as on real HW.
> > > 
> > > 2. Guest reads host-controller registers (MSI FIFO) to get the signaled MSI.
> > > 
> > > 
> > > 
> > > > Regarding the MSI doorbell, I have seen it is configured by the software
> > > > using a physical address of a page allocated in the RAM. When the PCI
> > > > devices is writing into the doorbell does the access go through the SMMU?
> > > 
> > > That's a good question. On our QEMU model it does, but I'll have to dig a little to see if that is the case on real HW aswell.
> > > 
> > > > Regardless the answer, I think we would need to map the MSI doorbell page in
> > > > the guest. Meaning that even if we trap MSI configuration access, a guess
> > > > could DMA in the page. So if I am not mistaken, MSI would be insecure in
> > > > this case :/.
> > > > 
> > > > Or maybe we could avoid mapping the doorbell in the guest and let Xen
> > > > receive an SMMU abort. When receiving the SMMU abort, Xen could sanitize the
> > > > value and write into the real MSI doorbell. Not sure if it would works
> > > > thought.
> > > 
> > > Yeah, this is a problem.
> > > I'm not sure if SMMU aborts would work because I don't think we know the value of the data written when we take the abort.
> > > Without the data, I'm not sure how we would distinguish between different MSI's from the same device.
> > > 
> > > Also, even if the MSI doorbell would be protected by the SMMU, all PCI devices are presented with the same AXI Master ID.
> > 
> > Does that mean that from the SMMU perspective you can only assign them
> > all or none?
> 
> Unfortunately yes.
> 
> 
> > > BTW, this master-ID SMMU limitation is a showstopper for domU guests isn't it?
> > > Or do you have ideas around that? Perhaps some PV way to request mappings for DMA?
> > 
> > No, we don't have anything like that. There are too many device specific
> > ways to request DMAs to do that. For devices that cannot be effectively
> > protected by IOMMU, (on x86) we support assignment but only in an
> > insecure fashion.
> 
> OK, I see.
> 
> A possible hack could be to allocate a chunk of DDR dedicated for PCI DMA.
> PCI DMA devs could be locked in to only be able to access this mem + MSI doorbell.
> Guests can still screw each other up but at least it becomes harder to read/write directly from each others OS memory.
> It may not be worth the effort though....

Actually, we do have the swiotlb in Dom0, which can be used to bounce
DMA requests over a buffer that has been previously setup to be DMA safe
using an hypercall. That is how the swiotlb is used on x86. On ARM it is
used to issue cache flushes via hypercall, but it could be adapted to do
both. It would degrade performance, due to the additional memcpy, but it
would work, I believe.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-02-10  1:01 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-29 14:04 [early RFC] ARM PCI Passthrough design document Julien Grall
2016-12-29 14:16 ` Jaggi, Manish
2016-12-29 17:03   ` Julien Grall
2016-12-29 18:41     ` Jaggi, Manish
2016-12-29 19:38       ` Julien Grall
2017-01-04  0:24 ` Stefano Stabellini
2017-01-24 14:28   ` Julien Grall
2017-01-24 20:07     ` Stefano Stabellini
2017-01-25 11:21       ` Roger Pau Monné
2017-01-25 18:53       ` Julien Grall
2017-01-31 16:53         ` Edgar E. Iglesias
2017-01-31 17:09           ` Julien Grall
2017-01-31 19:06             ` Edgar E. Iglesias
2017-01-31 22:08               ` Stefano Stabellini
2017-02-01 19:04               ` Julien Grall
2017-02-01 19:31                 ` Stefano Stabellini
2017-02-01 20:24                   ` Julien Grall
2017-02-02 15:33                 ` Edgar E. Iglesias
2017-02-02 23:12                   ` Stefano Stabellini
2017-02-02 23:44                     ` Edgar E. Iglesias
2017-02-10  1:01                       ` Stefano Stabellini [this message]
2017-02-13 15:39                         ` Julien Grall
2017-02-13 19:59                           ` Stefano Stabellini
2017-02-14 17:21                             ` Julien Grall
2017-02-14 18:20                               ` Stefano Stabellini
2017-02-14 20:18                                 ` Julien Grall
2017-02-13 15:35                   ` Julien Grall
2017-02-22  4:03                     ` Edgar E. Iglesias
2017-02-23 16:47                       ` Julien Grall
2017-03-02 21:13                         ` Edgar E. Iglesias
2017-02-02 15:40                 ` Roger Pau Monné
2017-02-13 16:22                   ` Julien Grall
2017-01-31 21:58         ` Stefano Stabellini
2017-02-01 20:12           ` Julien Grall
2017-02-01 10:55         ` Roger Pau Monné
2017-02-01 18:50           ` Stefano Stabellini
2017-02-10  9:48             ` Roger Pau Monné
2017-02-10 10:11               ` Paul Durrant
2017-02-10 12:57                 ` Roger Pau Monne
2017-02-10 13:02                   ` Paul Durrant
2017-02-10 21:04                     ` Stefano Stabellini
2017-02-02 12:38           ` Julien Grall
2017-02-02 23:06             ` Stefano Stabellini
2017-03-08 19:06               ` Julien Grall
2017-03-08 19:12                 ` Konrad Rzeszutek Wilk
2017-03-08 19:55                   ` Stefano Stabellini
2017-03-08 21:51                     ` Julien Grall
2017-03-09  2:59                   ` Roger Pau Monné
2017-03-09 11:17                     ` Konrad Rzeszutek Wilk
2017-03-09 13:26                       ` Julien Grall
2017-03-10  0:29                         ` Konrad Rzeszutek Wilk
2017-03-10  3:23                           ` Roger Pau Monné
2017-03-10 15:28                             ` Konrad Rzeszutek Wilk
2017-03-15 12:07                               ` Roger Pau Monné
2017-03-15 12:42                                 ` Konrad Rzeszutek Wilk
2017-03-15 12:56                                   ` Roger Pau Monné
2017-03-15 15:11                                     ` Venu Busireddy
2017-03-15 16:38                                       ` Roger Pau Monn?
2017-03-15 16:54                                         ` Venu Busireddy
2017-03-15 17:00                                           ` Roger Pau Monn?
2017-05-03 12:38                                             ` Julien Grall
2017-05-03 12:53                                         ` Julien Grall
2017-01-25  4:23     ` Manish Jaggi
2017-01-06 15:12 ` Roger Pau Monné
2017-01-06 21:16   ` Stefano Stabellini
2017-01-24 17:17   ` Julien Grall
2017-01-25 11:42     ` Roger Pau Monné
2017-01-31 15:59       ` Julien Grall
2017-01-31 22:03         ` Stefano Stabellini
2017-02-01 10:28           ` Roger Pau Monné
2017-02-01 18:45             ` Stefano Stabellini
2017-01-06 16:27 ` Edgar E. Iglesias
2017-01-06 21:12   ` Stefano Stabellini
2017-01-09 17:50     ` Edgar E. Iglesias
2017-01-19  5:09 ` Manish Jaggi
2017-01-24 17:43   ` Julien Grall
2017-01-25  4:37     ` Manish Jaggi
2017-01-25 15:25       ` Julien Grall
2017-01-30  7:41         ` Manish Jaggi
2017-01-31 13:33           ` Julien Grall
2017-05-19  6:38 ` Goel, Sameer
2017-05-19 16:48   ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1702091658420.20549@sstabellini-ThinkPad-X260 \
    --to=sstabellini@kernel.org \
    --cc=Steve.Capper@arm.com \
    --cc=Wei.Chen@arm.com \
    --cc=alistair.francis@xilinx.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anjiandi@codeaurora.org \
    --cc=edgar.iglesias@gmail.com \
    --cc=edgar.iglesias@xilinx.com \
    --cc=julien.grall@linaro.org \
    --cc=manish.jaggi@caviumnetworks.com \
    --cc=punit.agrawal@arm.com \
    --cc=roger.pau@citrix.com \
    --cc=scampbel@codeaurora.org \
    --cc=shankerd@codeaurora.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.