From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> To: Auger Eric <eric.auger@redhat.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, Joerg Roedel <joro@8bytes.org>, David Woodhouse <dwmw2@infradead.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Alex Williamson <alex.williamson@redhat.com> Cc: Jean Delvare <khali@linux-fr.org>, Rafael Wysocki <rafael.j.wysocki@intel.com>, Raj Ashok <ashok.raj@intel.com> Subject: Re: [PATCH v5 13/23] iommu: introduce device fault report API Date: Fri, 7 Sep 2018 12:23:01 +0100 [thread overview] Message-ID: <21349516-030a-93df-35bd-3597fa1cba79@arm.com> (raw) In-Reply-To: <953746f3-352b-cd17-9938-eb78af3b58a9@redhat.com> On 07/09/2018 08:11, Auger Eric wrote: >>> On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote: >>>> On 06/09/2018 10:25, Auger Eric wrote: >>>>>> + mutex_lock(&fparam->lock); >>>>>> + list_add_tail(&evt_pending->list, &fparam->faults); >>>>> same doubt as Yi Liu. You cannot rely on the userspace willingness to >>>>> void the queue and deallocate this memory. >>> >>> By the way I saw there is a kind of garbage collectors for faults which >>> wouldn't have received any responses. However I am not sure this removes >>> the concern of having the fault list on kernel side growing beyond >>> acceptable limits. >> >> How about per-device quotas? (https://lkml.org/lkml/2018/4/23/706 for >> reference) With PRI the IOMMU driver already sets per-device credits >> when initializing the device (pci_enable_pri), so if the device behaves >> properly it shouldn't send new page requests once the number of >> outstanding ones is maxed out. > > But this needs to work for non PRI use case too? Only recoverable faults, PRI and stall, are added to the fparam->faults list, because the kernel needs to make sure that each of these faults gets a reply, or else they are held in hardware indefinitely. Non-recoverable faults don't need tracking, the IOMMU API can forget about them after they're reported. Rate-limiting could be done by the consumer if it gets flooded by non-recoverable faults, for example by dropping some of them. >> Right, an event contains more information than a PRI page request. >> Stage-2 fields (CLASS, S2, IPA, TTRnW) cannot be represented by >> iommu_fault_event at the moment. > > Yes I am currently doing the mapping exercise between SMMUv3 events and > iommu_fault_event and I miss config errors for instance. We may have initially focused only on guest and userspace config errors (IOMMU_FAULT_REASON_PASID_FETCH, IOMMU_FAULT_REASON_PASID_INVALID, etc), since other config errors are most likely a bug in the host IOMMU driver, and could be reported with pr_err > For precise emulation it might be >> useful to at least add the S2 flag (as a new iommu_fault_reason?), so >> that when the guest maps stage-1 to an invalid GPA, QEMU could for >> example inject an external abort. > > Actually we may even need to filter events and return to the guest only > the S1 related. >> >>> queue >>> size, that may deserve to create different APIs and internal data >>> structs. Also this may help separating the concerns. >> >> It might duplicate them. If the consumer of the event report is a host >> device driver, the SMMU needs to report a "generic" iommu_fault_event, >> and if the consumer is VFIO it would report a specialized one > > I am unsure of my understanding of the UNRECOVERABLE error type. Is it > everything else than a PRI. For instance are all SMMUv3 event errors > supposed to be put under the IOMMU_FAULT_DMA_UNRECOV umbrella? I guess it's more clear-cut in VT-d, which defines recoverable and non-recoverable faults. In SMMUv3, PRI Page Requests are recoverable, but event errors can also be recoverable if they have the Stall flag set. Stall is a way for non-PCI endpoints to do SVA, and I have a patch in my series that sorts events into PAGE_REQ and DMA_UNRECOV before feeding them to this API: https://patchwork.kernel.org/patch/10395043/ > If I understand correctly there are different consumers for PRI and > unrecoverable data, so why not having 2 different APIs. My reasoning was that for virtualization they go through the same channel, VFIO, until the guest or the vIOMMU dispatches them depending on their type, so we might as well use the same API. In addition, host device drivers might also want to handle stall or PRI events themselves instead of relying on the SVA infrastructure. For example the MSM GPU with SMMUv2: https://patchwork.kernel.org/patch/9953803/ >>> My remark also >>> stems from the fact the SMMU uses 2 different queues, whose size can >>> also be different. >> >> Hm, for PRI requests the kernel-userspace queue size should actually be >> the number of PRI credits for that device. Hadn't thought about it >> before, where do we pass that info to userspace? > Cannot help here at the moment, sorry. > For fault events, the >> queue could be as big as the SMMU event queue, though using all that >> space might be wasteful. > The guest has its own programming of the SMMU_EVENTQ_BASE.LOG2SIZE. This > could be used to program the SW fifo > > Non-stalled events should be rare and reporting >> them isn't urgent. Stalled ones would need the number of stall credits I >> mentioned above, which realistically will be a lot less than the SMMU >> event queue size. Given that a device will use either PRI or stall but >> not both, I still think events and PRI could go through the same queue. > Did I get it right PRI is for PCIe and STALL for non PCIe? But all that > stuff also is related to Page Request use case, right? Yes, a stall event is a page request from a non-PCI device, but it comes in through the SMMU event queue Thanks, Jean
WARNING: multiple messages have this Message-ID (diff)
From: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> To: Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>, "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>, Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Jean Delvare <khali-PUYAD+kWke1g9hUCZPvPmw@public.gmane.org>, Rafael Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Raj Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Subject: Re: [PATCH v5 13/23] iommu: introduce device fault report API Date: Fri, 7 Sep 2018 12:23:01 +0100 [thread overview] Message-ID: <21349516-030a-93df-35bd-3597fa1cba79@arm.com> (raw) In-Reply-To: <953746f3-352b-cd17-9938-eb78af3b58a9-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> On 07/09/2018 08:11, Auger Eric wrote: >>> On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote: >>>> On 06/09/2018 10:25, Auger Eric wrote: >>>>>> + mutex_lock(&fparam->lock); >>>>>> + list_add_tail(&evt_pending->list, &fparam->faults); >>>>> same doubt as Yi Liu. You cannot rely on the userspace willingness to >>>>> void the queue and deallocate this memory. >>> >>> By the way I saw there is a kind of garbage collectors for faults which >>> wouldn't have received any responses. However I am not sure this removes >>> the concern of having the fault list on kernel side growing beyond >>> acceptable limits. >> >> How about per-device quotas? (https://lkml.org/lkml/2018/4/23/706 for >> reference) With PRI the IOMMU driver already sets per-device credits >> when initializing the device (pci_enable_pri), so if the device behaves >> properly it shouldn't send new page requests once the number of >> outstanding ones is maxed out. > > But this needs to work for non PRI use case too? Only recoverable faults, PRI and stall, are added to the fparam->faults list, because the kernel needs to make sure that each of these faults gets a reply, or else they are held in hardware indefinitely. Non-recoverable faults don't need tracking, the IOMMU API can forget about them after they're reported. Rate-limiting could be done by the consumer if it gets flooded by non-recoverable faults, for example by dropping some of them. >> Right, an event contains more information than a PRI page request. >> Stage-2 fields (CLASS, S2, IPA, TTRnW) cannot be represented by >> iommu_fault_event at the moment. > > Yes I am currently doing the mapping exercise between SMMUv3 events and > iommu_fault_event and I miss config errors for instance. We may have initially focused only on guest and userspace config errors (IOMMU_FAULT_REASON_PASID_FETCH, IOMMU_FAULT_REASON_PASID_INVALID, etc), since other config errors are most likely a bug in the host IOMMU driver, and could be reported with pr_err > For precise emulation it might be >> useful to at least add the S2 flag (as a new iommu_fault_reason?), so >> that when the guest maps stage-1 to an invalid GPA, QEMU could for >> example inject an external abort. > > Actually we may even need to filter events and return to the guest only > the S1 related. >> >>> queue >>> size, that may deserve to create different APIs and internal data >>> structs. Also this may help separating the concerns. >> >> It might duplicate them. If the consumer of the event report is a host >> device driver, the SMMU needs to report a "generic" iommu_fault_event, >> and if the consumer is VFIO it would report a specialized one > > I am unsure of my understanding of the UNRECOVERABLE error type. Is it > everything else than a PRI. For instance are all SMMUv3 event errors > supposed to be put under the IOMMU_FAULT_DMA_UNRECOV umbrella? I guess it's more clear-cut in VT-d, which defines recoverable and non-recoverable faults. In SMMUv3, PRI Page Requests are recoverable, but event errors can also be recoverable if they have the Stall flag set. Stall is a way for non-PCI endpoints to do SVA, and I have a patch in my series that sorts events into PAGE_REQ and DMA_UNRECOV before feeding them to this API: https://patchwork.kernel.org/patch/10395043/ > If I understand correctly there are different consumers for PRI and > unrecoverable data, so why not having 2 different APIs. My reasoning was that for virtualization they go through the same channel, VFIO, until the guest or the vIOMMU dispatches them depending on their type, so we might as well use the same API. In addition, host device drivers might also want to handle stall or PRI events themselves instead of relying on the SVA infrastructure. For example the MSM GPU with SMMUv2: https://patchwork.kernel.org/patch/9953803/ >>> My remark also >>> stems from the fact the SMMU uses 2 different queues, whose size can >>> also be different. >> >> Hm, for PRI requests the kernel-userspace queue size should actually be >> the number of PRI credits for that device. Hadn't thought about it >> before, where do we pass that info to userspace? > Cannot help here at the moment, sorry. > For fault events, the >> queue could be as big as the SMMU event queue, though using all that >> space might be wasteful. > The guest has its own programming of the SMMU_EVENTQ_BASE.LOG2SIZE. This > could be used to program the SW fifo > > Non-stalled events should be rare and reporting >> them isn't urgent. Stalled ones would need the number of stall credits I >> mentioned above, which realistically will be a lot less than the SMMU >> event queue size. Given that a device will use either PRI or stall but >> not both, I still think events and PRI could go through the same queue. > Did I get it right PRI is for PCIe and STALL for non PCIe? But all that > stuff also is related to Page Request use case, right? Yes, a stall event is a page request from a non-PCI device, but it comes in through the SMMU event queue Thanks, Jean
next prev parent reply other threads:[~2018-09-07 11:23 UTC|newest] Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-11 20:53 [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 01/23] iommu: introduce bind_pasid_table API function Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-08-23 16:34 ` Auger Eric 2018-08-24 12:47 ` Liu, Yi L 2018-08-24 12:47 ` Liu, Yi L 2018-08-24 13:20 ` Auger Eric 2018-08-28 17:04 ` Jacob Pan 2018-08-24 15:00 ` Auger Eric 2018-08-28 5:14 ` Jacob Pan 2018-08-28 8:34 ` Auger Eric 2018-08-28 8:34 ` Auger Eric 2018-08-28 16:36 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 02/23] iommu/vt-d: move device_domain_info to header Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-13 7:33 ` Lu Baolu 2018-05-13 7:33 ` Lu Baolu 2018-05-14 18:51 ` Jacob Pan 2018-05-14 18:51 ` Jacob Pan 2018-05-13 8:01 ` Lu Baolu 2018-05-13 8:01 ` Lu Baolu 2018-05-14 18:52 ` Jacob Pan 2018-05-14 18:52 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 04/23] iommu/vt-d: add bind_pasid_table function Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-13 9:29 ` Lu Baolu 2018-05-13 9:29 ` Lu Baolu 2018-05-14 20:22 ` Jacob Pan 2018-05-14 20:22 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 05/23] iommu: introduce iommu invalidate API function Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 06/23] iommu/vt-d: add definitions for PFSID Jacob Pan 2018-05-11 20:53 ` Jacob Pan 2018-05-14 1:36 ` Lu Baolu 2018-05-14 1:36 ` Lu Baolu 2018-05-14 20:30 ` Jacob Pan 2018-05-14 20:30 ` Jacob Pan 2018-05-11 20:53 ` [PATCH v5 07/23] iommu/vt-d: fix dev iotlb pfsid use Jacob Pan 2018-05-14 1:52 ` Lu Baolu 2018-05-14 1:52 ` Lu Baolu 2018-05-14 20:38 ` Jacob Pan 2018-05-14 20:38 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 08/23] iommu/vt-d: support flushing more translation cache types Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-14 2:18 ` Lu Baolu 2018-05-14 2:18 ` Lu Baolu 2018-05-14 20:46 ` Jacob Pan 2018-05-14 20:46 ` Jacob Pan 2018-05-17 8:44 ` kbuild test robot 2018-05-17 8:44 ` kbuild test robot 2018-05-11 20:54 ` [PATCH v5 09/23] iommu/vt-d: add svm/sva invalidate function Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-14 3:35 ` Lu Baolu 2018-05-14 3:35 ` Lu Baolu 2018-05-14 20:49 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 10/23] iommu: introduce device fault data Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-09-21 10:07 ` Auger Eric 2018-09-21 17:05 ` Jacob Pan 2018-09-26 10:20 ` Auger Eric 2018-05-11 20:54 ` [PATCH v5 11/23] driver core: add per device iommu param Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-14 5:27 ` Lu Baolu 2018-05-14 5:27 ` Lu Baolu 2018-05-14 20:52 ` Jacob Pan 2018-05-14 20:52 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 12/23] iommu: add a timeout parameter for prq response Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 13/23] iommu: introduce device fault report API Jacob Pan 2018-05-14 6:01 ` Lu Baolu 2018-05-14 6:01 ` Lu Baolu 2018-05-14 20:55 ` Jacob Pan 2018-05-14 20:55 ` Jacob Pan 2018-05-15 6:52 ` Lu Baolu 2018-05-15 6:52 ` Lu Baolu 2018-05-17 11:41 ` Liu, Yi L 2018-05-17 11:41 ` Liu, Yi L 2018-05-17 15:59 ` Jacob Pan 2018-05-17 15:59 ` Jacob Pan 2018-05-17 23:22 ` Liu, Yi L 2018-05-21 23:03 ` Jacob Pan 2018-09-06 9:25 ` Auger Eric 2018-09-06 12:42 ` Jean-Philippe Brucker 2018-09-06 13:14 ` Auger Eric 2018-09-06 17:06 ` Jean-Philippe Brucker 2018-09-06 17:06 ` Jean-Philippe Brucker 2018-09-07 7:11 ` Auger Eric 2018-09-07 11:23 ` Jean-Philippe Brucker [this message] 2018-09-07 11:23 ` Jean-Philippe Brucker 2018-09-14 13:24 ` Auger Eric 2018-09-17 16:57 ` Jacob Pan 2018-09-25 14:58 ` Jean-Philippe Brucker 2018-09-25 14:58 ` Jean-Philippe Brucker 2018-09-25 22:17 ` Jacob Pan 2018-09-26 10:14 ` Jean-Philippe Brucker 2018-05-11 20:54 ` [PATCH v5 14/23] iommu: introduce page response function Jacob Pan 2018-05-14 6:39 ` Lu Baolu 2018-05-14 6:39 ` Lu Baolu 2018-05-29 16:13 ` Jacob Pan 2018-05-29 16:13 ` Jacob Pan 2018-09-10 14:52 ` Auger Eric 2018-09-10 17:50 ` Jacob Pan 2018-09-10 19:06 ` Auger Eric 2018-09-10 19:06 ` Auger Eric 2018-05-11 20:54 ` [PATCH v5 15/23] iommu: handle page response timeout Jacob Pan 2018-05-14 7:43 ` Lu Baolu 2018-05-14 7:43 ` Lu Baolu 2018-05-29 16:20 ` Jacob Pan 2018-05-30 7:46 ` Lu Baolu 2018-05-11 20:54 ` [PATCH v5 16/23] iommu/config: add build dependency for dmar Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 17/23] iommu/vt-d: report non-recoverable faults to device Jacob Pan 2018-05-11 20:54 ` Jacob Pan 2018-05-14 8:17 ` Lu Baolu 2018-05-14 8:17 ` Lu Baolu 2018-05-29 17:33 ` Jacob Pan 2018-05-29 17:33 ` Jacob Pan 2018-05-11 20:54 ` [PATCH v5 18/23] iommu/intel-svm: report device page request Jacob Pan 2018-05-11 20:54 ` [PATCH v5 19/23] iommu/intel-svm: replace dev ops with fault report API Jacob Pan 2018-05-11 20:54 ` [PATCH v5 20/23] iommu/intel-svm: do not flush iotlb for viommu Jacob Pan 2018-05-11 20:54 ` [PATCH v5 21/23] iommu/vt-d: add intel iommu page response function Jacob Pan 2018-05-11 20:54 ` [PATCH v5 22/23] trace/iommu: add sva trace events Jacob Pan 2018-05-11 20:54 ` [PATCH v5 23/23] iommu: use sva invalidate and device fault trace event Jacob Pan 2018-05-29 15:54 ` [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan 2018-05-29 15:54 ` Jacob Pan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=21349516-030a-93df-35bd-3597fa1cba79@arm.com \ --to=jean-philippe.brucker@arm.com \ --cc=alex.williamson@redhat.com \ --cc=ashok.raj@intel.com \ --cc=dwmw2@infradead.org \ --cc=eric.auger@redhat.com \ --cc=gregkh@linuxfoundation.org \ --cc=iommu@lists.linux-foundation.org \ --cc=jacob.jun.pan@linux.intel.com \ --cc=joro@8bytes.org \ --cc=khali@linux-fr.org \ --cc=linux-kernel@vger.kernel.org \ --cc=rafael.j.wysocki@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.