On Wed, 14 Aug 2019, Julien Grall wrote: > Hi Oleksandr, > > On 13/08/2019 13:35, Oleksandr wrote: > > > > On 12.08.19 22:46, Julien Grall wrote: > > > Hi Oleksandr, > > > > Hi, Julien > > > > > > > > > > On 8/12/19 1:01 PM, Oleksandr wrote: > > > > On 12.08.19 14:11, Julien Grall wrote: > > > > > On 02/08/2019 17:39, Oleksandr Tyshchenko wrote: > > > > > > From: Oleksandr Tyshchenko > > > > > > > > > > > > This patch adds minimal required support to General IOMMU framework > > > > > > to be able to handle a case when IOMMU driver requesting deferred > > > > > > probing for a device. > > > > > > > > > > > > In order not to pull Linux's error code (-EPROBE_DEFER) to Xen > > > > > > we have chosen -EAGAIN to be used for indicating that device > > > > > > probing is deferred. > > > > > > > > > > > > This is needed for the upcoming IPMMU driver which may request > > > > > > deferred probing depending on what device will be probed the first > > > > > > (there is some dependency between these devices, Root device must be > > > > > > registered before Cache devices. If not the case, driver will deny > > > > > > further Cache device probes until Root device is registered). > > > > > > As we can't guarantee a fixed pre-defined order for the device nodes > > > > > > in DT, we need to be ready for the situation where devices being > > > > > > probed in "any" order. > > > > > > > > > > > > Signed-off-by: Oleksandr Tyshchenko > > > > > > --- > > > > > >   xen/common/device_tree.c            |  1 + > > > > > >   xen/drivers/passthrough/arm/iommu.c | 35 > > > > > > ++++++++++++++++++++++++++++++++++- > > > > > >   xen/include/asm-arm/device.h        |  6 +++++- > > > > > >   xen/include/xen/device_tree.h       |  1 + > > > > > >   4 files changed, 41 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c > > > > > > index e107c6f..6f37448 100644 > > > > > > --- a/xen/common/device_tree.c > > > > > > +++ b/xen/common/device_tree.c > > > > > > @@ -1774,6 +1774,7 @@ static unsigned long __init > > > > > > unflatten_dt_node(const void *fdt, > > > > > >           /* By default the device is not protected */ > > > > > >           np->is_protected = false; > > > > > >           INIT_LIST_HEAD(&np->domain_list); > > > > > > +        INIT_LIST_HEAD(&np->deferred_probe); > > > > > > > > > > I am not entirely happy to add a new list_head field per node just for > > > > > the benefits of boot code. Could we re-use domain_list (with a comment > > > > > in the code and appropriate ASSERT)? > > > > > > > > Agree that only boot code uses deferred_probe field. I will consider > > > > re-using domain_list. Could you please clarify regarding ASSERT (where > > > > to put and what to check). > > > > > > What I meant is adding an ASSERT to check that np->domain_list is at empty > > > at least before trying to add in the list. This would help to debug any > > > potential issue if we end up to use domain_list earlier in the future. I > > > can't see why it would as iommu is called earlier, but who knows :). > > > > Got it. Thank you for clarification. > > > > > > > > > > + > > > > > >   static const struct iommu_ops *iommu_ops; > > > > > >     const struct iommu_ops *iommu_get_ops(void) > > > > > > @@ -42,7 +48,7 @@ void __init iommu_set_ops(const struct iommu_ops > > > > > > *ops) > > > > > >     int __init iommu_hardware_setup(void) > > > > > >   { > > > > > > -    struct dt_device_node *np; > > > > > > +    struct dt_device_node *np, *tmp; > > > > > >       int rc; > > > > > >       unsigned int num_iommus = 0; > > > > > >   @@ -51,6 +57,33 @@ int __init iommu_hardware_setup(void) > > > > > >           rc = device_init(np, DEVICE_IOMMU, NULL); > > > > > >           if ( !rc ) > > > > > >               num_iommus++; > > > > > > +        else if (rc == -EAGAIN) > > > > > > +            /* > > > > > > +             * Driver requested deferred probing, so add this > > > > > > device to > > > > > > +             * the deferred list for further processing. > > > > > > +             */ > > > > > > +            list_add(&np->deferred_probe, &deferred_probe_list); > > > > > > +    } > > > > > > + > > > > > > +    /* > > > > > > +     * Process devices in the deferred list if at least one > > > > > > successfully > > > > > > +     * probed device is present. > > > > > > +     */ > > > > > > > > > > I think this can turn into an infinite loop if all device in > > > > > deferred_probe_list still return -EDEFER_PROBE and num_iommus is a > > > > > non-zero. > > > > > > > > Agree. > > > > > > > > > > > > > > > > > > A better condition would be to check that at least one IOMMU is added > > > > > at each loop. If not, then we should bail with an error because it > > > > > likely means something is buggy. > > > > > > > > Sounds reasonable. Will do. > > > > > > > > > > > > Just to clarify: > > > > > > > >  >>> A better condition would be to check that at least one IOMMU is > > > > added at each loop. > > > > > > > > Maybe, not only added (rc == 0), but driver didn't request deferred > > > > probe (rc != -EAGAIN). > > > > > > I think adding an IOMMU is enough. If you return an error other than > > > -EAGAIN here after deferring probing, then you are likely going to fail at > > > the next loop. So better to stop early. > > > > It makes sense. > > > > > > > > > > > > > I realize this not what the current code is doing (I know I wrote it ;)). > > > But I am not sure it is sane to continue if only part of the IOMMUs are > > > initialized. Most likely you will see an error much later that may be not > > > trivial to find out. > > > > > > Imagine you want to passthrough you network card to a guest but the IOMMU > > > initialization failed... > > > > Oh, agree. > > > > As I understand, the new strict logic would be the following: > > > > If initialization for at least one IOMMU device failed (device_init returns > > an error other than -EAGAIN), we should stop and return an error to upper > > layer (even if num_iommus > 0). No matter whether it is during the first > > attempt or after deferring probe. We don't allow the "I/O virtualisation" to > > be enabled (iommu_enabled == true) with only part of the IOMMU devices being > > initialized. Is my understanding correct? > > Let me summarize the discussion we had on IRC :). Without your patch, Xen may > initialize only half the IOMMUs. If the device is behind an IOMMU that wasn't > initialized, then we have two possibility: > 1) The device was already mark as protected (if using the old binding in > the SMMU). Xen will not be able to assign the device to Dom0 and therefore Xen > will crash (not able to build dom0). For domU, it will depend whether the > configuration contain the options 'dtdev'. If the option is specified, then > guest will fail to build. On the contrary if the option isn't specified then > the guest will boot and this could either lead to transaction failure (if the > IOMMU was already reset) or bypassing the IOMMU. Note that the latter can > today happen if your IOMMU was disabled. But we can't do much against it. > 2) The device is not marked as protected. Xen will not be able to "assign" > the device to Dom0 and this could either lead to the device bypassing the > IOMMU or a transaction failure. For domU, the problem is similar to 1). > > In the case of the SMMU driver, we only support old bindings. So devices are > marked as protected during SMMU initialization. This is done before the SMMU > is reset. Before reset the SMMU will bypassed. > > So the risk is to have an half secure system and may be unnoticed until later. > I realize this is the current behavior, so not very ideal. > > It feels to me if the user requested to use IOMMU then if we should panic if > any of the available IOMMU are not initialized correctly. This will save a lot > of debug afterwards. > > @Stefano, any opinions? I agree that we should enable all IOMMUs or none. I don't think we want to deal with partially initialized IOMMUs systems. Failure to enable all IOMMUs should lead to returning an error from the relevant function (arch_iommu_populate_page_table?) which should translate into Xen failing to boot and crashing. Which I think it is what you are suggesting, right? (I wouldn't call panic() inside the IOMMU specific initializer, because there might be iommu= command line options for Xen that specify a different desired outcome. I would deal with this case the same way we would deal with an error during initialization of a single IOMMU.)