Re: [PATCH v5 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.

From: Ram Pai <linuxram@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org,
	benh@kernel.crashing.org, david@gibson.dropbear.id.au,
	paulus@ozlabs.org, hch@lst.de, andmike@us.ibm.com,
	sukadev@linux.vnet.ibm.com, mst@redhat.com, ram.n.pai@gmail.com,
	aik@ozlabs.ru, cai@lca.pw, tglx@linutronix.de,
	bauerman@linux.ibm.com, linux-kernel@vger.kernel.org,
	leonardo@linux.ibm.com
Subject: Re: [PATCH v5 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.
Date: Wed, 11 Dec 2019 22:45:02 -0800	[thread overview]
Message-ID: <20191212064502.GC5709@oc0525413822.ibm.com> (raw)
In-Reply-To: <157602860458.3810.8599908751067047456@sif>

On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote:
> Quoting Ram Pai (2019-12-06 19:12:39)
> > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
> >                 secure guests")
> > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops
> > path for secure VMs, helped enable dma_direct path.  This enabled
> > support for bounce-buffering through SWIOTLB.  However it fails to
> > operate when IOMMU is enabled, since I/O pages are not TCE mapped.
> > 
> > Renable dma_iommu_ops path for pseries Secure VMs.  It handles all
> > cases including, TCE mapping I/O pages, in the presence of a
> > IOMMU.
> 
> Wasn't clear to me at first, but I guess the main gist of this series is
> that we want to continue to use SWIOTLB, but also need to create mappings
> of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops
> and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout
> to call into dma_direct_* ops rather than relying on the dma_is_direct(ops)
> checks in DMA API functions to do the same.
> 
> That makes sense, but one issue I see with that is that
> dma_iommu_map_bypass() only tests true if all the following are true:
> 
> 1) the device requests a 64-bit DMA mask via
>    dma_set_mask/dma_set_coherent_mask
> 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line)
> 
> dma_is_direct() checks don't have this limitation, so I think for
> anything cases, such as devices that use a smaller DMA mask, we'll
> end up falling back to the non-bypass functions in dma_iommu_ops, which
> will likely break for things like dma_alloc_coherent/dma_map_single
> since they won't use SWIOTLB pages and won't do the necessary calls to
> set_memory_unencrypted() to share those non-SWIOTLB buffers with
> hypervisor.
> 
> Maybe that's ok, but I think we should be clearer about how to
> fail/handle these cases.

Yes. makes sense. Device that cannot handle 64bit dma mask will not work.

> 
> Though I also agree with some concerns Alexey stated earlier: it seems
> wasteful to map the entire DDW window just so these bounce buffers can be
> mapped.  Especially if you consider the lack of a mapping to be an additional
> safe-guard against things like buggy device implementations on the QEMU
> side. E.g. if we leaked pages to the hypervisor on accident, those pages
> wouldn't be immediately accessible to a device, and would still require
> additional work get past the IOMMU.

Well, an accidental unintented page leak to the hypervisor, is a very
bad thing, regardless of any DMA mapping. The device may not be able to
access it, but the hypervisor still can access it.

> 
> What would it look like if we try to make all this work with disable_ddw passed
> to kernel command-line (or forced for is_secure_guest())?
> 
>   1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops,
>      but an additional case or hook that considers is_secure_guest() might do
>      it.
>      
>   2) We'd also need to set up an IOMMU mapping for the bounce buffers via
>      io_tlb_start/io_tlb_end. We could do it once, on-demand via
>      dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or
>      maybe in some init function.

Hmm... i not sure how to accomplish (2).   we need use some DDW window
to setup the mappings. right?  If disable_ddw is set, there wont be any
ddw.  What am i missing?

> 
> That also has the benefit of not requiring devices to support 64-bit DMA.
> 
> Alternatively, we could continue to rely on the 64-bit DDW window, but
> modify call to enable_ddw() to only map the io_tlb_start/end range in
> the case of is_secure_guest(). This is a little cleaner implementation-wise
> since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks.

I have been experimenting with this.  Trying to map only the memory
range from io_tlb_start/io_tlb_end though the 64-bit ddw window.  But
due to some reason, it wants the io_tlb_start to be aligned to some
boundary. It looks like a 2^28 boundary. Not sure what dictates that
boundary.

> , but
> devices that don't support 64-bit will fail back to not using dma_direct_* ops
> and fail miserably. We'd probably want to handle that more gracefully.

Yes i will put a warning message to indicate the failure.

> 
> Or we handle both cases gracefully. To me it makes more sense to enable
> non-DDW case, then consider adding DDW case later if there's some reason
> why 64-bit DMA is needed. But would be good to hear if there are other
> opinions.

educate me a bit here. What is a non-DDW case?  is it possible for a
device to acccess memory, in the presence of a IOMMU, without a window-mapping?

> 
> > 
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 11 +----------
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 67b5009..4e27d66 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -36,7 +36,6 @@
> >  #include <asm/udbg.h>
> >  #include <asm/mmzone.h>
> >  #include <asm/plpar_wrappers.h>
> > -#include <asm/svm.h>
> >  #include <asm/ultravisor.h>
> > 
> >  #include "pseries.h"
> > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void)
> >         of_reconfig_notifier_register(&iommu_reconfig_nb);
> >         register_memory_notifier(&iommu_mem_nb);
> > 
> > -       /*
> > -        * Secure guest memory is inacessible to devices so regular DMA isn't
> > -        * possible.
> > -        *
> > -        * In that case keep devices' dma_map_ops as NULL so that the generic
> > -        * DMA code path will use SWIOTLB to bounce buffers for DMA.
> > -        */
> > -       if (!is_secure_guest())
> > -               set_pci_dma_ops(&dma_iommu_ops);
> > +       set_pci_dma_ops(&dma_iommu_ops);
> >  }
> > 
> >  static int __init disable_multitce(char *str)
> > -- 
> > 1.8.3.1
> > 

-- 
Ram Pai