All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ram Pai <linuxram@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org,
	benh@kernel.crashing.org, david@gibson.dropbear.id.au,
	paulus@ozlabs.org, hch@lst.de, andmike@us.ibm.com,
	sukadev@linux.vnet.ibm.com, mst@redhat.com, ram.n.pai@gmail.com,
	aik@ozlabs.ru, cai@lca.pw, tglx@linutronix.de,
	bauerman@linux.ibm.com, linux-kernel@vger.kernel.org,
	leonardo@linux.ibm.com
Subject: Re: [PATCH v5 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.
Date: Wed, 11 Dec 2019 22:45:02 -0800	[thread overview]
Message-ID: <20191212064502.GC5709@oc0525413822.ibm.com> (raw)
In-Reply-To: <157602860458.3810.8599908751067047456@sif>

On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote:
> Quoting Ram Pai (2019-12-06 19:12:39)
> > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
> >                 secure guests")
> > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops
> > path for secure VMs, helped enable dma_direct path.  This enabled
> > support for bounce-buffering through SWIOTLB.  However it fails to
> > operate when IOMMU is enabled, since I/O pages are not TCE mapped.
> > 
> > Renable dma_iommu_ops path for pseries Secure VMs.  It handles all
> > cases including, TCE mapping I/O pages, in the presence of a
> > IOMMU.
> 
> Wasn't clear to me at first, but I guess the main gist of this series is
> that we want to continue to use SWIOTLB, but also need to create mappings
> of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops
> and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout
> to call into dma_direct_* ops rather than relying on the dma_is_direct(ops)
> checks in DMA API functions to do the same.
> 
> That makes sense, but one issue I see with that is that
> dma_iommu_map_bypass() only tests true if all the following are true:
> 
> 1) the device requests a 64-bit DMA mask via
>    dma_set_mask/dma_set_coherent_mask
> 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line)
> 
> dma_is_direct() checks don't have this limitation, so I think for
> anything cases, such as devices that use a smaller DMA mask, we'll
> end up falling back to the non-bypass functions in dma_iommu_ops, which
> will likely break for things like dma_alloc_coherent/dma_map_single
> since they won't use SWIOTLB pages and won't do the necessary calls to
> set_memory_unencrypted() to share those non-SWIOTLB buffers with
> hypervisor.
> 
> Maybe that's ok, but I think we should be clearer about how to
> fail/handle these cases.

Yes. makes sense. Device that cannot handle 64bit dma mask will not work.

> 
> Though I also agree with some concerns Alexey stated earlier: it seems
> wasteful to map the entire DDW window just so these bounce buffers can be
> mapped.  Especially if you consider the lack of a mapping to be an additional
> safe-guard against things like buggy device implementations on the QEMU
> side. E.g. if we leaked pages to the hypervisor on accident, those pages
> wouldn't be immediately accessible to a device, and would still require
> additional work get past the IOMMU.

Well, an accidental unintented page leak to the hypervisor, is a very
bad thing, regardless of any DMA mapping. The device may not be able to
access it, but the hypervisor still can access it.

> 
> What would it look like if we try to make all this work with disable_ddw passed
> to kernel command-line (or forced for is_secure_guest())?
> 
>   1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops,
>      but an additional case or hook that considers is_secure_guest() might do
>      it.
>      
>   2) We'd also need to set up an IOMMU mapping for the bounce buffers via
>      io_tlb_start/io_tlb_end. We could do it once, on-demand via
>      dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or
>      maybe in some init function.

Hmm... i not sure how to accomplish (2).   we need use some DDW window
to setup the mappings. right?  If disable_ddw is set, there wont be any
ddw.  What am i missing?

> 
> That also has the benefit of not requiring devices to support 64-bit DMA.
> 
> Alternatively, we could continue to rely on the 64-bit DDW window, but
> modify call to enable_ddw() to only map the io_tlb_start/end range in
> the case of is_secure_guest(). This is a little cleaner implementation-wise
> since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks.

I have been experimenting with this.  Trying to map only the memory
range from io_tlb_start/io_tlb_end though the 64-bit ddw window.  But
due to some reason, it wants the io_tlb_start to be aligned to some
boundary. It looks like a 2^28 boundary. Not sure what dictates that
boundary.
   

> , but
> devices that don't support 64-bit will fail back to not using dma_direct_* ops
> and fail miserably. We'd probably want to handle that more gracefully.

Yes i will put a warning message to indicate the failure.

> 
> Or we handle both cases gracefully. To me it makes more sense to enable
> non-DDW case, then consider adding DDW case later if there's some reason
> why 64-bit DMA is needed. But would be good to hear if there are other
> opinions.

educate me a bit here. What is a non-DDW case?  is it possible for a
device to acccess memory, in the presence of a IOMMU, without a window-mapping?

> 
> > 
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 11 +----------
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 67b5009..4e27d66 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -36,7 +36,6 @@
> >  #include <asm/udbg.h>
> >  #include <asm/mmzone.h>
> >  #include <asm/plpar_wrappers.h>
> > -#include <asm/svm.h>
> >  #include <asm/ultravisor.h>
> > 
> >  #include "pseries.h"
> > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void)
> >         of_reconfig_notifier_register(&iommu_reconfig_nb);
> >         register_memory_notifier(&iommu_mem_nb);
> > 
> > -       /*
> > -        * Secure guest memory is inacessible to devices so regular DMA isn't
> > -        * possible.
> > -        *
> > -        * In that case keep devices' dma_map_ops as NULL so that the generic
> > -        * DMA code path will use SWIOTLB to bounce buffers for DMA.
> > -        */
> > -       if (!is_secure_guest())
> > -               set_pci_dma_ops(&dma_iommu_ops);
> > +       set_pci_dma_ops(&dma_iommu_ops);
> >  }
> > 
> >  static int __init disable_multitce(char *str)
> > -- 
> > 1.8.3.1
> > 

-- 
Ram Pai


WARNING: multiple messages have this Message-ID (diff)
From: Ram Pai <linuxram@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: andmike@us.ibm.com, mst@redhat.com, aik@ozlabs.ru,
	linux-kernel@vger.kernel.org, leonardo@linux.ibm.com,
	ram.n.pai@gmail.com, cai@lca.pw, tglx@linutronix.de,
	sukadev@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
	hch@lst.de, bauerman@linux.ibm.com, david@gibson.dropbear.id.au
Subject: Re: [PATCH v5 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.
Date: Wed, 11 Dec 2019 22:45:02 -0800	[thread overview]
Message-ID: <20191212064502.GC5709@oc0525413822.ibm.com> (raw)
In-Reply-To: <157602860458.3810.8599908751067047456@sif>

On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote:
> Quoting Ram Pai (2019-12-06 19:12:39)
> > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
> >                 secure guests")
> > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops
> > path for secure VMs, helped enable dma_direct path.  This enabled
> > support for bounce-buffering through SWIOTLB.  However it fails to
> > operate when IOMMU is enabled, since I/O pages are not TCE mapped.
> > 
> > Renable dma_iommu_ops path for pseries Secure VMs.  It handles all
> > cases including, TCE mapping I/O pages, in the presence of a
> > IOMMU.
> 
> Wasn't clear to me at first, but I guess the main gist of this series is
> that we want to continue to use SWIOTLB, but also need to create mappings
> of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops
> and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout
> to call into dma_direct_* ops rather than relying on the dma_is_direct(ops)
> checks in DMA API functions to do the same.
> 
> That makes sense, but one issue I see with that is that
> dma_iommu_map_bypass() only tests true if all the following are true:
> 
> 1) the device requests a 64-bit DMA mask via
>    dma_set_mask/dma_set_coherent_mask
> 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line)
> 
> dma_is_direct() checks don't have this limitation, so I think for
> anything cases, such as devices that use a smaller DMA mask, we'll
> end up falling back to the non-bypass functions in dma_iommu_ops, which
> will likely break for things like dma_alloc_coherent/dma_map_single
> since they won't use SWIOTLB pages and won't do the necessary calls to
> set_memory_unencrypted() to share those non-SWIOTLB buffers with
> hypervisor.
> 
> Maybe that's ok, but I think we should be clearer about how to
> fail/handle these cases.

Yes. makes sense. Device that cannot handle 64bit dma mask will not work.

> 
> Though I also agree with some concerns Alexey stated earlier: it seems
> wasteful to map the entire DDW window just so these bounce buffers can be
> mapped.  Especially if you consider the lack of a mapping to be an additional
> safe-guard against things like buggy device implementations on the QEMU
> side. E.g. if we leaked pages to the hypervisor on accident, those pages
> wouldn't be immediately accessible to a device, and would still require
> additional work get past the IOMMU.

Well, an accidental unintented page leak to the hypervisor, is a very
bad thing, regardless of any DMA mapping. The device may not be able to
access it, but the hypervisor still can access it.

> 
> What would it look like if we try to make all this work with disable_ddw passed
> to kernel command-line (or forced for is_secure_guest())?
> 
>   1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops,
>      but an additional case or hook that considers is_secure_guest() might do
>      it.
>      
>   2) We'd also need to set up an IOMMU mapping for the bounce buffers via
>      io_tlb_start/io_tlb_end. We could do it once, on-demand via
>      dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or
>      maybe in some init function.

Hmm... i not sure how to accomplish (2).   we need use some DDW window
to setup the mappings. right?  If disable_ddw is set, there wont be any
ddw.  What am i missing?

> 
> That also has the benefit of not requiring devices to support 64-bit DMA.
> 
> Alternatively, we could continue to rely on the 64-bit DDW window, but
> modify call to enable_ddw() to only map the io_tlb_start/end range in
> the case of is_secure_guest(). This is a little cleaner implementation-wise
> since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks.

I have been experimenting with this.  Trying to map only the memory
range from io_tlb_start/io_tlb_end though the 64-bit ddw window.  But
due to some reason, it wants the io_tlb_start to be aligned to some
boundary. It looks like a 2^28 boundary. Not sure what dictates that
boundary.
   

> , but
> devices that don't support 64-bit will fail back to not using dma_direct_* ops
> and fail miserably. We'd probably want to handle that more gracefully.

Yes i will put a warning message to indicate the failure.

> 
> Or we handle both cases gracefully. To me it makes more sense to enable
> non-DDW case, then consider adding DDW case later if there's some reason
> why 64-bit DMA is needed. But would be good to hear if there are other
> opinions.

educate me a bit here. What is a non-DDW case?  is it possible for a
device to acccess memory, in the presence of a IOMMU, without a window-mapping?

> 
> > 
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 11 +----------
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 67b5009..4e27d66 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -36,7 +36,6 @@
> >  #include <asm/udbg.h>
> >  #include <asm/mmzone.h>
> >  #include <asm/plpar_wrappers.h>
> > -#include <asm/svm.h>
> >  #include <asm/ultravisor.h>
> > 
> >  #include "pseries.h"
> > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void)
> >         of_reconfig_notifier_register(&iommu_reconfig_nb);
> >         register_memory_notifier(&iommu_mem_nb);
> > 
> > -       /*
> > -        * Secure guest memory is inacessible to devices so regular DMA isn't
> > -        * possible.
> > -        *
> > -        * In that case keep devices' dma_map_ops as NULL so that the generic
> > -        * DMA code path will use SWIOTLB to bounce buffers for DMA.
> > -        */
> > -       if (!is_secure_guest())
> > -               set_pci_dma_ops(&dma_iommu_ops);
> > +       set_pci_dma_ops(&dma_iommu_ops);
> >  }
> > 
> >  static int __init disable_multitce(char *str)
> > -- 
> > 1.8.3.1
> > 

-- 
Ram Pai


  parent reply	other threads:[~2019-12-12  6:45 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-07  1:12 [PATCH v5 0/2] Enable IOMMU support for pseries Secure VMs Ram Pai
2019-12-07  1:12 ` Ram Pai
2019-12-07  1:12 ` [PATCH v5 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
2019-12-07  1:12   ` Ram Pai
2019-12-07  1:12   ` [PATCH v5 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM Ram Pai
2019-12-07  1:12     ` Ram Pai
2019-12-10  3:08     ` Alexey Kardashevskiy
2019-12-10  3:08       ` Alexey Kardashevskiy
2019-12-10 22:09     ` Thiago Jung Bauermann
2019-12-10 22:09       ` Thiago Jung Bauermann
2019-12-11  1:43     ` Michael Roth
2019-12-11  1:43       ` Michael Roth
2019-12-11  8:36       ` Alexey Kardashevskiy
2019-12-11  8:36         ` Alexey Kardashevskiy
2019-12-11 18:07         ` Michael Roth
2019-12-11 18:07           ` Michael Roth
2019-12-11 18:20           ` Christoph Hellwig
2019-12-11 18:20             ` Christoph Hellwig
2019-12-12  6:45       ` Ram Pai [this message]
2019-12-12  6:45         ` Ram Pai
2019-12-13  0:19         ` Michael Roth
2019-12-13  0:19           ` Michael Roth
2019-12-10  3:07   ` [PATCH v5 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Alexey Kardashevskiy
2019-12-10  3:07     ` Alexey Kardashevskiy
2019-12-10  5:12     ` Ram Pai
2019-12-10  5:12       ` Ram Pai
2019-12-10  5:32       ` Alexey Kardashevskiy
2019-12-10  5:32         ` Alexey Kardashevskiy
2019-12-10 15:35         ` Ram Pai
2019-12-10 15:35           ` Ram Pai
2019-12-11  8:15           ` Alexey Kardashevskiy
2019-12-11  8:15             ` Alexey Kardashevskiy
2019-12-11 20:31             ` Michael Roth
2019-12-11 20:31               ` Michael Roth
2019-12-11 22:47               ` Alexey Kardashevskiy
2019-12-11 22:47                 ` Alexey Kardashevskiy
2019-12-12  2:39                 ` Alexey Kardashevskiy
2019-12-12  2:39                   ` Alexey Kardashevskiy
2019-12-13  0:22                 ` Michael Roth
2019-12-13  0:22                   ` Michael Roth
2019-12-12  4:11             ` Ram Pai
2019-12-12  4:11               ` Ram Pai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191212064502.GC5709@oc0525413822.ibm.com \
    --to=linuxram@us.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=andmike@us.ibm.com \
    --cc=bauerman@linux.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=cai@lca.pw \
    --cc=david@gibson.dropbear.id.au \
    --cc=hch@lst.de \
    --cc=leonardo@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=mst@redhat.com \
    --cc=paulus@ozlabs.org \
    --cc=ram.n.pai@gmail.com \
    --cc=sukadev@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.