LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/2] Enable IOMMU support for pseries Secure VMs
@ 2019-12-02  6:45 Ram Pai
  2019-12-02  6:45 ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-02  6:45 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: andmike, mst, aik, linuxram, mdroth, linux-kernel, ram.n.pai,
	cai, tglx, sukadev, hch, bauerman, david

This patch series enables IOMMU support for pseries Secure VMs.

Tested using QEMU command line option:

 "-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4,
	iommu_platform=on,disable-modern=off,disable-legacy=on"

 and

 "-device virtio-blk-pci,scsi=off,bus=pci.0,
	addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,
	iommu_platform=on,disable-modern=off,disable-legacy=on"

changelog:
	v4: Corrected the Subject, to include the keyword 'PATCH'.
		No other changes.

	v3: Better description of 2/2 patch.
		Suggested by David Gibson.

	v2: added comments describing the changes.
		Suggested by Alexey and Michael Ellermen.


Ram Pai (2):
  powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.

 arch/powerpc/platforms/pseries/iommu.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-02  6:45 [PATCH v4 0/2] Enable IOMMU support for pseries Secure VMs Ram Pai
@ 2019-12-02  6:45 ` Ram Pai
  2019-12-02  6:45   ` [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell Ram Pai
                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-02  6:45 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: andmike, mst, aik, linuxram, mdroth, linux-kernel, ram.n.pai,
	cai, tglx, sukadev, hch, bauerman, david

H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
its parameters. One page is dedicated per cpu, for the lifetime of the
kernel for this purpose. On secure VMs, contents of this page, when
accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
needs to know the unencrypted entries, to update the TCE table
accordingly.  There is nothing secret or sensitive about these entries.
Hence share the page with the hypervisor.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 6ba081d..0720831 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -37,6 +37,7 @@
 #include <asm/mmzone.h>
 #include <asm/plpar_wrappers.h>
 #include <asm/svm.h>
+#include <asm/ultravisor.h>
 
 #include "pseries.h"
 
@@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
 
 static DEFINE_PER_CPU(__be64 *, tce_page);
 
+/*
+ * Allocate a tce page.  If secure VM, share the page with the hypervisor.
+ *
+ * NOTE: the TCE page is shared with the hypervisor explicitly and remains
+ * shared for the lifetime of the kernel. It is implicitly unshared at kernel
+ * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
+ */
+static __be64 *alloc_tce_page(void)
+{
+	__be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
+
+	if (tcep && is_secure_guest())
+		uv_share_page(PHYS_PFN(__pa(tcep)), 1);
+
+	return tcep;
+}
+
 static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 				     long npages, unsigned long uaddr,
 				     enum dma_data_direction direction,
@@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	 * from iommu_alloc{,_sg}()
 	 */
 	if (!tcep) {
-		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
-		/* If allocation fails, fall back to the loop implementation */
+		tcep = alloc_tce_page();
 		if (!tcep) {
 			local_irq_restore(flags);
 			return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
@@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 	tcep = __this_cpu_read(tce_page);
 
 	if (!tcep) {
-		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
+		tcep = alloc_tce_page();
 		if (!tcep) {
 			local_irq_enable();
 			return -ENOMEM;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.
  2019-12-02  6:45 ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
@ 2019-12-02  6:45   ` Ram Pai
  2019-12-03  0:58     ` Alexey Kardashevskiy
  2019-12-03  0:56   ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Alexey Kardashevskiy
  2019-12-04 18:26   ` Leonardo Bras
  2 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-02  6:45 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: andmike, mst, aik, linuxram, mdroth, linux-kernel, ram.n.pai,
	cai, tglx, sukadev, hch, bauerman, david

Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
	       secure guests")
disabled dma_iommu_ops path, for secure VMs. The rationale for disabling
the dma_iommu_ops path, was to use the dma_direct path, since it had
inbuilt support for bounce-buffering through SWIOTLB.

However dma_iommu_ops is functionally much richer. Depending on the
capabilities of the platform, it can handle direct DMA; with or without
bounce buffering, and it can handle indirect DMA. Hence its better to
leverage the richer functionality supported by dma_iommu_ops.

Renable dma_iommu_ops path for pseries Secure VMs.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 0720831..6adf4d3 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -36,7 +36,6 @@
 #include <asm/udbg.h>
 #include <asm/mmzone.h>
 #include <asm/plpar_wrappers.h>
-#include <asm/svm.h>
 #include <asm/ultravisor.h>
 
 #include "pseries.h"
@@ -1337,15 +1336,7 @@ void iommu_init_early_pSeries(void)
 	of_reconfig_notifier_register(&iommu_reconfig_nb);
 	register_memory_notifier(&iommu_mem_nb);
 
-	/*
-	 * Secure guest memory is inacessible to devices so regular DMA isn't
-	 * possible.
-	 *
-	 * In that case keep devices' dma_map_ops as NULL so that the generic
-	 * DMA code path will use SWIOTLB to bounce buffers for DMA.
-	 */
-	if (!is_secure_guest())
-		set_pci_dma_ops(&dma_iommu_ops);
+	set_pci_dma_ops(&dma_iommu_ops);
 }
 
 static int __init disable_multitce(char *str)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-02  6:45 ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
  2019-12-02  6:45   ` [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell Ram Pai
@ 2019-12-03  0:56   ` Alexey Kardashevskiy
  2019-12-03  2:08     ` Ram Pai
  2019-12-04 18:26   ` Leonardo Bras
  2 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  0:56 UTC (permalink / raw)
  To: Ram Pai, linuxppc-dev, mpe
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, hch, bauerman, david



On 02/12/2019 17:45, Ram Pai wrote:
> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> its parameters. One page is dedicated per cpu, for the lifetime of the
> kernel for this purpose. On secure VMs, contents of this page, when
> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> needs to know the unencrypted entries, to update the TCE table
> accordingly.  There is nothing secret or sensitive about these entries.
> Hence share the page with the hypervisor.

This unsecures a page in the guest in a random place which creates an
additional attack surface which is hard to exploit indeed but
nevertheless it is there. A safer option would be not to use the
hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
in the guest).

Also what is this for anyway? If I understand things right, you cannot
map any random guest memory, you should only be mapping that 64MB-ish
bounce buffer array but 1) I do not see that happening (I may have
missed it) 2) it should be done once and it takes a little time for
whatever memory size we allow for bounce buffers anyway. Thanks,


> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 6ba081d..0720831 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -37,6 +37,7 @@
>  #include <asm/mmzone.h>
>  #include <asm/plpar_wrappers.h>
>  #include <asm/svm.h>
> +#include <asm/ultravisor.h>
>  
>  #include "pseries.h"
>  
> @@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
>  
>  static DEFINE_PER_CPU(__be64 *, tce_page);
>  
> +/*
> + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
> + *
> + * NOTE: the TCE page is shared with the hypervisor explicitly and remains
> + * shared for the lifetime of the kernel. It is implicitly unshared at kernel
> + * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
> + */
> +static __be64 *alloc_tce_page(void)
> +{
> +	__be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> +
> +	if (tcep && is_secure_guest())
> +		uv_share_page(PHYS_PFN(__pa(tcep)), 1);
> +
> +	return tcep;
> +}
> +
>  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>  				     long npages, unsigned long uaddr,
>  				     enum dma_data_direction direction,
> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>  	 * from iommu_alloc{,_sg}()
>  	 */
>  	if (!tcep) {
> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> -		/* If allocation fails, fall back to the loop implementation */
> +		tcep = alloc_tce_page();
>  		if (!tcep) {
>  			local_irq_restore(flags);
>  			return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> @@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>  	tcep = __this_cpu_read(tce_page);
>  
>  	if (!tcep) {
> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> +		tcep = alloc_tce_page();
>  		if (!tcep) {
>  			local_irq_enable();
>  			return -ENOMEM;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.
  2019-12-02  6:45   ` [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell Ram Pai
@ 2019-12-03  0:58     ` Alexey Kardashevskiy
  2019-12-03  4:07       ` Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  0:58 UTC (permalink / raw)
  To: Ram Pai, linuxppc-dev, mpe
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, hch, bauerman, david



On 02/12/2019 17:45, Ram Pai wrote:
> Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
> 	       secure guests")
> disabled dma_iommu_ops path, for secure VMs. The rationale for disabling
> the dma_iommu_ops path, was to use the dma_direct path, since it had
> inbuilt support for bounce-buffering through SWIOTLB.
> 
> However dma_iommu_ops is functionally much richer. Depending on the
> capabilities of the platform, it can handle direct DMA; with or without
> bounce buffering, and it can handle indirect DMA. Hence its better to
> leverage the richer functionality supported by dma_iommu_ops.

What exactly do we leverage after applying this patch? afaict things are
going to work in exact same old way with this applied, no? Thanks,


> 
> Renable dma_iommu_ops path for pseries Secure VMs.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 0720831..6adf4d3 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -36,7 +36,6 @@
>  #include <asm/udbg.h>
>  #include <asm/mmzone.h>
>  #include <asm/plpar_wrappers.h>
> -#include <asm/svm.h>
>  #include <asm/ultravisor.h>
>  
>  #include "pseries.h"
> @@ -1337,15 +1336,7 @@ void iommu_init_early_pSeries(void)
>  	of_reconfig_notifier_register(&iommu_reconfig_nb);
>  	register_memory_notifier(&iommu_mem_nb);
>  
> -	/*
> -	 * Secure guest memory is inacessible to devices so regular DMA isn't
> -	 * possible.
> -	 *
> -	 * In that case keep devices' dma_map_ops as NULL so that the generic
> -	 * DMA code path will use SWIOTLB to bounce buffers for DMA.
> -	 */
> -	if (!is_secure_guest())
> -		set_pci_dma_ops(&dma_iommu_ops);
> +	set_pci_dma_ops(&dma_iommu_ops);
>  }
>  
>  static int __init disable_multitce(char *str)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03  0:56   ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Alexey Kardashevskiy
@ 2019-12-03  2:08     ` Ram Pai
  2019-12-03  2:15       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-03  2:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 02/12/2019 17:45, Ram Pai wrote:
> > H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> > its parameters. One page is dedicated per cpu, for the lifetime of the
> > kernel for this purpose. On secure VMs, contents of this page, when
> > accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> > needs to know the unencrypted entries, to update the TCE table
> > accordingly.  There is nothing secret or sensitive about these entries.
> > Hence share the page with the hypervisor.
> 
> This unsecures a page in the guest in a random place which creates an
> additional attack surface which is hard to exploit indeed but
> nevertheless it is there.
> A safer option would be not to use the
> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> in the guest).


Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
automatically when IOMMU option is enabled.  This happens even
on a normal VM when IOMMU is enabled. 


> 
> Also what is this for anyway? 

This is for sending indirect-TCE entries to the hypervisor.
The hypervisor must be able to read those TCE entries, so that it can 
use those entires to populate the TCE table with the correct mappings.

> if I understand things right, you cannot
> map any random guest memory, you should only be mapping that 64MB-ish
> bounce buffer array but 1) I do not see that happening (I may have
> missed it) 2) it should be done once and it takes a little time for
> whatever memory size we allow for bounce buffers anyway. Thanks,

Any random guest memory can be shared by the guest. 

Maybe you are confusing this with the SWIOTLB bounce buffers used by PCI
devices, to transfer data to the hypervisor?

> 
> 
> > 
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 6ba081d..0720831 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -37,6 +37,7 @@
> >  #include <asm/mmzone.h>
> >  #include <asm/plpar_wrappers.h>
> >  #include <asm/svm.h>
> > +#include <asm/ultravisor.h>
> >  
> >  #include "pseries.h"
> >  
> > @@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >  
> >  static DEFINE_PER_CPU(__be64 *, tce_page);
> >  
> > +/*
> > + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
> > + *
> > + * NOTE: the TCE page is shared with the hypervisor explicitly and remains
> > + * shared for the lifetime of the kernel. It is implicitly unshared at kernel
> > + * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
> > + */
> > +static __be64 *alloc_tce_page(void)
> > +{
> > +	__be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> > +
> > +	if (tcep && is_secure_guest())
> > +		uv_share_page(PHYS_PFN(__pa(tcep)), 1);
> > +
> > +	return tcep;
> > +}
> > +
> >  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >  				     long npages, unsigned long uaddr,
> >  				     enum dma_data_direction direction,
> > @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >  	 * from iommu_alloc{,_sg}()
> >  	 */
> >  	if (!tcep) {
> > -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> > -		/* If allocation fails, fall back to the loop implementation */
> > +		tcep = alloc_tce_page();
> >  		if (!tcep) {
> >  			local_irq_restore(flags);
> >  			return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> > @@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
> >  	tcep = __this_cpu_read(tce_page);
> >  
> >  	if (!tcep) {
> > -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> > +		tcep = alloc_tce_page();
> >  		if (!tcep) {
> >  			local_irq_enable();
> >  			return -ENOMEM;
> > 
> 
> -- 
> Alexey

-- 
Ram Pai


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03  2:08     ` Ram Pai
@ 2019-12-03  2:15       ` Alexey Kardashevskiy
  2019-12-03  4:05         ` Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  2:15 UTC (permalink / raw)
  To: Ram Pai
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david



On 03/12/2019 13:08, Ram Pai wrote:
> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 02/12/2019 17:45, Ram Pai wrote:
>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>> kernel for this purpose. On secure VMs, contents of this page, when
>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>> needs to know the unencrypted entries, to update the TCE table
>>> accordingly.  There is nothing secret or sensitive about these entries.
>>> Hence share the page with the hypervisor.
>>
>> This unsecures a page in the guest in a random place which creates an
>> additional attack surface which is hard to exploit indeed but
>> nevertheless it is there.
>> A safer option would be not to use the
>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>> in the guest).
> 
> 
> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> automatically when IOMMU option is enabled.

It is advertised by QEMU but the guest does not have to use it.

> This happens even
> on a normal VM when IOMMU is enabled.
> 
> 
>>
>> Also what is this for anyway? 
> 
> This is for sending indirect-TCE entries to the hypervisor.
> The hypervisor must be able to read those TCE entries, so that it can 
> use those entires to populate the TCE table with the correct mappings.
> 
>> if I understand things right, you cannot
>> map any random guest memory, you should only be mapping that 64MB-ish
>> bounce buffer array but 1) I do not see that happening (I may have
>> missed it) 2) it should be done once and it takes a little time for
>> whatever memory size we allow for bounce buffers anyway. Thanks,
> 
> Any random guest memory can be shared by the guest. 

Yes but we do not want this to be this random. I thought the whole idea
of swiotlb was to restrict the amount of shared memory to bare minimum,
what do I miss?

> Maybe you are confusing this with the SWIOTLB bounce buffers used by PCI
> devices, to transfer data to the hypervisor?

Is not this for pci+swiotlb? The cover letter suggests it is for
virtio-scsi-_pci_ with 	iommu_platform=on which makes it a normal pci
device just like emulated XHCI. Thanks,




> 
>>
>>
>>>
>>> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
>>>  1 file changed, 20 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6ba081d..0720831 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -37,6 +37,7 @@
>>>  #include <asm/mmzone.h>
>>>  #include <asm/plpar_wrappers.h>
>>>  #include <asm/svm.h>
>>> +#include <asm/ultravisor.h>
>>>  
>>>  #include "pseries.h"
>>>  
>>> @@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  
>>>  static DEFINE_PER_CPU(__be64 *, tce_page);
>>>  
>>> +/*
>>> + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
>>> + *
>>> + * NOTE: the TCE page is shared with the hypervisor explicitly and remains
>>> + * shared for the lifetime of the kernel. It is implicitly unshared at kernel
>>> + * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
>>> + */
>>> +static __be64 *alloc_tce_page(void)
>>> +{
>>> +	__be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +
>>> +	if (tcep && is_secure_guest())
>>> +		uv_share_page(PHYS_PFN(__pa(tcep)), 1);
>>> +
>>> +	return tcep;
>>> +}
>>> +
>>>  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  				     long npages, unsigned long uaddr,
>>>  				     enum dma_data_direction direction,
>>> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  	 * from iommu_alloc{,_sg}()
>>>  	 */
>>>  	if (!tcep) {
>>> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> -		/* If allocation fails, fall back to the loop implementation */
>>> +		tcep = alloc_tce_page();
>>>  		if (!tcep) {
>>>  			local_irq_restore(flags);
>>>  			return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
>>> @@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>>>  	tcep = __this_cpu_read(tce_page);
>>>  
>>>  	if (!tcep) {
>>> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +		tcep = alloc_tce_page();
>>>  		if (!tcep) {
>>>  			local_irq_enable();
>>>  			return -ENOMEM;
>>>
>>
>> -- 
>> Alexey
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03  2:15       ` Alexey Kardashevskiy
@ 2019-12-03  4:05         ` Ram Pai
  2019-12-03  4:24           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-03  4:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 03/12/2019 13:08, Ram Pai wrote:
> > On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 02/12/2019 17:45, Ram Pai wrote:
> >>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> >>> its parameters. One page is dedicated per cpu, for the lifetime of the
> >>> kernel for this purpose. On secure VMs, contents of this page, when
> >>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> >>> needs to know the unencrypted entries, to update the TCE table
> >>> accordingly.  There is nothing secret or sensitive about these entries.
> >>> Hence share the page with the hypervisor.
> >>
> >> This unsecures a page in the guest in a random place which creates an
> >> additional attack surface which is hard to exploit indeed but
> >> nevertheless it is there.
> >> A safer option would be not to use the
> >> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> >> in the guest).
> > 
> > 
> > Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> > automatically when IOMMU option is enabled.
> 
> It is advertised by QEMU but the guest does not have to use it.

Are you suggesting that even normal-guest, not use hcall-multi-tce?
or just secure-guest?  

> 
> > This happens even
> > on a normal VM when IOMMU is enabled.
> > 
> > 
> >>
> >> Also what is this for anyway? 
> > 
> > This is for sending indirect-TCE entries to the hypervisor.
> > The hypervisor must be able to read those TCE entries, so that it can 
> > use those entires to populate the TCE table with the correct mappings.
> > 
> >> if I understand things right, you cannot
> >> map any random guest memory, you should only be mapping that 64MB-ish
> >> bounce buffer array but 1) I do not see that happening (I may have
> >> missed it) 2) it should be done once and it takes a little time for
> >> whatever memory size we allow for bounce buffers anyway. Thanks,
> > 
> > Any random guest memory can be shared by the guest. 
> 
> Yes but we do not want this to be this random. 

It is not sharing some random page. It is sharing a page that is
ear-marked for communicating TCE entries. Yes the address of the page
can be random, depending on where the allocator decides to allocate it.
The purpose of the page is not random.

That page is used for one specific purpose; to communicate the TCE
entries to the hypervisor.  

> I thought the whole idea
> of swiotlb was to restrict the amount of shared memory to bare minimum,
> what do I miss?

I think, you are making a incorrect connection between this patch and
SWIOTLB.  This patch has nothing to do with SWIOTLB.

> 
> > Maybe you are confusing this with the SWIOTLB bounce buffers used by
> > PCI devices, to transfer data to the hypervisor?
> 
> Is not this for pci+swiotlb? 


No. This patch is NOT for PCI+SWIOTLB.  The SWIOTLB pages are a
different set of pages allocated and earmarked for bounce buffering.

This patch is purely to help the hypervisor setup the TCE table, in the
presence of a IOMMU.

>The cover letter suggests it is for
> virtio-scsi-_pci_ with 	iommu_platform=on which makes it a
> normal pci device just like emulated XHCI. Thanks,

Well, I guess, the cover letter is probably confusing. There are two
patches, which togather enable virtio on secure guests, in the presence
of IOMMU.

The second patch enables virtio in the presence of a IOMMU, to use
DMA_ops+SWIOTLB infrastructure, to correctly navigate the I/O to virtio
devices.

However that by itself wont work if the TCE entires are not correctly
setup in the TCE tables.  The first patch; i.e this patch, helps
accomplish that.

Hope this clears up the confusion.
RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell.
  2019-12-03  0:58     ` Alexey Kardashevskiy
@ 2019-12-03  4:07       ` Ram Pai
  0 siblings, 0 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-03  4:07 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Tue, Dec 03, 2019 at 11:58:18AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 02/12/2019 17:45, Ram Pai wrote:
> > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
> > 	       secure guests")
> > disabled dma_iommu_ops path, for secure VMs. The rationale for disabling
> > the dma_iommu_ops path, was to use the dma_direct path, since it had
> > inbuilt support for bounce-buffering through SWIOTLB.
> > 
> > However dma_iommu_ops is functionally much richer. Depending on the
> > capabilities of the platform, it can handle direct DMA; with or without
> > bounce buffering, and it can handle indirect DMA. Hence its better to
> > leverage the richer functionality supported by dma_iommu_ops.
> 
> What exactly do we leverage after applying this patch? afaict things are
> going to work in exact same old way with this applied, no? Thanks,

Yes. You got it right. Another way of saying this is, it reverts the changes.

RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03  4:05         ` Ram Pai
@ 2019-12-03  4:24           ` Alexey Kardashevskiy
  2019-12-03 16:52             ` Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  4:24 UTC (permalink / raw)
  To: Ram Pai
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david



On 03/12/2019 15:05, Ram Pai wrote:
> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 03/12/2019 13:08, Ram Pai wrote:
>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 02/12/2019 17:45, Ram Pai wrote:
>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>>>> kernel for this purpose. On secure VMs, contents of this page, when
>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>>>> needs to know the unencrypted entries, to update the TCE table
>>>>> accordingly.  There is nothing secret or sensitive about these entries.
>>>>> Hence share the page with the hypervisor.
>>>>
>>>> This unsecures a page in the guest in a random place which creates an
>>>> additional attack surface which is hard to exploit indeed but
>>>> nevertheless it is there.
>>>> A safer option would be not to use the
>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>>>> in the guest).
>>>
>>>
>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
>>> automatically when IOMMU option is enabled.
>>
>> It is advertised by QEMU but the guest does not have to use it.
> 
> Are you suggesting that even normal-guest, not use hcall-multi-tce?
> or just secure-guest?  


Just secure.


> 
>>
>>> This happens even
>>> on a normal VM when IOMMU is enabled.
>>>
>>>
>>>>
>>>> Also what is this for anyway? 
>>>
>>> This is for sending indirect-TCE entries to the hypervisor.
>>> The hypervisor must be able to read those TCE entries, so that it can 
>>> use those entires to populate the TCE table with the correct mappings.
>>>
>>>> if I understand things right, you cannot
>>>> map any random guest memory, you should only be mapping that 64MB-ish
>>>> bounce buffer array but 1) I do not see that happening (I may have
>>>> missed it) 2) it should be done once and it takes a little time for
>>>> whatever memory size we allow for bounce buffers anyway. Thanks,
>>>
>>> Any random guest memory can be shared by the guest. 
>>
>> Yes but we do not want this to be this random. 
> 
> It is not sharing some random page. It is sharing a page that is
> ear-marked for communicating TCE entries. Yes the address of the page
> can be random, depending on where the allocator decides to allocate it.
> The purpose of the page is not random.

I was talking about the location.


> That page is used for one specific purpose; to communicate the TCE
> entries to the hypervisor.  
> 
>> I thought the whole idea
>> of swiotlb was to restrict the amount of shared memory to bare minimum,
>> what do I miss?
> 
> I think, you are making a incorrect connection between this patch and
> SWIOTLB.  This patch has nothing to do with SWIOTLB.

I can see this and this is the confusing part.


>>
>>> Maybe you are confusing this with the SWIOTLB bounce buffers used by
>>> PCI devices, to transfer data to the hypervisor?
>>
>> Is not this for pci+swiotlb? 
> 
> 
> No. This patch is NOT for PCI+SWIOTLB.  The SWIOTLB pages are a
> different set of pages allocated and earmarked for bounce buffering.
> 
> This patch is purely to help the hypervisor setup the TCE table, in the
> presence of a IOMMU.

Then the hypervisor should be able to access the guest pages mapped for
DMA and these pages should be made unsecure for this to work. Where/when
does this happen?


>> The cover letter suggests it is for
>> virtio-scsi-_pci_ with 	iommu_platform=on which makes it a
>> normal pci device just like emulated XHCI. Thanks,
> 
> Well, I guess, the cover letter is probably confusing. There are two
> patches, which togather enable virtio on secure guests, in the presence
> of IOMMU.
> 
> The second patch enables virtio in the presence of a IOMMU, to use
> DMA_ops+SWIOTLB infrastructure, to correctly navigate the I/O to virtio
> devices.

The second patch does nothing in relation to the problem being solved.


> However that by itself wont work if the TCE entires are not correctly
> setup in the TCE tables.  The first patch; i.e this patch, helps
> accomplish that.
>> Hope this clears up the confusion.





-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03  4:24           ` Alexey Kardashevskiy
@ 2019-12-03 16:52             ` Ram Pai
  2019-12-04  0:04               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-03 16:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 03/12/2019 15:05, Ram Pai wrote:
> > On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 03/12/2019 13:08, Ram Pai wrote:
> >>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> >>>>
> >>>>
> >>>> On 02/12/2019 17:45, Ram Pai wrote:
> >>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> >>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
> >>>>> kernel for this purpose. On secure VMs, contents of this page, when
> >>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> >>>>> needs to know the unencrypted entries, to update the TCE table
> >>>>> accordingly.  There is nothing secret or sensitive about these entries.
> >>>>> Hence share the page with the hypervisor.
> >>>>
> >>>> This unsecures a page in the guest in a random place which creates an
> >>>> additional attack surface which is hard to exploit indeed but
> >>>> nevertheless it is there.
> >>>> A safer option would be not to use the
> >>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> >>>> in the guest).
> >>>
> >>>
> >>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> >>> automatically when IOMMU option is enabled.
> >>
> >> It is advertised by QEMU but the guest does not have to use it.
> > 
> > Are you suggesting that even normal-guest, not use hcall-multi-tce?
> > or just secure-guest?  
> 
> 
> Just secure.

hmm..  how are the TCE entries communicated to the hypervisor, if
hcall-multi-tce is disabled?

> 
> 
> > 
> >>
> >>> This happens even
> >>> on a normal VM when IOMMU is enabled.
> >>>
> >>>
> >>>>
> >>>> Also what is this for anyway? 
> >>>
> >>> This is for sending indirect-TCE entries to the hypervisor.
> >>> The hypervisor must be able to read those TCE entries, so that it can 
> >>> use those entires to populate the TCE table with the correct mappings.
> >>>
> >>>> if I understand things right, you cannot
> >>>> map any random guest memory, you should only be mapping that 64MB-ish
> >>>> bounce buffer array but 1) I do not see that happening (I may have
> >>>> missed it) 2) it should be done once and it takes a little time for
> >>>> whatever memory size we allow for bounce buffers anyway. Thanks,
> >>>
> >>> Any random guest memory can be shared by the guest. 
> >>
> >> Yes but we do not want this to be this random. 
> > 
> > It is not sharing some random page. It is sharing a page that is
> > ear-marked for communicating TCE entries. Yes the address of the page
> > can be random, depending on where the allocator decides to allocate it.
> > The purpose of the page is not random.
> 
> I was talking about the location.
> 
> 
> > That page is used for one specific purpose; to communicate the TCE
> > entries to the hypervisor.  
> > 
> >> I thought the whole idea
> >> of swiotlb was to restrict the amount of shared memory to bare minimum,
> >> what do I miss?
> > 
> > I think, you are making a incorrect connection between this patch and
> > SWIOTLB.  This patch has nothing to do with SWIOTLB.
> 
> I can see this and this is the confusing part.
> 
> 
> >>
> >>> Maybe you are confusing this with the SWIOTLB bounce buffers used by
> >>> PCI devices, to transfer data to the hypervisor?
> >>
> >> Is not this for pci+swiotlb? 
> > 
> > 
> > No. This patch is NOT for PCI+SWIOTLB.  The SWIOTLB pages are a
> > different set of pages allocated and earmarked for bounce buffering.
> > 
> > This patch is purely to help the hypervisor setup the TCE table, in the
> > presence of a IOMMU.
> 
> Then the hypervisor should be able to access the guest pages mapped for
> DMA and these pages should be made unsecure for this to work. Where/when
> does this happen?

This happens in the SWIOTLB code.  The code to do that is already
upstream.  

The sharing of the pages containing the SWIOTLB bounce buffers is done
in init_svm() which calls swiotlb_update_mem_attributes() which calls
set_memory_decrypted().  In the case of pseries, set_memory_decrypted() calls 
uv_share_page().

The code that bounces the contents of a I/O buffer through the 
SWIOTLB buffers, is in swiotlb_bounce().

> 
> 
> >> The cover letter suggests it is for
> >> virtio-scsi-_pci_ with 	iommu_platform=on which makes it a
> >> normal pci device just like emulated XHCI. Thanks,
> > 
> > Well, I guess, the cover letter is probably confusing. There are two
> > patches, which togather enable virtio on secure guests, in the presence
> > of IOMMU.
> > 
> > The second patch enables virtio in the presence of a IOMMU, to use
> > DMA_ops+SWIOTLB infrastructure, to correctly navigate the I/O to virtio
> > devices.
> 
> The second patch does nothing in relation to the problem being solved.

The second patch registers dma_iommu_ops with the PCI-system.  Doing so
enables I/O to take the dma_iommu_ops path, which internally 
leads it through the SWIOTLB path. Without that, the I/O fails to reach
its destination.

> 
> 
> > However that by itself wont work if the TCE entires are not correctly
> > setup in the TCE tables.  The first patch; i.e this patch, helps
> > accomplish that.
> >> Hope this clears up the confusion.
> 
> 
> 
> 
> 
> -- 
> Alexey

-- 
Ram Pai


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-03 16:52             ` Ram Pai
@ 2019-12-04  0:04               ` Alexey Kardashevskiy
  2019-12-04  0:49                 ` Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-04  0:04 UTC (permalink / raw)
  To: Ram Pai
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david



On 04/12/2019 03:52, Ram Pai wrote:
> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 03/12/2019 15:05, Ram Pai wrote:
>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 03/12/2019 13:08, Ram Pai wrote:
>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>>>>>> needs to know the unencrypted entries, to update the TCE table
>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
>>>>>>> Hence share the page with the hypervisor.
>>>>>>
>>>>>> This unsecures a page in the guest in a random place which creates an
>>>>>> additional attack surface which is hard to exploit indeed but
>>>>>> nevertheless it is there.
>>>>>> A safer option would be not to use the
>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>>>>>> in the guest).
>>>>>
>>>>>
>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
>>>>> automatically when IOMMU option is enabled.
>>>>
>>>> It is advertised by QEMU but the guest does not have to use it.
>>>
>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
>>> or just secure-guest?  
>>
>>
>> Just secure.
> 
> hmm..  how are the TCE entries communicated to the hypervisor, if
> hcall-multi-tce is disabled?

Via H_PUT_TCE which updates 1 entry at once (sets or clears).
hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
optimization.


> 
>>
>>
>>>
>>>>
>>>>> This happens even
>>>>> on a normal VM when IOMMU is enabled.
>>>>>
>>>>>
>>>>>>
>>>>>> Also what is this for anyway? 
>>>>>
>>>>> This is for sending indirect-TCE entries to the hypervisor.
>>>>> The hypervisor must be able to read those TCE entries, so that it can 
>>>>> use those entires to populate the TCE table with the correct mappings.
>>>>>
>>>>>> if I understand things right, you cannot
>>>>>> map any random guest memory, you should only be mapping that 64MB-ish
>>>>>> bounce buffer array but 1) I do not see that happening (I may have
>>>>>> missed it) 2) it should be done once and it takes a little time for
>>>>>> whatever memory size we allow for bounce buffers anyway. Thanks,
>>>>>
>>>>> Any random guest memory can be shared by the guest. 
>>>>
>>>> Yes but we do not want this to be this random. 
>>>
>>> It is not sharing some random page. It is sharing a page that is
>>> ear-marked for communicating TCE entries. Yes the address of the page
>>> can be random, depending on where the allocator decides to allocate it.
>>> The purpose of the page is not random.
>>
>> I was talking about the location.
>>
>>
>>> That page is used for one specific purpose; to communicate the TCE
>>> entries to the hypervisor.  
>>>
>>>> I thought the whole idea
>>>> of swiotlb was to restrict the amount of shared memory to bare minimum,
>>>> what do I miss?
>>>
>>> I think, you are making a incorrect connection between this patch and
>>> SWIOTLB.  This patch has nothing to do with SWIOTLB.
>>
>> I can see this and this is the confusing part.
>>
>>
>>>>
>>>>> Maybe you are confusing this with the SWIOTLB bounce buffers used by
>>>>> PCI devices, to transfer data to the hypervisor?
>>>>
>>>> Is not this for pci+swiotlb? 
>>>
>>>
>>> No. This patch is NOT for PCI+SWIOTLB.  The SWIOTLB pages are a
>>> different set of pages allocated and earmarked for bounce buffering.
>>>
>>> This patch is purely to help the hypervisor setup the TCE table, in the
>>> presence of a IOMMU.
>>
>> Then the hypervisor should be able to access the guest pages mapped for
>> DMA and these pages should be made unsecure for this to work. Where/when
>> does this happen?
> 
> This happens in the SWIOTLB code.  The code to do that is already
> upstream.  
>
> The sharing of the pages containing the SWIOTLB bounce buffers is done
> in init_svm() which calls swiotlb_update_mem_attributes() which calls
> set_memory_decrypted().  In the case of pseries, set_memory_decrypted() calls 
> uv_share_page().


This does not seem enough as when you enforce iommu_platform=on, QEMU
starts accessing virtio buffers via IOMMU so bounce buffers have to be
mapped explicitly, via H_PUT_TCE&co, where does this happen?


> 
> The code that bounces the contents of a I/O buffer through the 
> SWIOTLB buffers, is in swiotlb_bounce().
> 
>>
>>
>>>> The cover letter suggests it is for
>>>> virtio-scsi-_pci_ with 	iommu_platform=on which makes it a
>>>> normal pci device just like emulated XHCI. Thanks,
>>>
>>> Well, I guess, the cover letter is probably confusing. There are two
>>> patches, which togather enable virtio on secure guests, in the presence
>>> of IOMMU.
>>>
>>> The second patch enables virtio in the presence of a IOMMU, to use
>>> DMA_ops+SWIOTLB infrastructure, to correctly navigate the I/O to virtio
>>> devices.
>>
>> The second patch does nothing in relation to the problem being solved.
> 
> The second patch registers dma_iommu_ops with the PCI-system.  Doing so
> enables I/O to take the dma_iommu_ops path, which internally 
> leads it through the SWIOTLB path. Without that, the I/O fails to reach
> its destination.


This is not what the commit log says. What DMA ops was used before 2/2?
I thought it was NULL which should have turned into direct which then
would switch to swiotlb but since recent DMA reworks it is even harder
to tell what happens with DMA setup. Thanks,


>>> However that by itself wont work if the TCE entires are not correctly
>>> setup in the TCE tables.  The first patch; i.e this patch, helps
>>> accomplish that.
>>>> Hope this clears up the confusion.



-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04  0:04               ` Alexey Kardashevskiy
@ 2019-12-04  0:49                 ` Ram Pai
  2019-12-04  1:08                   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Ram Pai @ 2019-12-04  0:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 04/12/2019 03:52, Ram Pai wrote:
> > On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 03/12/2019 15:05, Ram Pai wrote:
> >>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> >>>>
> >>>>
> >>>> On 03/12/2019 13:08, Ram Pai wrote:
> >>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 02/12/2019 17:45, Ram Pai wrote:
> >>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> >>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
> >>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
> >>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> >>>>>>> needs to know the unencrypted entries, to update the TCE table
> >>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
> >>>>>>> Hence share the page with the hypervisor.
> >>>>>>
> >>>>>> This unsecures a page in the guest in a random place which creates an
> >>>>>> additional attack surface which is hard to exploit indeed but
> >>>>>> nevertheless it is there.
> >>>>>> A safer option would be not to use the
> >>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> >>>>>> in the guest).
> >>>>>
> >>>>>
> >>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> >>>>> automatically when IOMMU option is enabled.
> >>>>
> >>>> It is advertised by QEMU but the guest does not have to use it.
> >>>
> >>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
> >>> or just secure-guest?  
> >>
> >>
> >> Just secure.
> > 
> > hmm..  how are the TCE entries communicated to the hypervisor, if
> > hcall-multi-tce is disabled?
> 
> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> optimization.

Do you still think, secure-VM should use H_PUT_TCE and not
H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
Is there any advantage of special casing it for secure-VMs.
In fact, we could make use of as much optimization as possible.


> 
> >>>> Is not this for pci+swiotlb? 
..snip..
> >>> This patch is purely to help the hypervisor setup the TCE table, in the
> >>> presence of a IOMMU.
> >>
> >> Then the hypervisor should be able to access the guest pages mapped for
> >> DMA and these pages should be made unsecure for this to work. Where/when
> >> does this happen?
> > 
> > This happens in the SWIOTLB code.  The code to do that is already
> > upstream.  
> >
> > The sharing of the pages containing the SWIOTLB bounce buffers is done
> > in init_svm() which calls swiotlb_update_mem_attributes() which calls
> > set_memory_decrypted().  In the case of pseries, set_memory_decrypted() calls 
> > uv_share_page().
> 
> 
> This does not seem enough as when you enforce iommu_platform=on, QEMU
> starts accessing virtio buffers via IOMMU so bounce buffers have to be
> mapped explicitly, via H_PUT_TCE&co, where does this happen?
> 

I think, it happens at boot time. Every page of the guest memory is TCE
mapped, if iommu is enabled. SWIOTLB pages get implicitly TCE-mapped
as part of that operation.

RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04  0:49                 ` Ram Pai
@ 2019-12-04  1:08                   ` Alexey Kardashevskiy
  2019-12-04  3:36                     ` David Gibson
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-04  1:08 UTC (permalink / raw)
  To: Ram Pai
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david



On 04/12/2019 11:49, Ram Pai wrote:
> On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 04/12/2019 03:52, Ram Pai wrote:
>>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 03/12/2019 15:05, Ram Pai wrote:
>>>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 03/12/2019 13:08, Ram Pai wrote:
>>>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
>>>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
>>>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>>>>>>>> needs to know the unencrypted entries, to update the TCE table
>>>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
>>>>>>>>> Hence share the page with the hypervisor.
>>>>>>>>
>>>>>>>> This unsecures a page in the guest in a random place which creates an
>>>>>>>> additional attack surface which is hard to exploit indeed but
>>>>>>>> nevertheless it is there.
>>>>>>>> A safer option would be not to use the
>>>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>>>>>>>> in the guest).
>>>>>>>
>>>>>>>
>>>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
>>>>>>> automatically when IOMMU option is enabled.
>>>>>>
>>>>>> It is advertised by QEMU but the guest does not have to use it.
>>>>>
>>>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
>>>>> or just secure-guest?  
>>>>
>>>>
>>>> Just secure.
>>>
>>> hmm..  how are the TCE entries communicated to the hypervisor, if
>>> hcall-multi-tce is disabled?
>>
>> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
>> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
>> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
>> optimization.
> 
> Do you still think, secure-VM should use H_PUT_TCE and not
> H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> Is there any advantage of special casing it for secure-VMs.


Reducing the amount of insecure memory at random location.


> In fact, we could make use of as much optimization as possible.
> 
> 
>>
>>>>>> Is not this for pci+swiotlb? 
> ..snip..
>>>>> This patch is purely to help the hypervisor setup the TCE table, in the
>>>>> presence of a IOMMU.
>>>>
>>>> Then the hypervisor should be able to access the guest pages mapped for
>>>> DMA and these pages should be made unsecure for this to work. Where/when
>>>> does this happen?
>>>
>>> This happens in the SWIOTLB code.  The code to do that is already
>>> upstream.  
>>>
>>> The sharing of the pages containing the SWIOTLB bounce buffers is done
>>> in init_svm() which calls swiotlb_update_mem_attributes() which calls
>>> set_memory_decrypted().  In the case of pseries, set_memory_decrypted() calls 
>>> uv_share_page().
>>
>>
>> This does not seem enough as when you enforce iommu_platform=on, QEMU
>> starts accessing virtio buffers via IOMMU so bounce buffers have to be
>> mapped explicitly, via H_PUT_TCE&co, where does this happen?
>>
> 
> I think, it happens at boot time. Every page of the guest memory is TCE
> mapped, if iommu is enabled. SWIOTLB pages get implicitly TCE-mapped
> as part of that operation.


Ah I see. This works via the huge dma window. Ok, makes sense now.

It just seems like a waste that we could map swiotlb 1:1 via the always
existing small DMA window but instead we rely on a huge window to map
these small buffers. This way we are wasting the entire 32bit window and
most of the huge window. We may fix it in the future (not right now) but
for now I would still avoid unsecuring additional memory. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04  1:08                   ` Alexey Kardashevskiy
@ 2019-12-04  3:36                     ` David Gibson
  2019-12-04 20:42                       ` Ram Pai
  0 siblings, 1 reply; 22+ messages in thread
From: David Gibson @ 2019-12-04  3:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, Ram Pai, mdroth, linux-kernel, ram.n.pai, cai,
	tglx, sukadev, linuxppc-dev, hch, bauerman

[-- Attachment #1: Type: text/plain, Size: 4796 bytes --]

On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 04/12/2019 11:49, Ram Pai wrote:
> > On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 04/12/2019 03:52, Ram Pai wrote:
> >>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> >>>>
> >>>>
> >>>> On 03/12/2019 15:05, Ram Pai wrote:
> >>>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 03/12/2019 13:08, Ram Pai wrote:
> >>>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
> >>>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> >>>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
> >>>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
> >>>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> >>>>>>>>> needs to know the unencrypted entries, to update the TCE table
> >>>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
> >>>>>>>>> Hence share the page with the hypervisor.
> >>>>>>>>
> >>>>>>>> This unsecures a page in the guest in a random place which creates an
> >>>>>>>> additional attack surface which is hard to exploit indeed but
> >>>>>>>> nevertheless it is there.
> >>>>>>>> A safer option would be not to use the
> >>>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> >>>>>>>> in the guest).
> >>>>>>>
> >>>>>>>
> >>>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> >>>>>>> automatically when IOMMU option is enabled.
> >>>>>>
> >>>>>> It is advertised by QEMU but the guest does not have to use it.
> >>>>>
> >>>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
> >>>>> or just secure-guest?  
> >>>>
> >>>>
> >>>> Just secure.
> >>>
> >>> hmm..  how are the TCE entries communicated to the hypervisor, if
> >>> hcall-multi-tce is disabled?
> >>
> >> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> >> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> >> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> >> optimization.
> > 
> > Do you still think, secure-VM should use H_PUT_TCE and not
> > H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> > Is there any advantage of special casing it for secure-VMs.
> 
> 
> Reducing the amount of insecure memory at random location.

The other approach we could use for that - which would still allow
H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
same pool that we use for the bounce buffers.  I assume there must
already be some sort of allocator for that?

> > In fact, we could make use of as much optimization as possible.
> > 
> > 
> >>
> >>>>>> Is not this for pci+swiotlb? 
> > ..snip..
> >>>>> This patch is purely to help the hypervisor setup the TCE table, in the
> >>>>> presence of a IOMMU.
> >>>>
> >>>> Then the hypervisor should be able to access the guest pages mapped for
> >>>> DMA and these pages should be made unsecure for this to work. Where/when
> >>>> does this happen?
> >>>
> >>> This happens in the SWIOTLB code.  The code to do that is already
> >>> upstream.  
> >>>
> >>> The sharing of the pages containing the SWIOTLB bounce buffers is done
> >>> in init_svm() which calls swiotlb_update_mem_attributes() which calls
> >>> set_memory_decrypted().  In the case of pseries, set_memory_decrypted() calls 
> >>> uv_share_page().
> >>
> >>
> >> This does not seem enough as when you enforce iommu_platform=on, QEMU
> >> starts accessing virtio buffers via IOMMU so bounce buffers have to be
> >> mapped explicitly, via H_PUT_TCE&co, where does this happen?
> >>
> > 
> > I think, it happens at boot time. Every page of the guest memory is TCE
> > mapped, if iommu is enabled. SWIOTLB pages get implicitly TCE-mapped
> > as part of that operation.
> 
> 
> Ah I see. This works via the huge dma window. Ok, makes sense now.
> 
> It just seems like a waste that we could map swiotlb 1:1 via the always
> existing small DMA window but instead we rely on a huge window to map
> these small buffers. This way we are wasting the entire 32bit window and
> most of the huge window. We may fix it in the future (not right now) but
> for now I would still avoid unsecuring additional memory. Thanks,
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-02  6:45 ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
  2019-12-02  6:45   ` [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell Ram Pai
  2019-12-03  0:56   ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Alexey Kardashevskiy
@ 2019-12-04 18:26   ` Leonardo Bras
  2019-12-04 20:27     ` Ram Pai
  2 siblings, 1 reply; 22+ messages in thread
From: Leonardo Bras @ 2019-12-04 18:26 UTC (permalink / raw)
  To: Ram Pai, linuxppc-dev, mpe
  Cc: andmike, mst, aik, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, hch, bauerman, david

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

On Sun, 2019-12-01 at 22:45 -0800, Ram Pai wrote:
> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>          * from iommu_alloc{,_sg}()
>          */
>         if (!tcep) {
> -               tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> -               /* If allocation fails, fall back to the loop implementation */
> +               tcep = alloc_tce_page();
>                 if (!tcep) {
>                         local_irq_restore(flags);
>                         return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,

The comment about failing allocation was removed, but I see no chage of
behaviour here. 

Can you please explain what/where it changes? 

Best regards,

Leonardo

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04 18:26   ` Leonardo Bras
@ 2019-12-04 20:27     ` Ram Pai
  0 siblings, 0 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-04 20:27 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: andmike, mst, aik, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, david

On Wed, Dec 04, 2019 at 03:26:50PM -0300, Leonardo Bras wrote:
> On Sun, 2019-12-01 at 22:45 -0800, Ram Pai wrote:
> > @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >          * from iommu_alloc{,_sg}()
> >          */
> >         if (!tcep) {
> > -               tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> > -               /* If allocation fails, fall back to the loop implementation */
> > +               tcep = alloc_tce_page();
> >                 if (!tcep) {
> >                         local_irq_restore(flags);
> >                         return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> 
> The comment about failing allocation was removed, but I see no chage of
> behaviour here. 
> 
> Can you please explain what/where it changes? 

You observed it right. The comment should stay put.  Will have it fixed
in my next version.

Thanks,
RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04  3:36                     ` David Gibson
@ 2019-12-04 20:42                       ` Ram Pai
  2019-12-04 22:26                         ` Alexey Kardashevskiy
  2019-12-05  8:28                         ` Christoph Hellwig
  0 siblings, 2 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-04 20:42 UTC (permalink / raw)
  To: David Gibson
  Cc: andmike, mst, Alexey Kardashevskiy, mdroth, linux-kernel,
	ram.n.pai, cai, tglx, sukadev, linuxppc-dev, hch, bauerman

On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 04/12/2019 11:49, Ram Pai wrote:
> > > On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> > >>
> > >>
> > >> On 04/12/2019 03:52, Ram Pai wrote:
> > >>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> > >>>>
> > >>>>
> > >>>> On 03/12/2019 15:05, Ram Pai wrote:
> > >>>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>> On 03/12/2019 13:08, Ram Pai wrote:
> > >>>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
> > >>>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> > >>>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
> > >>>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
> > >>>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> > >>>>>>>>> needs to know the unencrypted entries, to update the TCE table
> > >>>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
> > >>>>>>>>> Hence share the page with the hypervisor.
> > >>>>>>>>
> > >>>>>>>> This unsecures a page in the guest in a random place which creates an
> > >>>>>>>> additional attack surface which is hard to exploit indeed but
> > >>>>>>>> nevertheless it is there.
> > >>>>>>>> A safer option would be not to use the
> > >>>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> > >>>>>>>> in the guest).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> > >>>>>>> automatically when IOMMU option is enabled.
> > >>>>>>
> > >>>>>> It is advertised by QEMU but the guest does not have to use it.
> > >>>>>
> > >>>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
> > >>>>> or just secure-guest?  
> > >>>>
> > >>>>
> > >>>> Just secure.
> > >>>
> > >>> hmm..  how are the TCE entries communicated to the hypervisor, if
> > >>> hcall-multi-tce is disabled?
> > >>
> > >> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> > >> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> > >> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> > >> optimization.
> > > 
> > > Do you still think, secure-VM should use H_PUT_TCE and not
> > > H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> > > Is there any advantage of special casing it for secure-VMs.
> > 
> > 
> > Reducing the amount of insecure memory at random location.
> 
> The other approach we could use for that - which would still allow
> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> same pool that we use for the bounce buffers.  I assume there must
> already be some sort of allocator for that?

The allocator for swiotlb is buried deep in the swiotlb code. It is 
not exposed to the outside-swiotlb world. Will have to do major surgery
to expose it.

I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
and unshare the page.  This will address Alexey's concern of having
shared pages at random location, and will also give me my performance
optimization.  Alexey: ok?

RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04 20:42                       ` Ram Pai
@ 2019-12-04 22:26                         ` Alexey Kardashevskiy
  2019-12-05  2:15                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.y Ram Pai
  2019-12-06 23:10                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
  2019-12-05  8:28                         ` Christoph Hellwig
  1 sibling, 2 replies; 22+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-04 22:26 UTC (permalink / raw)
  To: Ram Pai, David Gibson
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman



On 05/12/2019 07:42, Ram Pai wrote:
> On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
>> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 04/12/2019 11:49, Ram Pai wrote:
>>>> On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 04/12/2019 03:52, Ram Pai wrote:
>>>>>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/12/2019 15:05, Ram Pai wrote:
>>>>>>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/12/2019 13:08, Ram Pai wrote:
>>>>>>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
>>>>>>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>>>>>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>>>>>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
>>>>>>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>>>>>>>>>>> needs to know the unencrypted entries, to update the TCE table
>>>>>>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
>>>>>>>>>>>> Hence share the page with the hypervisor.
>>>>>>>>>>>
>>>>>>>>>>> This unsecures a page in the guest in a random place which creates an
>>>>>>>>>>> additional attack surface which is hard to exploit indeed but
>>>>>>>>>>> nevertheless it is there.
>>>>>>>>>>> A safer option would be not to use the
>>>>>>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>>>>>>>>>>> in the guest).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
>>>>>>>>>> automatically when IOMMU option is enabled.
>>>>>>>>>
>>>>>>>>> It is advertised by QEMU but the guest does not have to use it.
>>>>>>>>
>>>>>>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
>>>>>>>> or just secure-guest?  
>>>>>>>
>>>>>>>
>>>>>>> Just secure.
>>>>>>
>>>>>> hmm..  how are the TCE entries communicated to the hypervisor, if
>>>>>> hcall-multi-tce is disabled?
>>>>>
>>>>> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
>>>>> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
>>>>> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
>>>>> optimization.
>>>>
>>>> Do you still think, secure-VM should use H_PUT_TCE and not
>>>> H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
>>>> Is there any advantage of special casing it for secure-VMs.
>>>
>>>
>>> Reducing the amount of insecure memory at random location.
>>
>> The other approach we could use for that - which would still allow
>> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
>> same pool that we use for the bounce buffers.  I assume there must
>> already be some sort of allocator for that?
> 
> The allocator for swiotlb is buried deep in the swiotlb code. It is 
> not exposed to the outside-swiotlb world. Will have to do major surgery
> to expose it.
> 
> I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
> and unshare the page.  This will address Alexey's concern of having
> shared pages at random location, and will also give me my performance
> optimization.  Alexey: ok?


I really do not see the point. I really think we should to 1:1 mapping
of swtiotlb buffers using the default 32bit window using H_PUT_TCE and
this should be more than enough, I do not think the amount of code will
be dramatically different compared to unsecuring and securing a page or
using one of swtiotlb pages for this purpose. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.y
  2019-12-04 22:26                         ` Alexey Kardashevskiy
@ 2019-12-05  2:15                           ` Ram Pai
  2019-12-06 23:10                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
  1 sibling, 0 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-05  2:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, David Gibson

On Thu, Dec 05, 2019 at 09:26:14AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 05/12/2019 07:42, Ram Pai wrote:
> > On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
> >> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
> >>>
> >>>
> >>> On 04/12/2019 11:49, Ram Pai wrote:
> >>>> On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> >>>>>
> >>>>>
> >>>>> On 04/12/2019 03:52, Ram Pai wrote:
> >>>>>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03/12/2019 15:05, Ram Pai wrote:
> >>>>>>>> On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 03/12/2019 13:08, Ram Pai wrote:
> >>>>>>>>>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 02/12/2019 17:45, Ram Pai wrote:
> >>>>>>>>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
> >>>>>>>>>>>> its parameters. One page is dedicated per cpu, for the lifetime of the
> >>>>>>>>>>>> kernel for this purpose. On secure VMs, contents of this page, when
> >>>>>>>>>>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
> >>>>>>>>>>>> needs to know the unencrypted entries, to update the TCE table
> >>>>>>>>>>>> accordingly.  There is nothing secret or sensitive about these entries.
> >>>>>>>>>>>> Hence share the page with the hypervisor.
> >>>>>>>>>>>
> >>>>>>>>>>> This unsecures a page in the guest in a random place which creates an
> >>>>>>>>>>> additional attack surface which is hard to exploit indeed but
> >>>>>>>>>>> nevertheless it is there.
> >>>>>>>>>>> A safer option would be not to use the
> >>>>>>>>>>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
> >>>>>>>>>>> in the guest).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> >>>>>>>>>> automatically when IOMMU option is enabled.
> >>>>>>>>>
> >>>>>>>>> It is advertised by QEMU but the guest does not have to use it.
> >>>>>>>>
> >>>>>>>> Are you suggesting that even normal-guest, not use hcall-multi-tce?
> >>>>>>>> or just secure-guest?  
> >>>>>>>
> >>>>>>>
> >>>>>>> Just secure.
> >>>>>>
> >>>>>> hmm..  how are the TCE entries communicated to the hypervisor, if
> >>>>>> hcall-multi-tce is disabled?
> >>>>>
> >>>>> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> >>>>> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> >>>>> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> >>>>> optimization.
> >>>>
> >>>> Do you still think, secure-VM should use H_PUT_TCE and not
> >>>> H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> >>>> Is there any advantage of special casing it for secure-VMs.
> >>>
> >>>
> >>> Reducing the amount of insecure memory at random location.
> >>
> >> The other approach we could use for that - which would still allow
> >> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> >> same pool that we use for the bounce buffers.  I assume there must
> >> already be some sort of allocator for that?
> > 
> > The allocator for swiotlb is buried deep in the swiotlb code. It is 
> > not exposed to the outside-swiotlb world. Will have to do major surgery
> > to expose it.
> > 
> > I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
> > and unshare the page.  This will address Alexey's concern of having
> > shared pages at random location, and will also give me my performance
> > optimization.  Alexey: ok?
> 
> 
> I really do not see the point. I really think we should to 1:1 mapping
> of swtiotlb buffers using the default 32bit window using H_PUT_TCE and
> this should be more than enough, I do not think the amount of code will
> be dramatically different compared to unsecuring and securing a page or
> using one of swtiotlb pages for this purpose. Thanks,

Ok. I will address your major concern -- "do not create new shared pages
at random location"  in my next version of the patch.  Using the 32bit
DMA window just to map the SWIOTLB buffers, will be some effort. Hope 
we can stage it that way.

RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.
  2019-12-04 20:42                       ` Ram Pai
  2019-12-04 22:26                         ` Alexey Kardashevskiy
@ 2019-12-05  8:28                         ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-12-05  8:28 UTC (permalink / raw)
  To: Ram Pai
  Cc: andmike, mst, Alexey Kardashevskiy, mdroth, linux-kernel,
	ram.n.pai, cai, tglx, sukadev, linuxppc-dev, hch, bauerman,
	David Gibson

On Wed, Dec 04, 2019 at 12:42:32PM -0800, Ram Pai wrote:
> > The other approach we could use for that - which would still allow
> > H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> > same pool that we use for the bounce buffers.  I assume there must
> > already be some sort of allocator for that?
> 
> The allocator for swiotlb is buried deep in the swiotlb code. It is 
> not exposed to the outside-swiotlb world. Will have to do major surgery
> to expose it.

I don't think it would require all that much changes, but I'd really
hate the layering of calling into it directly.  Do we have a struct
device associated with the iommu that doesn't get iommu translations
themselves?  If we do a dma_alloc_coherent on that you'll get the
memory pool for free.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor
  2019-12-04 22:26                         ` Alexey Kardashevskiy
  2019-12-05  2:15                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.y Ram Pai
@ 2019-12-06 23:10                           ` Ram Pai
  1 sibling, 0 replies; 22+ messages in thread
From: Ram Pai @ 2019-12-06 23:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: andmike, mst, mdroth, linux-kernel, ram.n.pai, cai, tglx,
	sukadev, linuxppc-dev, hch, bauerman, David Gibson

On Thu, Dec 05, 2019 at 09:26:14AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 05/12/2019 07:42, Ram Pai wrote:
.snip...
> >>>> Do you still think, secure-VM should use H_PUT_TCE and not
> >>>> H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> >>>> Is there any advantage of special casing it for secure-VMs.
> >>>
> >>>
> >>> Reducing the amount of insecure memory at random location.
> >>
> >> The other approach we could use for that - which would still allow
> >> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> >> same pool that we use for the bounce buffers.  I assume there must
> >> already be some sort of allocator for that?
> > 
> > The allocator for swiotlb is buried deep in the swiotlb code. It is 
> > not exposed to the outside-swiotlb world. Will have to do major surgery
> > to expose it.
> > 
> > I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
> > and unshare the page.  This will address Alexey's concern of having
> > shared pages at random location, and will also give me my performance
> > optimization.  Alexey: ok?
> 
> 
> I really do not see the point. I really think we should to 1:1 mapping
> of swtiotlb buffers using the default 32bit window using H_PUT_TCE and
> this should be more than enough, I do not think the amount of code will
> be dramatically different compared to unsecuring and securing a page or
> using one of swtiotlb pages for this purpose. Thanks,

Ok. there are three different issues to be addressed here.

(a) How to map the TCE entries in the TCE table?  Should we use H_PUT_TCE, 
	or H_PUT_INDIRECT_TCE?

(b) How much of the guest memory must be mapped in the TCE table? Should
   it be the entire guest memory? or the memory used by the SWIOTLB?

(c) What mapping window must to be used? Is it the 64bit ddw? or the 
	default 32 bit ddw?

Regardless of how we resolve issue (b) and (c),  we still need to
resolve (a).  The main concern you have about solution (a) is
that, random pages are permanently shared, something that can be
exploited and can cause security issues.  I tend to agree, this
is possible, though I am not sure how. But yes we need to address
this concern, since security is paramount to Secure Virtual Machines.

The way to resolve (a) is  --
(i) grab a page from the SWIOTLB pool and use H_PUT_INDIRECT_TCE
	OR
(ii) simply use H_PUT_TCE.
	OR
(iii) share the page prior to H_PUT_INDIRECT_TCE, and
	unshare the page once done.

Solution (i) has layering violation; as Christoph alluded to in his
previous reply. The swiotlb buffers are meant for I/O and DMA related
actitivy.  We will be abusing these swiotlb pages to communicate TCE
entries to the hypervisor.  Secondly IOMMU code has no idea where its
pages are sourced from and should not know either. I am uncomfortable
going this route. There is some upstream discussion about having a seperate
pool of shared pages on secure VM, https://lkml.org/lkml/2019/11/14/381.
That solution; when ready, may be exploitable here.

I have coded solution (ii)  and it works. But boot path slows down
significantly. Huge amount H_PUT_TCE hcalls. Very hurtful.

I strongly think, solution (iii) is the right way to go. I have coded
it, it works and bootpath is much faster.  However I am not sure if you
have a concern with this solution.

In any case, I am sending my next version of the patch based on solution
(iii).

Once this is addressed, I will address (b) and (c).

RP


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02  6:45 [PATCH v4 0/2] Enable IOMMU support for pseries Secure VMs Ram Pai
2019-12-02  6:45 ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
2019-12-02  6:45   ` [PATCH v4 2/2] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VMs aswell Ram Pai
2019-12-03  0:58     ` Alexey Kardashevskiy
2019-12-03  4:07       ` Ram Pai
2019-12-03  0:56   ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Alexey Kardashevskiy
2019-12-03  2:08     ` Ram Pai
2019-12-03  2:15       ` Alexey Kardashevskiy
2019-12-03  4:05         ` Ram Pai
2019-12-03  4:24           ` Alexey Kardashevskiy
2019-12-03 16:52             ` Ram Pai
2019-12-04  0:04               ` Alexey Kardashevskiy
2019-12-04  0:49                 ` Ram Pai
2019-12-04  1:08                   ` Alexey Kardashevskiy
2019-12-04  3:36                     ` David Gibson
2019-12-04 20:42                       ` Ram Pai
2019-12-04 22:26                         ` Alexey Kardashevskiy
2019-12-05  2:15                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.y Ram Pai
2019-12-06 23:10                           ` [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor Ram Pai
2019-12-05  8:28                         ` Christoph Hellwig
2019-12-04 18:26   ` Leonardo Bras
2019-12-04 20:27     ` Ram Pai

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git