linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
@ 2020-03-26 16:29 Alexander Graf
  2020-03-26 17:05 ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Alexander Graf @ 2020-03-26 16:29 UTC (permalink / raw)
  To: iommu
  Cc: Konrad Rzeszutek Wilk, x86, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, linux-doc, linux-kernel, Mark Rutland, dwmw, benh,
	Jan Kiszka, alcioa, aggh, aagch, dhr

The swiotlb is a very convenient fallback mechanism for bounce buffering of
DMAable data. It is usually used for the compatibility case where devices
can only DMA to a "low region".

However, in some scenarios this "low region" may be bound even more
heavily. For example, there are embedded system where only an SRAM region
is shared between device and CPU. There are also heterogeneous computing
scenarios where only a subset of RAM is cache coherent between the
components of the system. There are partitioning hypervisors, where
a "control VM" that implements device emulation has limited view into a
partition's memory for DMA capabilities due to safety concerns.

This patch adds a command line driven mechanism to move all DMA memory into
a predefined shared memory region which may or may not be part of the
physical address layout of the Operating System.

Ideally, the typical path to set this configuration would be through Device
Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
instead configure the system purely through kernel command line options.

I'm sure other people will find the functionality useful going forward
though and extend it to be triggered by DT/ACPI in the future.

Signed-off-by: Alexander Graf <graf@amazon.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +-
 Documentation/x86/x86_64/boot-options.rst       |  4 ++-
 kernel/dma/swiotlb.c                            | 46 +++++++++++++++++++++++--
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d230bc..d085d55c3cbe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4785,11 +4785,12 @@
 			it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
 
 	swiotlb=	[ARM,IA-64,PPC,MIPS,X86]
-			Format: { <int> | force | noforce }
+			Format: { <int> | force | noforce | addr=<phys addr> }
 			<int> -- Number of I/O TLB slabs
 			force -- force using of bounce buffers even if they
 			         wouldn't be automatically used by the kernel
 			noforce -- Never use bounce buffers (for debugging)
+			addr=<phys addr> -- Try to allocate SWIOTLB at defined address
 
 	switches=	[HW,M68k]
 
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index 2b98efb5ba7f..ca46c57b68c9 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware IOMMU:
 iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
 implementation:
 
-    swiotlb=<pages>[,force]
+    swiotlb=<pages>[,force][,addr=<phys addr>]
       <pages>
         Prereserve that many 128K pages for the software IO bounce buffering.
       force
         Force all IO through the software TLB.
+      addr=<phys addr>
+        Try to allocate SWIOTLB at defined address
 
 Settings for the IBM Calgary hardware IOMMU currently found in IBM
 pSeries and xSeries machines
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd20..83da0caa2f93 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -46,6 +46,7 @@
 #include <linux/init.h>
 #include <linux/memblock.h>
 #include <linux/iommu-helper.h>
+#include <linux/io.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/swiotlb.h>
@@ -102,6 +103,12 @@ unsigned int max_segment;
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 static phys_addr_t *io_tlb_orig_addr;
 
+/*
+ * The TLB phys addr may be defined on the command line. Store it here if it is.
+ */
+static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
+
+
 /*
  * Protect the above data structures in the map and unmap calls
  */
@@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
 	}
 	if (*str == ',')
 		++str;
-	if (!strcmp(str, "force")) {
+	if (!strncmp(str, "force", 5)) {
 		swiotlb_force = SWIOTLB_FORCE;
-	} else if (!strcmp(str, "noforce")) {
+		str += 5;
+	} else if (!strncmp(str, "noforce", 7)) {
 		swiotlb_force = SWIOTLB_NO_FORCE;
 		io_tlb_nslabs = 1;
+		str += 7;
+	}
+
+	if (*str == ',')
+		++str;
+	if (!strncmp(str, "addr=", 5)) {
+		char *addrstr = str + 5;
+
+		io_tlb_addr = kstrtoul(addrstr, 0, &str);
+		if (addrstr == str)
+			io_tlb_addr = INVALID_PHYS_ADDR;
 	}
 
 	return 0;
@@ -239,6 +258,25 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	return 0;
 }
 
+static int __init swiotlb_init_io(int verbose, unsigned long bytes)
+{
+	unsigned __iomem char *vstart;
+
+	if (io_tlb_addr == INVALID_PHYS_ADDR)
+		return -EINVAL;
+
+	vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
+	if (!vstart)
+		return -EINVAL;
+
+	if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
+		memunmap(vstart);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
@@ -257,6 +295,10 @@ swiotlb_init(int verbose)
 
 	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 
+	/* Map IO TLB from device memory */
+	if (!swiotlb_init_io(verbose, bytes))
+		return;
+
 	/* Get IO TLB memory from the low pages */
 	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
 	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
-- 
2.16.4




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 16:29 [PATCH] swiotlb: Allow swiotlb to live at pre-defined address Alexander Graf
@ 2020-03-26 17:05 ` Christoph Hellwig
  2020-03-26 17:11   ` Alexander Graf
  2020-03-27  6:05 ` kbuild test robot
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2020-03-26 17:05 UTC (permalink / raw)
  To: Alexander Graf
  Cc: iommu, Konrad Rzeszutek Wilk, x86, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linux-doc, linux-kernel,
	Mark Rutland, dwmw, benh, Jan Kiszka, alcioa, aggh, aagch, dhr

On Thu, Mar 26, 2020 at 05:29:22PM +0100, Alexander Graf wrote:
> The swiotlb is a very convenient fallback mechanism for bounce buffering of
> DMAable data. It is usually used for the compatibility case where devices
> can only DMA to a "low region".
> 
> However, in some scenarios this "low region" may be bound even more
> heavily. For example, there are embedded system where only an SRAM region
> is shared between device and CPU. There are also heterogeneous computing
> scenarios where only a subset of RAM is cache coherent between the
> components of the system. There are partitioning hypervisors, where
> a "control VM" that implements device emulation has limited view into a
> partition's memory for DMA capabilities due to safety concerns.
> 
> This patch adds a command line driven mechanism to move all DMA memory into
> a predefined shared memory region which may or may not be part of the
> physical address layout of the Operating System.
> 
> Ideally, the typical path to set this configuration would be through Device
> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> instead configure the system purely through kernel command line options.
> 
> I'm sure other people will find the functionality useful going forward
> though and extend it to be triggered by DT/ACPI in the future.

I'm totally against hacking in a kernel parameter for this.  We'll need
a proper documented DT or ACPI way.  We also need to feed this information
into the actual DMA bounce buffering decisions and not just the swiotlb
placement.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 17:05 ` Christoph Hellwig
@ 2020-03-26 17:11   ` Alexander Graf
  2020-03-26 17:16     ` David Woodhouse
  2020-03-30 13:24     ` Mark Rutland
  0 siblings, 2 replies; 15+ messages in thread
From: Alexander Graf @ 2020-03-26 17:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Konrad Rzeszutek Wilk, x86, Marek Szyprowski,
	Robin Murphy, linux-doc, linux-kernel, Mark Rutland, dwmw, benh,
	Jan Kiszka, alcioa, aggh, aagch, dhr



On 26.03.20 18:05, Christoph Hellwig wrote:
> 
> On Thu, Mar 26, 2020 at 05:29:22PM +0100, Alexander Graf wrote:
>> The swiotlb is a very convenient fallback mechanism for bounce buffering of
>> DMAable data. It is usually used for the compatibility case where devices
>> can only DMA to a "low region".
>>
>> However, in some scenarios this "low region" may be bound even more
>> heavily. For example, there are embedded system where only an SRAM region
>> is shared between device and CPU. There are also heterogeneous computing
>> scenarios where only a subset of RAM is cache coherent between the
>> components of the system. There are partitioning hypervisors, where
>> a "control VM" that implements device emulation has limited view into a
>> partition's memory for DMA capabilities due to safety concerns.
>>
>> This patch adds a command line driven mechanism to move all DMA memory into
>> a predefined shared memory region which may or may not be part of the
>> physical address layout of the Operating System.
>>
>> Ideally, the typical path to set this configuration would be through Device
>> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
>> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
>> instead configure the system purely through kernel command line options.
>>
>> I'm sure other people will find the functionality useful going forward
>> though and extend it to be triggered by DT/ACPI in the future.
> 
> I'm totally against hacking in a kernel parameter for this.  We'll need
> a proper documented DT or ACPI way.  

I'm with you on that sentiment, but in the environment I'm currently 
looking at, we have neither DT nor ACPI: The kernel gets purely 
configured via kernel command line. For other unenumerable artifacts on 
the system, such as virtio-mmio platform devices, that works well enough 
and also basically "hacks a kernel parameter" to specify the system layout.

> We also need to feed this information
> into the actual DMA bounce buffering decisions and not just the swiotlb
> placement.

Care to elaborate a bit here? I was under the impression that 
"swiotlb=force" basically allows you to steer the DMA bounce buffering 
decisions already.


Thanks!

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 17:11   ` Alexander Graf
@ 2020-03-26 17:16     ` David Woodhouse
  2020-03-30 13:24     ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2020-03-26 17:16 UTC (permalink / raw)
  To: Alexander Graf, Christoph Hellwig
  Cc: iommu, Konrad Rzeszutek Wilk, x86, Marek Szyprowski,
	Robin Murphy, linux-doc, linux-kernel, Mark Rutland, dwmw, benh,
	Jan Kiszka, alcioa, aggh, aagch, dhr

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

On Thu, 2020-03-26 at 18:11 +0100, Alexander Graf wrote:
> I'm with you on that sentiment, but in the environment I'm currently 
> looking at, we have neither DT nor ACPI: The kernel gets purely 
> configured via kernel command line. For other unenumerable artifacts on 
> the system, such as virtio-mmio platform devices, that works well enough 
> and also basically "hacks a kernel parameter" to specify the system layout.

Well... maybe it should also feed in a DT for those too?


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 16:29 [PATCH] swiotlb: Allow swiotlb to live at pre-defined address Alexander Graf
  2020-03-26 17:05 ` Christoph Hellwig
@ 2020-03-27  6:05 ` kbuild test robot
  2020-03-27  9:58 ` Jan Kiszka
  2020-03-28 11:57 ` Dave Young
  3 siblings, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2020-03-27  6:05 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kbuild-all, iommu@lists.linux-foundation.org,
	Konrad Rzeszutek Wilk, x86, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, linux-doc, linux-kernel, Mark Rutland, dwmw, benh,
	Jan Kiszka, alcioa, aggh, aagch, dhr

Hi Alexander,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on swiotlb/linux-next tip/x86/core v5.6-rc7 next-20200326]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Alexander-Graf/swiotlb-Allow-swiotlb-to-live-at-pre-defined-address/20200327-062125
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9420e8ade4353a6710908ffafa23ecaf1caa0123
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-187-gbff9b106-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   kernel/dma/swiotlb.c:97:14: sparse: sparse: symbol 'max_segment' was not declared. Should it be static?
>> kernel/dma/swiotlb.c:143:53: sparse: sparse: incorrect type in argument 3 (different base types) @@    expected unsigned long *res @@    got  long *res @@
>> kernel/dma/swiotlb.c:143:53: sparse:    expected unsigned long *res
>> kernel/dma/swiotlb.c:143:53: sparse:    got char **
>> kernel/dma/swiotlb.c:268:16: sparse: sparse: incorrect type in assignment (different address spaces) @@    expected unsigned char [noderef] <asn:2> *vstart @@    got n:2> *vstart @@
>> kernel/dma/swiotlb.c:268:16: sparse:    expected unsigned char [noderef] <asn:2> *vstart
>> kernel/dma/swiotlb.c:268:16: sparse:    got void *
>> kernel/dma/swiotlb.c:272:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char *tlb @@    got unsigned char [noderef] <aschar *tlb @@
>> kernel/dma/swiotlb.c:272:35: sparse:    expected char *tlb
>> kernel/dma/swiotlb.c:272:35: sparse:    got unsigned char [noderef] <asn:2> *vstart
>> kernel/dma/swiotlb.c:273:26: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected void *addr @@    got unsigned char [noderef] <asvoid *addr @@
>> kernel/dma/swiotlb.c:273:26: sparse:    expected void *addr
   kernel/dma/swiotlb.c:273:26: sparse:    got unsigned char [noderef] <asn:2> *vstart

vim +143 kernel/dma/swiotlb.c

   118	
   119	static int __init
   120	setup_io_tlb_npages(char *str)
   121	{
   122		if (isdigit(*str)) {
   123			io_tlb_nslabs = simple_strtoul(str, &str, 0);
   124			/* avoid tail segment of size < IO_TLB_SEGSIZE */
   125			io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
   126		}
   127		if (*str == ',')
   128			++str;
   129		if (!strncmp(str, "force", 5)) {
   130			swiotlb_force = SWIOTLB_FORCE;
   131			str += 5;
   132		} else if (!strncmp(str, "noforce", 7)) {
   133			swiotlb_force = SWIOTLB_NO_FORCE;
   134			io_tlb_nslabs = 1;
   135			str += 7;
   136		}
   137	
   138		if (*str == ',')
   139			++str;
   140		if (!strncmp(str, "addr=", 5)) {
   141			char *addrstr = str + 5;
   142	
 > 143			io_tlb_addr = kstrtoul(addrstr, 0, &str);
   144			if (addrstr == str)
   145				io_tlb_addr = INVALID_PHYS_ADDR;
   146		}
   147	
   148		return 0;
   149	}
   150	early_param("swiotlb", setup_io_tlb_npages);
   151	
   152	static bool no_iotlb_memory;
   153	
   154	unsigned long swiotlb_nr_tbl(void)
   155	{
   156		return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
   157	}
   158	EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
   159	
   160	unsigned int swiotlb_max_segment(void)
   161	{
   162		return unlikely(no_iotlb_memory) ? 0 : max_segment;
   163	}
   164	EXPORT_SYMBOL_GPL(swiotlb_max_segment);
   165	
   166	void swiotlb_set_max_segment(unsigned int val)
   167	{
   168		if (swiotlb_force == SWIOTLB_FORCE)
   169			max_segment = 1;
   170		else
   171			max_segment = rounddown(val, PAGE_SIZE);
   172	}
   173	
   174	/* default to 64MB */
   175	#define IO_TLB_DEFAULT_SIZE (64UL<<20)
   176	unsigned long swiotlb_size_or_default(void)
   177	{
   178		unsigned long size;
   179	
   180		size = io_tlb_nslabs << IO_TLB_SHIFT;
   181	
   182		return size ? size : (IO_TLB_DEFAULT_SIZE);
   183	}
   184	
   185	void swiotlb_print_info(void)
   186	{
   187		unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
   188	
   189		if (no_iotlb_memory) {
   190			pr_warn("No low mem\n");
   191			return;
   192		}
   193	
   194		pr_info("mapped [mem %#010llx-%#010llx] (%luMB)\n",
   195		       (unsigned long long)io_tlb_start,
   196		       (unsigned long long)io_tlb_end,
   197		       bytes >> 20);
   198	}
   199	
   200	/*
   201	 * Early SWIOTLB allocation may be too early to allow an architecture to
   202	 * perform the desired operations.  This function allows the architecture to
   203	 * call SWIOTLB when the operations are possible.  It needs to be called
   204	 * before the SWIOTLB memory is used.
   205	 */
   206	void __init swiotlb_update_mem_attributes(void)
   207	{
   208		void *vaddr;
   209		unsigned long bytes;
   210	
   211		if (no_iotlb_memory || late_alloc)
   212			return;
   213	
   214		vaddr = phys_to_virt(io_tlb_start);
   215		bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
   216		set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
   217		memset(vaddr, 0, bytes);
   218	}
   219	
   220	int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
   221	{
   222		unsigned long i, bytes;
   223		size_t alloc_size;
   224	
   225		bytes = nslabs << IO_TLB_SHIFT;
   226	
   227		io_tlb_nslabs = nslabs;
   228		io_tlb_start = __pa(tlb);
   229		io_tlb_end = io_tlb_start + bytes;
   230	
   231		/*
   232		 * Allocate and initialize the free list array.  This array is used
   233		 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
   234		 * between io_tlb_start and io_tlb_end.
   235		 */
   236		alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
   237		io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
   238		if (!io_tlb_list)
   239			panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
   240			      __func__, alloc_size, PAGE_SIZE);
   241	
   242		alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
   243		io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
   244		if (!io_tlb_orig_addr)
   245			panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
   246			      __func__, alloc_size, PAGE_SIZE);
   247	
   248		for (i = 0; i < io_tlb_nslabs; i++) {
   249			io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
   250			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
   251		}
   252		io_tlb_index = 0;
   253	
   254		if (verbose)
   255			swiotlb_print_info();
   256	
   257		swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
   258		return 0;
   259	}
   260	
   261	static int __init swiotlb_init_io(int verbose, unsigned long bytes)
   262	{
   263		unsigned __iomem char *vstart;
   264	
   265		if (io_tlb_addr == INVALID_PHYS_ADDR)
   266			return -EINVAL;
   267	
 > 268		vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
   269		if (!vstart)
   270			return -EINVAL;
   271	
 > 272		if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
 > 273			memunmap(vstart);
   274			return -EINVAL;
   275		}
   276	
   277		return 0;
   278	}
   279	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 16:29 [PATCH] swiotlb: Allow swiotlb to live at pre-defined address Alexander Graf
  2020-03-26 17:05 ` Christoph Hellwig
  2020-03-27  6:05 ` kbuild test robot
@ 2020-03-27  9:58 ` Jan Kiszka
  2020-03-28 11:57 ` Dave Young
  3 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2020-03-27  9:58 UTC (permalink / raw)
  To: Alexander Graf, iommu
  Cc: Konrad Rzeszutek Wilk, x86, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, linux-doc, linux-kernel, Mark Rutland, dwmw, benh,
	alcioa, aggh, aagch, dhr

On 26.03.20 17:29, Alexander Graf wrote:
> The swiotlb is a very convenient fallback mechanism for bounce buffering of
> DMAable data. It is usually used for the compatibility case where devices
> can only DMA to a "low region".
>
> However, in some scenarios this "low region" may be bound even more
> heavily. For example, there are embedded system where only an SRAM region
> is shared between device and CPU. There are also heterogeneous computing
> scenarios where only a subset of RAM is cache coherent between the
> components of the system. There are partitioning hypervisors, where
> a "control VM" that implements device emulation has limited view into a
> partition's memory for DMA capabilities due to safety concerns.
>
> This patch adds a command line driven mechanism to move all DMA memory into
> a predefined shared memory region which may or may not be part of the
> physical address layout of the Operating System.
>
> Ideally, the typical path to set this configuration would be through Device
> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> instead configure the system purely through kernel command line options.
>
> I'm sure other people will find the functionality useful going forward
> though and extend it to be triggered by DT/ACPI in the future.
>
> Signed-off-by: Alexander Graf <graf@amazon.com>
> ---
>   Documentation/admin-guide/kernel-parameters.txt |  3 +-
>   Documentation/x86/x86_64/boot-options.rst       |  4 ++-
>   kernel/dma/swiotlb.c                            | 46 +++++++++++++++++++++++--
>   3 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..d085d55c3cbe 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4785,11 +4785,12 @@
>   			it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
>
>   	swiotlb=	[ARM,IA-64,PPC,MIPS,X86]
> -			Format: { <int> | force | noforce }
> +			Format: { <int> | force | noforce | addr=<phys addr> }
>   			<int> -- Number of I/O TLB slabs
>   			force -- force using of bounce buffers even if they
>   			         wouldn't be automatically used by the kernel
>   			noforce -- Never use bounce buffers (for debugging)
> +			addr=<phys addr> -- Try to allocate SWIOTLB at defined address
>
>   	switches=	[HW,M68k]
>
> diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
> index 2b98efb5ba7f..ca46c57b68c9 100644
> --- a/Documentation/x86/x86_64/boot-options.rst
> +++ b/Documentation/x86/x86_64/boot-options.rst
> @@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware IOMMU:
>   iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
>   implementation:
>
> -    swiotlb=<pages>[,force]
> +    swiotlb=<pages>[,force][,addr=<phys addr>]
>         <pages>
>           Prereserve that many 128K pages for the software IO bounce buffering.
>         force
>           Force all IO through the software TLB.
> +      addr=<phys addr>
> +        Try to allocate SWIOTLB at defined address
>
>   Settings for the IBM Calgary hardware IOMMU currently found in IBM
>   pSeries and xSeries machines
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..83da0caa2f93 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -46,6 +46,7 @@
>   #include <linux/init.h>
>   #include <linux/memblock.h>
>   #include <linux/iommu-helper.h>
> +#include <linux/io.h>
>
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/swiotlb.h>
> @@ -102,6 +103,12 @@ unsigned int max_segment;
>   #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
>   static phys_addr_t *io_tlb_orig_addr;
>
> +/*
> + * The TLB phys addr may be defined on the command line. Store it here if it is.
> + */
> +static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
> +
> +
>   /*
>    * Protect the above data structures in the map and unmap calls
>    */
> @@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
>   	}
>   	if (*str == ',')
>   		++str;
> -	if (!strcmp(str, "force")) {
> +	if (!strncmp(str, "force", 5)) {
>   		swiotlb_force = SWIOTLB_FORCE;
> -	} else if (!strcmp(str, "noforce")) {
> +		str += 5;
> +	} else if (!strncmp(str, "noforce", 7)) {
>   		swiotlb_force = SWIOTLB_NO_FORCE;
>   		io_tlb_nslabs = 1;
> +		str += 7;
> +	}
> +
> +	if (*str == ',')
> +		++str;
> +	if (!strncmp(str, "addr=", 5)) {
> +		char *addrstr = str + 5;
> +
> +		io_tlb_addr = kstrtoul(addrstr, 0, &str);
> +		if (addrstr == str)
> +			io_tlb_addr = INVALID_PHYS_ADDR;
>   	}
>
>   	return 0;
> @@ -239,6 +258,25 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>   	return 0;
>   }
>
> +static int __init swiotlb_init_io(int verbose, unsigned long bytes)
> +{
> +	unsigned __iomem char *vstart;
> +
> +	if (io_tlb_addr == INVALID_PHYS_ADDR)
> +		return -EINVAL;
> +
> +	vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
> +	if (!vstart)
> +		return -EINVAL;
> +
> +	if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
> +		memunmap(vstart);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * Statically reserve bounce buffer space and initialize bounce buffer data
>    * structures for the software IO TLB used to implement the DMA API.
> @@ -257,6 +295,10 @@ swiotlb_init(int verbose)
>
>   	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>
> +	/* Map IO TLB from device memory */
> +	if (!swiotlb_init_io(verbose, bytes))
> +		return;
> +
>   	/* Get IO TLB memory from the low pages */
>   	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
>   	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
>

To make this useful also for shared-memory based devices, it should not
only have a command-line independent interface. Multi-instance support
would be needed so that you can attach swiotlb with individual address
ranges to devices that need it and leave it alone for other that do not
(e.g. passed-through guest devices).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 16:29 [PATCH] swiotlb: Allow swiotlb to live at pre-defined address Alexander Graf
                   ` (2 preceding siblings ...)
  2020-03-27  9:58 ` Jan Kiszka
@ 2020-03-28 11:57 ` Dave Young
  2020-03-30  6:06   ` Kairui Song
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Young @ 2020-03-28 11:57 UTC (permalink / raw)
  To: Alexander Graf
  Cc: iommu, Konrad Rzeszutek Wilk, x86, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linux-doc, linux-kernel,
	Mark Rutland, dwmw, benh, Jan Kiszka, alcioa, aggh, aagch, dhr,
	Laszlo Ersek, bhe, lijiang, kasong, brijesh.singh,
	thomas.lendacky, kexec

On 03/26/20 at 05:29pm, Alexander Graf wrote:
> The swiotlb is a very convenient fallback mechanism for bounce buffering of
> DMAable data. It is usually used for the compatibility case where devices
> can only DMA to a "low region".
> 
> However, in some scenarios this "low region" may be bound even more
> heavily. For example, there are embedded system where only an SRAM region
> is shared between device and CPU. There are also heterogeneous computing
> scenarios where only a subset of RAM is cache coherent between the
> components of the system. There are partitioning hypervisors, where
> a "control VM" that implements device emulation has limited view into a
> partition's memory for DMA capabilities due to safety concerns.
> 
> This patch adds a command line driven mechanism to move all DMA memory into
> a predefined shared memory region which may or may not be part of the
> physical address layout of the Operating System.
> 
> Ideally, the typical path to set this configuration would be through Device
> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> instead configure the system purely through kernel command line options.
> 
> I'm sure other people will find the functionality useful going forward
> though and extend it to be triggered by DT/ACPI in the future.

Hmm, we have a use case for kdump, this maybe useful.  For example
swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
kernel we have to increase the crashkernel reserved size for the extra
swiotlb requirement.  I wonder if we can just reuse the old kernel's
swiotlb region and pass the addr to kdump kernel.

> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +-
>  Documentation/x86/x86_64/boot-options.rst       |  4 ++-
>  kernel/dma/swiotlb.c                            | 46 +++++++++++++++++++++++--
>  3 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..d085d55c3cbe 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4785,11 +4785,12 @@
>  			it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
>  
>  	swiotlb=	[ARM,IA-64,PPC,MIPS,X86]
> -			Format: { <int> | force | noforce }
> +			Format: { <int> | force | noforce | addr=<phys addr> }
>  			<int> -- Number of I/O TLB slabs
>  			force -- force using of bounce buffers even if they
>  			         wouldn't be automatically used by the kernel
>  			noforce -- Never use bounce buffers (for debugging)
> +			addr=<phys addr> -- Try to allocate SWIOTLB at defined address
>  
>  	switches=	[HW,M68k]
>  
> diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
> index 2b98efb5ba7f..ca46c57b68c9 100644
> --- a/Documentation/x86/x86_64/boot-options.rst
> +++ b/Documentation/x86/x86_64/boot-options.rst
> @@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware IOMMU:
>  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
>  implementation:
>  
> -    swiotlb=<pages>[,force]
> +    swiotlb=<pages>[,force][,addr=<phys addr>]
>        <pages>
>          Prereserve that many 128K pages for the software IO bounce buffering.
>        force
>          Force all IO through the software TLB.
> +      addr=<phys addr>
> +        Try to allocate SWIOTLB at defined address
>  
>  Settings for the IBM Calgary hardware IOMMU currently found in IBM
>  pSeries and xSeries machines
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..83da0caa2f93 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -46,6 +46,7 @@
>  #include <linux/init.h>
>  #include <linux/memblock.h>
>  #include <linux/iommu-helper.h>
> +#include <linux/io.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/swiotlb.h>
> @@ -102,6 +103,12 @@ unsigned int max_segment;
>  #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
>  static phys_addr_t *io_tlb_orig_addr;
>  
> +/*
> + * The TLB phys addr may be defined on the command line. Store it here if it is.
> + */
> +static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
> +
> +
>  /*
>   * Protect the above data structures in the map and unmap calls
>   */
> @@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
>  	}
>  	if (*str == ',')
>  		++str;
> -	if (!strcmp(str, "force")) {
> +	if (!strncmp(str, "force", 5)) {
>  		swiotlb_force = SWIOTLB_FORCE;
> -	} else if (!strcmp(str, "noforce")) {
> +		str += 5;
> +	} else if (!strncmp(str, "noforce", 7)) {
>  		swiotlb_force = SWIOTLB_NO_FORCE;
>  		io_tlb_nslabs = 1;
> +		str += 7;
> +	}
> +
> +	if (*str == ',')
> +		++str;
> +	if (!strncmp(str, "addr=", 5)) {
> +		char *addrstr = str + 5;
> +
> +		io_tlb_addr = kstrtoul(addrstr, 0, &str);
> +		if (addrstr == str)
> +			io_tlb_addr = INVALID_PHYS_ADDR;
>  	}
>  
>  	return 0;
> @@ -239,6 +258,25 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>  	return 0;
>  }
>  
> +static int __init swiotlb_init_io(int verbose, unsigned long bytes)
> +{
> +	unsigned __iomem char *vstart;
> +
> +	if (io_tlb_addr == INVALID_PHYS_ADDR)
> +		return -EINVAL;
> +
> +	vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
> +	if (!vstart)
> +		return -EINVAL;
> +
> +	if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
> +		memunmap(vstart);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
> @@ -257,6 +295,10 @@ swiotlb_init(int verbose)
>  
>  	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>  
> +	/* Map IO TLB from device memory */
> +	if (!swiotlb_init_io(verbose, bytes))
> +		return;
> +
>  	/* Get IO TLB memory from the low pages */
>  	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
>  	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
> -- 
> 2.16.4
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
> 

Thanks
Dave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-28 11:57 ` Dave Young
@ 2020-03-30  6:06   ` Kairui Song
  2020-03-30 13:40     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 15+ messages in thread
From: Kairui Song @ 2020-03-30  6:06 UTC (permalink / raw)
  To: Dave Young
  Cc: Alexander Graf, iommu, Konrad Rzeszutek Wilk,
	the arch/x86 maintainers, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, linux-doc, Linux Kernel Mailing List, Mark Rutland,
	dwmw, benh, Jan Kiszka, alcioa, aggh, aagch, dhr, Laszlo Ersek,
	Baoquan He, Lianbo Jiang, brijesh.singh, Lendacky, Thomas, kexec

On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 03/26/20 at 05:29pm, Alexander Graf wrote:
> > The swiotlb is a very convenient fallback mechanism for bounce buffering of
> > DMAable data. It is usually used for the compatibility case where devices
> > can only DMA to a "low region".
> >
> > However, in some scenarios this "low region" may be bound even more
> > heavily. For example, there are embedded system where only an SRAM region
> > is shared between device and CPU. There are also heterogeneous computing
> > scenarios where only a subset of RAM is cache coherent between the
> > components of the system. There are partitioning hypervisors, where
> > a "control VM" that implements device emulation has limited view into a
> > partition's memory for DMA capabilities due to safety concerns.
> >
> > This patch adds a command line driven mechanism to move all DMA memory into
> > a predefined shared memory region which may or may not be part of the
> > physical address layout of the Operating System.
> >
> > Ideally, the typical path to set this configuration would be through Device
> > Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> > in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> > instead configure the system purely through kernel command line options.
> >
> > I'm sure other people will find the functionality useful going forward
> > though and extend it to be triggered by DT/ACPI in the future.
>
> Hmm, we have a use case for kdump, this maybe useful.  For example
> swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
> kernel we have to increase the crashkernel reserved size for the extra
> swiotlb requirement.  I wonder if we can just reuse the old kernel's
> swiotlb region and pass the addr to kdump kernel.
>

Yes, definitely helpful for kdump kernel. This can help reduce the
crashkernel value.

Previously I was thinking about something similar, play around the
e820 entry passed to kdump and let it place swiotlb in wanted region.
Simply remap it like in this patch looks much cleaner.

If this patch is acceptable, one more patch is needed to expose the
swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
second kernel.

> >
> > Signed-off-by: Alexander Graf <graf@amazon.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  3 +-
> >  Documentation/x86/x86_64/boot-options.rst       |  4 ++-
> >  kernel/dma/swiotlb.c                            | 46 +++++++++++++++++++++++--
> >  3 files changed, 49 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index c07815d230bc..d085d55c3cbe 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4785,11 +4785,12 @@
> >                       it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
> >
> >       swiotlb=        [ARM,IA-64,PPC,MIPS,X86]
> > -                     Format: { <int> | force | noforce }
> > +                     Format: { <int> | force | noforce | addr=<phys addr> }
> >                       <int> -- Number of I/O TLB slabs
> >                       force -- force using of bounce buffers even if they
> >                                wouldn't be automatically used by the kernel
> >                       noforce -- Never use bounce buffers (for debugging)
> > +                     addr=<phys addr> -- Try to allocate SWIOTLB at defined address
> >
> >       switches=       [HW,M68k]
> >
> > diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
> > index 2b98efb5ba7f..ca46c57b68c9 100644
> > --- a/Documentation/x86/x86_64/boot-options.rst
> > +++ b/Documentation/x86/x86_64/boot-options.rst
> > @@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware IOMMU:
> >  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
> >  implementation:
> >
> > -    swiotlb=<pages>[,force]
> > +    swiotlb=<pages>[,force][,addr=<phys addr>]
> >        <pages>
> >          Prereserve that many 128K pages for the software IO bounce buffering.
> >        force
> >          Force all IO through the software TLB.
> > +      addr=<phys addr>
> > +        Try to allocate SWIOTLB at defined address
> >
> >  Settings for the IBM Calgary hardware IOMMU currently found in IBM
> >  pSeries and xSeries machines
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index c19379fabd20..83da0caa2f93 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/init.h>
> >  #include <linux/memblock.h>
> >  #include <linux/iommu-helper.h>
> > +#include <linux/io.h>
> >
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/swiotlb.h>
> > @@ -102,6 +103,12 @@ unsigned int max_segment;
> >  #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
> >  static phys_addr_t *io_tlb_orig_addr;
> >
> > +/*
> > + * The TLB phys addr may be defined on the command line. Store it here if it is.
> > + */
> > +static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
> > +
> > +
> >  /*
> >   * Protect the above data structures in the map and unmap calls
> >   */
> > @@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
> >       }
> >       if (*str == ',')
> >               ++str;
> > -     if (!strcmp(str, "force")) {
> > +     if (!strncmp(str, "force", 5)) {
> >               swiotlb_force = SWIOTLB_FORCE;
> > -     } else if (!strcmp(str, "noforce")) {
> > +             str += 5;
> > +     } else if (!strncmp(str, "noforce", 7)) {
> >               swiotlb_force = SWIOTLB_NO_FORCE;
> >               io_tlb_nslabs = 1;
> > +             str += 7;
> > +     }
> > +
> > +     if (*str == ',')
> > +             ++str;
> > +     if (!strncmp(str, "addr=", 5)) {
> > +             char *addrstr = str + 5;
> > +
> > +             io_tlb_addr = kstrtoul(addrstr, 0, &str);
> > +             if (addrstr == str)
> > +                     io_tlb_addr = INVALID_PHYS_ADDR;
> >       }
> >
> >       return 0;
> > @@ -239,6 +258,25 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> >       return 0;
> >  }
> >
> > +static int __init swiotlb_init_io(int verbose, unsigned long bytes)
> > +{
> > +     unsigned __iomem char *vstart;
> > +
> > +     if (io_tlb_addr == INVALID_PHYS_ADDR)
> > +             return -EINVAL;
> > +
> > +     vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
> > +     if (!vstart)
> > +             return -EINVAL;
> > +
> > +     if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
> > +             memunmap(vstart);
> > +             return -EINVAL;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  /*
> >   * Statically reserve bounce buffer space and initialize bounce buffer data
> >   * structures for the software IO TLB used to implement the DMA API.
> > @@ -257,6 +295,10 @@ swiotlb_init(int verbose)
> >
> >       bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> >
> > +     /* Map IO TLB from device memory */
> > +     if (!swiotlb_init_io(verbose, bytes))
> > +             return;
> > +
> >       /* Get IO TLB memory from the low pages */
> >       vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
> >       if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
> > --
> > 2.16.4
> >
> >
> >
> >
> > Amazon Development Center Germany GmbH
> > Krausenstr. 38
> > 10117 Berlin
> > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> > Sitz: Berlin
> > Ust-ID: DE 289 237 879
> >
> >
> >
>
> Thanks
> Dave
>


-- 
Best Regards,
Kairui Song


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-26 17:11   ` Alexander Graf
  2020-03-26 17:16     ` David Woodhouse
@ 2020-03-30 13:24     ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2020-03-30 13:24 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Christoph Hellwig, iommu, Konrad Rzeszutek Wilk, x86,
	Marek Szyprowski, Robin Murphy, linux-doc, linux-kernel, dwmw,
	benh, Jan Kiszka, alcioa, aggh, aagch, dhr

On Thu, Mar 26, 2020 at 06:11:31PM +0100, Alexander Graf wrote:
> On 26.03.20 18:05, Christoph Hellwig wrote:
> > 
> > On Thu, Mar 26, 2020 at 05:29:22PM +0100, Alexander Graf wrote:
> > > The swiotlb is a very convenient fallback mechanism for bounce buffering of
> > > DMAable data. It is usually used for the compatibility case where devices
> > > can only DMA to a "low region".
> > > 
> > > However, in some scenarios this "low region" may be bound even more
> > > heavily. For example, there are embedded system where only an SRAM region
> > > is shared between device and CPU. There are also heterogeneous computing
> > > scenarios where only a subset of RAM is cache coherent between the
> > > components of the system. There are partitioning hypervisors, where
> > > a "control VM" that implements device emulation has limited view into a
> > > partition's memory for DMA capabilities due to safety concerns.
> > > 
> > > This patch adds a command line driven mechanism to move all DMA memory into
> > > a predefined shared memory region which may or may not be part of the
> > > physical address layout of the Operating System.
> > > 
> > > Ideally, the typical path to set this configuration would be through Device
> > > Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> > > in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> > > instead configure the system purely through kernel command line options.
> > > 
> > > I'm sure other people will find the functionality useful going forward
> > > though and extend it to be triggered by DT/ACPI in the future.
> > 
> > I'm totally against hacking in a kernel parameter for this.  We'll need
> > a proper documented DT or ACPI way.
> 
> I'm with you on that sentiment, but in the environment I'm currently looking
> at, we have neither DT nor ACPI: The kernel gets purely configured via
> kernel command line. For other unenumerable artifacts on the system, such as
> virtio-mmio platform devices, that works well enough and also basically
> "hacks a kernel parameter" to specify the system layout.

On the arm64 front, you'd *have* to pass a DT to the kernel (as that's
where we get the command line from), and we *only* discover memory
from the DT or EFI memory map, so the arguments above aren't generally
applicable. You can enumerate virtio-mmio devices from DT, also.

Device-specific constraints on memory should really be described in a
per-device fashion in the FW tables so that the OS can decide how to
handle them. Just becuase one device can only access memory in a
specific 1MiB window doesn't mean all other should be forced to share
the same constraint. I think that's what Christoph was alluding to.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30  6:06   ` Kairui Song
@ 2020-03-30 13:40     ` Konrad Rzeszutek Wilk
  2020-03-30 20:42       ` Alexander Graf
  2020-03-31  1:46       ` Dave Young
  0 siblings, 2 replies; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2020-03-30 13:40 UTC (permalink / raw)
  To: Kairui Song, anthony.yznaga, Jan Setje-Eilers
  Cc: Dave Young, Alexander Graf, iommu, the arch/x86 maintainers,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Baoquan He, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec

On Mon, Mar 30, 2020 at 02:06:01PM +0800, Kairui Song wrote:
> On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
> >
> > On 03/26/20 at 05:29pm, Alexander Graf wrote:
> > > The swiotlb is a very convenient fallback mechanism for bounce buffering of
> > > DMAable data. It is usually used for the compatibility case where devices
> > > can only DMA to a "low region".
> > >
> > > However, in some scenarios this "low region" may be bound even more
> > > heavily. For example, there are embedded system where only an SRAM region
> > > is shared between device and CPU. There are also heterogeneous computing
> > > scenarios where only a subset of RAM is cache coherent between the
> > > components of the system. There are partitioning hypervisors, where
> > > a "control VM" that implements device emulation has limited view into a
> > > partition's memory for DMA capabilities due to safety concerns.
> > >
> > > This patch adds a command line driven mechanism to move all DMA memory into
> > > a predefined shared memory region which may or may not be part of the
> > > physical address layout of the Operating System.
> > >
> > > Ideally, the typical path to set this configuration would be through Device
> > > Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> > > in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> > > instead configure the system purely through kernel command line options.
> > >
> > > I'm sure other people will find the functionality useful going forward
> > > though and extend it to be triggered by DT/ACPI in the future.
> >
> > Hmm, we have a use case for kdump, this maybe useful.  For example
> > swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
> > kernel we have to increase the crashkernel reserved size for the extra
> > swiotlb requirement.  I wonder if we can just reuse the old kernel's
> > swiotlb region and pass the addr to kdump kernel.
> >
> 
> Yes, definitely helpful for kdump kernel. This can help reduce the
> crashkernel value.
> 
> Previously I was thinking about something similar, play around the
> e820 entry passed to kdump and let it place swiotlb in wanted region.
> Simply remap it like in this patch looks much cleaner.
> 
> If this patch is acceptable, one more patch is needed to expose the
> swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
> second kernel.

We seem to be passsing a lot of data to kexec.. Perhaps something
of a unified way since we seem to have a lot of things to pass - disabling
IOMMU, ACPI RSDT address, and then this.

CC-ing Anthony who is working on something - would you by any chance
have a doc on this?

Thanks!
> 
> > >
> > > Signed-off-by: Alexander Graf <graf@amazon.com>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  3 +-
> > >  Documentation/x86/x86_64/boot-options.rst       |  4 ++-
> > >  kernel/dma/swiotlb.c                            | 46 +++++++++++++++++++++++--
> > >  3 files changed, 49 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index c07815d230bc..d085d55c3cbe 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -4785,11 +4785,12 @@
> > >                       it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
> > >
> > >       swiotlb=        [ARM,IA-64,PPC,MIPS,X86]
> > > -                     Format: { <int> | force | noforce }
> > > +                     Format: { <int> | force | noforce | addr=<phys addr> }
> > >                       <int> -- Number of I/O TLB slabs
> > >                       force -- force using of bounce buffers even if they
> > >                                wouldn't be automatically used by the kernel
> > >                       noforce -- Never use bounce buffers (for debugging)
> > > +                     addr=<phys addr> -- Try to allocate SWIOTLB at defined address
> > >
> > >       switches=       [HW,M68k]
> > >
> > > diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
> > > index 2b98efb5ba7f..ca46c57b68c9 100644
> > > --- a/Documentation/x86/x86_64/boot-options.rst
> > > +++ b/Documentation/x86/x86_64/boot-options.rst
> > > @@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware IOMMU:
> > >  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
> > >  implementation:
> > >
> > > -    swiotlb=<pages>[,force]
> > > +    swiotlb=<pages>[,force][,addr=<phys addr>]
> > >        <pages>
> > >          Prereserve that many 128K pages for the software IO bounce buffering.
> > >        force
> > >          Force all IO through the software TLB.
> > > +      addr=<phys addr>
> > > +        Try to allocate SWIOTLB at defined address
> > >
> > >  Settings for the IBM Calgary hardware IOMMU currently found in IBM
> > >  pSeries and xSeries machines
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index c19379fabd20..83da0caa2f93 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -46,6 +46,7 @@
> > >  #include <linux/init.h>
> > >  #include <linux/memblock.h>
> > >  #include <linux/iommu-helper.h>
> > > +#include <linux/io.h>
> > >
> > >  #define CREATE_TRACE_POINTS
> > >  #include <trace/events/swiotlb.h>
> > > @@ -102,6 +103,12 @@ unsigned int max_segment;
> > >  #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
> > >  static phys_addr_t *io_tlb_orig_addr;
> > >
> > > +/*
> > > + * The TLB phys addr may be defined on the command line. Store it here if it is.
> > > + */
> > > +static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
> > > +
> > > +
> > >  /*
> > >   * Protect the above data structures in the map and unmap calls
> > >   */
> > > @@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
> > >       }
> > >       if (*str == ',')
> > >               ++str;
> > > -     if (!strcmp(str, "force")) {
> > > +     if (!strncmp(str, "force", 5)) {
> > >               swiotlb_force = SWIOTLB_FORCE;
> > > -     } else if (!strcmp(str, "noforce")) {
> > > +             str += 5;
> > > +     } else if (!strncmp(str, "noforce", 7)) {
> > >               swiotlb_force = SWIOTLB_NO_FORCE;
> > >               io_tlb_nslabs = 1;
> > > +             str += 7;
> > > +     }
> > > +
> > > +     if (*str == ',')
> > > +             ++str;
> > > +     if (!strncmp(str, "addr=", 5)) {
> > > +             char *addrstr = str + 5;
> > > +
> > > +             io_tlb_addr = kstrtoul(addrstr, 0, &str);
> > > +             if (addrstr == str)
> > > +                     io_tlb_addr = INVALID_PHYS_ADDR;
> > >       }
> > >
> > >       return 0;
> > > @@ -239,6 +258,25 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> > >       return 0;
> > >  }
> > >
> > > +static int __init swiotlb_init_io(int verbose, unsigned long bytes)
> > > +{
> > > +     unsigned __iomem char *vstart;
> > > +
> > > +     if (io_tlb_addr == INVALID_PHYS_ADDR)
> > > +             return -EINVAL;
> > > +
> > > +     vstart = memremap(io_tlb_addr, bytes, MEMREMAP_WB);
> > > +     if (!vstart)
> > > +             return -EINVAL;
> > > +
> > > +     if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose)) {
> > > +             memunmap(vstart);
> > > +             return -EINVAL;
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +
> > >  /*
> > >   * Statically reserve bounce buffer space and initialize bounce buffer data
> > >   * structures for the software IO TLB used to implement the DMA API.
> > > @@ -257,6 +295,10 @@ swiotlb_init(int verbose)
> > >
> > >       bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > >
> > > +     /* Map IO TLB from device memory */
> > > +     if (!swiotlb_init_io(verbose, bytes))
> > > +             return;
> > > +
> > >       /* Get IO TLB memory from the low pages */
> > >       vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
> > >       if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
> > > --
> > > 2.16.4
> > >
> > >
> > >
> > >
> > > Amazon Development Center Germany GmbH
> > > Krausenstr. 38
> > > 10117 Berlin
> > > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> > > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> > > Sitz: Berlin
> > > Ust-ID: DE 289 237 879
> > >
> > >
> > >
> >
> > Thanks
> > Dave
> >
> 
> 
> -- 
> Best Regards,
> Kairui Song
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30 13:40     ` Konrad Rzeszutek Wilk
@ 2020-03-30 20:42       ` Alexander Graf
  2020-03-30 23:37         ` Anthony Yznaga
                           ` (2 more replies)
  2020-03-31  1:46       ` Dave Young
  1 sibling, 3 replies; 15+ messages in thread
From: Alexander Graf @ 2020-03-30 20:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Kairui Song, anthony.yznaga, Jan Setje-Eilers
  Cc: Dave Young, iommu, the arch/x86 maintainers, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Baoquan He, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec, Schoenherr, Jan H.



On 30.03.20 15:40, Konrad Rzeszutek Wilk wrote:
> 
> 
> 
> On Mon, Mar 30, 2020 at 02:06:01PM +0800, Kairui Song wrote:
>> On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
>>>
>>> On 03/26/20 at 05:29pm, Alexander Graf wrote:
>>>> The swiotlb is a very convenient fallback mechanism for bounce buffering of
>>>> DMAable data. It is usually used for the compatibility case where devices
>>>> can only DMA to a "low region".
>>>>
>>>> However, in some scenarios this "low region" may be bound even more
>>>> heavily. For example, there are embedded system where only an SRAM region
>>>> is shared between device and CPU. There are also heterogeneous computing
>>>> scenarios where only a subset of RAM is cache coherent between the
>>>> components of the system. There are partitioning hypervisors, where
>>>> a "control VM" that implements device emulation has limited view into a
>>>> partition's memory for DMA capabilities due to safety concerns.
>>>>
>>>> This patch adds a command line driven mechanism to move all DMA memory into
>>>> a predefined shared memory region which may or may not be part of the
>>>> physical address layout of the Operating System.
>>>>
>>>> Ideally, the typical path to set this configuration would be through Device
>>>> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
>>>> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
>>>> instead configure the system purely through kernel command line options.
>>>>
>>>> I'm sure other people will find the functionality useful going forward
>>>> though and extend it to be triggered by DT/ACPI in the future.
>>>
>>> Hmm, we have a use case for kdump, this maybe useful.  For example
>>> swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
>>> kernel we have to increase the crashkernel reserved size for the extra
>>> swiotlb requirement.  I wonder if we can just reuse the old kernel's
>>> swiotlb region and pass the addr to kdump kernel.
>>>
>>
>> Yes, definitely helpful for kdump kernel. This can help reduce the
>> crashkernel value.
>>
>> Previously I was thinking about something similar, play around the
>> e820 entry passed to kdump and let it place swiotlb in wanted region.
>> Simply remap it like in this patch looks much cleaner.
>>
>> If this patch is acceptable, one more patch is needed to expose the
>> swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
>> second kernel.
> 
> We seem to be passsing a lot of data to kexec.. Perhaps something
> of a unified way since we seem to have a lot of things to pass - disabling
> IOMMU, ACPI RSDT address, and then this.
> 
> CC-ing Anthony who is working on something - would you by any chance
> have a doc on this?


I see in general 2 use cases here:


1) Allow for a generic mechanism to have the fully system, individual 
buses, devices or functions of a device go through a particular, 
self-contained bounce buffer.

This sounds like the holy grail to a lot of problems. It would solve 
typical embedded scenarios where you only have a shared SRAM. It solves 
the safety case (to some extent) where you need to ensure that one 
device interaction doesn't conflict with another device interaction. It 
also solves the problem I've tried to solve with the patch here.

It's unfortunately a lot harder than the patch I sent, so it will take 
me some time to come up with a working patch set.. I suppose starting 
with a DT binding only is sensible. Worst case, x86 does also support DT ...

(And yes, I'm always happy to review patches if someone else beats me to it)


2) Reuse the SWIOTLB from the previous boot on kexec/kdump

I see little direct relation to SEV here. The only reason SEV makes it 
more relevant, is that you need to have an SWIOTLB region available with 
SEV while without you could live with a disabled IOMMU.

However, I can definitely understand how you would want to have a way to 
tell the new kexec'ed kernel where the old SWIOTLB was, so it can reuse 
its memory for its own SWIOTLB. That way, you don't have to reserve 
another 64MB of RAM for kdump.

What I'm curious on is whether we need to be as elaborate. Can't we just 
pass the old SWIOTLB as free memory to the new kexec'ed kernel and 
everything else will fall into place? All that would take is a bit of 
shuffling on the e820 table pass-through to the kexec'ed kernel, no?


Thanks,

Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30 20:42       ` Alexander Graf
@ 2020-03-30 23:37         ` Anthony Yznaga
  2020-03-31  1:59         ` Dave Young
  2020-03-31  2:16         ` Baoquan He
  2 siblings, 0 replies; 15+ messages in thread
From: Anthony Yznaga @ 2020-03-30 23:37 UTC (permalink / raw)
  To: Alexander Graf, Konrad Rzeszutek Wilk, Kairui Song, Jan Setje-Eilers
  Cc: Dave Young, iommu, the arch/x86 maintainers, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Baoquan He, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec, Schoenherr, Jan H.



On 3/30/20 1:42 PM, Alexander Graf wrote:
>
>
> On 30.03.20 15:40, Konrad Rzeszutek Wilk wrote:
>>
>>
>>
>> On Mon, Mar 30, 2020 at 02:06:01PM +0800, Kairui Song wrote:
>>> On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
>>>>
>>>> On 03/26/20 at 05:29pm, Alexander Graf wrote:
>>>>> The swiotlb is a very convenient fallback mechanism for bounce buffering of
>>>>> DMAable data. It is usually used for the compatibility case where devices
>>>>> can only DMA to a "low region".
>>>>>
>>>>> However, in some scenarios this "low region" may be bound even more
>>>>> heavily. For example, there are embedded system where only an SRAM region
>>>>> is shared between device and CPU. There are also heterogeneous computing
>>>>> scenarios where only a subset of RAM is cache coherent between the
>>>>> components of the system. There are partitioning hypervisors, where
>>>>> a "control VM" that implements device emulation has limited view into a
>>>>> partition's memory for DMA capabilities due to safety concerns.
>>>>>
>>>>> This patch adds a command line driven mechanism to move all DMA memory into
>>>>> a predefined shared memory region which may or may not be part of the
>>>>> physical address layout of the Operating System.
>>>>>
>>>>> Ideally, the typical path to set this configuration would be through Device
>>>>> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
>>>>> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
>>>>> instead configure the system purely through kernel command line options.
>>>>>
>>>>> I'm sure other people will find the functionality useful going forward
>>>>> though and extend it to be triggered by DT/ACPI in the future.
>>>>
>>>> Hmm, we have a use case for kdump, this maybe useful.  For example
>>>> swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
>>>> kernel we have to increase the crashkernel reserved size for the extra
>>>> swiotlb requirement.  I wonder if we can just reuse the old kernel's
>>>> swiotlb region and pass the addr to kdump kernel.
>>>>
>>>
>>> Yes, definitely helpful for kdump kernel. This can help reduce the
>>> crashkernel value.
>>>
>>> Previously I was thinking about something similar, play around the
>>> e820 entry passed to kdump and let it place swiotlb in wanted region.
>>> Simply remap it like in this patch looks much cleaner.
>>>
>>> If this patch is acceptable, one more patch is needed to expose the
>>> swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
>>> second kernel.
>>
>> We seem to be passsing a lot of data to kexec.. Perhaps something
>> of a unified way since we seem to have a lot of things to pass - disabling
>> IOMMU, ACPI RSDT address, and then this.
>>
>> CC-ing Anthony who is working on something - would you by any chance
>> have a doc on this?
>
>
> I see in general 2 use cases here:
>
>
> 1) Allow for a generic mechanism to have the fully system, individual buses, devices or functions of a device go through a particular, self-contained bounce buffer.
>
> This sounds like the holy grail to a lot of problems. It would solve typical embedded scenarios where you only have a shared SRAM. It solves the safety case (to some extent) where you need to ensure that one device interaction doesn't conflict with another device interaction. It also solves the problem I've tried to solve with the patch here.
>
> It's unfortunately a lot harder than the patch I sent, so it will take me some time to come up with a working patch set.. I suppose starting with a DT binding only is sensible. Worst case, x86 does also support DT ...
>
> (And yes, I'm always happy to review patches if someone else beats me to it)

Not precisely what is described here, but I am working on a somewhat generic mechanism for preserving pages across kexec based on this old RFC patch set: https://lkml.org/lkml/2013/7/1/211.  I expect to post patches soon.

Anthony

>
>
> 2) Reuse the SWIOTLB from the previous boot on kexec/kdump
>
> I see little direct relation to SEV here. The only reason SEV makes it more relevant, is that you need to have an SWIOTLB region available with SEV while without you could live with a disabled IOMMU.
>
> However, I can definitely understand how you would want to have a way to tell the new kexec'ed kernel where the old SWIOTLB was, so it can reuse its memory for its own SWIOTLB. That way, you don't have to reserve another 64MB of RAM for kdump.
>
> What I'm curious on is whether we need to be as elaborate. Can't we just pass the old SWIOTLB as free memory to the new kexec'ed kernel and everything else will fall into place? All that would take is a bit of shuffling on the e820 table pass-through to the kexec'ed kernel, no?
>
>
> Thanks,
>
> Alex
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30 13:40     ` Konrad Rzeszutek Wilk
  2020-03-30 20:42       ` Alexander Graf
@ 2020-03-31  1:46       ` Dave Young
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Young @ 2020-03-31  1:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kairui Song, anthony.yznaga, Jan Setje-Eilers, Alexander Graf,
	iommu, the arch/x86 maintainers, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Baoquan He, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec

On 03/30/20 at 09:40am, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 30, 2020 at 02:06:01PM +0800, Kairui Song wrote:
> > On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
> > >
> > > On 03/26/20 at 05:29pm, Alexander Graf wrote:
> > > > The swiotlb is a very convenient fallback mechanism for bounce buffering of
> > > > DMAable data. It is usually used for the compatibility case where devices
> > > > can only DMA to a "low region".
> > > >
> > > > However, in some scenarios this "low region" may be bound even more
> > > > heavily. For example, there are embedded system where only an SRAM region
> > > > is shared between device and CPU. There are also heterogeneous computing
> > > > scenarios where only a subset of RAM is cache coherent between the
> > > > components of the system. There are partitioning hypervisors, where
> > > > a "control VM" that implements device emulation has limited view into a
> > > > partition's memory for DMA capabilities due to safety concerns.
> > > >
> > > > This patch adds a command line driven mechanism to move all DMA memory into
> > > > a predefined shared memory region which may or may not be part of the
> > > > physical address layout of the Operating System.
> > > >
> > > > Ideally, the typical path to set this configuration would be through Device
> > > > Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> > > > in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> > > > instead configure the system purely through kernel command line options.
> > > >
> > > > I'm sure other people will find the functionality useful going forward
> > > > though and extend it to be triggered by DT/ACPI in the future.
> > >
> > > Hmm, we have a use case for kdump, this maybe useful.  For example
> > > swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
> > > kernel we have to increase the crashkernel reserved size for the extra
> > > swiotlb requirement.  I wonder if we can just reuse the old kernel's
> > > swiotlb region and pass the addr to kdump kernel.
> > >
> > 
> > Yes, definitely helpful for kdump kernel. This can help reduce the
> > crashkernel value.
> > 
> > Previously I was thinking about something similar, play around the
> > e820 entry passed to kdump and let it place swiotlb in wanted region.
> > Simply remap it like in this patch looks much cleaner.
> > 
> > If this patch is acceptable, one more patch is needed to expose the
> > swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
> > second kernel.
> 
> We seem to be passsing a lot of data to kexec.. Perhaps something
> of a unified way since we seem to have a lot of things to pass - disabling
> IOMMU, ACPI RSDT address, and then this.

acpi_rsdp kernel cmdline is not useful anymore.  The initial purpose is
for kexec-rebooting in efi system.  But now we have supported efi boot
across kexec reboot thus that is useless now.

Thanks
Dave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30 20:42       ` Alexander Graf
  2020-03-30 23:37         ` Anthony Yznaga
@ 2020-03-31  1:59         ` Dave Young
  2020-03-31  2:16         ` Baoquan He
  2 siblings, 0 replies; 15+ messages in thread
From: Dave Young @ 2020-03-31  1:59 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Konrad Rzeszutek Wilk, Kairui Song, anthony.yznaga,
	Jan Setje-Eilers, iommu, the arch/x86 maintainers,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Baoquan He, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec, Schoenherr, Jan H.

Hi,

[snip]
> 2) Reuse the SWIOTLB from the previous boot on kexec/kdump

We should only care about kdump

> 
> I see little direct relation to SEV here. The only reason SEV makes it more
> relevant, is that you need to have an SWIOTLB region available with SEV
> while without you could live with a disabled IOMMU.


Here is some comment in arch/x86/kernel/pci-swiotlb.c, it is enforced
for some reason.
        /*
         * If SME is active then swiotlb will be set to 1 so that bounce
         * buffers are allocated and used for devices that do not support
         * the addressing range required for the encryption mask.
         */
        if (sme_active())
                swiotlb = 1;

> 
> However, I can definitely understand how you would want to have a way to
> tell the new kexec'ed kernel where the old SWIOTLB was, so it can reuse its
> memory for its own SWIOTLB. That way, you don't have to reserve another 64MB
> of RAM for kdump.
> 
> What I'm curious on is whether we need to be as elaborate. Can't we just
> pass the old SWIOTLB as free memory to the new kexec'ed kernel and
> everything else will fall into place? All that would take is a bit of
> shuffling on the e820 table pass-through to the kexec'ed kernel, no?

Maybe either of the two is fine.  But we may need ensure these swiotlb
area to be reused explictly in some way.  Say about the crashkernel=X,high case,
major part is in above 4G region, and a small piece in low memory. Then
when kernel booting, kernel/driver initialization could use out of the
low memory, and the remain part for swiotlb could be not big enough and
finally swiotlb allocation fails. 

Thanks
Dave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
  2020-03-30 20:42       ` Alexander Graf
  2020-03-30 23:37         ` Anthony Yznaga
  2020-03-31  1:59         ` Dave Young
@ 2020-03-31  2:16         ` Baoquan He
  2 siblings, 0 replies; 15+ messages in thread
From: Baoquan He @ 2020-03-31  2:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Konrad Rzeszutek Wilk, Kairui Song, anthony.yznaga,
	Jan Setje-Eilers, Dave Young, iommu, the arch/x86 maintainers,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linux-doc,
	Linux Kernel Mailing List, Mark Rutland, dwmw, benh, Jan Kiszka,
	alcioa, aggh, aagch, dhr, Laszlo Ersek, Lianbo Jiang,
	brijesh.singh, Lendacky, Thomas, kexec, Schoenherr, Jan H.

On 03/30/20 at 10:42pm, Alexander Graf wrote:
> 
> 
> On 30.03.20 15:40, Konrad Rzeszutek Wilk wrote:
> > 
> > 
> > 
> > On Mon, Mar 30, 2020 at 02:06:01PM +0800, Kairui Song wrote:
> > > On Sat, Mar 28, 2020 at 7:57 PM Dave Young <dyoung@redhat.com> wrote:
> > > > 
> > > > On 03/26/20 at 05:29pm, Alexander Graf wrote:
> > > > > The swiotlb is a very convenient fallback mechanism for bounce buffering of
> > > > > DMAable data. It is usually used for the compatibility case where devices
> > > > > can only DMA to a "low region".
> > > > > 
> > > > > However, in some scenarios this "low region" may be bound even more
> > > > > heavily. For example, there are embedded system where only an SRAM region
> > > > > is shared between device and CPU. There are also heterogeneous computing
> > > > > scenarios where only a subset of RAM is cache coherent between the
> > > > > components of the system. There are partitioning hypervisors, where
> > > > > a "control VM" that implements device emulation has limited view into a
> > > > > partition's memory for DMA capabilities due to safety concerns.
> > > > > 
> > > > > This patch adds a command line driven mechanism to move all DMA memory into
> > > > > a predefined shared memory region which may or may not be part of the
> > > > > physical address layout of the Operating System.
> > > > > 
> > > > > Ideally, the typical path to set this configuration would be through Device
> > > > > Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> > > > > in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> > > > > instead configure the system purely through kernel command line options.
> > > > > 
> > > > > I'm sure other people will find the functionality useful going forward
> > > > > though and extend it to be triggered by DT/ACPI in the future.
> > > > 
> > > > Hmm, we have a use case for kdump, this maybe useful.  For example
> > > > swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
> > > > kernel we have to increase the crashkernel reserved size for the extra
> > > > swiotlb requirement.  I wonder if we can just reuse the old kernel's
> > > > swiotlb region and pass the addr to kdump kernel.
> > > > 
> > > 
> > > Yes, definitely helpful for kdump kernel. This can help reduce the
> > > crashkernel value.
> > > 
> > > Previously I was thinking about something similar, play around the
> > > e820 entry passed to kdump and let it place swiotlb in wanted region.
> > > Simply remap it like in this patch looks much cleaner.
> > > 
> > > If this patch is acceptable, one more patch is needed to expose the
> > > swiotlb in iomem, so kexec-tools can pass the right kernel cmdline to
> > > second kernel.
> > 
> > We seem to be passsing a lot of data to kexec.. Perhaps something
> > of a unified way since we seem to have a lot of things to pass - disabling
> > IOMMU, ACPI RSDT address, and then this.
> > 
> > CC-ing Anthony who is working on something - would you by any chance
> > have a doc on this?
> 
> 
> I see in general 2 use cases here:
> 
> 
> 1) Allow for a generic mechanism to have the fully system, individual buses,
> devices or functions of a device go through a particular, self-contained
> bounce buffer.
> 
> This sounds like the holy grail to a lot of problems. It would solve typical
> embedded scenarios where you only have a shared SRAM. It solves the safety
> case (to some extent) where you need to ensure that one device interaction
> doesn't conflict with another device interaction. It also solves the problem
> I've tried to solve with the patch here.
> 
> It's unfortunately a lot harder than the patch I sent, so it will take me
> some time to come up with a working patch set.. I suppose starting with a DT
> binding only is sensible. Worst case, x86 does also support DT ...
> 
> (And yes, I'm always happy to review patches if someone else beats me to it)
> 
> 
> 2) Reuse the SWIOTLB from the previous boot on kexec/kdump
> 
> I see little direct relation to SEV here. The only reason SEV makes it more
> relevant, is that you need to have an SWIOTLB region available with SEV
> while without you could live with a disabled IOMMU.
> 
> However, I can definitely understand how you would want to have a way to
> tell the new kexec'ed kernel where the old SWIOTLB was, so it can reuse its
> memory for its own SWIOTLB. That way, you don't have to reserve another 64MB
> of RAM for kdump.
> 
> What I'm curious on is whether we need to be as elaborate. Can't we just
> pass the old SWIOTLB as free memory to the new kexec'ed kernel and
> everything else will fall into place? All that would take is a bit of
> shuffling on the e820 table pass-through to the kexec'ed kernel, no?

Swiotlb memory have to be continuous. We can't guarantee that region
won't be touched by kernel allocation before swiotlb init. Then we may
not have chance to get a continuous region of memory block again for
swiotlb. This is our main concern when reusing swiotlb for kdump.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-31  2:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26 16:29 [PATCH] swiotlb: Allow swiotlb to live at pre-defined address Alexander Graf
2020-03-26 17:05 ` Christoph Hellwig
2020-03-26 17:11   ` Alexander Graf
2020-03-26 17:16     ` David Woodhouse
2020-03-30 13:24     ` Mark Rutland
2020-03-27  6:05 ` kbuild test robot
2020-03-27  9:58 ` Jan Kiszka
2020-03-28 11:57 ` Dave Young
2020-03-30  6:06   ` Kairui Song
2020-03-30 13:40     ` Konrad Rzeszutek Wilk
2020-03-30 20:42       ` Alexander Graf
2020-03-30 23:37         ` Anthony Yznaga
2020-03-31  1:59         ` Dave Young
2020-03-31  2:16         ` Baoquan He
2020-03-31  1:46       ` Dave Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).