linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] swiotlb: Split up single swiotlb lock
@ 2022-06-27 15:31 Tianyu Lan
  2022-06-27 15:31 ` [PATCH 1/2] " Tianyu Lan
  2022-06-27 15:31 ` [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT Tianyu Lan
  0 siblings, 2 replies; 10+ messages in thread
From: Tianyu Lan @ 2022-06-27 15:31 UTC (permalink / raw)
  To: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, hch, m.szyprowski, robin.murphy, paulmck, akpm,
	keescook, songmuchun, rdunlap, damien.lemoal, michael.h.kelley,
	kys
  Cc: Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

Patch 1 is to introduce swiotlb area concept and split up single swiotlb
lock.
Patch 2 set swiotlb area number with lapic number


Tianyu Lan (2):
  swiotlb: Split up single swiotlb lock
  x86/ACPI: Set swiotlb area according to the number of lapic entry in
    MADT

 .../admin-guide/kernel-parameters.txt         |   4 +-
 arch/x86/kernel/acpi/boot.c                   |   3 +
 include/linux/swiotlb.h                       |  27 +++
 kernel/dma/swiotlb.c                          | 202 ++++++++++++++----
 4 files changed, 197 insertions(+), 39 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] swiotlb: Split up single swiotlb lock
  2022-06-27 15:31 [PATCH 0/2] swiotlb: Split up single swiotlb lock Tianyu Lan
@ 2022-06-27 15:31 ` Tianyu Lan
  2022-06-29 14:09   ` Christoph Hellwig
  2022-06-27 15:31 ` [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT Tianyu Lan
  1 sibling, 1 reply; 10+ messages in thread
From: Tianyu Lan @ 2022-06-27 15:31 UTC (permalink / raw)
  To: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, hch, m.szyprowski, robin.murphy, paulmck, akpm,
	keescook, songmuchun, rdunlap, damien.lemoal, michael.h.kelley,
	kys
  Cc: Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 .../admin-guide/kernel-parameters.txt         |   4 +-
 include/linux/swiotlb.h                       |  27 +++
 kernel/dma/swiotlb.c                          | 202 ++++++++++++++----
 3 files changed, 194 insertions(+), 39 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
 			it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
 
 	swiotlb=	[ARM,IA-64,PPC,MIPS,X86]
-			Format: { <int> | force | noforce }
+			Format: { <int> [,<int>] | force | noforce }
 			<int> -- Number of I/O TLB slabs
+			<int> -- Second integer after comma. Number of swiotlb
+				 areas with their own lock. Must be power of 2.
 			force -- force using of bounce buffers even if they
 			         wouldn't be automatically used by the kernel
 			noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..7157428cf3ac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:	The number of used IO TLB block.
+ * @index:	The slot index to start searching in this area for next round.
+ * @lock:	The lock to protect the above data structures in the map and
+ *		unmap calls.
+ */
+struct io_tlb_area {
+	unsigned long used;
+	unsigned int index;
+	spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:	%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
 	phys_addr_t start;
@@ -102,6 +120,9 @@ struct io_tlb_mem {
 	bool late_alloc;
 	bool force_bounce;
 	bool for_alloc;
+	unsigned int nareas;
+	unsigned int area_nslabs;
+	struct io_tlb_area *areas;
 	struct io_tlb_slot {
 		phys_addr_t orig_addr;
 		size_t alloc_size;
@@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+void __init swiotlb_adjust_nareas(unsigned int nareas);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline void swiotlb_adjust_nareas(unsigned int nareas)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..17154abdfb34 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,7 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = 1;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +80,10 @@ setup_io_tlb_npages(char *str)
 		default_nslabs =
 			ALIGN(simple_strtoul(str, &str, 0), IO_TLB_SEGSIZE);
 	}
+	if (*str == ',')
+		++str;
+	if (isdigit(*str))
+		swiotlb_adjust_nareas(simple_strtoul(str, &str, 0));
 	if (*str == ',')
 		++str;
 	if (!strcmp(str, "force"))
@@ -103,6 +108,28 @@ unsigned long swiotlb_size_or_default(void)
 	return default_nslabs << IO_TLB_SHIFT;
 }
 
+void __init swiotlb_adjust_nareas(unsigned int nareas)
+{
+	if (!is_power_of_2(nareas)) {
+		pr_err("swiotlb: Invalid areas parameter %d.\n", nareas);
+		return;
+	}
+
+	default_nareas = nareas;
+
+	pr_info("area num %d.\n", nareas);
+	/*
+	 * Round up number of slabs to the next power of 2.
+	 * The last area is going be smaller than the rest if
+	 * default_nslabs is not power of two.
+	 */
+	if (nareas > 1) {
+		default_nslabs = roundup_pow_of_two(default_nslabs);
+		pr_info("SWIOTLB bounce buffer size roundup to %luMB",
+			(default_nslabs << IO_TLB_SHIFT) >> 20);
+	}
+}
+
 void __init swiotlb_adjust_size(unsigned long size)
 {
 	/*
@@ -112,8 +139,19 @@ void __init swiotlb_adjust_size(unsigned long size)
 	 */
 	if (default_nslabs != IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT)
 		return;
+
+	/*
+	 * Round up number of slabs to the next power of 2.
+	 * The last area is going be smaller than the rest if
+	 * default_nslabs is not power of two.
+	 */
 	size = ALIGN(size, IO_TLB_SIZE);
 	default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
+	if (default_nareas > 1) {
+		default_nslabs = roundup_pow_of_two(default_nslabs);
+		size = default_nslabs << IO_TLB_SHIFT;
+	}
+
 	pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
 }
 
@@ -192,7 +230,8 @@ void __init swiotlb_update_mem_attributes(void)
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
-		unsigned long nslabs, unsigned int flags, bool late_alloc)
+				    unsigned long nslabs, unsigned int flags,
+				    bool late_alloc, unsigned int nareas)
 {
 	void *vaddr = phys_to_virt(start);
 	unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
@@ -202,10 +241,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 	mem->end = mem->start + bytes;
 	mem->index = 0;
 	mem->late_alloc = late_alloc;
+	mem->nareas = nareas;
+	mem->area_nslabs = nslabs / mem->nareas;
 
 	mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
 	spin_lock_init(&mem->lock);
+	for (i = 0; i < mem->nareas; i++) {
+		spin_lock_init(&mem->areas[i].lock);
+		mem->areas[i].index = 0;
+	}
+
 	for (i = 0; i < mem->nslabs; i++) {
 		mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
@@ -274,7 +320,13 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false);
+	mem->areas = memblock_alloc(sizeof(struct io_tlb_area) *
+			    default_nareas, SMP_CACHE_BYTES);
+	if (!mem->areas)
+		panic("%s: Failed to allocate mem->areas.\n", __func__);
+
+	swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags,
+				false, default_nareas);
 
 	if (flags & SWIOTLB_VERBOSE)
 		swiotlb_print_info();
@@ -296,7 +348,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	struct io_tlb_mem *mem = &io_tlb_default_mem;
 	unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
 	unsigned char *vstart = NULL;
-	unsigned int order;
+	unsigned int order, area_order;
 	bool retried = false;
 	int rc = 0;
 
@@ -337,19 +389,31 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 			(PAGE_SIZE << order) >> 20);
 	}
 
+	area_order = get_order(array_size(sizeof(*mem->areas),
+		default_nareas));
+	mem->areas = (struct io_tlb_area *)
+		__get_free_pages(GFP_KERNEL | __GFP_ZERO, area_order);
+	if (!mem->areas)
+		goto error_area;
+
 	mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
 		get_order(array_size(sizeof(*mem->slots), nslabs)));
-	if (!mem->slots) {
-		free_pages((unsigned long)vstart, order);
-		return -ENOMEM;
-	}
+	if (!mem->slots)
+		goto error_slots;
 
 	set_memory_decrypted((unsigned long)vstart,
 			     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
-	swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true);
+	swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true,
+				default_nareas);
 
 	swiotlb_print_info();
 	return 0;
+
+error_slots:
+	free_pages((unsigned long)mem->areas, area_order);
+error_area:
+	free_pages((unsigned long)vstart, order);
+	return -ENOMEM;
 }
 
 void __init swiotlb_exit(void)
@@ -357,6 +421,7 @@ void __init swiotlb_exit(void)
 	struct io_tlb_mem *mem = &io_tlb_default_mem;
 	unsigned long tbl_vaddr;
 	size_t tbl_size, slots_size;
+	unsigned int area_order;
 
 	if (swiotlb_force_bounce)
 		return;
@@ -371,9 +436,14 @@ void __init swiotlb_exit(void)
 
 	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
 	if (mem->late_alloc) {
+		area_order = get_order(array_size(sizeof(*mem->areas),
+			mem->nareas));
+		free_pages((unsigned long)mem->areas, area_order);
 		free_pages(tbl_vaddr, get_order(tbl_size));
 		free_pages((unsigned long)mem->slots, get_order(slots_size));
 	} else {
+		memblock_free_late(__pa(mem->areas),
+				   mem->nareas * sizeof(struct io_tlb_area));
 		memblock_free_late(mem->start, tbl_size);
 		memblock_free_late(__pa(mem->slots), slots_size);
 	}
@@ -476,9 +546,9 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
 	return nr_slots(boundary_mask + 1);
 }
 
-static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
+static unsigned int wrap_area_index(struct io_tlb_mem *mem, unsigned int index)
 {
-	if (index >= mem->nslabs)
+	if (index >= mem->area_nslabs)
 		return 0;
 	return index;
 }
@@ -487,10 +557,13 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
-			      size_t alloc_size, unsigned int alloc_align_mask)
+static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
+				 struct io_tlb_area *area,
+				 int area_index,
+				 struct device *dev, phys_addr_t orig_addr,
+				 size_t alloc_size,
+				 unsigned int alloc_align_mask)
 {
-	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 	unsigned long boundary_mask = dma_get_seg_boundary(dev);
 	dma_addr_t tbl_dma_addr =
 		phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -501,8 +574,11 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
 	unsigned int index, wrap, count = 0, i;
 	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned long flags;
+	unsigned int slot_base;
+	unsigned int slot_index;
 
 	BUG_ON(!nslots);
+	BUG_ON(area_index >= mem->nareas);
 
 	/*
 	 * For mappings with an alignment requirement don't bother looping to
@@ -514,16 +590,20 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
 		stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
 	stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
 
-	spin_lock_irqsave(&mem->lock, flags);
-	if (unlikely(nslots > mem->nslabs - mem->used))
+	spin_lock_irqsave(&area->lock, flags);
+	if (unlikely(nslots > mem->area_nslabs - area->used))
 		goto not_found;
 
-	index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
+	slot_base = area_index * mem->area_nslabs;
+	index = wrap = wrap_area_index(mem, ALIGN(area->index, stride));
+
 	do {
+		slot_index = slot_base + index;
+
 		if (orig_addr &&
-		    (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
-			    (orig_addr & iotlb_align_mask)) {
-			index = wrap_index(mem, index + 1);
+		    (slot_addr(tbl_dma_addr, slot_index) &
+		     iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
+			index = wrap_area_index(mem, index + 1);
 			continue;
 		}
 
@@ -532,26 +612,26 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (!iommu_is_span_boundary(index, nslots,
+		if (!iommu_is_span_boundary(slot_index, nslots,
 					    nr_slots(tbl_dma_addr),
 					    max_slots)) {
-			if (mem->slots[index].list >= nslots)
+			if (mem->slots[slot_index].list >= nslots)
 				goto found;
 		}
-		index = wrap_index(mem, index + stride);
+		index = wrap_area_index(mem, index + stride);
 	} while (index != wrap);
 
 not_found:
-	spin_unlock_irqrestore(&mem->lock, flags);
+	spin_unlock_irqrestore(&area->lock, flags);
 	return -1;
 
 found:
-	for (i = index; i < index + nslots; i++) {
+	for (i = slot_index; i < slot_index + nslots; i++) {
 		mem->slots[i].list = 0;
 		mem->slots[i].alloc_size =
-			alloc_size - (offset + ((i - index) << IO_TLB_SHIFT));
+			alloc_size - (offset + ((i - slot_index) << IO_TLB_SHIFT));
 	}
-	for (i = index - 1;
+	for (i = slot_index - 1;
 	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 	     mem->slots[i].list; i--)
 		mem->slots[i].list = ++count;
@@ -559,14 +639,43 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
 	/*
 	 * Update the indices to avoid searching in the next round.
 	 */
-	if (index + nslots < mem->nslabs)
-		mem->index = index + nslots;
+	if (index + nslots < mem->area_nslabs)
+		area->index = index + nslots;
 	else
-		mem->index = 0;
-	mem->used += nslots;
+		area->index = 0;
+	area->used += nslots;
+	spin_unlock_irqrestore(&area->lock, flags);
+	return slot_index;
+}
 
-	spin_unlock_irqrestore(&mem->lock, flags);
-	return index;
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+			      size_t alloc_size, unsigned int alloc_align_mask)
+{
+	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+	int start = raw_smp_processor_id() & ((1U << __fls(mem->nareas)) - 1);
+	int i = start, index;
+
+	do {
+		index = swiotlb_do_find_slots(mem, mem->areas + i, i,
+					      dev, orig_addr, alloc_size,
+					      alloc_align_mask);
+		if (index >= 0)
+			return index;
+		if (++i >= mem->nareas)
+			i = 0;
+	} while (i != start);
+
+	return -1;
+}
+
+static unsigned long mem_used(struct io_tlb_mem *mem)
+{
+	int i;
+	unsigned long used = 0;
+
+	for (i = 0; i < mem->nareas; i++)
+		used += mem->areas[i].used;
+	return used;
 }
 
 phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
@@ -598,7 +707,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
 	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-				 alloc_size, mem->nslabs, mem->used);
+				 alloc_size, mem->nslabs, mem_used(mem));
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
@@ -628,6 +737,8 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 	unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
 	int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
 	int nslots = nr_slots(mem->slots[index].alloc_size + offset);
+	int aindex = index / mem->area_nslabs;
+	struct io_tlb_area *area = &mem->areas[aindex];
 	int count, i;
 
 	/*
@@ -636,7 +747,9 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 	 * While returning the entries to the free list, we merge the entries
 	 * with slots below and above the pool being returned.
 	 */
-	spin_lock_irqsave(&mem->lock, flags);
+	BUG_ON(aindex >= mem->nareas);
+
+	spin_lock_irqsave(&area->lock, flags);
 	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
 		count = mem->slots[index + nslots].list;
 	else
@@ -660,8 +773,8 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && mem->slots[i].list;
 	     i--)
 		mem->slots[i].list = ++count;
-	mem->used -= nslots;
-	spin_unlock_irqrestore(&mem->lock, flags);
+	area->used -= nslots;
+	spin_unlock_irqrestore(&area->lock, flags);
 }
 
 /*
@@ -759,12 +872,14 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
 static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
 					 const char *dirname)
 {
+	unsigned long used = mem_used(mem);
+
 	mem->debugfs = debugfs_create_dir(dirname, io_tlb_default_mem.debugfs);
 	if (!mem->nslabs)
 		return;
 
 	debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
-	debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &mem->used);
+	debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &used);
 }
 
 static int __init __maybe_unused swiotlb_create_default_debugfs(void)
@@ -815,6 +930,9 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 	struct io_tlb_mem *mem = rmem->priv;
 	unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
 
+	/* Set Per-device io tlb area to one */
+	unsigned int nareas = 1;
+
 	/*
 	 * Since multiple devices can share the same pool, the private data,
 	 * io_tlb_mem struct, will be initialized by the first device attached
@@ -831,10 +949,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 			return -ENOMEM;
 		}
 
+		mem->areas = kcalloc(nareas, sizeof(*mem->areas),
+				     GFP_KERNEL);
+		if (!mem->areas) {
+			kfree(mem);
+			kfree(mem->slots);
+			return -ENOMEM;
+		}
+
 		set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 				     rmem->size >> PAGE_SHIFT);
 		swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE,
-				false);
+					false, nareas);
 		mem->for_alloc = true;
 
 		rmem->priv = mem;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-06-27 15:31 [PATCH 0/2] swiotlb: Split up single swiotlb lock Tianyu Lan
  2022-06-27 15:31 ` [PATCH 1/2] " Tianyu Lan
@ 2022-06-27 15:31 ` Tianyu Lan
  2022-06-29 14:04   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Tianyu Lan @ 2022-06-27 15:31 UTC (permalink / raw)
  To: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, hch, m.szyprowski, robin.murphy, paulmck, akpm,
	keescook, songmuchun, rdunlap, damien.lemoal, michael.h.kelley,
	kys
  Cc: Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

When initialize swiotlb bounce buffer, smp_init() has not been
called and cpu number can not be got from num_online_cpus().
Use the number of lapic entry to set swiotlb area number and
keep swiotlb area number equal to cpu number on the x86 platform.

Based-on-idea-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/kernel/acpi/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 907cc98b1938..7e13499f2c10 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -22,6 +22,7 @@
 #include <linux/efi-bgrt.h>
 #include <linux/serial_core.h>
 #include <linux/pgtable.h>
+#include <linux/swiotlb.h>
 
 #include <asm/e820/api.h>
 #include <asm/irqdomain.h>
@@ -1131,6 +1132,8 @@ static int __init acpi_parse_madt_lapic_entries(void)
 		return count;
 	}
 
+	swiotlb_adjust_nareas(max(count, x2count));
+
 	x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI,
 					acpi_parse_x2apic_nmi, 0);
 	count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-06-27 15:31 ` [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT Tianyu Lan
@ 2022-06-29 14:04   ` Christoph Hellwig
  2022-06-30 17:02     ` Tianyu Lan
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-06-29 14:04 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, hch, m.szyprowski, robin.murphy, paulmck, akpm,
	keescook, songmuchun, rdunlap, damien.lemoal, michael.h.kelley,
	kys, Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm,
	vkuznets, wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

On Mon, Jun 27, 2022 at 11:31:50AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> When initialize swiotlb bounce buffer, smp_init() has not been
> called and cpu number can not be got from num_online_cpus().
> Use the number of lapic entry to set swiotlb area number and
> keep swiotlb area number equal to cpu number on the x86 platform.

Can we reorder that initialization?  Because I really hate having
to have an arch hook in every architecture.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] swiotlb: Split up single swiotlb lock
  2022-06-27 15:31 ` [PATCH 1/2] " Tianyu Lan
@ 2022-06-29 14:09   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-06-29 14:09 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, hch, m.szyprowski, robin.murphy, paulmck, akpm,
	keescook, songmuchun, rdunlap, damien.lemoal, michael.h.kelley,
	kys, Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm,
	vkuznets, wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

On Mon, Jun 27, 2022 at 11:31:49AM -0400, Tianyu Lan wrote:
> +/**
> + * struct io_tlb_area - IO TLB memory area descriptor
> + *
> + * This is a single area with a single lock.
> + *
> + * @used:	The number of used IO TLB block.
> + * @index:	The slot index to start searching in this area for next round.
> + * @lock:	The lock to protect the above data structures in the map and
> + *		unmap calls.
> + */
> +struct io_tlb_area {
> +	unsigned long used;
> +	unsigned int index;
> +	spinlock_t lock;
> +};

As already mentioned last time, please move this into swiotlb.c,
swiotlb.h only uses a pointer to this structure.

>  static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> -		unsigned long nslabs, unsigned int flags, bool late_alloc)
> +				    unsigned long nslabs, unsigned int flags,
> +				    bool late_alloc, unsigned int nareas)

Nit: the two tab indentation for prototype continuations is a lot easier
to maintain, so don't graciously switch away from it.

> +			alloc_size - (offset + ((i - slot_index) << IO_TLB_SHIFT));

Overly long line here.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-06-29 14:04   ` Christoph Hellwig
@ 2022-06-30 17:02     ` Tianyu Lan
  2022-07-06  8:00       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Tianyu Lan @ 2022-06-30 17:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, m.szyprowski, robin.murphy, paulmck, akpm, keescook,
	songmuchun, rdunlap, damien.lemoal, michael.h.kelley, kys,
	Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

On 6/29/2022 10:04 PM, Christoph Hellwig wrote:
> On Mon, Jun 27, 2022 at 11:31:50AM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> When initialize swiotlb bounce buffer, smp_init() has not been
>> called and cpu number can not be got from num_online_cpus().
>> Use the number of lapic entry to set swiotlb area number and
>> keep swiotlb area number equal to cpu number on the x86 platform.
> 
> Can we reorder that initialization?  Because I really hate having
> to have an arch hook in every architecture.

How about using "flags" parameter of swiotlb_init() to pass area number
or add new parameter for area number?

I just reposted patch 1 since there is just some coding style issue and 
area number may also set via swiotlb kernel parameter. We still need 
figure out a good solution to pass area number from architecture code.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-06-30 17:02     ` Tianyu Lan
@ 2022-07-06  8:00       ` Christoph Hellwig
  2022-07-06  8:57         ` Tianyu Lan
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-07-06  8:00 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Christoph Hellwig, corbet, rafael, len.brown, pavel, tglx, mingo,
	bp, dave.hansen, x86, hpa, m.szyprowski, robin.murphy, paulmck,
	akpm, keescook, songmuchun, rdunlap, damien.lemoal,
	michael.h.kelley, kys, Tianyu Lan, iommu, linux-doc,
	linux-kernel, linux-pm, vkuznets, wei.liu, parri.andrea,
	thomas.lendacky, linux-hyperv, kirill.shutemov, andi.kleen,
	Andi Kleen

On Fri, Jul 01, 2022 at 01:02:21AM +0800, Tianyu Lan wrote:
> > Can we reorder that initialization?  Because I really hate having
> > to have an arch hook in every architecture.
> 
> How about using "flags" parameter of swiotlb_init() to pass area number
> or add new parameter for area number?
> 
> I just reposted patch 1 since there is just some coding style issue and area
> number may also set via swiotlb kernel parameter. We still need figure out a
> good solution to pass area number from architecture code.

What is the problem with calling swiotlb_init after nr_possible_cpus()
works?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-07-06  8:00       ` Christoph Hellwig
@ 2022-07-06  8:57         ` Tianyu Lan
  2022-07-06  9:02           ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Tianyu Lan @ 2022-07-06  8:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, m.szyprowski, robin.murphy, paulmck, akpm, keescook,
	songmuchun, rdunlap, damien.lemoal, michael.h.kelley, kys,
	Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

On 7/6/2022 4:00 PM, Christoph Hellwig wrote:
> On Fri, Jul 01, 2022 at 01:02:21AM +0800, Tianyu Lan wrote:
>>> Can we reorder that initialization?  Because I really hate having
>>> to have an arch hook in every architecture.
>>
>> How about using "flags" parameter of swiotlb_init() to pass area number
>> or add new parameter for area number?
>>
>> I just reposted patch 1 since there is just some coding style issue and area
>> number may also set via swiotlb kernel parameter. We still need figure out a
>> good solution to pass area number from architecture code.
> 
> What is the problem with calling swiotlb_init after nr_possible_cpus()
> works?

Swiotlb_init() is called in the mem_init() of different architects and
memblock free pages are released to the buddy allocator just after
calling swiotlb_init() via memblock_free_all().

The mem_init() is called before smp_init(). If calling swiotlb_init()
after smp_init(), that means we can't allocate large chunk low end
memory via memblock_alloc() in the swiotlb(). Swiotlb_init() needs
to rework to allocate memory from the buddy allocator and just like
swiotlb_init_late() does. This will limit the bounce buffer size.
Otherwise We need to do the reorder for all achitectures and there maybe
some other unknown issues.

swiotlb flags parameter of swiotlb_init() seems to be a good place to
pass the area number in current code. If not set the swiotlb_area
number/flag, the area number will be one and keep the original behavior
of one single global spinlock protecting io tlb data structure.















^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-07-06  8:57         ` Tianyu Lan
@ 2022-07-06  9:02           ` Christoph Hellwig
  2022-07-06 10:21             ` Tianyu Lan
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-07-06  9:02 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Christoph Hellwig, corbet, rafael, len.brown, pavel, tglx, mingo,
	bp, dave.hansen, x86, hpa, m.szyprowski, robin.murphy, paulmck,
	akpm, keescook, songmuchun, rdunlap, damien.lemoal,
	michael.h.kelley, kys, Tianyu Lan, iommu, linux-doc,
	linux-kernel, linux-pm, vkuznets, wei.liu, parri.andrea,
	thomas.lendacky, linux-hyperv, kirill.shutemov, andi.kleen,
	Andi Kleen

On Wed, Jul 06, 2022 at 04:57:33PM +0800, Tianyu Lan wrote:
> Swiotlb_init() is called in the mem_init() of different architects and
> memblock free pages are released to the buddy allocator just after
> calling swiotlb_init() via memblock_free_all().

Yes.

> The mem_init() is called before smp_init().

But why would that matter?  cpu_possible_map is set up from
setup_arch(), which is called before that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
  2022-07-06  9:02           ` Christoph Hellwig
@ 2022-07-06 10:21             ` Tianyu Lan
  0 siblings, 0 replies; 10+ messages in thread
From: Tianyu Lan @ 2022-07-06 10:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: corbet, rafael, len.brown, pavel, tglx, mingo, bp, dave.hansen,
	x86, hpa, m.szyprowski, robin.murphy, paulmck, akpm, keescook,
	songmuchun, rdunlap, damien.lemoal, michael.h.kelley, kys,
	Tianyu Lan, iommu, linux-doc, linux-kernel, linux-pm, vkuznets,
	wei.liu, parri.andrea, thomas.lendacky, linux-hyperv,
	kirill.shutemov, andi.kleen, Andi Kleen

On 7/6/2022 5:02 PM, Christoph Hellwig wrote:
> On Wed, Jul 06, 2022 at 04:57:33PM +0800, Tianyu Lan wrote:
>> Swiotlb_init() is called in the mem_init() of different architects and
>> memblock free pages are released to the buddy allocator just after
>> calling swiotlb_init() via memblock_free_all().
> 
> Yes.
> 
>> The mem_init() is called before smp_init().
> 
> But why would that matter?  cpu_possible_map is set up from
> setup_arch(), which is called before that.

Sorry. I just still focus online cpu number and the number is got after
smp_init(). Possible cpu number includes some offline cpus. I will have 
a try. Thanks for suggestion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-06 10:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27 15:31 [PATCH 0/2] swiotlb: Split up single swiotlb lock Tianyu Lan
2022-06-27 15:31 ` [PATCH 1/2] " Tianyu Lan
2022-06-29 14:09   ` Christoph Hellwig
2022-06-27 15:31 ` [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT Tianyu Lan
2022-06-29 14:04   ` Christoph Hellwig
2022-06-30 17:02     ` Tianyu Lan
2022-07-06  8:00       ` Christoph Hellwig
2022-07-06  8:57         ` Tianyu Lan
2022-07-06  9:02           ` Christoph Hellwig
2022-07-06 10:21             ` Tianyu Lan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).