[PATCH v3] dma: Fix max PFN arithmetic overflow on 32 bit systems

* [PATCH v3] dma: Fix max PFN arithmetic overflow on 32 bit systems
@ 2020-05-26 17:57 Alexander Dahl
  2020-05-27  9:06 ` Greg KH
  2020-05-28 18:24 ` [tip: x86/urgent] x86/dma: " tip-bot2 for Alexander Dahl
  0 siblings, 2 replies; 3+ messages in thread
From: Alexander Dahl @ 2020-05-26 17:57 UTC (permalink / raw)
  To: x86
  Cc: iommu, linux-kernel, Alan Jenkins, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Robin Murphy, Florian Wolters,
	Alexander Dahl, stable

The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.  The
patch does not change the later overall result of 0x100000 for
MAX_DMA32_PFN.  The new calculation yields the same result, but does not
require 64 bit arithmetic.

On 32 bit systems the old calculation suffers from an arithmetic
overflow in that intermediate term in braces: 4UL aka unsigned long int
is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
not fit in 4 bytes), the in braces result is truncated to zero, the
following right shift does not alter that, so MAX_DMA32_PFN evaluates to
0 on 32 bit systems.

That wrong value is a problem in a comparision against MAX_DMA32_PFN in
the init code for swiotlb in 'pci_swiotlb_detect_4gb()' to decide if
swiotlb should be active.  That comparison yields the opposite result,
when compiling on 32 bit systems.

This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation
on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32
(and which landed in v3.0).

In practice this wasn't a problem, unless you activated CONFIG_SWIOTLB
on x86 (32 bit).

However for ARCH=x86 (32 bit) and if you have set CONFIG_IOMMU_INTEL,
since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page
is used") there's a dependency on CONFIG_SWIOTLB, which was not
necessarily active before.  That landed in v5.4, where we noticed it in
the fli4l Linux distribution.  We have CONFIG_IOMMU_INTEL active on both
32 and 64 bit kernel configs there (I could not find out why, so let's
just say historical reasons).

The effect is at boot time 64 MiB (default size) were allocated for
bounce buffers now, which is a noticeable amount of memory on small
systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
frequently used as home routers.

We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
(LTS) in fli4l and got that kernel messages for example:

  Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
  …
  Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
  …
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)

The initial analysis and the suggested fix was done by user 'sourcejedi'
at stackoverflow and explicitly marked as GPLv2 for inclusion in the
Linux kernel:

  https://unix.stackexchange.com/a/520525/50007

The new calculation, which does not suffer from that overflow, is the
same as for arch/mips now as suggested by Robin Murphy.

The fix was tested by fli4l users on round about two dozen different
systems, including both 32 and 64 bit archs, bare metal and virtualized
machines.

Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too")
Fixes: https://web.nettworks.org/bugs/browse/FFL-2560
Fixes: https://unix.stackexchange.com/q/520065/50007
Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexander Dahl <post@lespocky.de>
Cc: stable@vger.kernel.org
---

Notes:
    v3:
      - rewritten commit message to better explain that arithmetic overflow
        and added Fixes tag (Greg Kroah-Hartman)
      - rebased on v5.7-rc7
    
    v2:
      - use the same calculation as with arch/mips (Robin Murphy)

 arch/x86/include/asm/dma.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/dma.h b/arch/x86/include/asm/dma.h
index 00f7cf45e699..8e95aa4b0d17 100644
--- a/arch/x86/include/asm/dma.h
+++ b/arch/x86/include/asm/dma.h
@@ -74,7 +74,7 @@
 #define MAX_DMA_PFN   ((16UL * 1024 * 1024) >> PAGE_SHIFT)
 
 /* 4GB broken PCI/AGP hardware bus master zone */
-#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
+#define MAX_DMA32_PFN (1UL << (32 - PAGE_SHIFT))
 
 #ifdef CONFIG_X86_32
 /* The maximum address that we can perform a DMA transfer to on this platform */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread