linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: next/master bisection: baseline.login on r8a77960-ulcb
       [not found] <60346234.1c69fb81.cd55e.770d@mx.google.com>
@ 2021-02-23  9:56 ` Guillaume Tucker
  2021-02-24 21:39   ` Heiko Thiery
  0 siblings, 1 reply; 7+ messages in thread
From: Guillaume Tucker @ 2021-02-23  9:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jianxiong Gao, Christoph Hellwig
  Cc: kernelci-results, linux-kernel, iommu, Marek Szyprowski, Robin Murphy

Hi Christoph,

Please see the bisection report below about a boot failure on
r8a77960-ulcb on next-20210222.  

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The log shows a kernel panic, more details can be found here:

  https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/

Please let us know if you need any help to debug the issue or try
a fix on this platform.

Best wishes,
Guillaume

On 23/02/2021 02:02, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on r8a77960-ulcb
> 
> Summary:
>   Start:      37dfbfbdca66 Add linux-next specific files for 20210222
>   Plain log:  https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.html
>   Result:     567d877f9a7d swiotlb: refactor swiotlb_tbl_map_single
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     r8a77960-ulcb
>   CPU arch:   arm64
>   Lab:        lab-baylibre
>   Compiler:   clang-10
>   Config:     defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 567d877f9a7d6bf4e4bf0ecd6de23fec8039b123
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Thu Feb 4 11:08:35 2021 +0100
> 
>     swiotlb: refactor swiotlb_tbl_map_single
>     
>     Split out a bunch of a self-contained helpers to make the function easier
>     to follow.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Acked-by: Jianxiong Gao <jxgao@google.com>
>     Tested-by: Jianxiong Gao <jxgao@google.com>
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b38b1553c466..381c24ef1ac1 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,133 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>  	}
>  }
>  
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		enum dma_data_direction dir, unsigned long attrs)
> -{
> -	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> -	unsigned long flags;
> -	phys_addr_t tlb_addr;
> -	unsigned int nslots, stride, index, wrap;
> -	int i;
> -	unsigned long mask;
> -	unsigned long offset_slots;
> -	unsigned long max_slots;
> -	unsigned long tmp_io_tlb_used;
> -
> -	if (no_iotlb_memory)
> -		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
> -
> -	if (mem_encrypt_active())
> -		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> +#define slot_addr(start, idx)	((start) + ((idx) << IO_TLB_SHIFT))
>  
> -	if (mapping_size > alloc_size) {
> -		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> -			      mapping_size, alloc_size);
> -		return (phys_addr_t)DMA_MAPPING_ERROR;
> -	}
> -
> -	mask = dma_get_seg_boundary(hwdev);
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
> +{
> +	if (boundary_mask == ~0UL)
> +		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +	return nr_slots(boundary_mask + 1);
> +}
>  
> -	tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> +	if (index >= io_tlb_nslabs)
> +		return 0;
> +	return index;
> +}
>  
> -	offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size)
> +{
> +	unsigned long boundary_mask = dma_get_seg_boundary(dev);
> +	dma_addr_t tbl_dma_addr =
> +		phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> +	unsigned int max_slots = get_max_slots(boundary_mask);
> +	unsigned int nslots = nr_slots(alloc_size), stride = 1;
> +	unsigned int index, wrap, count = 0, i;
> +	unsigned long flags;
>  
> -	/*
> -	 * Carefully handle integer overflow which can occur when mask == ~0UL.
> -	 */
> -	max_slots = mask + 1
> -		    ? nr_slots(mask + 1)
> -		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +	BUG_ON(!nslots);
>  
>  	/*
>  	 * For mappings greater than or equal to a page, we limit the stride
>  	 * (and hence alignment) to a page size.
>  	 */
> -	nslots = nr_slots(alloc_size);
>  	if (alloc_size >= PAGE_SIZE)
> -		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
> -	else
> -		stride = 1;
> +		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
>  
> -	BUG_ON(!nslots);
> -
> -	/*
> -	 * Find suitable number of IO TLB entries size that will fit this
> -	 * request and allocate a buffer from that IO TLB pool.
> -	 */
>  	spin_lock_irqsave(&io_tlb_lock, flags);
> -
>  	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
>  		goto not_found;
>  
> -	index = ALIGN(io_tlb_index, stride);
> -	if (index >= io_tlb_nslabs)
> -		index = 0;
> -	wrap = index;
> -
> +	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
>  	do {
> -		while (iommu_is_span_boundary(index, nslots, offset_slots,
> -					      max_slots)) {
> -			index += stride;
> -			if (index >= io_tlb_nslabs)
> -				index = 0;
> -			if (index == wrap)
> -				goto not_found;
> -		}
> -
>  		/*
>  		 * If we find a slot that indicates we have 'nslots' number of
>  		 * contiguous buffers, we allocate the buffers from that slot
>  		 * and mark the entries as '0' indicating unavailable.
>  		 */
> -		if (io_tlb_list[index] >= nslots) {
> -			int count = 0;
> -
> -			for (i = index; i < (int) (index + nslots); i++)
> -				io_tlb_list[i] = 0;
> -			for (i = index - 1;
> -			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
> -			     io_tlb_list[i]; i--)
> -				io_tlb_list[i] = ++count;
> -			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
> -
> -			/*
> -			 * Update the indices to avoid searching in the next
> -			 * round.
> -			 */
> -			io_tlb_index = ((index + nslots) < io_tlb_nslabs
> -					? (index + nslots) : 0);
> -
> -			goto found;
> +		if (!iommu_is_span_boundary(index, nslots,
> +					    nr_slots(tbl_dma_addr),
> +					    max_slots)) {
> +			if (io_tlb_list[index] >= nslots)
> +				goto found;
>  		}
> -		index += stride;
> -		if (index >= io_tlb_nslabs)
> -			index = 0;
> +		index = wrap_index(index + stride);
>  	} while (index != wrap);
>  
>  not_found:
> -	tmp_io_tlb_used = io_tlb_used;
> -
>  	spin_unlock_irqrestore(&io_tlb_lock, flags);
> -	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
> -		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
> -			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
> -	return (phys_addr_t)DMA_MAPPING_ERROR;
> +	return -1;
> +
>  found:
> +	for (i = index; i < index + nslots; i++)
> +		io_tlb_list[i] = 0;
> +	for (i = index - 1;
> +	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
> +	     io_tlb_list[i]; i--)
> +		io_tlb_list[i] = ++count;
> +
> +	/*
> +	 * Update the indices to avoid searching in the next round.
> +	 */
> +	if (index + nslots < io_tlb_nslabs)
> +		io_tlb_index = index + nslots;
> +	else
> +		io_tlb_index = 0;
>  	io_tlb_used += nslots;
> +
>  	spin_unlock_irqrestore(&io_tlb_lock, flags);
> +	return index;
> +}
> +
> +phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> +		size_t mapping_size, size_t alloc_size,
> +		enum dma_data_direction dir, unsigned long attrs)
> +{
> +	unsigned int index, i;
> +	phys_addr_t tlb_addr;
> +
> +	if (no_iotlb_memory)
> +		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
> +
> +	if (mem_encrypt_active())
> +		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> +
> +	if (mapping_size > alloc_size) {
> +		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> +			      mapping_size, alloc_size);
> +		return (phys_addr_t)DMA_MAPPING_ERROR;
> +	}
> +
> +	index = find_slots(dev, alloc_size);
> +	if (index == -1) {
> +		if (!(attrs & DMA_ATTR_NO_WARN))
> +			dev_warn_ratelimited(dev,
> +	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
> +				 alloc_size, io_tlb_nslabs, io_tlb_used);
> +		return (phys_addr_t)DMA_MAPPING_ERROR;
> +	}
>  
>  	/*
>  	 * Save away the mapping from the original address to the DMA address.
>  	 * This is needed when we sync the memory.  Then we sync the buffer if
>  	 * needed.
>  	 */
> -	for (i = 0; i < nslots; i++)
> -		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> +	for (i = 0; i < nr_slots(alloc_size); i++)
> +		io_tlb_orig_addr[index + i] = slot_addr(orig_addr, i);
> +
> +	tlb_addr = slot_addr(io_tlb_start, index);
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>  	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>  		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> -
>  	return tlb_addr;
>  }
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [d99676af540c2dc829999928fb81c58c80a1dce4] Merge tag 'drm-next-2021-02-19' of git://anongit.freedesktop.org/drm/drm
> git bisect good d99676af540c2dc829999928fb81c58c80a1dce4
> # bad: [37dfbfbdca66834bc0f64ec9b35e09ac6c8898da] Add linux-next specific files for 20210222
> git bisect bad 37dfbfbdca66834bc0f64ec9b35e09ac6c8898da
> # bad: [25c1843cc6b3d64ce774ce7f1dc649ca3109a4c5] Merge remote-tracking branch 'block/for-next'
> git bisect bad 25c1843cc6b3d64ce774ce7f1dc649ca3109a4c5
> # good: [705552a85bfda7f2b0a3922b318d74fcc8368fd6] Merge remote-tracking branch 'btrfs/for-next'
> git bisect good 705552a85bfda7f2b0a3922b318d74fcc8368fd6
> # good: [eed3cd1a28b4a41ca25b8f5fbd86449be8ac3216] Merge remote-tracking branch 'v4l-dvb-next/master'
> git bisect good eed3cd1a28b4a41ca25b8f5fbd86449be8ac3216
> # bad: [366e8fe73e13244686604662ddcc70aa14a3e0e6] Merge remote-tracking branch 'rdma/for-next'
> git bisect bad 366e8fe73e13244686604662ddcc70aa14a3e0e6
> # good: [5120bf0a5fc15dec210a0fe0f39e4a256bb6e349] RDMA/rxe: Correct skb on loopback path
> git bisect good 5120bf0a5fc15dec210a0fe0f39e4a256bb6e349
> # good: [ffc46af1757e05652e17c47e4aa2a01bf5aaf3ad] Merge remote-tracking branch 'thermal/thermal/linux-next'
> git bisect good ffc46af1757e05652e17c47e4aa2a01bf5aaf3ad
> # good: [229557230c760e25b6af79709aa85d30de4c8500] RDMA/hns: Remove unused member and variable of CMDQ
> git bisect good 229557230c760e25b6af79709aa85d30de4c8500
> # good: [2b5715fc17386a6223490d5b8f08d031999b0c0b] RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
> git bisect good 2b5715fc17386a6223490d5b8f08d031999b0c0b
> # bad: [e952d9a1bc204109a21f7dbedddedc110a33baf1] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
> git bisect bad e952d9a1bc204109a21f7dbedddedc110a33baf1
> # good: [c7fbeca757fe74135d8b6a4c8ddaef76f5775d68] swiotlb: factor out an io_tlb_offset helper
> git bisect good c7fbeca757fe74135d8b6a4c8ddaef76f5775d68
> # good: [ca10d0f8e530600ec63c603dbace2c30927d70b7] swiotlb: clean up swiotlb_tbl_unmap_single
> git bisect good ca10d0f8e530600ec63c603dbace2c30927d70b7
> # bad: [567d877f9a7d6bf4e4bf0ecd6de23fec8039b123] swiotlb: refactor swiotlb_tbl_map_single
> git bisect bad 567d877f9a7d6bf4e4bf0ecd6de23fec8039b123
> # first bad commit: [567d877f9a7d6bf4e4bf0ecd6de23fec8039b123] swiotlb: refactor swiotlb_tbl_map_single
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#7292): https://groups.io/g/kernelci-results/message/7292
> Mute This Topic: https://groups.io/mt/80842441/924702
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-23  9:56 ` next/master bisection: baseline.login on r8a77960-ulcb Guillaume Tucker
@ 2021-02-24 21:39   ` Heiko Thiery
  2021-02-25 11:09     ` Thierry Reding
  0 siblings, 1 reply; 7+ messages in thread
From: Heiko Thiery @ 2021-02-24 21:39 UTC (permalink / raw)
  To: Guillaume Tucker, Konrad Rzeszutek Wilk, Jianxiong Gao,
	Christoph Hellwig
  Cc: kernelci-results, linux-kernel, iommu, Marek Szyprowski, Robin Murphy

Hi Christoph and all,

On 23.02.21 10:56, Guillaume Tucker wrote:
> Hi Christoph,
> 
> Please see the bisection report below about a boot failure on
> r8a77960-ulcb on next-20210222.
> 
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
> 
> The log shows a kernel panic, more details can be found here:
> 
>    https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/
> 
> Please let us know if you need any help to debug the issue or try
> a fix on this platform.

I am also seeing this problem on an iMX8MQ board and can help test if 
you have a fix.

BR
-- 
Heiko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-24 21:39   ` Heiko Thiery
@ 2021-02-25 11:09     ` Thierry Reding
  2021-02-25 11:14       ` Robin Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: Thierry Reding @ 2021-02-25 11:09 UTC (permalink / raw)
  To: Heiko Thiery
  Cc: Guillaume Tucker, Konrad Rzeszutek Wilk, Jianxiong Gao,
	Christoph Hellwig, kernelci-results, linux-kernel, iommu,
	Marek Szyprowski, Robin Murphy

[-- Attachment #1: Type: text/plain, Size: 4361 bytes --]

On Wed, Feb 24, 2021 at 10:39:42PM +0100, Heiko Thiery wrote:
> Hi Christoph and all,
> 
> On 23.02.21 10:56, Guillaume Tucker wrote:
> > Hi Christoph,
> > 
> > Please see the bisection report below about a boot failure on
> > r8a77960-ulcb on next-20210222.
> > 
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> > 
> > The log shows a kernel panic, more details can be found here:
> > 
> >    https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/
> > 
> > Please let us know if you need any help to debug the issue or try
> > a fix on this platform.
> 
> I am also seeing this problem on an iMX8MQ board and can help test if you
> have a fix.

This is also causing boot failures on Jetson AGX Xavier. The origin is
slightly different from the above kernelci.org report, but the BUG_ON is
the same:

    [    2.650447] ------------[ cut here ]------------
    [    2.650588] kernel BUG at include/linux/iommu-helper.h:23!
    [    2.650729] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    [    2.654330] Modules linked in:
    [    2.657474] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 5.11.0-next-20210225-00025-gfd15609b3a81-dirty #120
    [    2.667367] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
    [    2.674096] Workqueue: events deferred_probe_work_func
    [    2.679169] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
    [    2.684949] pc : find_slots.isra.0+0x118/0x2f0
    [    2.689494] lr : find_slots.isra.0+0x88/0x2f0
    [    2.693696] sp : ffff800011faf950
    [    2.697281] x29: ffff800011faf950 x28: 0000000000000001
    [    2.702537] x27: 0000000000000001 x26: 0000000000000000
    [    2.708131] x25: 0000000000000001 x24: 0000000105f03148
    [    2.713556] x23: 0000000000000001 x22: ffff800011559000
    [    2.718835] x21: ffff800011559a80 x20: 00000000edc00000
    [    2.724493] x19: 0000000000000000 x18: 0000000000000020
    [    2.729770] x17: ffff0003ffd7d160 x16: 0000000000000068
    [    2.735173] x15: ffff000080b43150 x14: ffffffffffffffff
    [    2.740944] x13: ffff000082b5d791 x12: 0000000000000040
    [    2.746113] x11: ffff0000a0000248 x10: 0000000000000000
    [    2.751882] x9 : 0000000000000000 x8 : ffff0003fed30000
    [    2.757139] x7 : 0000000000000000 x6 : 0000000000000000
    [    2.762818] x5 : 0000000000000000 x4 : 0000000000000000
    [    2.767984] x3 : 00000001e8303148 x2 : 0000000000008000
    [    2.773580] x1 : ffffffffffffffff x0 : 00000000001db800
    [    2.778662] Call trace:
    [    2.781136]  find_slots.isra.0+0x118/0x2f0
    [    2.785137]  swiotlb_tbl_map_single+0x80/0x1b4
    [    2.789858]  swiotlb_map+0x58/0x200
    [    2.793355]  dma_direct_map_page+0x148/0x1c0
    [    2.797386]  dma_map_page_attrs+0x2c/0x54
    [    2.801411]  dw_pcie_host_init+0x40c/0x4c0
    [    2.805633]  tegra_pcie_config_rp+0x7c/0x1f4
    [    2.810155]  tegra_pcie_dw_probe+0x3d0/0x60c
    [    2.814185]  platform_probe+0x68/0xe0
    [    2.817688]  really_probe+0xe4/0x4c0
    [    2.821362]  driver_probe_device+0x58/0xc0
    [    2.825386]  __device_attach_driver+0xa8/0x104
    [    2.829953]  bus_for_each_drv+0x78/0xd0
    [    2.833434]  __device_attach+0xdc/0x17c
    [    2.837631]  device_initial_probe+0x14/0x20
    [    2.841680]  bus_probe_device+0x9c/0xa4
    [    2.845160]  deferred_probe_work_func+0x74/0xb0
    [    2.849734]  process_one_work+0x1cc/0x350
    [    2.853822]  worker_thread+0x20c/0x3ac
    [    2.858018]  kthread+0x128/0x134
    [    2.860997]  ret_from_fork+0x10/0x34
    [    2.864508] Code: ca180063 ea06007f 54fffee1 b50001e7 (d4210000)
    [    2.870547] ---[ end trace e5c50bdcf12b316e ]---
    [    2.875087] note: kworker/2:1[67] exited with preempt_count 2
    [    2.880836] ------------[ cut here ]------------

I've confirmed that reverting the following commits makes the system
boot again:

    47cfc5be1934 ("swiotlb: Validate bounce size in the sync/unmap path")
    c6f50c7719e7 ("swiotlb: respect min_align_mask")
    e952d9a1bc20 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
    567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")

Let me know if I can help test any fixes for this.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-25 11:09     ` Thierry Reding
@ 2021-02-25 11:14       ` Robin Murphy
  2021-02-25 11:50         ` Thierry Reding
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2021-02-25 11:14 UTC (permalink / raw)
  To: Thierry Reding, Heiko Thiery
  Cc: Guillaume Tucker, Konrad Rzeszutek Wilk, Jianxiong Gao,
	Christoph Hellwig, kernelci-results, linux-kernel, iommu,
	Marek Szyprowski

On 2021-02-25 11:09, Thierry Reding wrote:
> On Wed, Feb 24, 2021 at 10:39:42PM +0100, Heiko Thiery wrote:
>> Hi Christoph and all,
>>
>> On 23.02.21 10:56, Guillaume Tucker wrote:
>>> Hi Christoph,
>>>
>>> Please see the bisection report below about a boot failure on
>>> r8a77960-ulcb on next-20210222.
>>>
>>> Reports aren't automatically sent to the public while we're
>>> trialing new bisection features on kernelci.org but this one
>>> looks valid.
>>>
>>> The log shows a kernel panic, more details can be found here:
>>>
>>>     https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/
>>>
>>> Please let us know if you need any help to debug the issue or try
>>> a fix on this platform.
>>
>> I am also seeing this problem on an iMX8MQ board and can help test if you
>> have a fix.
> 
> This is also causing boot failures on Jetson AGX Xavier. The origin is
> slightly different from the above kernelci.org report, but the BUG_ON is
> the same:
> 
>      [    2.650447] ------------[ cut here ]------------
>      [    2.650588] kernel BUG at include/linux/iommu-helper.h:23!
>      [    2.650729] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>      [    2.654330] Modules linked in:
>      [    2.657474] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 5.11.0-next-20210225-00025-gfd15609b3a81-dirty #120
>      [    2.667367] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
>      [    2.674096] Workqueue: events deferred_probe_work_func
>      [    2.679169] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
>      [    2.684949] pc : find_slots.isra.0+0x118/0x2f0
>      [    2.689494] lr : find_slots.isra.0+0x88/0x2f0
>      [    2.693696] sp : ffff800011faf950
>      [    2.697281] x29: ffff800011faf950 x28: 0000000000000001
>      [    2.702537] x27: 0000000000000001 x26: 0000000000000000
>      [    2.708131] x25: 0000000000000001 x24: 0000000105f03148
>      [    2.713556] x23: 0000000000000001 x22: ffff800011559000
>      [    2.718835] x21: ffff800011559a80 x20: 00000000edc00000
>      [    2.724493] x19: 0000000000000000 x18: 0000000000000020
>      [    2.729770] x17: ffff0003ffd7d160 x16: 0000000000000068
>      [    2.735173] x15: ffff000080b43150 x14: ffffffffffffffff
>      [    2.740944] x13: ffff000082b5d791 x12: 0000000000000040
>      [    2.746113] x11: ffff0000a0000248 x10: 0000000000000000
>      [    2.751882] x9 : 0000000000000000 x8 : ffff0003fed30000
>      [    2.757139] x7 : 0000000000000000 x6 : 0000000000000000
>      [    2.762818] x5 : 0000000000000000 x4 : 0000000000000000
>      [    2.767984] x3 : 00000001e8303148 x2 : 0000000000008000
>      [    2.773580] x1 : ffffffffffffffff x0 : 00000000001db800
>      [    2.778662] Call trace:
>      [    2.781136]  find_slots.isra.0+0x118/0x2f0
>      [    2.785137]  swiotlb_tbl_map_single+0x80/0x1b4
>      [    2.789858]  swiotlb_map+0x58/0x200
>      [    2.793355]  dma_direct_map_page+0x148/0x1c0
>      [    2.797386]  dma_map_page_attrs+0x2c/0x54
>      [    2.801411]  dw_pcie_host_init+0x40c/0x4c0
>      [    2.805633]  tegra_pcie_config_rp+0x7c/0x1f4
>      [    2.810155]  tegra_pcie_dw_probe+0x3d0/0x60c
>      [    2.814185]  platform_probe+0x68/0xe0
>      [    2.817688]  really_probe+0xe4/0x4c0
>      [    2.821362]  driver_probe_device+0x58/0xc0
>      [    2.825386]  __device_attach_driver+0xa8/0x104
>      [    2.829953]  bus_for_each_drv+0x78/0xd0
>      [    2.833434]  __device_attach+0xdc/0x17c
>      [    2.837631]  device_initial_probe+0x14/0x20
>      [    2.841680]  bus_probe_device+0x9c/0xa4
>      [    2.845160]  deferred_probe_work_func+0x74/0xb0
>      [    2.849734]  process_one_work+0x1cc/0x350
>      [    2.853822]  worker_thread+0x20c/0x3ac
>      [    2.858018]  kthread+0x128/0x134
>      [    2.860997]  ret_from_fork+0x10/0x34
>      [    2.864508] Code: ca180063 ea06007f 54fffee1 b50001e7 (d4210000)
>      [    2.870547] ---[ end trace e5c50bdcf12b316e ]---
>      [    2.875087] note: kworker/2:1[67] exited with preempt_count 2
>      [    2.880836] ------------[ cut here ]------------
> 
> I've confirmed that reverting the following commits makes the system
> boot again:
> 
>      47cfc5be1934 ("swiotlb: Validate bounce size in the sync/unmap path")
>      c6f50c7719e7 ("swiotlb: respect min_align_mask")
>      e952d9a1bc20 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
>      567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
> 
> Let me know if I can help test any fixes for this.

FWIW, this sounds like it's probably the same thing for which a fix 
should be pending:

https://lore.kernel.org/linux-iommu/20210223072514.GA18079@lst.de/T/#u

Robin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-25 11:14       ` Robin Murphy
@ 2021-02-25 11:50         ` Thierry Reding
  2021-02-25 13:00           ` Heiko Thiery
  0 siblings, 1 reply; 7+ messages in thread
From: Thierry Reding @ 2021-02-25 11:50 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Heiko Thiery, Guillaume Tucker, Konrad Rzeszutek Wilk,
	Jianxiong Gao, Christoph Hellwig, kernelci-results, linux-kernel,
	iommu, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 5165 bytes --]

On Thu, Feb 25, 2021 at 11:14:57AM +0000, Robin Murphy wrote:
> On 2021-02-25 11:09, Thierry Reding wrote:
> > On Wed, Feb 24, 2021 at 10:39:42PM +0100, Heiko Thiery wrote:
> > > Hi Christoph and all,
> > > 
> > > On 23.02.21 10:56, Guillaume Tucker wrote:
> > > > Hi Christoph,
> > > > 
> > > > Please see the bisection report below about a boot failure on
> > > > r8a77960-ulcb on next-20210222.
> > > > 
> > > > Reports aren't automatically sent to the public while we're
> > > > trialing new bisection features on kernelci.org but this one
> > > > looks valid.
> > > > 
> > > > The log shows a kernel panic, more details can be found here:
> > > > 
> > > >     https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/
> > > > 
> > > > Please let us know if you need any help to debug the issue or try
> > > > a fix on this platform.
> > > 
> > > I am also seeing this problem on an iMX8MQ board and can help test if you
> > > have a fix.
> > 
> > This is also causing boot failures on Jetson AGX Xavier. The origin is
> > slightly different from the above kernelci.org report, but the BUG_ON is
> > the same:
> > 
> >      [    2.650447] ------------[ cut here ]------------
> >      [    2.650588] kernel BUG at include/linux/iommu-helper.h:23!
> >      [    2.650729] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> >      [    2.654330] Modules linked in:
> >      [    2.657474] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 5.11.0-next-20210225-00025-gfd15609b3a81-dirty #120
> >      [    2.667367] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> >      [    2.674096] Workqueue: events deferred_probe_work_func
> >      [    2.679169] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> >      [    2.684949] pc : find_slots.isra.0+0x118/0x2f0
> >      [    2.689494] lr : find_slots.isra.0+0x88/0x2f0
> >      [    2.693696] sp : ffff800011faf950
> >      [    2.697281] x29: ffff800011faf950 x28: 0000000000000001
> >      [    2.702537] x27: 0000000000000001 x26: 0000000000000000
> >      [    2.708131] x25: 0000000000000001 x24: 0000000105f03148
> >      [    2.713556] x23: 0000000000000001 x22: ffff800011559000
> >      [    2.718835] x21: ffff800011559a80 x20: 00000000edc00000
> >      [    2.724493] x19: 0000000000000000 x18: 0000000000000020
> >      [    2.729770] x17: ffff0003ffd7d160 x16: 0000000000000068
> >      [    2.735173] x15: ffff000080b43150 x14: ffffffffffffffff
> >      [    2.740944] x13: ffff000082b5d791 x12: 0000000000000040
> >      [    2.746113] x11: ffff0000a0000248 x10: 0000000000000000
> >      [    2.751882] x9 : 0000000000000000 x8 : ffff0003fed30000
> >      [    2.757139] x7 : 0000000000000000 x6 : 0000000000000000
> >      [    2.762818] x5 : 0000000000000000 x4 : 0000000000000000
> >      [    2.767984] x3 : 00000001e8303148 x2 : 0000000000008000
> >      [    2.773580] x1 : ffffffffffffffff x0 : 00000000001db800
> >      [    2.778662] Call trace:
> >      [    2.781136]  find_slots.isra.0+0x118/0x2f0
> >      [    2.785137]  swiotlb_tbl_map_single+0x80/0x1b4
> >      [    2.789858]  swiotlb_map+0x58/0x200
> >      [    2.793355]  dma_direct_map_page+0x148/0x1c0
> >      [    2.797386]  dma_map_page_attrs+0x2c/0x54
> >      [    2.801411]  dw_pcie_host_init+0x40c/0x4c0
> >      [    2.805633]  tegra_pcie_config_rp+0x7c/0x1f4
> >      [    2.810155]  tegra_pcie_dw_probe+0x3d0/0x60c
> >      [    2.814185]  platform_probe+0x68/0xe0
> >      [    2.817688]  really_probe+0xe4/0x4c0
> >      [    2.821362]  driver_probe_device+0x58/0xc0
> >      [    2.825386]  __device_attach_driver+0xa8/0x104
> >      [    2.829953]  bus_for_each_drv+0x78/0xd0
> >      [    2.833434]  __device_attach+0xdc/0x17c
> >      [    2.837631]  device_initial_probe+0x14/0x20
> >      [    2.841680]  bus_probe_device+0x9c/0xa4
> >      [    2.845160]  deferred_probe_work_func+0x74/0xb0
> >      [    2.849734]  process_one_work+0x1cc/0x350
> >      [    2.853822]  worker_thread+0x20c/0x3ac
> >      [    2.858018]  kthread+0x128/0x134
> >      [    2.860997]  ret_from_fork+0x10/0x34
> >      [    2.864508] Code: ca180063 ea06007f 54fffee1 b50001e7 (d4210000)
> >      [    2.870547] ---[ end trace e5c50bdcf12b316e ]---
> >      [    2.875087] note: kworker/2:1[67] exited with preempt_count 2
> >      [    2.880836] ------------[ cut here ]------------
> > 
> > I've confirmed that reverting the following commits makes the system
> > boot again:
> > 
> >      47cfc5be1934 ("swiotlb: Validate bounce size in the sync/unmap path")
> >      c6f50c7719e7 ("swiotlb: respect min_align_mask")
> >      e952d9a1bc20 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
> >      567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
> > 
> > Let me know if I can help test any fixes for this.
> 
> FWIW, this sounds like it's probably the same thing for which a fix should
> be pending:
> 
> https://lore.kernel.org/linux-iommu/20210223072514.GA18079@lst.de/T/#u

Yep, changing max_slots from unsigned int to unsigned long fixes this as
well. Thanks for the pointer!

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-25 11:50         ` Thierry Reding
@ 2021-02-25 13:00           ` Heiko Thiery
  2021-02-26  8:56             ` Yoshihiro Shimoda
  0 siblings, 1 reply; 7+ messages in thread
From: Heiko Thiery @ 2021-02-25 13:00 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Robin Murphy, Guillaume Tucker, Konrad Rzeszutek Wilk,
	Jianxiong Gao, Christoph Hellwig, kernelci-results, linux-kernel,
	iommu, Marek Szyprowski

Hi all,


Am Do., 25. Feb. 2021 um 12:50 Uhr schrieb Thierry Reding
<thierry.reding@gmail.com>:
>
> On Thu, Feb 25, 2021 at 11:14:57AM +0000, Robin Murphy wrote:
> > On 2021-02-25 11:09, Thierry Reding wrote:
> > > On Wed, Feb 24, 2021 at 10:39:42PM +0100, Heiko Thiery wrote:
> > > > Hi Christoph and all,
> > > >
> > > > On 23.02.21 10:56, Guillaume Tucker wrote:
> > > > > Hi Christoph,
> > > > >
> > > > > Please see the bisection report below about a boot failure on
> > > > > r8a77960-ulcb on next-20210222.
> > > > >
> > > > > Reports aren't automatically sent to the public while we're
> > > > > trialing new bisection features on kernelci.org but this one
> > > > > looks valid.
> > > > >
> > > > > The log shows a kernel panic, more details can be found here:
> > > > >
> > > > >     https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/
> > > > >
> > > > > Please let us know if you need any help to debug the issue or try
> > > > > a fix on this platform.
> > > >
> > > > I am also seeing this problem on an iMX8MQ board and can help test if you
> > > > have a fix.
> > >
> > > This is also causing boot failures on Jetson AGX Xavier. The origin is
> > > slightly different from the above kernelci.org report, but the BUG_ON is
> > > the same:
> > >
> > >      [    2.650447] ------------[ cut here ]------------
> > >      [    2.650588] kernel BUG at include/linux/iommu-helper.h:23!
> > >      [    2.650729] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > >      [    2.654330] Modules linked in:
> > >      [    2.657474] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 5.11.0-next-20210225-00025-gfd15609b3a81-dirty #120
> > >      [    2.667367] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> > >      [    2.674096] Workqueue: events deferred_probe_work_func
> > >      [    2.679169] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> > >      [    2.684949] pc : find_slots.isra.0+0x118/0x2f0
> > >      [    2.689494] lr : find_slots.isra.0+0x88/0x2f0
> > >      [    2.693696] sp : ffff800011faf950
> > >      [    2.697281] x29: ffff800011faf950 x28: 0000000000000001
> > >      [    2.702537] x27: 0000000000000001 x26: 0000000000000000
> > >      [    2.708131] x25: 0000000000000001 x24: 0000000105f03148
> > >      [    2.713556] x23: 0000000000000001 x22: ffff800011559000
> > >      [    2.718835] x21: ffff800011559a80 x20: 00000000edc00000
> > >      [    2.724493] x19: 0000000000000000 x18: 0000000000000020
> > >      [    2.729770] x17: ffff0003ffd7d160 x16: 0000000000000068
> > >      [    2.735173] x15: ffff000080b43150 x14: ffffffffffffffff
> > >      [    2.740944] x13: ffff000082b5d791 x12: 0000000000000040
> > >      [    2.746113] x11: ffff0000a0000248 x10: 0000000000000000
> > >      [    2.751882] x9 : 0000000000000000 x8 : ffff0003fed30000
> > >      [    2.757139] x7 : 0000000000000000 x6 : 0000000000000000
> > >      [    2.762818] x5 : 0000000000000000 x4 : 0000000000000000
> > >      [    2.767984] x3 : 00000001e8303148 x2 : 0000000000008000
> > >      [    2.773580] x1 : ffffffffffffffff x0 : 00000000001db800
> > >      [    2.778662] Call trace:
> > >      [    2.781136]  find_slots.isra.0+0x118/0x2f0
> > >      [    2.785137]  swiotlb_tbl_map_single+0x80/0x1b4
> > >      [    2.789858]  swiotlb_map+0x58/0x200
> > >      [    2.793355]  dma_direct_map_page+0x148/0x1c0
> > >      [    2.797386]  dma_map_page_attrs+0x2c/0x54
> > >      [    2.801411]  dw_pcie_host_init+0x40c/0x4c0
> > >      [    2.805633]  tegra_pcie_config_rp+0x7c/0x1f4
> > >      [    2.810155]  tegra_pcie_dw_probe+0x3d0/0x60c
> > >      [    2.814185]  platform_probe+0x68/0xe0
> > >      [    2.817688]  really_probe+0xe4/0x4c0
> > >      [    2.821362]  driver_probe_device+0x58/0xc0
> > >      [    2.825386]  __device_attach_driver+0xa8/0x104
> > >      [    2.829953]  bus_for_each_drv+0x78/0xd0
> > >      [    2.833434]  __device_attach+0xdc/0x17c
> > >      [    2.837631]  device_initial_probe+0x14/0x20
> > >      [    2.841680]  bus_probe_device+0x9c/0xa4
> > >      [    2.845160]  deferred_probe_work_func+0x74/0xb0
> > >      [    2.849734]  process_one_work+0x1cc/0x350
> > >      [    2.853822]  worker_thread+0x20c/0x3ac
> > >      [    2.858018]  kthread+0x128/0x134
> > >      [    2.860997]  ret_from_fork+0x10/0x34
> > >      [    2.864508] Code: ca180063 ea06007f 54fffee1 b50001e7 (d4210000)
> > >      [    2.870547] ---[ end trace e5c50bdcf12b316e ]---
> > >      [    2.875087] note: kworker/2:1[67] exited with preempt_count 2
> > >      [    2.880836] ------------[ cut here ]------------
> > >
> > > I've confirmed that reverting the following commits makes the system
> > > boot again:
> > >
> > >      47cfc5be1934 ("swiotlb: Validate bounce size in the sync/unmap path")
> > >      c6f50c7719e7 ("swiotlb: respect min_align_mask")
> > >      e952d9a1bc20 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
> > >      567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
> > >
> > > Let me know if I can help test any fixes for this.
> >
> > FWIW, this sounds like it's probably the same thing for which a fix should
> > be pending:
> >
> > https://lore.kernel.org/linux-iommu/20210223072514.GA18079@lst.de/T/#u
>
> Yep, changing max_slots from unsigned int to unsigned long fixes this as
> well. Thanks for the pointer!

I also can confirm that changing that to unsigned long fixes the issue.

Thank you
-- 
Heiko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: next/master bisection: baseline.login on r8a77960-ulcb
  2021-02-25 13:00           ` Heiko Thiery
@ 2021-02-26  8:56             ` Yoshihiro Shimoda
  0 siblings, 0 replies; 7+ messages in thread
From: Yoshihiro Shimoda @ 2021-02-26  8:56 UTC (permalink / raw)
  To: Heiko Thiery, Thierry Reding
  Cc: kernelci-results, Konrad Rzeszutek Wilk, Guillaume Tucker,
	linux-kernel, iommu, Robin Murphy, Christoph Hellwig,
	Jianxiong Gao

Hi all,

> From: Heiko Thiery, Sent: Thursday, February 25, 2021 10:01 PM
> Am Do., 25. Feb. 2021 um 12:50 Uhr schrieb Thierry Reding:
> > On Thu, Feb 25, 2021 at 11:14:57AM +0000, Robin Murphy wrote:
> > > On 2021-02-25 11:09, Thierry Reding wrote:
> > > > On Wed, Feb 24, 2021 at 10:39:42PM +0100, Heiko Thiery wrote:
> > > > > Hi Christoph and all,
> > > > >
> > > > > On 23.02.21 10:56, Guillaume Tucker wrote:
> > > > > > Hi Christoph,
> > > > > >
> > > > > > Please see the bisection report below about a boot failure on
> > > > > > r8a77960-ulcb on next-20210222.
> > > > > >
> > > > > > Reports aren't automatically sent to the public while we're
> > > > > > trialing new bisection features on kernelci.org but this one
> > > > > > looks valid.
> > > > > >
> > > > > > The log shows a kernel panic, more details can be found here:
<snip>
> >
> > Yep, changing max_slots from unsigned int to unsigned long fixes this as
> > well. Thanks for the pointer!
> 
> I also can confirm that changing that to unsigned long fixes the issue.

Thank you for the information! I also confirmed that changing the type of
max_slots fixed the issue on my environment (r8a77951-salvator-xs.dts with defconfig).

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-26  8:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <60346234.1c69fb81.cd55e.770d@mx.google.com>
2021-02-23  9:56 ` next/master bisection: baseline.login on r8a77960-ulcb Guillaume Tucker
2021-02-24 21:39   ` Heiko Thiery
2021-02-25 11:09     ` Thierry Reding
2021-02-25 11:14       ` Robin Murphy
2021-02-25 11:50         ` Thierry Reding
2021-02-25 13:00           ` Heiko Thiery
2021-02-26  8:56             ` Yoshihiro Shimoda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).