linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
@ 2019-11-15 10:59 Kishon Vijay Abraham I
  2019-11-15 13:06 ` Christoph Hellwig
  2020-01-30  7:58 ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2019-11-15 10:59 UTC (permalink / raw)
  To: Christoph Hellwig, linux-pci

Hi Christoph,

I think we are encountering a case where the connected PCIe card (like PCIe USB
card) supports 64-bit addressing and the ARM core supports 64-bit addressing
but the PCIe controller in the SoC to which PCIe card is connected supports
only 32-bits.

Here dma APIs can provide an address above the 32 bit region to the PCIe card.
However this will fail when the card tries to access the provided address via
the PCIe controller.

The first commit where we actually start seeing issue is
commit 21e07dba9fb1179148089d611fc9e6e70d1887c3 (j7_serdes_v2)
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 3 19:09:59 2018 +0200

    scsi: reduce use of block bounce buffers

Can you give hints on how to solve this?

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-15 10:59 pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers" Kishon Vijay Abraham I
@ 2019-11-15 13:06 ` Christoph Hellwig
  2019-11-15 14:18   ` Kishon Vijay Abraham I
  2020-01-30  7:58 ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-11-15 13:06 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
> Hi Christoph,
> 
> I think we are encountering a case where the connected PCIe card (like PCIe USB
> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
> but the PCIe controller in the SoC to which PCIe card is connected supports
> only 32-bits.
> 
> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
> However this will fail when the card tries to access the provided address via
> the PCIe controller.

What kernel version did you see your problems with?

Linux 5.3 added swiotlb to arm LPAE configs for exactly that case.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-15 13:06 ` Christoph Hellwig
@ 2019-11-15 14:18   ` Kishon Vijay Abraham I
  2019-11-16 16:35     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2019-11-15 14:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Hi Christoph,

On 15/11/19 6:36 PM, Christoph Hellwig wrote:
> On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Christoph,
>>
>> I think we are encountering a case where the connected PCIe card (like PCIe USB
>> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
>> but the PCIe controller in the SoC to which PCIe card is connected supports
>> only 32-bits.
>>
>> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
>> However this will fail when the card tries to access the provided address via
>> the PCIe controller.
> 
> What kernel version did you see your problems with?
> 
> Linux 5.3 added swiotlb to arm LPAE configs for exactly that case.

I'm using the latest kernel
commit 96b95eff4a591dbac582c2590d067e356a18aacb (HEAD, origin/master, origin/HEAD)
Merge: 4e84608c7836 80591e61a0f7
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Nov 14 08:48:10 2019 -0800

    Merge tag 'kbuild-fixes-v5.4-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

I think the fix on 5.3 was useful for platform drivers (where the platform
driver will set dma_set_mask as 32bits) even when the system itself supports LPAE.

Here the pci_driver will set dma_set_mask as 64 bits, since the PCI device as
such is capable of addressing 64 bits. The pci_driver doesn't know if the PCI
controller to which the PCI device is connected is capable of addressing 64 bits.

We should find a way to set the DMA mask of of the PCI device based on the DMA
mask of the PCI controller in the SoC. One option would be to change the
pci_drivers all over the kernel to set DMA mask to be based on the DMA mask of
the PCI controller (the PCI device hierarchy should get a reference to the
device pointer of the PCI controller). Or is there a better way to handle this?

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-15 14:18   ` Kishon Vijay Abraham I
@ 2019-11-16 16:35     ` Christoph Hellwig
  2019-11-18 17:21       ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-11-16 16:35 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci, Robin Murphy

On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I wrote:
> I think the fix on 5.3 was useful for platform drivers (where the platform
> driver will set dma_set_mask as 32bits) even when the system itself supports LPAE.

Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
as we do that for the VIA ones on x86.  But I think the OF parsing code
is missing something here, and Robin did plan to look into that.

> We should find a way to set the DMA mask of of the PCI device based on the DMA
> mask of the PCI controller in the SoC. One option would be to change the
> pci_drivers all over the kernel to set DMA mask to be based on the DMA mask of
> the PCI controller (the PCI device hierarchy should get a reference to the
> device pointer of the PCI controller). Or is there a better way to handle this?

No.  The driver sets the device capabilities.  bus_dma_mask handles
the system limitations.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-16 16:35     ` Christoph Hellwig
@ 2019-11-18 17:21       ` Robin Murphy
  2019-11-25  5:43         ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2019-11-18 17:21 UTC (permalink / raw)
  To: Christoph Hellwig, Kishon Vijay Abraham I; +Cc: linux-pci

On 16/11/2019 4:35 pm, Christoph Hellwig wrote:
> On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I wrote:
>> I think the fix on 5.3 was useful for platform drivers (where the platform
>> driver will set dma_set_mask as 32bits) even when the system itself supports LPAE.
> 
> Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
> as we do that for the VIA ones on x86.  But I think the OF parsing code
> is missing something here, and Robin did plan to look into that.

Right, the correct way to describe this is with "dma-ranges" on the host 
bridge node, and there are patches queued in linux-next to (finally) 
handle that properly for the way we bodge dynamically-discovered 
endpoints through of_dma_configure().

Robin.

>> We should find a way to set the DMA mask of of the PCI device based on the DMA
>> mask of the PCI controller in the SoC. One option would be to change the
>> pci_drivers all over the kernel to set DMA mask to be based on the DMA mask of
>> the PCI controller (the PCI device hierarchy should get a reference to the
>> device pointer of the PCI controller). Or is there a better way to handle this?
> 
> No.  The driver sets the device capabilities.  bus_dma_mask handles
> the system limitations.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-18 17:21       ` Robin Murphy
@ 2019-11-25  5:43         ` Kishon Vijay Abraham I
  2020-01-27 13:10           ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2019-11-25  5:43 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig; +Cc: linux-pci

Hi,

On 18/11/19 10:51 PM, Robin Murphy wrote:
> On 16/11/2019 4:35 pm, Christoph Hellwig wrote:
>> On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I wrote:
>>> I think the fix on 5.3 was useful for platform drivers (where the platform
>>> driver will set dma_set_mask as 32bits) even when the system itself supports
>>> LPAE.
>>
>> Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
>> as we do that for the VIA ones on x86.  But I think the OF parsing code
>> is missing something here, and Robin did plan to look into that.
> 
> Right, the correct way to describe this is with "dma-ranges" on the host bridge
> node, and there are patches queued in linux-next to (finally) handle that
> properly for the way we bodge dynamically-discovered endpoints through
> of_dma_configure().

Tried linux-next after adding dma-ranges property to the DRA7 RC dt node and
don't see the issue anymore.

Thanks
Kishon

> 
> Robin.
> 
>>> We should find a way to set the DMA mask of of the PCI device based on the DMA
>>> mask of the PCI controller in the SoC. One option would be to change the
>>> pci_drivers all over the kernel to set DMA mask to be based on the DMA mask of
>>> the PCI controller (the PCI device hierarchy should get a reference to the
>>> device pointer of the PCI controller). Or is there a better way to handle this?
>>
>> No.  The driver sets the device capabilities.  bus_dma_mask handles
>> the system limitations.
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-25  5:43         ` Kishon Vijay Abraham I
@ 2020-01-27 13:10           ` Kishon Vijay Abraham I
  2020-01-27 13:22             ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-01-27 13:10 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig; +Cc: linux-pci

Hi Christoph, Robin,

On 25/11/19 11:13 am, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 18/11/19 10:51 PM, Robin Murphy wrote:
>> On 16/11/2019 4:35 pm, Christoph Hellwig wrote:
>>> On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I wrote:
>>>> I think the fix on 5.3 was useful for platform drivers (where the platform
>>>> driver will set dma_set_mask as 32bits) even when the system itself supports
>>>> LPAE.
>>>
>>> Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
>>> as we do that for the VIA ones on x86.  But I think the OF parsing code
>>> is missing something here, and Robin did plan to look into that.
>>
>> Right, the correct way to describe this is with "dma-ranges" on the host bridge
>> node, and there are patches queued in linux-next to (finally) handle that
>> properly for the way we bodge dynamically-discovered endpoints through
>> of_dma_configure().
> 
> Tried linux-next after adding dma-ranges property to the DRA7 RC dt node and
> don't see the issue anymore.

Using the latest mainline kernel
commit d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 (tag: v5.5,
origin/master, origin/HEAD)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Jan 26 16:23:03 2020 -0800

    Linux 5.5

I see the following warn dump when using a NVMe card with LPAE config
enabled

nvme 0000:01:00.0: overflow 0x000000027b3be000+270336 of DMA mask
ffffffffffffffff bus limit ffffffff
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26 at kernel/dma/direct.c:35 report_addr+0xf0/0xf4
Modules linked in:
CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 5.5.0-00002-g1383adf7b819 #2
Hardware name: Generic DRA74X (Flattened Device Tree)
Workqueue: writeback wb_workfn (flush-259:0)
(unwind_backtrace) from [<c020b494>] (show_stack+0x10/0x14)
(show_stack) from [<c0a2ae24>] (dump_stack+0x94/0xa8)
(dump_stack) from [<c022bbd8>] (__warn+0xbc/0xd8)
(__warn) from [<c022bc54>] (warn_slowpath_fmt+0x60/0xb8)
(warn_slowpath_fmt) from [<c0299928>] (report_addr+0xf0/0xf4)
(report_addr) from [<c0299ab8>] (dma_direct_map_page+0x18c/0x19c)
(dma_direct_map_page) from [<c0299b2c>] (dma_direct_map_sg+0x64/0xb4)
(dma_direct_map_sg) from [<c071b12c>] (nvme_queue_rq+0x778/0x9ec)
(nvme_queue_rq) from [<c050c8c8>] (__blk_mq_try_issue_directly+0x130/0x1bc)
(__blk_mq_try_issue_directly) from [<c050d1b8>]
(blk_mq_request_issue_directly+0x48/0x78)
(blk_mq_request_issue_directly) from [<c050d22c>]
(blk_mq_try_issue_list_directly+0x44/0xb8)
(blk_mq_try_issue_list_directly) from [<c0511620>]
(blk_mq_sched_insert_requests+0xe0/0x154)
(blk_mq_sched_insert_requests) from [<c050d13c>]
(blk_mq_flush_plug_list+0x150/0x184)
(blk_mq_flush_plug_list) from [<c0502ec4>] (blk_flush_plug_list+0xc8/0xe4)
(blk_flush_plug_list) from [<c050cc44>] (blk_mq_make_request+0x24c/0x3f0)
(blk_mq_make_request) from [<c0501acc>] (generic_make_request+0xb0/0x2d4)
(generic_make_request) from [<c0501d34>] (submit_bio+0x44/0x180)
(submit_bio) from [<c039ad10>] (mpage_writepages+0xac/0xe8)
(mpage_writepages) from [<c02f96dc>] (do_writepages+0x44/0xdc)
(do_writepages) from [<c0384830>] (__writeback_single_inode+0x2c/0x1bc)
(__writeback_single_inode) from [<c0384b98>]
(writeback_sb_inodes+0x1d8/0x404)
(writeback_sb_inodes) from [<c0384e1c>] (__writeback_inodes_wb+0x58/0x9c)
(__writeback_inodes_wb) from [<c0384ff4>] (wb_writeback+0x194/0x1d8)
(wb_writeback) from [<c0386104>] (wb_workfn+0x244/0x33c)
(wb_workfn) from [<c0244ff8>] (process_one_work+0x204/0x458)
(process_one_work) from [<c0245290>] (worker_thread+0x44/0x598)
(worker_thread) from [<c024ab30>] (kthread+0x14c/0x150)
(kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-27 13:10           ` Kishon Vijay Abraham I
@ 2020-01-27 13:22             ` Robin Murphy
  2020-01-29  6:24               ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2020-01-27 13:22 UTC (permalink / raw)
  To: Kishon Vijay Abraham I, Christoph Hellwig; +Cc: linux-pci

Hi Kishon,

On 27/01/2020 1:10 pm, Kishon Vijay Abraham I wrote:
> Hi Christoph, Robin,
> 
> On 25/11/19 11:13 am, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On 18/11/19 10:51 PM, Robin Murphy wrote:
>>> On 16/11/2019 4:35 pm, Christoph Hellwig wrote:
>>>> On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I wrote:
>>>>> I think the fix on 5.3 was useful for platform drivers (where the platform
>>>>> driver will set dma_set_mask as 32bits) even when the system itself supports
>>>>> LPAE.
>>>>
>>>> Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
>>>> as we do that for the VIA ones on x86.  But I think the OF parsing code
>>>> is missing something here, and Robin did plan to look into that.
>>>
>>> Right, the correct way to describe this is with "dma-ranges" on the host bridge
>>> node, and there are patches queued in linux-next to (finally) handle that
>>> properly for the way we bodge dynamically-discovered endpoints through
>>> of_dma_configure().
>>
>> Tried linux-next after adding dma-ranges property to the DRA7 RC dt node and
>> don't see the issue anymore.
> 
> Using the latest mainline kernel
> commit d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 (tag: v5.5,
> origin/master, origin/HEAD)
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sun Jan 26 16:23:03 2020 -0800
> 
>      Linux 5.5
> 
> I see the following warn dump when using a NVMe card with LPAE config
> enabled
> 
> nvme 0000:01:00.0: overflow 0x000000027b3be000+270336 of DMA mask

That's a 34-bit physical address...

> ffffffffffffffff bus limit ffffffff

...and that's your 32-bit PCI host bridge constraint. Thus the warning 
appears to be correct in that this is an attempt at an impossible direct 
DMA mapping. I'm assuming you do have RAM above the 32-bit boundary 
exposed by virtue of the LPAE config but don't have SWIOTLB enabled, is 
that the case?

Robin.

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 26 at kernel/dma/direct.c:35 report_addr+0xf0/0xf4
> Modules linked in:
> CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 5.5.0-00002-g1383adf7b819 #2
> Hardware name: Generic DRA74X (Flattened Device Tree)
> Workqueue: writeback wb_workfn (flush-259:0)
> (unwind_backtrace) from [<c020b494>] (show_stack+0x10/0x14)
> (show_stack) from [<c0a2ae24>] (dump_stack+0x94/0xa8)
> (dump_stack) from [<c022bbd8>] (__warn+0xbc/0xd8)
> (__warn) from [<c022bc54>] (warn_slowpath_fmt+0x60/0xb8)
> (warn_slowpath_fmt) from [<c0299928>] (report_addr+0xf0/0xf4)
> (report_addr) from [<c0299ab8>] (dma_direct_map_page+0x18c/0x19c)
> (dma_direct_map_page) from [<c0299b2c>] (dma_direct_map_sg+0x64/0xb4)
> (dma_direct_map_sg) from [<c071b12c>] (nvme_queue_rq+0x778/0x9ec)
> (nvme_queue_rq) from [<c050c8c8>] (__blk_mq_try_issue_directly+0x130/0x1bc)
> (__blk_mq_try_issue_directly) from [<c050d1b8>]
> (blk_mq_request_issue_directly+0x48/0x78)
> (blk_mq_request_issue_directly) from [<c050d22c>]
> (blk_mq_try_issue_list_directly+0x44/0xb8)
> (blk_mq_try_issue_list_directly) from [<c0511620>]
> (blk_mq_sched_insert_requests+0xe0/0x154)
> (blk_mq_sched_insert_requests) from [<c050d13c>]
> (blk_mq_flush_plug_list+0x150/0x184)
> (blk_mq_flush_plug_list) from [<c0502ec4>] (blk_flush_plug_list+0xc8/0xe4)
> (blk_flush_plug_list) from [<c050cc44>] (blk_mq_make_request+0x24c/0x3f0)
> (blk_mq_make_request) from [<c0501acc>] (generic_make_request+0xb0/0x2d4)
> (generic_make_request) from [<c0501d34>] (submit_bio+0x44/0x180)
> (submit_bio) from [<c039ad10>] (mpage_writepages+0xac/0xe8)
> (mpage_writepages) from [<c02f96dc>] (do_writepages+0x44/0xdc)
> (do_writepages) from [<c0384830>] (__writeback_single_inode+0x2c/0x1bc)
> (__writeback_single_inode) from [<c0384b98>]
> (writeback_sb_inodes+0x1d8/0x404)
> (writeback_sb_inodes) from [<c0384e1c>] (__writeback_inodes_wb+0x58/0x9c)
> (__writeback_inodes_wb) from [<c0384ff4>] (wb_writeback+0x194/0x1d8)
> (wb_writeback) from [<c0386104>] (wb_workfn+0x244/0x33c)
> (wb_workfn) from [<c0244ff8>] (process_one_work+0x204/0x458)
> (process_one_work) from [<c0245290>] (worker_thread+0x44/0x598)
> (worker_thread) from [<c024ab30>] (kthread+0x14c/0x150)
> (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
> 
> Thanks
> Kishon
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-27 13:22             ` Robin Murphy
@ 2020-01-29  6:24               ` Kishon Vijay Abraham I
  0 siblings, 0 replies; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-01-29  6:24 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig; +Cc: linux-pci

Hi Robin,

On 27/01/20 6:52 pm, Robin Murphy wrote:
> Hi Kishon,
> 
> On 27/01/2020 1:10 pm, Kishon Vijay Abraham I wrote:
>> Hi Christoph, Robin,
>>
>> On 25/11/19 11:13 am, Kishon Vijay Abraham I wrote:
>>> Hi,
>>>
>>> On 18/11/19 10:51 PM, Robin Murphy wrote:
>>>> On 16/11/2019 4:35 pm, Christoph Hellwig wrote:
>>>>> On Fri, Nov 15, 2019 at 07:48:23PM +0530, Kishon Vijay Abraham I
>>>>> wrote:
>>>>>> I think the fix on 5.3 was useful for platform drivers (where the
>>>>>> platform
>>>>>> driver will set dma_set_mask as 32bits) even when the system
>>>>>> itself supports
>>>>>> LPAE.
>>>>>
>>>>> Well, we can also use the bus_dma_mask for PCI(e) root port quirks,
>>>>> as we do that for the VIA ones on x86.  But I think the OF parsing
>>>>> code
>>>>> is missing something here, and Robin did plan to look into that.
>>>>
>>>> Right, the correct way to describe this is with "dma-ranges" on the
>>>> host bridge
>>>> node, and there are patches queued in linux-next to (finally) handle
>>>> that
>>>> properly for the way we bodge dynamically-discovered endpoints through
>>>> of_dma_configure().
>>>
>>> Tried linux-next after adding dma-ranges property to the DRA7 RC dt
>>> node and
>>> don't see the issue anymore.
>>
>> Using the latest mainline kernel
>> commit d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 (tag: v5.5,
>> origin/master, origin/HEAD)
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date:   Sun Jan 26 16:23:03 2020 -0800
>>
>>      Linux 5.5
>>
>> I see the following warn dump when using a NVMe card with LPAE config
>> enabled
>>
>> nvme 0000:01:00.0: overflow 0x000000027b3be000+270336 of DMA mask
> 
> That's a 34-bit physical address...
> 
>> ffffffffffffffff bus limit ffffffff
> 
> ...and that's your 32-bit PCI host bridge constraint. Thus the warning
> appears to be correct in that this is an attempt at an impossible direct
> DMA mapping. I'm assuming you do have RAM above the 32-bit boundary
> exposed by virtue of the LPAE config but don't have SWIOTLB enabled, is
> that the case?

I have RAM above 32-bit boundary and I have SWIOTLB enabled as well.
I've pasted the complete .config here [1]

I'm seeing the issue only when I try with NVMe card. PCI USB card works
fine.

[1] -> https://pastebin.ubuntu.com/p/TxSGnXdBtw/

Thanks
Kishon

> 
> Robin.
> 
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 26 at kernel/dma/direct.c:35 report_addr+0xf0/0xf4
>> Modules linked in:
>> CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted
>> 5.5.0-00002-g1383adf7b819 #2
>> Hardware name: Generic DRA74X (Flattened Device Tree)
>> Workqueue: writeback wb_workfn (flush-259:0)
>> (unwind_backtrace) from [<c020b494>] (show_stack+0x10/0x14)
>> (show_stack) from [<c0a2ae24>] (dump_stack+0x94/0xa8)
>> (dump_stack) from [<c022bbd8>] (__warn+0xbc/0xd8)
>> (__warn) from [<c022bc54>] (warn_slowpath_fmt+0x60/0xb8)
>> (warn_slowpath_fmt) from [<c0299928>] (report_addr+0xf0/0xf4)
>> (report_addr) from [<c0299ab8>] (dma_direct_map_page+0x18c/0x19c)
>> (dma_direct_map_page) from [<c0299b2c>] (dma_direct_map_sg+0x64/0xb4)
>> (dma_direct_map_sg) from [<c071b12c>] (nvme_queue_rq+0x778/0x9ec)
>> (nvme_queue_rq) from [<c050c8c8>]
>> (__blk_mq_try_issue_directly+0x130/0x1bc)
>> (__blk_mq_try_issue_directly) from [<c050d1b8>]
>> (blk_mq_request_issue_directly+0x48/0x78)
>> (blk_mq_request_issue_directly) from [<c050d22c>]
>> (blk_mq_try_issue_list_directly+0x44/0xb8)
>> (blk_mq_try_issue_list_directly) from [<c0511620>]
>> (blk_mq_sched_insert_requests+0xe0/0x154)
>> (blk_mq_sched_insert_requests) from [<c050d13c>]
>> (blk_mq_flush_plug_list+0x150/0x184)
>> (blk_mq_flush_plug_list) from [<c0502ec4>]
>> (blk_flush_plug_list+0xc8/0xe4)
>> (blk_flush_plug_list) from [<c050cc44>] (blk_mq_make_request+0x24c/0x3f0)
>> (blk_mq_make_request) from [<c0501acc>] (generic_make_request+0xb0/0x2d4)
>> (generic_make_request) from [<c0501d34>] (submit_bio+0x44/0x180)
>> (submit_bio) from [<c039ad10>] (mpage_writepages+0xac/0xe8)
>> (mpage_writepages) from [<c02f96dc>] (do_writepages+0x44/0xdc)
>> (do_writepages) from [<c0384830>] (__writeback_single_inode+0x2c/0x1bc)
>> (__writeback_single_inode) from [<c0384b98>]
>> (writeback_sb_inodes+0x1d8/0x404)
>> (writeback_sb_inodes) from [<c0384e1c>] (__writeback_inodes_wb+0x58/0x9c)
>> (__writeback_inodes_wb) from [<c0384ff4>] (wb_writeback+0x194/0x1d8)
>> (wb_writeback) from [<c0386104>] (wb_workfn+0x244/0x33c)
>> (wb_workfn) from [<c0244ff8>] (process_one_work+0x204/0x458)
>> (process_one_work) from [<c0245290>] (worker_thread+0x44/0x598)
>> (worker_thread) from [<c024ab30>] (kthread+0x14c/0x150)
>> (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
>>
>> Thanks
>> Kishon
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2019-11-15 10:59 pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers" Kishon Vijay Abraham I
  2019-11-15 13:06 ` Christoph Hellwig
@ 2020-01-30  7:58 ` Christoph Hellwig
  2020-01-30  8:09   ` Kishon Vijay Abraham I
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-01-30  7:58 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
> Hi Christoph,
> 
> I think we are encountering a case where the connected PCIe card (like PCIe USB
> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
> but the PCIe controller in the SoC to which PCIe card is connected supports
> only 32-bits.
> 
> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
> However this will fail when the card tries to access the provided address via
> the PCIe controller.

What kernel version do you test?  The classic arm version of dma_capable
doesn't take the bus dma mask into account.  In Linux 5.5 I switched
ARM to use the generic version in

130c1ccbf55 ("dma-direct: unify the dma_capable definitions")

so with that this case is supposed to work, without that it doesn't
have much of a chance.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-30  7:58 ` Christoph Hellwig
@ 2020-01-30  8:09   ` Kishon Vijay Abraham I
  2020-01-30 16:42     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-01-30  8:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Hi Christoph,

On 30/01/20 1:28 pm, Christoph Hellwig wrote:
> On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Christoph,
>>
>> I think we are encountering a case where the connected PCIe card (like PCIe USB
>> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
>> but the PCIe controller in the SoC to which PCIe card is connected supports
>> only 32-bits.
>>
>> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
>> However this will fail when the card tries to access the provided address via
>> the PCIe controller.
> 
> What kernel version do you test?  The classic arm version of dma_capable
> doesn't take the bus dma mask into account.  In Linux 5.5 I switched
> ARM to use the generic version in
> 
> 130c1ccbf55 ("dma-direct: unify the dma_capable definitions")
> 
> so with that this case is supposed to work, without that it doesn't
> have much of a chance.

I got into a new issue in 5.5 kernel with NVMe card wherein I get the
below warn dump. This is different from the issue I initially posted
seen with USB and SATA cards (I was getting a data mismatch then). With
5.5 kernel I don't see those issues anymore in USB card. I only see the
below warn dump with NVMe card.

nvme 0000:01:00.0: overflow 0x000000027b3be000+270336 of DMA mask
ffffffffffffffff bus limit ffffffff
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26 at kernel/dma/direct.c:35 report_addr+0xf0/0xf4
Modules linked in:
CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 5.5.0-00002-g1383adf7b819 #2
Hardware name: Generic DRA74X (Flattened Device Tree)
Workqueue: writeback wb_workfn (flush-259:0)
(unwind_backtrace) from [<c020b494>] (show_stack+0x10/0x14)
(show_stack) from [<c0a2ae24>] (dump_stack+0x94/0xa8)
(dump_stack) from [<c022bbd8>] (__warn+0xbc/0xd8)
(__warn) from [<c022bc54>] (warn_slowpath_fmt+0x60/0xb8)
(warn_slowpath_fmt) from [<c0299928>] (report_addr+0xf0/0xf4)
(report_addr) from [<c0299ab8>] (dma_direct_map_page+0x18c/0x19c)
(dma_direct_map_page) from [<c0299b2c>] (dma_direct_map_sg+0x64/0xb4)
(dma_direct_map_sg) from [<c071b12c>] (nvme_queue_rq+0x778/0x9ec)
(nvme_queue_rq) from [<c050c8c8>] (__blk_mq_try_issue_directly+0x130/0x1bc)
(__blk_mq_try_issue_directly) from [<c050d1b8>]
(blk_mq_request_issue_directly+0x48/0x78)
(blk_mq_request_issue_directly) from [<c050d22c>]
(blk_mq_try_issue_list_directly+0x44/0xb8)
(blk_mq_try_issue_list_directly) from [<c0511620>]
(blk_mq_sched_insert_requests+0xe0/0x154)
(blk_mq_sched_insert_requests) from [<c050d13c>]
(blk_mq_flush_plug_list+0x150/0x184)
(blk_mq_flush_plug_list) from [<c0502ec4>] (blk_flush_plug_list+0xc8/0xe4)
(blk_flush_plug_list) from [<c050cc44>] (blk_mq_make_request+0x24c/0x3f0)
(blk_mq_make_request) from [<c0501acc>] (generic_make_request+0xb0/0x2d4)
(generic_make_request) from [<c0501d34>] (submit_bio+0x44/0x180)
(submit_bio) from [<c039ad10>] (mpage_writepages+0xac/0xe8)
(mpage_writepages) from [<c02f96dc>] (do_writepages+0x44/0xdc)
(do_writepages) from [<c0384830>] (__writeback_single_inode+0x2c/0x1bc)
(__writeback_single_inode) from [<c0384b98>]
(writeback_sb_inodes+0x1d8/0x404)
(writeback_sb_inodes) from [<c0384e1c>] (__writeback_inodes_wb+0x58/0x9c)
(__writeback_inodes_wb) from [<c0384ff4>] (wb_writeback+0x194/0x1d8)
(wb_writeback) from [<c0386104>] (wb_workfn+0x244/0x33c)
(wb_workfn) from [<c0244ff8>] (process_one_work+0x204/0x458)
(process_one_work) from [<c0245290>] (worker_thread+0x44/0x598)
(worker_thread) from [<c024ab30>] (kthread+0x14c/0x150)
(kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-30  8:09   ` Kishon Vijay Abraham I
@ 2020-01-30 16:42     ` Christoph Hellwig
  2020-01-31 11:44       ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-01-30 16:42 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Thu, Jan 30, 2020 at 01:39:58PM +0530, Kishon Vijay Abraham I wrote:
> Hi Christoph,
> 
> On 30/01/20 1:28 pm, Christoph Hellwig wrote:
> > On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
> >> Hi Christoph,
> >>
> >> I think we are encountering a case where the connected PCIe card (like PCIe USB
> >> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
> >> but the PCIe controller in the SoC to which PCIe card is connected supports
> >> only 32-bits.
> >>
> >> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
> >> However this will fail when the card tries to access the provided address via
> >> the PCIe controller.
> > 
> > What kernel version do you test?  The classic arm version of dma_capable
> > doesn't take the bus dma mask into account.  In Linux 5.5 I switched
> > ARM to use the generic version in
> > 
> > 130c1ccbf55 ("dma-direct: unify the dma_capable definitions")
> > 
> > so with that this case is supposed to work, without that it doesn't
> > have much of a chance.
> 
> I got into a new issue in 5.5 kernel with NVMe card wherein I get the
> below warn dump. This is different from the issue I initially posted
> seen with USB and SATA cards (I was getting a data mismatch then). With
> 5.5 kernel I don't see those issues anymore in USB card. I only see the
> below warn dump with NVMe card.

Can you throw in a little debug printk if this comes from
dma_direct_possible or swiotlb_map?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-30 16:42     ` Christoph Hellwig
@ 2020-01-31 11:44       ` Kishon Vijay Abraham I
  2020-02-03 14:21         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-01-31 11:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Hi Christoph,

On 30/01/20 10:12 pm, Christoph Hellwig wrote:
> On Thu, Jan 30, 2020 at 01:39:58PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Christoph,
>>
>> On 30/01/20 1:28 pm, Christoph Hellwig wrote:
>>> On Fri, Nov 15, 2019 at 04:29:31PM +0530, Kishon Vijay Abraham I wrote:
>>>> Hi Christoph,
>>>>
>>>> I think we are encountering a case where the connected PCIe card (like PCIe USB
>>>> card) supports 64-bit addressing and the ARM core supports 64-bit addressing
>>>> but the PCIe controller in the SoC to which PCIe card is connected supports
>>>> only 32-bits.
>>>>
>>>> Here dma APIs can provide an address above the 32 bit region to the PCIe card.
>>>> However this will fail when the card tries to access the provided address via
>>>> the PCIe controller.
>>>
>>> What kernel version do you test?  The classic arm version of dma_capable
>>> doesn't take the bus dma mask into account.  In Linux 5.5 I switched
>>> ARM to use the generic version in
>>>
>>> 130c1ccbf55 ("dma-direct: unify the dma_capable definitions")
>>>
>>> so with that this case is supposed to work, without that it doesn't
>>> have much of a chance.
>>
>> I got into a new issue in 5.5 kernel with NVMe card wherein I get the
>> below warn dump. This is different from the issue I initially posted
>> seen with USB and SATA cards (I was getting a data mismatch then). With
>> 5.5 kernel I don't see those issues anymore in USB card. I only see the
>> below warn dump with NVMe card.
> 
> Can you throw in a little debug printk if this comes from
> dma_direct_possible or swiotlb_map?

I could see swiotlb_tbl_map_single() returning DMA_MAPPING_ERROR.

Kernel with debug print:
https://github.com/kishon/linux-wip.git nvm_dma_issue

Full log: https://pastebin.ubuntu.com/p/Xf2ngxc3kB/

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-01-31 11:44       ` Kishon Vijay Abraham I
@ 2020-02-03 14:21         ` Christoph Hellwig
  2020-02-05  5:15           ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-03 14:21 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

On Fri, Jan 31, 2020 at 05:14:01PM +0530, Kishon Vijay Abraham I wrote:
> > Can you throw in a little debug printk if this comes from
> > dma_direct_possible or swiotlb_map?
> 
> I could see swiotlb_tbl_map_single() returning DMA_MAPPING_ERROR.
> 
> Kernel with debug print:
> https://github.com/kishon/linux-wip.git nvm_dma_issue
> 
> Full log: https://pastebin.ubuntu.com/p/Xf2ngxc3kB/

Ok, this mostly like means we allocate a swiotlb buffer that isn't
actually addressable.  To verify that can you post the output with the
first attached patch?  If it shows the overflow message added there,
please try if the second patch fixes it.

[-- Attachment #2: 0001-dma-direct-improve-swiotlb-error-reporting.patch --]
[-- Type: text/x-patch, Size: 5864 bytes --]

From b72e7e81954c02e83f59f0caa56360d6faab0355 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 3 Feb 2020 14:44:38 +0100
Subject: dma-direct: improve swiotlb error reporting

Untangle the way how dma_direct_map_page calls into swiotlb to
be able to properly report errors where the swiotlb DMA address
overflows the mask separately from overflows in the !swiotlb case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/swiotlb.h | 11 +++--------
 kernel/dma/direct.c     | 17 ++++++++---------
 kernel/dma/swiotlb.c    | 42 +++++++++++++++++++++++------------------
 3 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index cde3dc18e21a..046bb94bd4d6 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -64,6 +64,9 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 				    size_t size, enum dma_data_direction dir,
 				    enum dma_sync_target target);
 
+dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
+
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 extern phys_addr_t io_tlb_start, io_tlb_end;
@@ -73,8 +76,6 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 	return paddr >= io_tlb_start && paddr < io_tlb_end;
 }
 
-bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -85,12 +86,6 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
 	return false;
 }
-static inline bool swiotlb_map(struct device *dev, phys_addr_t *phys,
-		dma_addr_t *dma_addr, size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
-{
-	return false;
-}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6af7ae83c4ad..e16baa9aa233 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -357,13 +357,6 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 EXPORT_SYMBOL(dma_direct_unmap_sg);
 #endif
 
-static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
-		size_t size)
-{
-	return swiotlb_force != SWIOTLB_FORCE &&
-		dma_capable(dev, dma_addr, size, true);
-}
-
 dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
@@ -371,8 +364,14 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-	if (unlikely(!dma_direct_possible(dev, dma_addr, size)) &&
-	    !swiotlb_map(dev, &phys, &dma_addr, size, dir, attrs)) {
+	if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+		return swiotlb_map(dev, phys, size, dir, attrs);
+
+	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+		if (IS_ENABLED(CONFIG_SWIOTLB) &&
+		    swiotlb_force != SWIOTLB_NO_FORCE)
+			return swiotlb_map(dev, phys, size, dir, attrs);
+
 		report_addr(dev, dma_addr, size);
 		return DMA_MAPPING_ERROR;
 	}
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9280d6f8271e..0341d01e4614 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -22,6 +22,7 @@
 
 #include <linux/cache.h>
 #include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <linux/spinlock.h>
@@ -656,35 +657,40 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 }
 
 /*
- * Create a swiotlb mapping for the buffer at @phys, and in case of DMAing
+ * Create a swiotlb mapping for the buffer at @page, and in case of DMAing
  * to the device copy the data into it as well.
  */
-bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
+dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
-	trace_swiotlb_bounced(dev, *dma_addr, size, swiotlb_force);
+	phys_addr_t swiotlb_addr;
+	dma_addr_t dma_addr;
 
-	if (unlikely(swiotlb_force == SWIOTLB_NO_FORCE)) {
-		dev_warn_ratelimited(dev,
-			"Cannot do DMA to address %pa\n", phys);
-		return false;
-	}
+	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
+			      swiotlb_force);
 
 	/* Oh well, have to allocate and map a bounce buffer. */
-	*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
-			*phys, size, size, dir, attrs);
-	if (*phys == (phys_addr_t)DMA_MAPPING_ERROR)
-		return false;
+	swiotlb_addr = swiotlb_tbl_map_single(dev,
+			__phys_to_dma(dev, io_tlb_start),
+			paddr, size, size, dir, attrs);
+	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
+		return DMA_MAPPING_ERROR;
 
 	/* Ensure that the address returned is DMA'ble */
-	*dma_addr = __phys_to_dma(dev, *phys);
-	if (unlikely(!dma_capable(dev, *dma_addr, size, true))) {
-		swiotlb_tbl_unmap_single(dev, *phys, size, size, dir,
+	dma_addr = __phys_to_dma(dev, swiotlb_addr);
+	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, size, dir,
 			attrs | DMA_ATTR_SKIP_CPU_SYNC);
-		return false;
+		dev_err_once(dev,
+			"swiotlb addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+		WARN_ON_ONCE(1);
+		return DMA_MAPPING_ERROR;
 	}
 
-	return true;
+	if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		arch_sync_dma_for_device(swiotlb_addr, size, dir);
+	return dma_addr;
 }
 
 size_t swiotlb_max_mapping_size(struct device *dev)
-- 
2.24.1


[-- Attachment #3: 0003-arm-dma-mapping-allocate-swiotlb-bottom-up.patch --]
[-- Type: text/x-patch, Size: 828 bytes --]

From d15217ee1e1f361ab064dfed82252b4124dd6b36 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 3 Feb 2020 14:57:57 +0100
Subject: arm/dma-mapping: allocate swiotlb bottom up

Allocate the swiotlb buffer as low as possible to increase the chance
of it to be actually addressable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 3ef204137e73..3951fcd560ff 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -471,7 +471,9 @@ static void __init free_highpages(void)
 void __init mem_init(void)
 {
 #ifdef CONFIG_ARM_LPAE
+	memblock_set_bottom_up(true);
 	swiotlb_init(1);
+	memblock_set_bottom_up(false);
 #endif
 
 	set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-03 14:21         ` Christoph Hellwig
@ 2020-02-05  5:15           ` Kishon Vijay Abraham I
  2020-02-05  7:47             ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-02-05  5:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Hi Christoph,

On 03/02/20 7:51 PM, Christoph Hellwig wrote:
> On Fri, Jan 31, 2020 at 05:14:01PM +0530, Kishon Vijay Abraham I wrote:
>>> Can you throw in a little debug printk if this comes from
>>> dma_direct_possible or swiotlb_map?
>>
>> I could see swiotlb_tbl_map_single() returning DMA_MAPPING_ERROR.
>>
>> Kernel with debug print:
>> https://github.com/kishon/linux-wip.git nvm_dma_issue
>>
>> Full log: https://pastebin.ubuntu.com/p/Xf2ngxc3kB/
> 
> Ok, this mostly like means we allocate a swiotlb buffer that isn't
> actually addressable.  To verify that can you post the output with the
> first attached patch?  If it shows the overflow message added there,
> please try if the second patch fixes it.

I'm seeing some sort of busy loop after applying your 1st patch. I sent
a SysRq to see where it is stuck

[  182.641398] sysrq: Show Blocked State
[  182.645080]   task                PC stack   pid father
[  182.650359] sync            D    0  2101   1901 0x00000000
[  182.655889] [<c0a399b8>] (__schedule) from [<c0a39e54>]
(schedule+0xa0/0x138)
[  182.663063] [<c0a39e54>] (schedule) from [<c0a3a484>]
(io_schedule+0x14/0x34)
[  182.670237] [<c0a3a484>] (io_schedule) from [<c02eebec>]
(wait_on_page_bit+0x14c/0x1a8)
[  182.678283] [<c02eebec>] (wait_on_page_bit) from [<c02edaa4>]
(__filemap_fdatawait_range+0x94/0xec)
[  182.687374] [<c02edaa4>] (__filemap_fdatawait_range) from
[<c02edb8c>] (filemap_fdatawait_keep_errors+0x24/0x50)
[  182.697601] [<c02edb8c>] (filemap_fdatawait_keep_errors) from
[<c0385a84>] (sync_inodes_sb+0x1a8/0x23c)
[  182.707041] [<c0385a84>] (sync_inodes_sb) from [<c035bd84>]
(iterate_supers+0x88/0xdc)
[  182.714998] [<c035bd84>] (iterate_supers) from [<c0389be4>]
(ksys_sync+0x40/0xb8)
[  182.722519] [<c0389be4>] (ksys_sync) from [<c0389c64>]
(sys_sync+0x8/0x10)
[  182.729429] [<c0389c64>] (sys_sync) from [<c0201000>]
(ret_fast_syscall+0x0/0x4c)
[  182.736943] Exception stack(0xe5461fa8 to 0xe5461ff0)
[  182.742016] 1fa0:                   be8a8db4 be8a8db8 00000000
ffffffff 00000000 0009d6b8
[  182.750230] 1fc0: be8a8db4 be8a8db8 00000001 00000024 be8a8db4
00000000 b6f0b000 0009d6dc
[  182.758443] 1fe0: b6e2e3d0 be8a8c1c 00064821 b6e2e3dc

Full log here: https://pastebin.ubuntu.com/p/q6yDtP9vxR/

Thanks
Kishon
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  5:15           ` Kishon Vijay Abraham I
@ 2020-02-05  7:47             ` Christoph Hellwig
  2020-02-05  8:32               ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-05  7:47 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Wed, Feb 05, 2020 at 10:45:24AM +0530, Kishon Vijay Abraham I wrote:
> > Ok, this mostly like means we allocate a swiotlb buffer that isn't
> > actually addressable.  To verify that can you post the output with the
> > first attached patch?  If it shows the overflow message added there,
> > please try if the second patch fixes it.
> 
> I'm seeing some sort of busy loop after applying your 1st patch. I sent
> a SysRq to see where it is stuck

And that shows up just with the patch?  Really strange as it doesn't
change any blockig points.  What also is strange is that I don't see
any of the warnings that should be there.  FYI, the slightly updated
version of the patch that went through my testing it here:

    git://git.infradead.org/users/hch/misc.git swiotlb-debug

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-debug

this also includes what was the second patch in the previous mail.  Can
you try that branch?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  7:47             ` Christoph Hellwig
@ 2020-02-05  8:32               ` Kishon Vijay Abraham I
  2020-02-05  8:48                 ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-02-05  8:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Christoph,

On 05/02/20 1:17 PM, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 10:45:24AM +0530, Kishon Vijay Abraham I wrote:
>>> Ok, this mostly like means we allocate a swiotlb buffer that isn't
>>> actually addressable.  To verify that can you post the output with the
>>> first attached patch?  If it shows the overflow message added there,
>>> please try if the second patch fixes it.
>>
>> I'm seeing some sort of busy loop after applying your 1st patch. I sent
>> a SysRq to see where it is stuck
> 
> And that shows up just with the patch?  Really strange as it doesn't
> change any blockig points.  What also is strange is that I don't see
> any of the warnings that should be there.  FYI, the slightly updated
> version of the patch that went through my testing it here:
> 
>     git://git.infradead.org/users/hch/misc.git swiotlb-debug
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-debug
> 
> this also includes what was the second patch in the previous mail.  Can
> you try that branch?

I see data mismatch with that branch.

Kernel log: https://pastebin.ubuntu.com/p/9g9cm7GzRh/
Kernel Config: https://pastebin.ubuntu.com/p/gYfpRDdVry/
Repo: https://github.com/kishon/linux-wip.git swiotlb-debug (Added an
additional patch for fixing a interrupt issue over your branch).

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  8:32               ` Kishon Vijay Abraham I
@ 2020-02-05  8:48                 ` Christoph Hellwig
  2020-02-05  9:18                   ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-05  8:48 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Wed, Feb 05, 2020 at 02:02:51PM +0530, Kishon Vijay Abraham I wrote:
> > you try that branch?
> 
> I see data mismatch with that branch.

But previously it didn't work at all? If you disable LPAE and thus
limit the available RAM, does it work without any fixes?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  8:48                 ` Christoph Hellwig
@ 2020-02-05  9:18                   ` Kishon Vijay Abraham I
  2020-02-05  9:19                     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-02-05  9:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Christoph,

On 05/02/20 2:18 PM, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 02:02:51PM +0530, Kishon Vijay Abraham I wrote:
>>> you try that branch?
>>
>> I see data mismatch with that branch.
> 
> But previously it didn't work at all? If you disable LPAE and thus
> limit the available RAM, does it work without any fixes?

Previously there was a warn dump and it gets stuck.

With the branch you shared (with LPAE enabled), there was data mismatch.
With the branch you shared (with LPAE disabled), things work fine
(https://pastebin.ubuntu.com/p/kPNdsJd7ds/)

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  9:18                   ` Kishon Vijay Abraham I
@ 2020-02-05  9:19                     ` Christoph Hellwig
  2020-02-05  9:33                       ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-05  9:19 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Wed, Feb 05, 2020 at 02:48:17PM +0530, Kishon Vijay Abraham I wrote:
> Christoph,
> 
> On 05/02/20 2:18 PM, Christoph Hellwig wrote:
> > On Wed, Feb 05, 2020 at 02:02:51PM +0530, Kishon Vijay Abraham I wrote:
> >>> you try that branch?
> >>
> >> I see data mismatch with that branch.
> > 
> > But previously it didn't work at all? If you disable LPAE and thus
> > limit the available RAM, does it work without any fixes?
> 
> Previously there was a warn dump and it gets stuck.
> 
> With the branch you shared (with LPAE enabled), there was data mismatch.
> With the branch you shared (with LPAE disabled), things work fine
> (https://pastebin.ubuntu.com/p/kPNdsJd7ds/)

Does the miscompare still happen if you revert:

 "dma-direct: improve DMA mask overflow reporting"

and

 "dma-direct: improve swiotlb error reporting"

?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  9:19                     ` Christoph Hellwig
@ 2020-02-05  9:33                       ` Kishon Vijay Abraham I
  2020-02-05 16:05                         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-02-05  9:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Christoph,

On 05/02/20 2:49 PM, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 02:48:17PM +0530, Kishon Vijay Abraham I wrote:
>> Christoph,
>>
>> On 05/02/20 2:18 PM, Christoph Hellwig wrote:
>>> On Wed, Feb 05, 2020 at 02:02:51PM +0530, Kishon Vijay Abraham I wrote:
>>>>> you try that branch?
>>>>
>>>> I see data mismatch with that branch.
>>>
>>> But previously it didn't work at all? If you disable LPAE and thus
>>> limit the available RAM, does it work without any fixes?
>>
>> Previously there was a warn dump and it gets stuck.
>>
>> With the branch you shared (with LPAE enabled), there was data mismatch.
>> With the branch you shared (with LPAE disabled), things work fine
>> (https://pastebin.ubuntu.com/p/kPNdsJd7ds/)
> 
> Does the miscompare still happen if you revert:
> 
>  "dma-direct: improve DMA mask overflow reporting"
> 
> and
> 
>  "dma-direct: improve swiotlb error reporting"
> 
> ?

Yes, I see the mismatch after reverting the above patches.

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05  9:33                       ` Kishon Vijay Abraham I
@ 2020-02-05 16:05                         ` Christoph Hellwig
  2020-02-17 14:23                           ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-05 16:05 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Wed, Feb 05, 2020 at 03:03:13PM +0530, Kishon Vijay Abraham I wrote:
> Yes, I see the mismatch after reverting the above patches.

In which case the data mismatch is very likely due to a different root
cause.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-05 16:05                         ` Christoph Hellwig
@ 2020-02-17 14:23                           ` Christoph Hellwig
  2020-02-18 12:15                             ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-02-17 14:23 UTC (permalink / raw)
  To: Kishon Vijay Abraham I; +Cc: Christoph Hellwig, linux-pci

On Wed, Feb 05, 2020 at 05:05:42PM +0100, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 03:03:13PM +0530, Kishon Vijay Abraham I wrote:
> > Yes, I see the mismatch after reverting the above patches.
> 
> In which case the data mismatch is very likely due to a different root
> cause.

Did you manage to dig into this a little more?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-17 14:23                           ` Christoph Hellwig
@ 2020-02-18 12:15                             ` Kishon Vijay Abraham I
  2020-04-02 12:01                               ` Kishon Vijay Abraham I
  0 siblings, 1 reply; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-02-18 12:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Christoph,

On 17/02/20 7:53 pm, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 05:05:42PM +0100, Christoph Hellwig wrote:
>> On Wed, Feb 05, 2020 at 03:03:13PM +0530, Kishon Vijay Abraham I wrote:
>>> Yes, I see the mismatch after reverting the above patches.
>>
>> In which case the data mismatch is very likely due to a different root
>> cause.
> 
> Did you manage to dig into this a little more?

I'll probably get to this later half of this week. Will update you then.

Thanks
Kishon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers"
  2020-02-18 12:15                             ` Kishon Vijay Abraham I
@ 2020-04-02 12:01                               ` Kishon Vijay Abraham I
  0 siblings, 0 replies; 25+ messages in thread
From: Kishon Vijay Abraham I @ 2020-04-02 12:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-pci

Hi Christoph,

On 2/18/2020 5:45 PM, Kishon Vijay Abraham I wrote:
> Christoph,
> 
> On 17/02/20 7:53 pm, Christoph Hellwig wrote:
>> On Wed, Feb 05, 2020 at 05:05:42PM +0100, Christoph Hellwig wrote:
>>> On Wed, Feb 05, 2020 at 03:03:13PM +0530, Kishon Vijay Abraham I wrote:
>>>> Yes, I see the mismatch after reverting the above patches.
>>>
>>> In which case the data mismatch is very likely due to a different root
>>> cause.
>>
>> Did you manage to dig into this a little more?
> 
> I'll probably get to this later half of this week. Will update you then.
> 

Sorry for the delay in getting back to this. But I guess I have root caused the
issue now.

The issue was because NVMe is requesting a sector size (4096KB) which is more
than what is supported by SWIOTLB default (256KB). NVMe driver actually has a
mechanism to select the correct sector size

 dev->ctrl.max_hw_sectors = min_t(u32,
                NVME_MAX_KB_SZ << 1, dma_max_mapping_size(dev->dev) >> 9);
However dma_max_mapping_size() here misbehaves and gives 4G. Ideally it should
have given 256KB -> the max supported by SWIOTLB

Tracing through the dma_max_mapping_size(), dma_direct_max_mapping_size() was
giving incorrect value

size_t dma_direct_max_mapping_size(struct device *dev)
{
        /* If SWIOTLB is active, use its maximum mapping size */
        if (is_swiotlb_active() &&
            (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
                return swiotlb_max_mapping_size(dev);
        return SIZE_MAX;
}
In the above function swiotlb_max_mapping_size(dev) gives 256KB however
dma_addressing_limited(dev) always returns false. So 256KB is never returned to
the NVMe driver.

Tracing dma_addressing_limited(dev), found a bug in
dma_direct_get_required_mask(). When it passes the physical address to
phys_to_dma_direct(), the upper 32 bit is lost and dma_addressing_limited(dev)
thinks the entire address is accessible by the device.

A patch that type casts the argument of phys_to_dma_direct() like below fixes
the issue.

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 32ec69cdba54..0081410334c8 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -51,7 +51,9 @@ static inline struct page *dma_direct_to_page(struct device
*dev, u64 dma_direct_get_required_mask(struct device *dev)
 {
-       u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
+       u64 max_dma =
+               phys_to_dma_direct(dev,
+                                  (phys_addr_t)(max_pfn - 1) << PAGE_SHIFT);
        return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }

If this looks okay to you, I can post a patch for it.

Thanks
Kishon

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-04-02 12:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 10:59 pci-usb/pci-sata broken with LPAE config after "reduce use of block bounce buffers" Kishon Vijay Abraham I
2019-11-15 13:06 ` Christoph Hellwig
2019-11-15 14:18   ` Kishon Vijay Abraham I
2019-11-16 16:35     ` Christoph Hellwig
2019-11-18 17:21       ` Robin Murphy
2019-11-25  5:43         ` Kishon Vijay Abraham I
2020-01-27 13:10           ` Kishon Vijay Abraham I
2020-01-27 13:22             ` Robin Murphy
2020-01-29  6:24               ` Kishon Vijay Abraham I
2020-01-30  7:58 ` Christoph Hellwig
2020-01-30  8:09   ` Kishon Vijay Abraham I
2020-01-30 16:42     ` Christoph Hellwig
2020-01-31 11:44       ` Kishon Vijay Abraham I
2020-02-03 14:21         ` Christoph Hellwig
2020-02-05  5:15           ` Kishon Vijay Abraham I
2020-02-05  7:47             ` Christoph Hellwig
2020-02-05  8:32               ` Kishon Vijay Abraham I
2020-02-05  8:48                 ` Christoph Hellwig
2020-02-05  9:18                   ` Kishon Vijay Abraham I
2020-02-05  9:19                     ` Christoph Hellwig
2020-02-05  9:33                       ` Kishon Vijay Abraham I
2020-02-05 16:05                         ` Christoph Hellwig
2020-02-17 14:23                           ` Christoph Hellwig
2020-02-18 12:15                             ` Kishon Vijay Abraham I
2020-04-02 12:01                               ` Kishon Vijay Abraham I

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).