* [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
@ 2023-04-19 15:26 Gaurav Batra
2023-04-20 15:21 ` Michael Ellerman
0 siblings, 1 reply; 9+ messages in thread
From: Gaurav Batra @ 2023-04-19 15:26 UTC (permalink / raw)
To: mpe; +Cc: Brian King, linuxppc-dev, Greg Joyce, Gaurav Batra
When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.
Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.
Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Greg Joyce <gjoyce@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
arch/powerpc/kernel/iommu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf1..ca57526ce47a 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
/* Convert entry to a dma_addr_t */
entry += tbl->it_offset;
dma_addr = entry << tbl->it_page_shift;
- dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
+ dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
npages, entry, dma_addr);
@@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+ int tcesize = (1 << tbl->it_page_shift);
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
memset(ret, 0, size);
/* Set up tces to cover the allocated range */
- nio_pages = size >> tbl->it_page_shift;
+ nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
+
io_order = get_iommu_order(size, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
mask >> tbl->it_page_shift, io_order, 0);
@@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
free_pages((unsigned long)ret, order);
return NULL;
}
- *dma_handle = mapping;
+
+ *dma_handle = mapping | ((u64)ret & (tcesize - 1));
return ret;
}
--
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-04-19 15:26 [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs Gaurav Batra
@ 2023-04-20 15:21 ` Michael Ellerman
2023-04-20 19:45 ` Gaurav Batra
0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2023-04-20 15:21 UTC (permalink / raw)
To: Gaurav Batra; +Cc: Brian King, linuxppc-dev, Greg Joyce, Gaurav Batra
Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
> page should be the offset of the page relative to the 2MB TCE. The code
> was incorrectly setting the DMA address to the beginning of the TCE
> range.
>
> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
> ethernet port, when DMA window is backed by 2MB TCEs.
I assume this is similar or related to the bug Srikar reported?
https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
In that thread Alexey suggested a patch, have you tried his patch? He
suggested rounding up the allocation size, rather than adjusting the
dma_handle.
> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
That's not the right syntax, it's described in the documentation how to
generate it.
It should be:
Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M IOMMU page size")
cheers
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ee95937bdaf1..ca57526ce47a 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
> /* Convert entry to a dma_addr_t */
> entry += tbl->it_offset;
> dma_addr = entry << tbl->it_page_shift;
> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>
> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
> npages, entry, dma_addr);
> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
> unsigned int order;
> unsigned int nio_pages, io_order;
> struct page *page;
> + int tcesize = (1 << tbl->it_page_shift);
>
> size = PAGE_ALIGN(size);
> order = get_order(size);
> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
> memset(ret, 0, size);
>
> /* Set up tces to cover the allocated range */
> - nio_pages = size >> tbl->it_page_shift;
> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
> +
> io_order = get_iommu_order(size, tbl);
> mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
> mask >> tbl->it_page_shift, io_order, 0);
> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
> free_pages((unsigned long)ret, order);
> return NULL;
> }
> - *dma_handle = mapping;
> +
> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
> return ret;
> }
>
> --
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-04-20 15:21 ` Michael Ellerman
@ 2023-04-20 19:45 ` Gaurav Batra
2023-05-03 3:25 ` Gaurav Batra
2023-05-04 5:10 ` Michael Ellerman
0 siblings, 2 replies; 9+ messages in thread
From: Gaurav Batra @ 2023-04-20 19:45 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Brian King, linuxppc-dev, Greg Joyce
Hello Michael,
I was looking into the Bug: 199106
(https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
In the Bug, Mellanox driver was timing out when enabling SRIOV device.
I tested, Alexey's patch and it fixes the issue with Mellanox driver.
The down side
to Alexey's fix is that even a small memory request by the driver will
be aligned up
to 2MB. In my test, the Mellanox driver is issuing multiple requests of
64K size.
All these will get aligned up to 2MB, which is quite a waste of resources.
In any case, both the patches work. Let me know which approach you
prefer. In case
we decide to go with my patch, I just realized that I need to fix
nio_pages in
iommu_free_coherent() as well.
Thanks,
Gaurav
On 4/20/23 10:21 AM, Michael Ellerman wrote:
> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>> page should be the offset of the page relative to the 2MB TCE. The code
>> was incorrectly setting the DMA address to the beginning of the TCE
>> range.
>>
>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
>> ethernet port, when DMA window is backed by 2MB TCEs.
> I assume this is similar or related to the bug Srikar reported?
>
> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>
> In that thread Alexey suggested a patch, have you tried his patch? He
> suggested rounding up the allocation size, rather than adjusting the
> dma_handle.
>
>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
> That's not the right syntax, it's described in the documentation how to
> generate it.
>
> It should be:
>
> Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M IOMMU page size")
>
> cheers
>
>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>> index ee95937bdaf1..ca57526ce47a 100644
>> --- a/arch/powerpc/kernel/iommu.c
>> +++ b/arch/powerpc/kernel/iommu.c
>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
>> /* Convert entry to a dma_addr_t */
>> entry += tbl->it_offset;
>> dma_addr = entry << tbl->it_page_shift;
>> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>
>> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
>> npages, entry, dma_addr);
>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>> unsigned int order;
>> unsigned int nio_pages, io_order;
>> struct page *page;
>> + int tcesize = (1 << tbl->it_page_shift);
>>
>> size = PAGE_ALIGN(size);
>> order = get_order(size);
>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>> memset(ret, 0, size);
>>
>> /* Set up tces to cover the allocated range */
>> - nio_pages = size >> tbl->it_page_shift;
>> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>> +
>> io_order = get_iommu_order(size, tbl);
>> mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
>> mask >> tbl->it_page_shift, io_order, 0);
>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>> free_pages((unsigned long)ret, order);
>> return NULL;
>> }
>> - *dma_handle = mapping;
>> +
>> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>> return ret;
>> }
>>
>> --
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-04-20 19:45 ` Gaurav Batra
@ 2023-05-03 3:25 ` Gaurav Batra
2023-05-22 0:08 ` Alexey Kardashevskiy
2023-05-04 5:10 ` Michael Ellerman
1 sibling, 1 reply; 9+ messages in thread
From: Gaurav Batra @ 2023-05-03 3:25 UTC (permalink / raw)
To: aik; +Cc: Brian King, linuxppc-dev, Greg Joyce
Hello Alexey,
I recently joined IOMMU team. There was a bug reported by test team
where Mellanox driver was timing out during configuration. I proposed a
fix for the same, which is below in the email.
You suggested a fix for Srikar's reported problem. Basically, both these
fixes will resolve Srikar and Mellanox driver issues. The problem is
with 2MB DDW.
Since you have extensive knowledge of IOMMU design and code, in your
opinion, which patch should we adopt?
Thanks a lot
Gaurav
On 4/20/23 2:45 PM, Gaurav Batra wrote:
> Hello Michael,
>
> I was looking into the Bug: 199106
> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>
> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>
> I tested, Alexey's patch and it fixes the issue with Mellanox driver.
> The down side
>
> to Alexey's fix is that even a small memory request by the driver will
> be aligned up
>
> to 2MB. In my test, the Mellanox driver is issuing multiple requests
> of 64K size.
>
> All these will get aligned up to 2MB, which is quite a waste of
> resources.
>
>
> In any case, both the patches work. Let me know which approach you
> prefer. In case
>
> we decide to go with my patch, I just realized that I need to fix
> nio_pages in
>
> iommu_free_coherent() as well.
>
>
> Thanks,
>
> Gaurav
>
> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>> page should be the offset of the page relative to the 2MB TCE. The code
>>> was incorrectly setting the DMA address to the beginning of the TCE
>>> range.
>>>
>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
>>> ethernet port, when DMA window is backed by 2MB TCEs.
>> I assume this is similar or related to the bug Srikar reported?
>>
>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>>
>> In that thread Alexey suggested a patch, have you tried his patch? He
>> suggested rounding up the allocation size, rather than adjusting the
>> dma_handle.
>>
>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>> That's not the right syntax, it's described in the documentation how to
>> generate it.
>>
>> It should be:
>>
>> Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M
>> IOMMU page size")
>>
>> cheers
>>
>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>> index ee95937bdaf1..ca57526ce47a 100644
>>> --- a/arch/powerpc/kernel/iommu.c
>>> +++ b/arch/powerpc/kernel/iommu.c
>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct
>>> iommu_table *tbl,
>>> /* Convert entry to a dma_addr_t */
>>> entry += tbl->it_offset;
>>> dma_addr = entry << tbl->it_page_shift;
>>> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
>>> npages, entry, dma_addr);
>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev,
>>> struct iommu_table *tbl,
>>> unsigned int order;
>>> unsigned int nio_pages, io_order;
>>> struct page *page;
>>> + int tcesize = (1 << tbl->it_page_shift);
>>> size = PAGE_ALIGN(size);
>>> order = get_order(size);
>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>> struct iommu_table *tbl,
>>> memset(ret, 0, size);
>>> /* Set up tces to cover the allocated range */
>>> - nio_pages = size >> tbl->it_page_shift;
>>> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>> +
>>> io_order = get_iommu_order(size, tbl);
>>> mapping = iommu_alloc(dev, tbl, ret, nio_pages,
>>> DMA_BIDIRECTIONAL,
>>> mask >> tbl->it_page_shift, io_order, 0);
>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>> struct iommu_table *tbl,
>>> free_pages((unsigned long)ret, order);
>>> return NULL;
>>> }
>>> - *dma_handle = mapping;
>>> +
>>> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>> return ret;
>>> }
>>> --
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-04-20 19:45 ` Gaurav Batra
2023-05-03 3:25 ` Gaurav Batra
@ 2023-05-04 5:10 ` Michael Ellerman
2023-05-04 18:03 ` Gaurav Batra
1 sibling, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2023-05-04 5:10 UTC (permalink / raw)
To: Gaurav Batra; +Cc: Brian King, linuxppc-dev, Greg Joyce
Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
> Hello Michael,
>
> I was looking into the Bug: 199106
> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>
> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>
> I tested, Alexey's patch and it fixes the issue with Mellanox driver.
> The down side
>
> to Alexey's fix is that even a small memory request by the driver will
> be aligned up
>
> to 2MB. In my test, the Mellanox driver is issuing multiple requests of
> 64K size.
>
> All these will get aligned up to 2MB, which is quite a waste of resources.
OK. I guess we should use your patch then.
It's not ideal as it means the device can potentially read/write to
memory it shouldn't, but 2MB is a lot to waste for a 64K alloc.
> In any case, both the patches work. Let me know which approach you
> prefer. In case
>
> we decide to go with my patch, I just realized that I need to fix
> nio_pages in
>
> iommu_free_coherent() as well.
Can you send a v3 with that fixed please.
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-05-04 5:10 ` Michael Ellerman
@ 2023-05-04 18:03 ` Gaurav Batra
2023-05-05 2:21 ` Michael Ellerman
0 siblings, 1 reply; 9+ messages in thread
From: Gaurav Batra @ 2023-05-04 18:03 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Brian King, linuxppc-dev, Greg Joyce
Hello Michael,
I agree with your concerns regarding a device been able to access memory
that doesn't belong to it. That exposure we have today with 2MB TCEs.
With 2MB TCEs, DMA window size will be big enough, for dedicated
adapters, that whole memory is going to be mapped "direct". Which
essentially means, that a "rogue" device/driver has the potential to
corrupt LPAR wide memory.
I have sent you v3.
Thanks,
Gaurav
On 5/4/23 12:10 AM, Michael Ellerman wrote:
> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>> Hello Michael,
>>
>> I was looking into the Bug: 199106
>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>
>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>
>> I tested, Alexey's patch and it fixes the issue with Mellanox driver.
>> The down side
>>
>> to Alexey's fix is that even a small memory request by the driver will
>> be aligned up
>>
>> to 2MB. In my test, the Mellanox driver is issuing multiple requests of
>> 64K size.
>>
>> All these will get aligned up to 2MB, which is quite a waste of resources.
> OK. I guess we should use your patch then.
>
> It's not ideal as it means the device can potentially read/write to
> memory it shouldn't, but 2MB is a lot to waste for a 64K alloc.
>
>> In any case, both the patches work. Let me know which approach you
>> prefer. In case
>>
>> we decide to go with my patch, I just realized that I need to fix
>> nio_pages in
>>
>> iommu_free_coherent() as well.
> Can you send a v3 with that fixed please.
>
> cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-05-04 18:03 ` Gaurav Batra
@ 2023-05-05 2:21 ` Michael Ellerman
0 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2023-05-05 2:21 UTC (permalink / raw)
To: Gaurav Batra; +Cc: Brian King, linuxppc-dev, Greg Joyce
Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
> Hello Michael,
>
> I agree with your concerns regarding a device been able to access memory
> that doesn't belong to it. That exposure we have today with 2MB TCEs.
> With 2MB TCEs, DMA window size will be big enough, for dedicated
> adapters, that whole memory is going to be mapped "direct". Which
> essentially means, that a "rogue" device/driver has the potential to
> corrupt LPAR wide memory.
Yes that's always been a trade-off between performance and robustness,
and performance is generally the winner.
There have been various command line flags in the past to configure
stricter behaviour, disable bypass etc. Some of those are now generic,
iommu.strict/passthrough, it would be good to get them wired up to work
on powerpc at some point.
> I have sent you v3.
Thanks.
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-05-03 3:25 ` Gaurav Batra
@ 2023-05-22 0:08 ` Alexey Kardashevskiy
2023-05-22 13:11 ` Gaurav Batra
0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-22 0:08 UTC (permalink / raw)
To: Gaurav Batra; +Cc: Brian King, linuxppc-dev, Greg Joyce
Hi Gaurav,
Sorry I missed this. Please share the link to the your fix, I do not see
it in my mail. In general, the problem can probably be solved by using
huge pages (anything more than 64K) only for 1:1 mapping.
On 03/05/2023 13:25, Gaurav Batra wrote:
> Hello Alexey,
>
> I recently joined IOMMU team. There was a bug reported by test team
> where Mellanox driver was timing out during configuration. I proposed a
> fix for the same, which is below in the email.
>
> You suggested a fix for Srikar's reported problem. Basically, both these
> fixes will resolve Srikar and Mellanox driver issues. The problem is
> with 2MB DDW.
>
> Since you have extensive knowledge of IOMMU design and code, in your
> opinion, which patch should we adopt?
>
> Thanks a lot
>
> Gaurav
>
> On 4/20/23 2:45 PM, Gaurav Batra wrote:
>> Hello Michael,
>>
>> I was looking into the Bug: 199106
>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>
>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>
>> I tested, Alexey's patch and it fixes the issue with Mellanox driver.
>> The down side
>>
>> to Alexey's fix is that even a small memory request by the driver will
>> be aligned up
>>
>> to 2MB. In my test, the Mellanox driver is issuing multiple requests
>> of 64K size.
>>
>> All these will get aligned up to 2MB, which is quite a waste of
>> resources.
>>
>>
>> In any case, both the patches work. Let me know which approach you
>> prefer. In case
>>
>> we decide to go with my patch, I just realized that I need to fix
>> nio_pages in
>>
>> iommu_free_coherent() as well.
>>
>>
>> Thanks,
>>
>> Gaurav
>>
>> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>>> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>>> page should be the offset of the page relative to the 2MB TCE. The code
>>>> was incorrectly setting the DMA address to the beginning of the TCE
>>>> range.
>>>>
>>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
>>>> ethernet port, when DMA window is backed by 2MB TCEs.
>>> I assume this is similar or related to the bug Srikar reported?
>>>
>>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>>>
>>> In that thread Alexey suggested a patch, have you tried his patch? He
>>> suggested rounding up the allocation size, rather than adjusting the
>>> dma_handle.
>>>
>>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>>> That's not the right syntax, it's described in the documentation how to
>>> generate it.
>>>
>>> It should be:
>>>
>>> Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M
>>> IOMMU page size")
>>>
>>> cheers
>>>
>>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>>> index ee95937bdaf1..ca57526ce47a 100644
>>>> --- a/arch/powerpc/kernel/iommu.c
>>>> +++ b/arch/powerpc/kernel/iommu.c
>>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct
>>>> iommu_table *tbl,
>>>> /* Convert entry to a dma_addr_t */
>>>> entry += tbl->it_offset;
>>>> dma_addr = entry << tbl->it_page_shift;
>>>> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>>> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>>> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
>>>> npages, entry, dma_addr);
>>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> unsigned int order;
>>>> unsigned int nio_pages, io_order;
>>>> struct page *page;
>>>> + int tcesize = (1 << tbl->it_page_shift);
>>>> size = PAGE_ALIGN(size);
>>>> order = get_order(size);
>>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> memset(ret, 0, size);
>>>> /* Set up tces to cover the allocated range */
>>>> - nio_pages = size >> tbl->it_page_shift;
>>>> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>>> +
>>>> io_order = get_iommu_order(size, tbl);
>>>> mapping = iommu_alloc(dev, tbl, ret, nio_pages,
>>>> DMA_BIDIRECTIONAL,
>>>> mask >> tbl->it_page_shift, io_order, 0);
>>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> free_pages((unsigned long)ret, order);
>>>> return NULL;
>>>> }
>>>> - *dma_handle = mapping;
>>>> +
>>>> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>>> return ret;
>>>> }
>>>> --
--
Alexey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
2023-05-22 0:08 ` Alexey Kardashevskiy
@ 2023-05-22 13:11 ` Gaurav Batra
0 siblings, 0 replies; 9+ messages in thread
From: Gaurav Batra @ 2023-05-22 13:11 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: Brian King, linuxppc-dev, Greg Joyce
Hello Alexey,
No worries. I resolved the issue with Michael's help. The patch is
merged upstream and it fixes the issue.
Here is the link
https://lore.kernel.org/all/20230504175913.83844-1-gbatra@linux.vnet.ibm.com/
Thanks,
Gaurav
On 5/21/23 7:08 PM, Alexey Kardashevskiy wrote:
> Hi Gaurav,
>
> Sorry I missed this. Please share the link to the your fix, I do not
> see it in my mail. In general, the problem can probably be solved by
> using huge pages (anything more than 64K) only for 1:1 mapping.
>
>
> On 03/05/2023 13:25, Gaurav Batra wrote:
>> Hello Alexey,
>>
>> I recently joined IOMMU team. There was a bug reported by test team
>> where Mellanox driver was timing out during configuration. I proposed
>> a fix for the same, which is below in the email.
>>
>> You suggested a fix for Srikar's reported problem. Basically, both
>> these fixes will resolve Srikar and Mellanox driver issues. The
>> problem is with 2MB DDW.
>>
>> Since you have extensive knowledge of IOMMU design and code, in your
>> opinion, which patch should we adopt?
>>
>> Thanks a lot
>>
>> Gaurav
>>
>> On 4/20/23 2:45 PM, Gaurav Batra wrote:
>>> Hello Michael,
>>>
>>> I was looking into the Bug: 199106
>>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>>
>>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>>
>>> I tested, Alexey's patch and it fixes the issue with Mellanox
>>> driver. The down side
>>>
>>> to Alexey's fix is that even a small memory request by the driver
>>> will be aligned up
>>>
>>> to 2MB. In my test, the Mellanox driver is issuing multiple requests
>>> of 64K size.
>>>
>>> All these will get aligned up to 2MB, which is quite a waste of
>>> resources.
>>>
>>>
>>> In any case, both the patches work. Let me know which approach you
>>> prefer. In case
>>>
>>> we decide to go with my patch, I just realized that I need to fix
>>> nio_pages in
>>>
>>> iommu_free_coherent() as well.
>>>
>>>
>>> Thanks,
>>>
>>> Gaurav
>>>
>>> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>>>> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>>>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>>>> page should be the offset of the page relative to the 2MB TCE. The
>>>>> code
>>>>> was incorrectly setting the DMA address to the beginning of the TCE
>>>>> range.
>>>>>
>>>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an
>>>>> SR-IOV
>>>>> ethernet port, when DMA window is backed by 2MB TCEs.
>>>> I assume this is similar or related to the bug Srikar reported?
>>>>
>>>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>>>>
>>>>
>>>> In that thread Alexey suggested a patch, have you tried his patch? He
>>>> suggested rounding up the allocation size, rather than adjusting the
>>>> dma_handle.
>>>>
>>>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>>>> That's not the right syntax, it's described in the documentation
>>>> how to
>>>> generate it.
>>>>
>>>> It should be:
>>>>
>>>> Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M
>>>> IOMMU page size")
>>>>
>>>> cheers
>>>>
>>>>> diff --git a/arch/powerpc/kernel/iommu.c
>>>>> b/arch/powerpc/kernel/iommu.c
>>>>> index ee95937bdaf1..ca57526ce47a 100644
>>>>> --- a/arch/powerpc/kernel/iommu.c
>>>>> +++ b/arch/powerpc/kernel/iommu.c
>>>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev,
>>>>> struct iommu_table *tbl,
>>>>> /* Convert entry to a dma_addr_t */
>>>>> entry += tbl->it_offset;
>>>>> dma_addr = entry << tbl->it_page_shift;
>>>>> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>>>> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>>>> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
>>>>> npages, entry, dma_addr);
>>>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev,
>>>>> struct iommu_table *tbl,
>>>>> unsigned int order;
>>>>> unsigned int nio_pages, io_order;
>>>>> struct page *page;
>>>>> + int tcesize = (1 << tbl->it_page_shift);
>>>>> size = PAGE_ALIGN(size);
>>>>> order = get_order(size);
>>>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>>> struct iommu_table *tbl,
>>>>> memset(ret, 0, size);
>>>>> /* Set up tces to cover the allocated range */
>>>>> - nio_pages = size >> tbl->it_page_shift;
>>>>> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>>>> +
>>>>> io_order = get_iommu_order(size, tbl);
>>>>> mapping = iommu_alloc(dev, tbl, ret, nio_pages,
>>>>> DMA_BIDIRECTIONAL,
>>>>> mask >> tbl->it_page_shift, io_order, 0);
>>>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>>> struct iommu_table *tbl,
>>>>> free_pages((unsigned long)ret, order);
>>>>> return NULL;
>>>>> }
>>>>> - *dma_handle = mapping;
>>>>> +
>>>>> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>>>> return ret;
>>>>> }
>>>>> --
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-05-22 13:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-19 15:26 [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs Gaurav Batra
2023-04-20 15:21 ` Michael Ellerman
2023-04-20 19:45 ` Gaurav Batra
2023-05-03 3:25 ` Gaurav Batra
2023-05-22 0:08 ` Alexey Kardashevskiy
2023-05-22 13:11 ` Gaurav Batra
2023-05-04 5:10 ` Michael Ellerman
2023-05-04 18:03 ` Gaurav Batra
2023-05-05 2:21 ` Michael Ellerman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.