linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
@ 2022-11-14 11:03 Manivannan Sadhasivam
  2022-11-14 11:29 ` Manivannan Sadhasivam
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-11-14 11:03 UTC (permalink / raw)
  To: catalin.marinas, will
  Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal,
	linux-arm-kernel, linux-kernel, Manivannan Sadhasivam

This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.

As reported by Amit [1], dropping cache invalidation from
arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
(most probably on other Qcom platforms too). The reason is, Qcom
qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
for validation. The modem has a secure block (XPU) that will trigger a
whole system crash if the shared memory is accessed by the CPU while modem
is poking at it.

To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
with no kernel mapping, vmap's it, copies the firmware metadata and
unvmap's it. Finally the address is then shared with modem for metadata
validation [2].

Now because of the removal of cache invalidation from
arch_dma_prep_coherent(), there will be cache lines associated with this
memory even after sharing with modem. So when the CPU accesses it, the XPU
violation gets triggered.

So let's revert this commit to get remoteproc's working (thereby avoiding
full system crash) on Qcom platforms.

[1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---

Will, Catalin: Please share if you have any other suggestions to handle the
resource sharing in the remoteproc driver that could avoid this revert.

 arch/arm64/mm/dma-mapping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 3cb101e8cb29..7d7e9a046305 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
 {
 	unsigned long start = (unsigned long)page_address(page);
 
-	dcache_clean_poc(start, start + size);
+	dcache_clean_inval_poc(start, start + size);
 }
 
 #ifdef CONFIG_IOMMU_DMA
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam
@ 2022-11-14 11:29 ` Manivannan Sadhasivam
  2022-11-14 14:11 ` Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-11-14 11:29 UTC (permalink / raw)
  To: catalin.marinas, will
  Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal,
	linux-arm-kernel, linux-kernel

On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> 
> As reported by Amit [1], dropping cache invalidation from
> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform

s/SM8250/SDM845/g

Sorry for the confusion.

Thanks,
Mani

> (most probably on other Qcom platforms too). The reason is, Qcom
> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> for validation. The modem has a secure block (XPU) that will trigger a
> whole system crash if the shared memory is accessed by the CPU while modem
> is poking at it.
> 
> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> with no kernel mapping, vmap's it, copies the firmware metadata and
> unvmap's it. Finally the address is then shared with modem for metadata
> validation [2].
> 
> Now because of the removal of cache invalidation from
> arch_dma_prep_coherent(), there will be cache lines associated with this
> memory even after sharing with modem. So when the CPU accesses it, the XPU
> violation gets triggered.
> 
> So let's revert this commit to get remoteproc's working (thereby avoiding
> full system crash) on Qcom platforms.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
> 
> Will, Catalin: Please share if you have any other suggestions to handle the
> resource sharing in the remoteproc driver that could avoid this revert.
> 
>  arch/arm64/mm/dma-mapping.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 3cb101e8cb29..7d7e9a046305 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
>  {
>  	unsigned long start = (unsigned long)page_address(page);
>  
> -	dcache_clean_poc(start, start + size);
> +	dcache_clean_inval_poc(start, start + size);
>  }
>  
>  #ifdef CONFIG_IOMMU_DMA
> -- 
> 2.25.1
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam
  2022-11-14 11:29 ` Manivannan Sadhasivam
@ 2022-11-14 14:11 ` Will Deacon
  2022-11-14 15:14   ` Robin Murphy
  2022-11-28  5:44 ` Thorsten Leemhuis
  2022-12-08  4:59 ` Leonard Lausen
  3 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2022-11-14 14:11 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: catalin.marinas, robin.murphy, amit.pundir, andersson,
	quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel

On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> 
> As reported by Amit [1], dropping cache invalidation from
> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> (most probably on other Qcom platforms too). The reason is, Qcom
> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> for validation. The modem has a secure block (XPU) that will trigger a
> whole system crash if the shared memory is accessed by the CPU while modem
> is poking at it.
> 
> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> with no kernel mapping, vmap's it, copies the firmware metadata and
> unvmap's it. Finally the address is then shared with modem for metadata
> validation [2].
> 
> Now because of the removal of cache invalidation from
> arch_dma_prep_coherent(), there will be cache lines associated with this
> memory even after sharing with modem. So when the CPU accesses it, the XPU
> violation gets triggered.

This last past is a non-sequitur: the buffer is no longer mapped on the CPU
side, so how would the CPU access it?

As I just replied to Amit, we need more information about what this
"access" is and how it is being detected.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 14:11 ` Will Deacon
@ 2022-11-14 15:14   ` Robin Murphy
  2022-11-14 17:38     ` Catalin Marinas
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2022-11-14 15:14 UTC (permalink / raw)
  To: Will Deacon, Manivannan Sadhasivam
  Cc: catalin.marinas, amit.pundir, andersson, quic_sibis,
	sumit.semwal, linux-arm-kernel, linux-kernel

On 2022-11-14 14:11, Will Deacon wrote:
> On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
>> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
>>
>> As reported by Amit [1], dropping cache invalidation from
>> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
>> (most probably on other Qcom platforms too). The reason is, Qcom
>> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
>> for validation. The modem has a secure block (XPU) that will trigger a
>> whole system crash if the shared memory is accessed by the CPU while modem
>> is poking at it.
>>
>> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
>> with no kernel mapping, vmap's it, copies the firmware metadata and
>> unvmap's it. Finally the address is then shared with modem for metadata
>> validation [2].
>>
>> Now because of the removal of cache invalidation from
>> arch_dma_prep_coherent(), there will be cache lines associated with this
>> memory even after sharing with modem. So when the CPU accesses it, the XPU
>> violation gets triggered.
> 
> This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> side, so how would the CPU access it?

Right, for the previous change to have made a difference the offending 
part of this buffer must be present in some cache somewhere *before* the 
DMA buffer allocation completes.

Clearly that driver is completely broken though. If the DMA allocation 
came from a no-map carveout vma_dma_alloc_from_dev_coherent() then the 
vmap() shenanigans wouldn't work, so if it backed by struct pages then 
the whole dance is still pointless because *a cacheable linear mapping 
exists*, and it's just relying on the reduced chance that anything's 
going to re-fetch the linear map address after those pages have been 
allocated, exactly as I called out previously[1].

Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/97fface8-e40e-072c-4335-c94094884e93@arm.com/

> As I just replied to Amit, we need more information about what this
> "access" is and how it is being detected.
> 
> Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 15:14   ` Robin Murphy
@ 2022-11-14 17:38     ` Catalin Marinas
  2022-11-18 10:54       ` Manivannan Sadhasivam
  0 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2022-11-14 17:38 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Manivannan Sadhasivam, amit.pundir, andersson,
	quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel

On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> On 2022-11-14 14:11, Will Deacon wrote:
> > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > > 
> > > As reported by Amit [1], dropping cache invalidation from
> > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > > (most probably on other Qcom platforms too). The reason is, Qcom
> > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > > for validation. The modem has a secure block (XPU) that will trigger a
> > > whole system crash if the shared memory is accessed by the CPU while modem
> > > is poking at it.
> > > 
> > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> > > with no kernel mapping, vmap's it, copies the firmware metadata and
> > > unvmap's it. Finally the address is then shared with modem for metadata
> > > validation [2].
> > > 
> > > Now because of the removal of cache invalidation from
> > > arch_dma_prep_coherent(), there will be cache lines associated with this
> > > memory even after sharing with modem. So when the CPU accesses it, the XPU
> > > violation gets triggered.
> > 
> > This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> > side, so how would the CPU access it?
> 
> Right, for the previous change to have made a difference the offending part
> of this buffer must be present in some cache somewhere *before* the DMA
> buffer allocation completes.
> 
> Clearly that driver is completely broken though. If the DMA allocation came
> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> shenanigans wouldn't work, so if it backed by struct pages then the whole
> dance is still pointless because *a cacheable linear mapping exists*, and
> it's just relying on the reduced chance that anything's going to re-fetch
> the linear map address after those pages have been allocated, exactly as I
> called out previously[1].

So I guess a DMA pool that's not mapped in the linear map, together with
memremap() instead of vmap(), would work around the issue. But the
driver needs fixing, not the arch code.

-- 
Catalin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 17:38     ` Catalin Marinas
@ 2022-11-18 10:54       ` Manivannan Sadhasivam
  2022-11-18 12:33         ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-11-18 10:54 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Robin Murphy, Will Deacon, amit.pundir, andersson, quic_sibis,
	sumit.semwal, linux-arm-kernel, linux-kernel

On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> > On 2022-11-14 14:11, Will Deacon wrote:
> > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > > > 
> > > > As reported by Amit [1], dropping cache invalidation from
> > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > > > (most probably on other Qcom platforms too). The reason is, Qcom
> > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > > > for validation. The modem has a secure block (XPU) that will trigger a
> > > > whole system crash if the shared memory is accessed by the CPU while modem
> > > > is poking at it.
> > > > 
> > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> > > > with no kernel mapping, vmap's it, copies the firmware metadata and
> > > > unvmap's it. Finally the address is then shared with modem for metadata
> > > > validation [2].
> > > > 
> > > > Now because of the removal of cache invalidation from
> > > > arch_dma_prep_coherent(), there will be cache lines associated with this
> > > > memory even after sharing with modem. So when the CPU accesses it, the XPU
> > > > violation gets triggered.
> > > 
> > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> > > side, so how would the CPU access it?
> > 
> > Right, for the previous change to have made a difference the offending part
> > of this buffer must be present in some cache somewhere *before* the DMA
> > buffer allocation completes.
> > 
> > Clearly that driver is completely broken though. If the DMA allocation came
> > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> > shenanigans wouldn't work, so if it backed by struct pages then the whole
> > dance is still pointless because *a cacheable linear mapping exists*, and
> > it's just relying on the reduced chance that anything's going to re-fetch
> > the linear map address after those pages have been allocated, exactly as I
> > called out previously[1].
> 
> So I guess a DMA pool that's not mapped in the linear map, together with
> memremap() instead of vmap(), would work around the issue. But the
> driver needs fixing, not the arch code.
> 

Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
not part of the kernel's linear map? I looked into it but couldn't find a way.

Thanks,
Mani

> -- 
> Catalin

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-18 10:54       ` Manivannan Sadhasivam
@ 2022-11-18 12:33         ` Will Deacon
  2022-11-21  6:42           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2022-11-18 12:33 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson,
	quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel

On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
> > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> > > On 2022-11-14 14:11, Will Deacon wrote:
> > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > > > > 
> > > > > As reported by Amit [1], dropping cache invalidation from
> > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > > > > (most probably on other Qcom platforms too). The reason is, Qcom
> > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > > > > for validation. The modem has a secure block (XPU) that will trigger a
> > > > > whole system crash if the shared memory is accessed by the CPU while modem
> > > > > is poking at it.
> > > > > 
> > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> > > > > with no kernel mapping, vmap's it, copies the firmware metadata and
> > > > > unvmap's it. Finally the address is then shared with modem for metadata
> > > > > validation [2].
> > > > > 
> > > > > Now because of the removal of cache invalidation from
> > > > > arch_dma_prep_coherent(), there will be cache lines associated with this
> > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU
> > > > > violation gets triggered.
> > > > 
> > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> > > > side, so how would the CPU access it?
> > > 
> > > Right, for the previous change to have made a difference the offending part
> > > of this buffer must be present in some cache somewhere *before* the DMA
> > > buffer allocation completes.
> > > 
> > > Clearly that driver is completely broken though. If the DMA allocation came
> > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> > > shenanigans wouldn't work, so if it backed by struct pages then the whole
> > > dance is still pointless because *a cacheable linear mapping exists*, and
> > > it's just relying on the reduced chance that anything's going to re-fetch
> > > the linear map address after those pages have been allocated, exactly as I
> > > called out previously[1].
> > 
> > So I guess a DMA pool that's not mapped in the linear map, together with
> > memremap() instead of vmap(), would work around the issue. But the
> > driver needs fixing, not the arch code.
> > 
> 
> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
> not part of the kernel's linear map? I looked into it but couldn't find a way.

The no-map property should take care of this iirc

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-18 12:33         ` Will Deacon
@ 2022-11-21  6:42           ` Manivannan Sadhasivam
  2022-11-21 10:12             ` Sibi Sankar
  0 siblings, 1 reply; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-11-21  6:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson,
	quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel

On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote:
> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
> > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> > > > On 2022-11-14 14:11, Will Deacon wrote:
> > > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> > > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > > > > > 
> > > > > > As reported by Amit [1], dropping cache invalidation from
> > > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > > > > > (most probably on other Qcom platforms too). The reason is, Qcom
> > > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > > > > > for validation. The modem has a secure block (XPU) that will trigger a
> > > > > > whole system crash if the shared memory is accessed by the CPU while modem
> > > > > > is poking at it.
> > > > > > 
> > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> > > > > > with no kernel mapping, vmap's it, copies the firmware metadata and
> > > > > > unvmap's it. Finally the address is then shared with modem for metadata
> > > > > > validation [2].
> > > > > > 
> > > > > > Now because of the removal of cache invalidation from
> > > > > > arch_dma_prep_coherent(), there will be cache lines associated with this
> > > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU
> > > > > > violation gets triggered.
> > > > > 
> > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> > > > > side, so how would the CPU access it?
> > > > 
> > > > Right, for the previous change to have made a difference the offending part
> > > > of this buffer must be present in some cache somewhere *before* the DMA
> > > > buffer allocation completes.
> > > > 
> > > > Clearly that driver is completely broken though. If the DMA allocation came
> > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> > > > shenanigans wouldn't work, so if it backed by struct pages then the whole
> > > > dance is still pointless because *a cacheable linear mapping exists*, and
> > > > it's just relying on the reduced chance that anything's going to re-fetch
> > > > the linear map address after those pages have been allocated, exactly as I
> > > > called out previously[1].
> > > 
> > > So I guess a DMA pool that's not mapped in the linear map, together with
> > > memremap() instead of vmap(), would work around the issue. But the
> > > driver needs fixing, not the arch code.
> > > 
> > 
> > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
> > not part of the kernel's linear map? I looked into it but couldn't find a way.
> 
> The no-map property should take care of this iirc
> 

Yeah, we have been using it in other places of the same driver. But as per
Sibi, we used dynamic allocation for metadata validation since there was no
memory reserved statically for that.

But if we do not have a way to allocate a dynamic memory that is not part of
kernel's linear map, then we may have to resort to using an existing reserved
memory.

Thanks,
Mani

> Will

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-21  6:42           ` Manivannan Sadhasivam
@ 2022-11-21 10:12             ` Sibi Sankar
  2022-11-24 11:55               ` Catalin Marinas
  0 siblings, 1 reply; 27+ messages in thread
From: Sibi Sankar @ 2022-11-21 10:12 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Will Deacon
  Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson,
	sumit.semwal, linux-arm-kernel, linux-kernel



On 11/21/22 12:12, Manivannan Sadhasivam wrote:
> On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote:
>> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
>>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
>>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
>>>>> On 2022-11-14 14:11, Will Deacon wrote:
>>>>>> On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
>>>>>>> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
>>>>>>>
>>>>>>> As reported by Amit [1], dropping cache invalidation from
>>>>>>> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
>>>>>>> (most probably on other Qcom platforms too). The reason is, Qcom
>>>>>>> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
>>>>>>> for validation. The modem has a secure block (XPU) that will trigger a
>>>>>>> whole system crash if the shared memory is accessed by the CPU while modem
>>>>>>> is poking at it.
>>>>>>>
>>>>>>> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
>>>>>>> with no kernel mapping, vmap's it, copies the firmware metadata and
>>>>>>> unvmap's it. Finally the address is then shared with modem for metadata
>>>>>>> validation [2].
>>>>>>>
>>>>>>> Now because of the removal of cache invalidation from
>>>>>>> arch_dma_prep_coherent(), there will be cache lines associated with this
>>>>>>> memory even after sharing with modem. So when the CPU accesses it, the XPU
>>>>>>> violation gets triggered.
>>>>>>
>>>>>> This last past is a non-sequitur: the buffer is no longer mapped on the CPU
>>>>>> side, so how would the CPU access it?
>>>>>
>>>>> Right, for the previous change to have made a difference the offending part
>>>>> of this buffer must be present in some cache somewhere *before* the DMA
>>>>> buffer allocation completes.
>>>>>
>>>>> Clearly that driver is completely broken though. If the DMA allocation came
>>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
>>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole
>>>>> dance is still pointless because *a cacheable linear mapping exists*, and
>>>>> it's just relying on the reduced chance that anything's going to re-fetch
>>>>> the linear map address after those pages have been allocated, exactly as I
>>>>> called out previously[1].
>>>>
>>>> So I guess a DMA pool that's not mapped in the linear map, together with
>>>> memremap() instead of vmap(), would work around the issue. But the
>>>> driver needs fixing, not the arch code.
>>>>
>>>
>>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
>>> not part of the kernel's linear map? I looked into it but couldn't find a way.
>>
>> The no-map property should take care of this iirc
>>
> 
> Yeah, we have been using it in other places of the same driver. But as per
> Sibi, we used dynamic allocation for metadata validation since there was no
> memory reserved statically for that.

Will,

Unlike the other portions in the driver that required statically defined 
no-map carveouts, metadata just needed a contiguous memory for 
authentication. Re-using existing carveouts for this metadata region
may not work due to modem FW limitations and declaring a new carveout 
for metadata will break the device tree bindings. That's the reason for
using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with
VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there 
other suggestions for achieving the same without breaking bindings?

- Sibi

> 
> But if we do not have a way to allocate a dynamic memory that is not part of
> kernel's linear map, then we may have to resort to using an existing reserved
> memory.
> 
> Thanks,
> Mani
> 
>> Will
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-21 10:12             ` Sibi Sankar
@ 2022-11-24 11:55               ` Catalin Marinas
  2022-12-01  9:29                 ` Thorsten Leemhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2022-11-24 11:55 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Manivannan Sadhasivam, Will Deacon, Robin Murphy, amit.pundir,
	andersson, sumit.semwal, linux-arm-kernel, linux-kernel

On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote:
> On 11/21/22 12:12, Manivannan Sadhasivam wrote:
> > On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote:
> > > On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
> > > > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
> > > > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> > > > > > Clearly that driver is completely broken though. If the DMA allocation came
> > > > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> > > > > > shenanigans wouldn't work, so if it backed by struct pages then the whole
> > > > > > dance is still pointless because *a cacheable linear mapping exists*, and
> > > > > > it's just relying on the reduced chance that anything's going to re-fetch
> > > > > > the linear map address after those pages have been allocated, exactly as I
> > > > > > called out previously[1].
> > > > > 
> > > > > So I guess a DMA pool that's not mapped in the linear map, together with
> > > > > memremap() instead of vmap(), would work around the issue. But the
> > > > > driver needs fixing, not the arch code.
> > > > 
> > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
> > > > not part of the kernel's linear map? I looked into it but couldn't find a way.
> > > 
> > > The no-map property should take care of this iirc
> > 
> > Yeah, we have been using it in other places of the same driver. But as per
> > Sibi, we used dynamic allocation for metadata validation since there was no
> > memory reserved statically for that.
> 
> Unlike the other portions in the driver that required statically defined
> no-map carveouts, metadata just needed a contiguous memory for
> authentication. Re-using existing carveouts for this metadata region
> may not work due to modem FW limitations and declaring a new carveout for
> metadata will break the device tree bindings. That's the reason for
> using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with
> VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other
> suggestions for achieving the same without breaking bindings?

Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes
the failure rate smaller. All this attribute does is avoiding creating a
non-cacheable mapping but you still have the kernel linear mapping in
place that may be speculatively accessed by the CPU. You were just lucky
so far not to have hit the issue. So I'd rather see this fixed properly
with a no-map carveout. Maybe you can reuse an existing carveout if the
driver already needs some and avoid changing the DT. More complicated
options include allocating memory and unmapping it from the linear map
with set_memory_valid(), though that's not exported to modules and it
also requires the linear map to be pages only, not block mappings.

Yet another option is to have the swiotlb buffer unmapped from the
kernel linear map and use the bounce buffer for this. That's more
involved (Robin has some patches, though for a different reason and they
may not make it upstream).

-- 
Catalin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam
  2022-11-14 11:29 ` Manivannan Sadhasivam
  2022-11-14 14:11 ` Will Deacon
@ 2022-11-28  5:44 ` Thorsten Leemhuis
  2022-11-28  8:15   ` Manivannan Sadhasivam
  2022-12-08  4:59 ` Leonard Lausen
  3 siblings, 1 reply; 27+ messages in thread
From: Thorsten Leemhuis @ 2022-11-28  5:44 UTC (permalink / raw)
  To: Manivannan Sadhasivam, catalin.marinas, will
  Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal,
	linux-arm-kernel, linux-kernel

On 14.11.22 12:03, Manivannan Sadhasivam wrote:
> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> 
> As reported by Amit [1], dropping cache invalidation from
> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> (most probably on other Qcom platforms too). The reason is, Qcom
> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> for validation. The modem has a secure block (XPU) that will trigger a
> whole system crash if the shared memory is accessed by the CPU while modem
> is poking at it.
> [...]
> [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/
>

I have Amit's report on the list of tracked regressions. I also noticed
the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency":
https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/

I have no expertise in this area, but it looked somewhat related to my
eyes, so please allow me to quickly ask: is that related or even
supposed to fix Amit's regression?

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-28  5:44 ` Thorsten Leemhuis
@ 2022-11-28  8:15   ` Manivannan Sadhasivam
  0 siblings, 0 replies; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-11-28  8:15 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: catalin.marinas, will, robin.murphy, amit.pundir, andersson,
	quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel

On Mon, Nov 28, 2022 at 06:44:13AM +0100, Thorsten Leemhuis wrote:
> On 14.11.22 12:03, Manivannan Sadhasivam wrote:
> > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > 
> > As reported by Amit [1], dropping cache invalidation from
> > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > (most probably on other Qcom platforms too). The reason is, Qcom
> > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > for validation. The modem has a secure block (XPU) that will trigger a
> > whole system crash if the shared memory is accessed by the CPU while modem
> > is poking at it.
> > [...]
> > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/
> >
> 
> I have Amit's report on the list of tracked regressions. I also noticed
> the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency":
> https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/
> 
> I have no expertise in this area, but it looked somewhat related to my
> eyes, so please allow me to quickly ask: is that related or even
> supposed to fix Amit's regression?
> 

The proposed patch doesn't fix the regression reported by Amit. But the patch
itself fixes an issue that might be triggered more frequently by c44094eee32f.

Thanks,
Mani

> Ciao, Thorsten

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-24 11:55               ` Catalin Marinas
@ 2022-12-01  9:29                 ` Thorsten Leemhuis
  2022-12-01 17:45                   ` Catalin Marinas
  0 siblings, 1 reply; 27+ messages in thread
From: Thorsten Leemhuis @ 2022-12-01  9:29 UTC (permalink / raw)
  To: Catalin Marinas, Sibi Sankar
  Cc: Manivannan Sadhasivam, Will Deacon, Robin Murphy, amit.pundir,
	andersson, sumit.semwal, linux-arm-kernel, linux-kernel

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Has any progress been made to fix this regression? It afaics is not a
release critical issue, but well, it still would be nice to get this
fixed before 6.1 is released.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 24.11.22 12:55, Catalin Marinas wrote:
> On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote:
>> On 11/21/22 12:12, Manivannan Sadhasivam wrote:
>>> On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote:
>>>> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
>>>>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
>>>>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
>>>>>>> Clearly that driver is completely broken though. If the DMA allocation came
>>>>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
>>>>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole
>>>>>>> dance is still pointless because *a cacheable linear mapping exists*, and
>>>>>>> it's just relying on the reduced chance that anything's going to re-fetch
>>>>>>> the linear map address after those pages have been allocated, exactly as I
>>>>>>> called out previously[1].
>>>>>>
>>>>>> So I guess a DMA pool that's not mapped in the linear map, together with
>>>>>> memremap() instead of vmap(), would work around the issue. But the
>>>>>> driver needs fixing, not the arch code.
>>>>>
>>>>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
>>>>> not part of the kernel's linear map? I looked into it but couldn't find a way.
>>>>
>>>> The no-map property should take care of this iirc
>>>
>>> Yeah, we have been using it in other places of the same driver. But as per
>>> Sibi, we used dynamic allocation for metadata validation since there was no
>>> memory reserved statically for that.
>>
>> Unlike the other portions in the driver that required statically defined
>> no-map carveouts, metadata just needed a contiguous memory for
>> authentication. Re-using existing carveouts for this metadata region
>> may not work due to modem FW limitations and declaring a new carveout for
>> metadata will break the device tree bindings. That's the reason for
>> using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with
>> VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other
>> suggestions for achieving the same without breaking bindings?
> 
> Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes
> the failure rate smaller. All this attribute does is avoiding creating a
> non-cacheable mapping but you still have the kernel linear mapping in
> place that may be speculatively accessed by the CPU. You were just lucky
> so far not to have hit the issue. So I'd rather see this fixed properly
> with a no-map carveout. Maybe you can reuse an existing carveout if the
> driver already needs some and avoid changing the DT. More complicated
> options include allocating memory and unmapping it from the linear map
> with set_memory_valid(), though that's not exported to modules and it
> also requires the linear map to be pages only, not block mappings.
> 
> Yet another option is to have the swiotlb buffer unmapped from the
> kernel linear map and use the bounce buffer for this. That's more
> involved (Robin has some patches, though for a different reason and they
> may not make it upstream).
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-01  9:29                 ` Thorsten Leemhuis
@ 2022-12-01 17:45                   ` Catalin Marinas
  2022-12-02  8:26                     ` Amit Pundir
  0 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2022-12-01 17:45 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Sibi Sankar, Manivannan Sadhasivam, Will Deacon, Robin Murphy,
	amit.pundir, andersson, sumit.semwal, linux-arm-kernel,
	linux-kernel

On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> Has any progress been made to fix this regression? It afaics is not a
> release critical issue, but well, it still would be nice to get this
> fixed before 6.1 is released.

The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
that exposed the driver bug. It doesn't fix the actual bug, it only
makes it less likely to happen.

I like the original commit removing the cache invalidation as it shows
drivers not behaving properly but, as a workaround, we could add a
command line option to force back the old behaviour (defaulting to the
new one) until the driver is fixed.

-- 
Catalin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-01 17:45                   ` Catalin Marinas
@ 2022-12-02  8:26                     ` Amit Pundir
  2022-12-02  8:54                       ` Thorsten Leemhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Amit Pundir @ 2022-12-02  8:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Thorsten Leemhuis, Sibi Sankar, Manivannan Sadhasivam,
	Will Deacon, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel

On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > Has any progress been made to fix this regression? It afaics is not a
> > release critical issue, but well, it still would be nice to get this
> > fixed before 6.1 is released.
>
> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> that exposed the driver bug. It doesn't fix the actual bug, it only
> makes it less likely to happen.
>
> I like the original commit removing the cache invalidation as it shows
> drivers not behaving properly but, as a workaround, we could add a
> command line option to force back the old behaviour (defaulting to the
> new one) until the driver is fixed.

We use DB845c extensively for mainline and android-mainline[1] testing
with AOSP, and it is broken for weeks now. So be it a temporary
workaround or a proper driver fix in place, we'd really appreciate a
quick fix here.

I understand that the revert doesn't fix the actual driver bug, but we
were very very lucky so far that we had never hit this issue before.
So at this point I'll take the revert of the upstream commit as well,
while a proper fix is being worked upon.

Regards,
Amit Pundir
[1] https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline




>
> --
> Catalin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02  8:26                     ` Amit Pundir
@ 2022-12-02  8:54                       ` Thorsten Leemhuis
  2022-12-02 10:03                         ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Thorsten Leemhuis @ 2022-12-02  8:54 UTC (permalink / raw)
  To: Amit Pundir, Catalin Marinas
  Cc: Sibi Sankar, Manivannan Sadhasivam, Will Deacon, Robin Murphy,
	andersson, sumit.semwal, linux-arm-kernel, linux-kernel

On 02.12.22 09:26, Amit Pundir wrote:
> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>
>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
>>> Has any progress been made to fix this regression? It afaics is not a
>>> release critical issue, but well, it still would be nice to get this
>>> fixed before 6.1 is released.
>>
>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
>> that exposed the driver bug. It doesn't fix the actual bug, it only
>> makes it less likely to happen.
>>
>> I like the original commit removing the cache invalidation as it shows
>> drivers not behaving properly

Yeah, I understand that, but I guess it's my job to ask at this point:
"is continuing to live with the old behavior for one or two more cycles"
that much of a problem"?

>> but, as a workaround, we could add a
>> command line option to force back the old behaviour (defaulting to the
>> new one) until the driver is fixed.

Well, sometimes that approach is fine to fix a regression, but I'm not
sure this is one of those situations, as this...

> We use DB845c extensively for mainline and android-mainline[1] testing
> with AOSP, and it is broken for weeks now. So be it a temporary
> workaround or a proper driver fix in place, we'd really appreciate a
> quick fix here.

...doesn't sound like we are not talking about some odd corner case
here. But in the end that would be up to Linus to decide.

I'll point him to this thread once more in my weekly report anyway.
Maybe I'll even suggest to revert this change, not sure yet.

> [...]

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02  8:54                       ` Thorsten Leemhuis
@ 2022-12-02 10:03                         ` Will Deacon
  2022-12-02 10:34                           ` Thorsten Leemhuis
  2022-12-02 10:54                           ` Manivannan Sadhasivam
  0 siblings, 2 replies; 27+ messages in thread
From: Will Deacon @ 2022-12-02 10:03 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam,
	Robin Murphy, andersson, sumit.semwal, linux-arm-kernel,
	linux-kernel, hch, gregkh

On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> On 02.12.22 09:26, Amit Pundir wrote:
> > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>
> >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> >>> Has any progress been made to fix this regression? It afaics is not a
> >>> release critical issue, but well, it still would be nice to get this
> >>> fixed before 6.1 is released.
> >>
> >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> >> that exposed the driver bug. It doesn't fix the actual bug, it only
> >> makes it less likely to happen.
> >>
> >> I like the original commit removing the cache invalidation as it shows
> >> drivers not behaving properly
> 
> Yeah, I understand that, but I guess it's my job to ask at this point:
> "is continuing to live with the old behavior for one or two more cycles"
> that much of a problem"?

That wouldn't be a problem. The problem is that I haven't see any efforts
from the Qualcomm side to actually fix the drivers so what makes you think
the issue will be addressed in one or two more cycles? On the other hand, if
there were patches out there trying to fix the drivers and we just needed to
revert this change to buy them some time, then that would obviously be the
right thing to do.

> >> but, as a workaround, we could add a
> >> command line option to force back the old behaviour (defaulting to the
> >> new one) until the driver is fixed.
> 
> Well, sometimes that approach is fine to fix a regression, but I'm not
> sure this is one of those situations, as this...
> 
> > We use DB845c extensively for mainline and android-mainline[1] testing
> > with AOSP, and it is broken for weeks now. So be it a temporary
> > workaround or a proper driver fix in place, we'd really appreciate a
> > quick fix here.
> 
> ...doesn't sound like we are not talking about some odd corner case
> here. But in the end that would be up to Linus to decide.

The issue is that these drivers are abusing the DMA API to manage buffers
which are being transferred to trustzone. Even with the revert, this is
broken (the CPU can speculate from the kernel's cacheable linear mapping
of memory), it just appears to be less likely with the CPUs on this SoC.
So we end up in a situation where the kernel is flakey on these devices
but with even less incentive for the drivers to be fixed.

As well as broken drivers, the patch has also identified broken device-tree
files where DMA-coherent devices weher incorrectly being treated as
non-coherent:

https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/

so I do think it's something that's worth having as the default behaviour.

> I'll point him to this thread once more in my weekly report anyway.
> Maybe I'll even suggest to revert this change, not sure yet.

As I said above, I think the revert makes sense if the drivers are actually
being fixed, but I'm not seeing any movement at all on that front.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 10:03                         ` Will Deacon
@ 2022-12-02 10:34                           ` Thorsten Leemhuis
  2022-12-02 16:10                             ` Greg KH
  2022-12-02 10:54                           ` Manivannan Sadhasivam
  1 sibling, 1 reply; 27+ messages in thread
From: Thorsten Leemhuis @ 2022-12-02 10:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam,
	Robin Murphy, andersson, sumit.semwal, linux-arm-kernel,
	linux-kernel, hch, gregkh

On 02.12.22 11:03, Will Deacon wrote:
> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
>> On 02.12.22 09:26, Amit Pundir wrote:
>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>
>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
>>>>> Has any progress been made to fix this regression? It afaics is not a
>>>>> release critical issue, but well, it still would be nice to get this
>>>>> fixed before 6.1 is released.
>>>>
>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
>>>> makes it less likely to happen.
>>>>
>>>> I like the original commit removing the cache invalidation as it shows
>>>> drivers not behaving properly
>>
>> Yeah, I understand that, but I guess it's my job to ask at this point:
>> "is continuing to live with the old behavior for one or two more cycles"
>> that much of a problem"?
> 
> That wouldn't be a problem. The problem is that I haven't see any efforts
> from the Qualcomm side to actually fix the drivers [...]

Thx for sharing the details. I can fully understand your pain. But well,
in the end it looks to me like this commit it intentionally breaking
something that used to work -- which to my understanding of the "no
regression rule" is not okay, even if things only worked by chance and
not flawless.

But well, as with every rule there are misunderstandings, grey areas,
and situations where judgement calls have to be made. Then it's up to
Linus to decide how to handle things. Hence I'll just point him to this
thread and then he can decide. No biggie. And sorry if I'm being a PITA
here, I just thing doing that is my duty as regression tracker in
situations like this. Hope your won't mind.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 10:03                         ` Will Deacon
  2022-12-02 10:34                           ` Thorsten Leemhuis
@ 2022-12-02 10:54                           ` Manivannan Sadhasivam
  1 sibling, 0 replies; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-12-02 10:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thorsten Leemhuis, Amit Pundir, Catalin Marinas, Sibi Sankar,
	Robin Murphy, andersson, sumit.semwal, linux-arm-kernel,
	linux-kernel, hch, gregkh

On Fri, Dec 02, 2022 at 10:03:58AM +0000, Will Deacon wrote:
> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > On 02.12.22 09:26, Amit Pundir wrote:
> > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >>
> > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > >>> Has any progress been made to fix this regression? It afaics is not a
> > >>> release critical issue, but well, it still would be nice to get this
> > >>> fixed before 6.1 is released.
> > >>
> > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > >> that exposed the driver bug. It doesn't fix the actual bug, it only
> > >> makes it less likely to happen.
> > >>
> > >> I like the original commit removing the cache invalidation as it shows
> > >> drivers not behaving properly
> > 
> > Yeah, I understand that, but I guess it's my job to ask at this point:
> > "is continuing to live with the old behavior for one or two more cycles"
> > that much of a problem"?
> 
> That wouldn't be a problem. The problem is that I haven't see any efforts
> from the Qualcomm side to actually fix the drivers so what makes you think
> the issue will be addressed in one or two more cycles? On the other hand, if
> there were patches out there trying to fix the drivers and we just needed to
> revert this change to buy them some time, then that would obviously be the
> right thing to do.
> 

There are efforts going on to fix the driver from Qualcomm. It's just that the
patches are not available yet. The delay is mainly due to the internal
communication that should happen between the internal teams.

The fix would be use a separate no-map carveout for the usecase.

But it'd be good to revert this patch untill those patches get merged.

Thanks,
Mani

> > >> but, as a workaround, we could add a
> > >> command line option to force back the old behaviour (defaulting to the
> > >> new one) until the driver is fixed.
> > 
> > Well, sometimes that approach is fine to fix a regression, but I'm not
> > sure this is one of those situations, as this...
> > 
> > > We use DB845c extensively for mainline and android-mainline[1] testing
> > > with AOSP, and it is broken for weeks now. So be it a temporary
> > > workaround or a proper driver fix in place, we'd really appreciate a
> > > quick fix here.
> > 
> > ...doesn't sound like we are not talking about some odd corner case
> > here. But in the end that would be up to Linus to decide.
> 
> The issue is that these drivers are abusing the DMA API to manage buffers
> which are being transferred to trustzone. Even with the revert, this is
> broken (the CPU can speculate from the kernel's cacheable linear mapping
> of memory), it just appears to be less likely with the CPUs on this SoC.
> So we end up in a situation where the kernel is flakey on these devices
> but with even less incentive for the drivers to be fixed.
> 
> As well as broken drivers, the patch has also identified broken device-tree
> files where DMA-coherent devices weher incorrectly being treated as
> non-coherent:
> 
> https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/
> 
> so I do think it's something that's worth having as the default behaviour.
> 
> > I'll point him to this thread once more in my weekly report anyway.
> > Maybe I'll even suggest to revert this change, not sure yet.
> 
> As I said above, I think the revert makes sense if the drivers are actually
> being fixed, but I'm not seeing any movement at all on that front.
> 
> Will

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 10:34                           ` Thorsten Leemhuis
@ 2022-12-02 16:10                             ` Greg KH
  2022-12-02 16:27                               ` Thorsten Leemhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Greg KH @ 2022-12-02 16:10 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar,
	Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> On 02.12.22 11:03, Will Deacon wrote:
> > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> >> On 02.12.22 09:26, Amit Pundir wrote:
> >>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>>
> >>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> >>>>> Has any progress been made to fix this regression? It afaics is not a
> >>>>> release critical issue, but well, it still would be nice to get this
> >>>>> fixed before 6.1 is released.
> >>>>
> >>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> >>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> >>>> makes it less likely to happen.
> >>>>
> >>>> I like the original commit removing the cache invalidation as it shows
> >>>> drivers not behaving properly
> >>
> >> Yeah, I understand that, but I guess it's my job to ask at this point:
> >> "is continuing to live with the old behavior for one or two more cycles"
> >> that much of a problem"?
> > 
> > That wouldn't be a problem. The problem is that I haven't see any efforts
> > from the Qualcomm side to actually fix the drivers [...]
> 
> Thx for sharing the details. I can fully understand your pain. But well,
> in the end it looks to me like this commit it intentionally breaking
> something that used to work -- which to my understanding of the "no
> regression rule" is not okay, even if things only worked by chance and
> not flawless.

"no regressions" for userspace code, this is broken, out-of-tree driver
code, right?  I do not think any in-kernel drivers have this issue today
from what I can tell, but if I am wrong here, please let me know.

We don't keep stable apis, or even functionality, for out-of-tree kernel
code as that would be impossible for us to do for obvious reasons.

thanks,

greg kh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 16:10                             ` Greg KH
@ 2022-12-02 16:27                               ` Thorsten Leemhuis
  2022-12-02 16:32                                 ` Greg KH
  0 siblings, 1 reply; 27+ messages in thread
From: Thorsten Leemhuis @ 2022-12-02 16:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar,
	Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch



On 02.12.22 17:10, Greg KH wrote:
> On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
>> On 02.12.22 11:03, Will Deacon wrote:
>>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
>>>> On 02.12.22 09:26, Amit Pundir wrote:
>>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>>>
>>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
>>>>>>> Has any progress been made to fix this regression? It afaics is not a
>>>>>>> release critical issue, but well, it still would be nice to get this
>>>>>>> fixed before 6.1 is released.
>>>>>>
>>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
>>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
>>>>>> makes it less likely to happen.
>>>>>>
>>>>>> I like the original commit removing the cache invalidation as it shows
>>>>>> drivers not behaving properly
>>>>
>>>> Yeah, I understand that, but I guess it's my job to ask at this point:
>>>> "is continuing to live with the old behavior for one or two more cycles"
>>>> that much of a problem"?
>>>
>>> That wouldn't be a problem. The problem is that I haven't see any efforts
>>> from the Qualcomm side to actually fix the drivers [...]
>>
>> Thx for sharing the details. I can fully understand your pain. But well,
>> in the end it looks to me like this commit it intentionally breaking
>> something that used to work -- which to my understanding of the "no
>> regression rule" is not okay, even if things only worked by chance and
>> not flawless.
> 
> "no regressions" for userspace code, this is broken, out-of-tree driver
> code, right?

If so: apologies. But that's not the impression I got, as Amit wrote "I
can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
drivers." here:
https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/

>  I do not think any in-kernel drivers have this issue today
> from what I can tell, but if I am wrong here, please let me know.
> 
> We don't keep stable apis, or even functionality, for out-of-tree kernel
> code as that would be impossible for us to do for obvious reasons.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 16:27                               ` Thorsten Leemhuis
@ 2022-12-02 16:32                                 ` Greg KH
  2022-12-02 17:14                                   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 27+ messages in thread
From: Greg KH @ 2022-12-02 16:32 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar,
	Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote:
> 
> 
> On 02.12.22 17:10, Greg KH wrote:
> > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> >> On 02.12.22 11:03, Will Deacon wrote:
> >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> >>>> On 02.12.22 09:26, Amit Pundir wrote:
> >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>>>>
> >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> >>>>>>> Has any progress been made to fix this regression? It afaics is not a
> >>>>>>> release critical issue, but well, it still would be nice to get this
> >>>>>>> fixed before 6.1 is released.
> >>>>>>
> >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> >>>>>> makes it less likely to happen.
> >>>>>>
> >>>>>> I like the original commit removing the cache invalidation as it shows
> >>>>>> drivers not behaving properly
> >>>>
> >>>> Yeah, I understand that, but I guess it's my job to ask at this point:
> >>>> "is continuing to live with the old behavior for one or two more cycles"
> >>>> that much of a problem"?
> >>>
> >>> That wouldn't be a problem. The problem is that I haven't see any efforts
> >>> from the Qualcomm side to actually fix the drivers [...]
> >>
> >> Thx for sharing the details. I can fully understand your pain. But well,
> >> in the end it looks to me like this commit it intentionally breaking
> >> something that used to work -- which to my understanding of the "no
> >> regression rule" is not okay, even if things only worked by chance and
> >> not flawless.
> > 
> > "no regressions" for userspace code, this is broken, out-of-tree driver
> > code, right?
> 
> If so: apologies. But that's not the impression I got, as Amit wrote "I
> can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
> drivers." here:
> https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/

Ah, I missed that.

Ok, what in-tree drivers are having problems being buggy?  I can't seem
to figure that out from that report at all.  Does anyone know?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 16:32                                 ` Greg KH
@ 2022-12-02 17:14                                   ` Manivannan Sadhasivam
  2022-12-05 14:24                                     ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-12-02 17:14 UTC (permalink / raw)
  To: Greg KH
  Cc: Thorsten Leemhuis, Will Deacon, Amit Pundir, Catalin Marinas,
	Sibi Sankar, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote:
> On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote:
> > 
> > 
> > On 02.12.22 17:10, Greg KH wrote:
> > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> > >> On 02.12.22 11:03, Will Deacon wrote:
> > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > >>>> On 02.12.22 09:26, Amit Pundir wrote:
> > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >>>>>>
> > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > >>>>>>> Has any progress been made to fix this regression? It afaics is not a
> > >>>>>>> release critical issue, but well, it still would be nice to get this
> > >>>>>>> fixed before 6.1 is released.
> > >>>>>>
> > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> > >>>>>> makes it less likely to happen.
> > >>>>>>
> > >>>>>> I like the original commit removing the cache invalidation as it shows
> > >>>>>> drivers not behaving properly
> > >>>>
> > >>>> Yeah, I understand that, but I guess it's my job to ask at this point:
> > >>>> "is continuing to live with the old behavior for one or two more cycles"
> > >>>> that much of a problem"?
> > >>>
> > >>> That wouldn't be a problem. The problem is that I haven't see any efforts
> > >>> from the Qualcomm side to actually fix the drivers [...]
> > >>
> > >> Thx for sharing the details. I can fully understand your pain. But well,
> > >> in the end it looks to me like this commit it intentionally breaking
> > >> something that used to work -- which to my understanding of the "no
> > >> regression rule" is not okay, even if things only worked by chance and
> > >> not flawless.
> > > 
> > > "no regressions" for userspace code, this is broken, out-of-tree driver
> > > code, right?
> > 
> > If so: apologies. But that's not the impression I got, as Amit wrote "I
> > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
> > drivers." here:
> > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/
> 
> Ah, I missed that.
> 
> Ok, what in-tree drivers are having problems being buggy?  I can't seem
> to figure that out from that report at all.  Does anyone know?
> 

It is the Qualcomm Q6V5_MSS remoteproc driver:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c

Qualcomm is working on the fix but the patches are not ready yet. So if we can
get this patch reverted in the meantime, that would be helpful.

Thanks,
Mani

> thanks,
> 
> greg k-h

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-02 17:14                                   ` Manivannan Sadhasivam
@ 2022-12-05 14:24                                     ` Will Deacon
  2022-12-06  9:21                                       ` Manivannan Sadhasivam
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2022-12-05 14:24 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas,
	Sibi Sankar, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote:
> > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote:
> > > On 02.12.22 17:10, Greg KH wrote:
> > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> > > >> On 02.12.22 11:03, Will Deacon wrote:
> > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > > >>>> On 02.12.22 09:26, Amit Pundir wrote:
> > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >>>>>>
> > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a
> > > >>>>>>> release critical issue, but well, it still would be nice to get this
> > > >>>>>>> fixed before 6.1 is released.
> > > >>>>>>
> > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> > > >>>>>> makes it less likely to happen.
> > > >>>>>>
> > > >>>>>> I like the original commit removing the cache invalidation as it shows
> > > >>>>>> drivers not behaving properly
> > > >>>>
> > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point:
> > > >>>> "is continuing to live with the old behavior for one or two more cycles"
> > > >>>> that much of a problem"?
> > > >>>
> > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts
> > > >>> from the Qualcomm side to actually fix the drivers [...]
> > > >>
> > > >> Thx for sharing the details. I can fully understand your pain. But well,
> > > >> in the end it looks to me like this commit it intentionally breaking
> > > >> something that used to work -- which to my understanding of the "no
> > > >> regression rule" is not okay, even if things only worked by chance and
> > > >> not flawless.
> > > > 
> > > > "no regressions" for userspace code, this is broken, out-of-tree driver
> > > > code, right?
> > > 
> > > If so: apologies. But that's not the impression I got, as Amit wrote "I
> > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
> > > drivers." here:
> > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/
> > 
> > Ah, I missed that.
> > 
> > Ok, what in-tree drivers are having problems being buggy?  I can't seem
> > to figure that out from that report at all.  Does anyone know?
> > 
> 
> It is the Qualcomm Q6V5_MSS remoteproc driver:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c
> 
> Qualcomm is working on the fix but the patches are not ready yet. So if we can
> get this patch reverted in the meantime, that would be helpful.

It's good to hear that you're working to fix this, even if it's happening
behind closed doors. Do you have a rough idea how soon you'll be able to
post the remoteproc driver fixes? That would help us to figure out when
to bring back the change if we were to revert it.

Cheers,

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-05 14:24                                     ` Will Deacon
@ 2022-12-06  9:21                                       ` Manivannan Sadhasivam
  2022-12-06  9:58                                         ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Manivannan Sadhasivam @ 2022-12-06  9:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas,
	Sibi Sankar, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote:
> On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote:
> > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote:
> > > > On 02.12.22 17:10, Greg KH wrote:
> > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> > > > >> On 02.12.22 11:03, Will Deacon wrote:
> > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > > > >>>> On 02.12.22 09:26, Amit Pundir wrote:
> > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > >>>>>>
> > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a
> > > > >>>>>>> release critical issue, but well, it still would be nice to get this
> > > > >>>>>>> fixed before 6.1 is released.
> > > > >>>>>>
> > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> > > > >>>>>> makes it less likely to happen.
> > > > >>>>>>
> > > > >>>>>> I like the original commit removing the cache invalidation as it shows
> > > > >>>>>> drivers not behaving properly
> > > > >>>>
> > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point:
> > > > >>>> "is continuing to live with the old behavior for one or two more cycles"
> > > > >>>> that much of a problem"?
> > > > >>>
> > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts
> > > > >>> from the Qualcomm side to actually fix the drivers [...]
> > > > >>
> > > > >> Thx for sharing the details. I can fully understand your pain. But well,
> > > > >> in the end it looks to me like this commit it intentionally breaking
> > > > >> something that used to work -- which to my understanding of the "no
> > > > >> regression rule" is not okay, even if things only worked by chance and
> > > > >> not flawless.
> > > > > 
> > > > > "no regressions" for userspace code, this is broken, out-of-tree driver
> > > > > code, right?
> > > > 
> > > > If so: apologies. But that's not the impression I got, as Amit wrote "I
> > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
> > > > drivers." here:
> > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/
> > > 
> > > Ah, I missed that.
> > > 
> > > Ok, what in-tree drivers are having problems being buggy?  I can't seem
> > > to figure that out from that report at all.  Does anyone know?
> > > 
> > 
> > It is the Qualcomm Q6V5_MSS remoteproc driver:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c
> > 
> > Qualcomm is working on the fix but the patches are not ready yet. So if we can
> > get this patch reverted in the meantime, that would be helpful.
> 
> It's good to hear that you're working to fix this, even if it's happening
> behind closed doors. Do you have a rough idea how soon you'll be able to
> post the remoteproc driver fixes? That would help us to figure out when
> to bring back the change if we were to revert it.
> 

Sibi is the one working on the fix. I believe he should be able to post the
patches within this week.

Thanks,
Mani

> Cheers,
> 
> Will

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-12-06  9:21                                       ` Manivannan Sadhasivam
@ 2022-12-06  9:58                                         ` Will Deacon
  0 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2022-12-06  9:58 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas,
	Sibi Sankar, Robin Murphy, andersson, sumit.semwal,
	linux-arm-kernel, linux-kernel, hch

On Tue, Dec 06, 2022 at 02:51:52PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote:
> > On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote:
> > > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote:
> > > > > On 02.12.22 17:10, Greg KH wrote:
> > > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote:
> > > > > >> On 02.12.22 11:03, Will Deacon wrote:
> > > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > > > > >>>> On 02.12.22 09:26, Amit Pundir wrote:
> > > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a
> > > > > >>>>>>> release critical issue, but well, it still would be nice to get this
> > > > > >>>>>>> fixed before 6.1 is released.
> > > > > >>>>>>
> > > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only
> > > > > >>>>>> makes it less likely to happen.
> > > > > >>>>>>
> > > > > >>>>>> I like the original commit removing the cache invalidation as it shows
> > > > > >>>>>> drivers not behaving properly
> > > > > >>>>
> > > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point:
> > > > > >>>> "is continuing to live with the old behavior for one or two more cycles"
> > > > > >>>> that much of a problem"?
> > > > > >>>
> > > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts
> > > > > >>> from the Qualcomm side to actually fix the drivers [...]
> > > > > >>
> > > > > >> Thx for sharing the details. I can fully understand your pain. But well,
> > > > > >> in the end it looks to me like this commit it intentionally breaking
> > > > > >> something that used to work -- which to my understanding of the "no
> > > > > >> regression rule" is not okay, even if things only worked by chance and
> > > > > >> not flawless.
> > > > > > 
> > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver
> > > > > > code, right?
> > > > > 
> > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I
> > > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree
> > > > > drivers." here:
> > > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/
> > > > 
> > > > Ah, I missed that.
> > > > 
> > > > Ok, what in-tree drivers are having problems being buggy?  I can't seem
> > > > to figure that out from that report at all.  Does anyone know?
> > > > 
> > > 
> > > It is the Qualcomm Q6V5_MSS remoteproc driver:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c
> > > 
> > > Qualcomm is working on the fix but the patches are not ready yet. So if we can
> > > get this patch reverted in the meantime, that would be helpful.
> > 
> > It's good to hear that you're working to fix this, even if it's happening
> > behind closed doors. Do you have a rough idea how soon you'll be able to
> > post the remoteproc driver fixes? That would help us to figure out when
> > to bring back the change if we were to revert it.
> > 
> 
> Sibi is the one working on the fix. I believe he should be able to post the
> patches within this week.

Oh nice, that's a lot sooner than I expected! I'll send a revert out now,
with a comment about where we're at.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
  2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam
                   ` (2 preceding siblings ...)
  2022-11-28  5:44 ` Thorsten Leemhuis
@ 2022-12-08  4:59 ` Leonard Lausen
  3 siblings, 0 replies; 27+ messages in thread
From: Leonard Lausen @ 2022-12-08  4:59 UTC (permalink / raw)
  To: Manivannan Sadhasivam, catalin.marinas, will
  Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal,
	linux-arm-kernel, linux-kernel, Manivannan Sadhasivam

Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
>
> As reported by Amit [1], dropping cache invalidation from
> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> (most probably on other Qcom platforms too).

On sc7180 with c44094ee applied, it does not trigger crash but makes
Wifi dysfunctional by preventing initialization of ath10k_snoc.

qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284098560

With the revert of c44094ee, wifi works fine again.

Thank you
Leonard

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-12-08  5:06 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam
2022-11-14 11:29 ` Manivannan Sadhasivam
2022-11-14 14:11 ` Will Deacon
2022-11-14 15:14   ` Robin Murphy
2022-11-14 17:38     ` Catalin Marinas
2022-11-18 10:54       ` Manivannan Sadhasivam
2022-11-18 12:33         ` Will Deacon
2022-11-21  6:42           ` Manivannan Sadhasivam
2022-11-21 10:12             ` Sibi Sankar
2022-11-24 11:55               ` Catalin Marinas
2022-12-01  9:29                 ` Thorsten Leemhuis
2022-12-01 17:45                   ` Catalin Marinas
2022-12-02  8:26                     ` Amit Pundir
2022-12-02  8:54                       ` Thorsten Leemhuis
2022-12-02 10:03                         ` Will Deacon
2022-12-02 10:34                           ` Thorsten Leemhuis
2022-12-02 16:10                             ` Greg KH
2022-12-02 16:27                               ` Thorsten Leemhuis
2022-12-02 16:32                                 ` Greg KH
2022-12-02 17:14                                   ` Manivannan Sadhasivam
2022-12-05 14:24                                     ` Will Deacon
2022-12-06  9:21                                       ` Manivannan Sadhasivam
2022-12-06  9:58                                         ` Will Deacon
2022-12-02 10:54                           ` Manivannan Sadhasivam
2022-11-28  5:44 ` Thorsten Leemhuis
2022-11-28  8:15   ` Manivannan Sadhasivam
2022-12-08  4:59 ` Leonard Lausen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).