iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Consult on ARM SMMU debugfs
@ 2021-01-07  2:45 chenxiang (M)
  2021-01-11 20:01 ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: chenxiang (M) @ 2021-01-07  2:45 UTC (permalink / raw)
  To: will, robin.murphy; +Cc: iommu, linux-arm-kernel

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as

the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs

for ARM SMMU driver specially. Do you know whether ARM have the plan to 
do that recently?


Best regards,

Shawn


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Consult on ARM SMMU debugfs
  2021-01-07  2:45 Consult on ARM SMMU debugfs chenxiang (M)
@ 2021-01-11 20:01 ` Robin Murphy
  2021-01-15 11:26   ` chenxiang (M)
  2021-01-15 15:14   ` Russell King - ARM Linux admin
  0 siblings, 2 replies; 6+ messages in thread
From: Robin Murphy @ 2021-01-11 20:01 UTC (permalink / raw)
  To: chenxiang (M), will; +Cc: iommu, linux-arm-kernel

On 2021-01-07 02:45, chenxiang (M) wrote:
> Hi Will,  Robin or other guys,
> 
> When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
> lacks of enough debugfs for ARM SMMU driver (such as
> 
> the value of STE/CD which we need to check sometimes). Currently it 
> creates top-level iommu directory in debugfs, but there is no debugfs
> 
> for ARM SMMU driver specially. Do you know whether ARM have the plan to 
> do that recently?

FWIW I don't think I've ever felt the need to need to inspect the Stream 
Table on a live system. So far the nature of the STE code has been 
simple enough that it's very hard for any given STE to be *wrong* - 
either it's set up as expected and thus works fine, or it's not 
initialised at all and you get C_BAD_STE, where 99% of the time you then 
just cross-reference the Stream ID against the firmware and find that 
the DT/IORT is wrong.

Similarly I don't think I've even even *seen* an issue that could be 
attributed to a context descriptor, although I appreciate that as we 
start landing more PASID and SVA support the scope for that starts to 
widen considerably.

Feel free to propose a patch if you believe it would be genuinely useful 
and won't just bit-rot into a maintenance burden, but it's not something 
that's on our roadmap here.

Thanks,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Consult on ARM SMMU debugfs
  2021-01-11 20:01 ` Robin Murphy
@ 2021-01-15 11:26   ` chenxiang (M)
  2021-01-15 15:14   ` Russell King - ARM Linux admin
  1 sibling, 0 replies; 6+ messages in thread
From: chenxiang (M) @ 2021-01-15 11:26 UTC (permalink / raw)
  To: Robin Murphy, will; +Cc: iommu, linux-arm-kernel



在 2021/1/12 4:01, Robin Murphy 写道:
> On 2021-01-07 02:45, chenxiang (M) wrote:
>> Hi Will,  Robin or other guys,
>>
>> When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
>> lacks of enough debugfs for ARM SMMU driver (such as
>>
>> the value of STE/CD which we need to check sometimes). Currently it 
>> creates top-level iommu directory in debugfs, but there is no debugfs
>>
>> for ARM SMMU driver specially. Do you know whether ARM have the plan 
>> to do that recently?
>
> FWIW I don't think I've ever felt the need to need to inspect the 
> Stream Table on a live system. So far the nature of the STE code has 
> been simple enough that it's very hard for any given STE to be *wrong* 
> - either it's set up as expected and thus works fine, or it's not 
> initialised at all and you get C_BAD_STE, where 99% of the time you 
> then just cross-reference the Stream ID against the firmware and find 
> that the DT/IORT is wrong.
>
> Similarly I don't think I've even even *seen* an issue that could be 
> attributed to a context descriptor, although I appreciate that as we 
> start landing more PASID and SVA support the scope for that starts to 
> widen considerably.

Thank you for your reply.  I aggree that it is very hard for the content 
of STE/CD to be wrong in current code, but  there are more upsteaming 
features(such as SVA/vSMMU) which are related to SMMU,
when debugging with those features, it is useful for us to locate issues 
if there are interfaces to dump those info. And also when debugging 
together with hardware guys, the content of dump is important for them too.

>
> Feel free to propose a patch if you believe it would be genuinely 
> useful and won't just bit-rot into a maintenance burden, but it's not 
> something that's on our roadmap here.

OK, we are considering about incorporating following requirements into 
the plan:
- Dump the value of STE/CD of devices
- Dump page table entries of devices
- Dump the entries info of CMDQ/EVENTQ

>
> Thanks,
> Robin.
>
> .
>


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Consult on ARM SMMU debugfs
  2021-01-11 20:01 ` Robin Murphy
  2021-01-15 11:26   ` chenxiang (M)
@ 2021-01-15 15:14   ` Russell King - ARM Linux admin
  2021-01-15 17:17     ` Robin Murphy
  1 sibling, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2021-01-15 15:14 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Mon, Jan 11, 2021 at 08:01:48PM +0000, Robin Murphy wrote:
> On 2021-01-07 02:45, chenxiang (M) wrote:
> > Hi Will,  Robin or other guys,
> > 
> > When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
> > lacks of enough debugfs for ARM SMMU driver (such as
> > 
> > the value of STE/CD which we need to check sometimes). Currently it
> > creates top-level iommu directory in debugfs, but there is no debugfs
> > 
> > for ARM SMMU driver specially. Do you know whether ARM have the plan to
> > do that recently?
> 
> FWIW I don't think I've ever felt the need to need to inspect the Stream
> Table on a live system. So far the nature of the STE code has been simple
> enough that it's very hard for any given STE to be *wrong* - either it's set
> up as expected and thus works fine, or it's not initialised at all and you
> get C_BAD_STE, where 99% of the time you then just cross-reference the
> Stream ID against the firmware and find that the DT/IORT is wrong.
> 
> Similarly I don't think I've even even *seen* an issue that could be
> attributed to a context descriptor, although I appreciate that as we start
> landing more PASID and SVA support the scope for that starts to widen
> considerably.
> 
> Feel free to propose a patch if you believe it would be genuinely useful and
> won't just bit-rot into a maintenance burden, but it's not something that's
> on our roadmap here.

I do think that the IOMMU stuff needs better debugging. I've hit the
WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
so I've resorted to putting the IOMMU into bypass mode permanently to
work around the issue.

The reason that it's undebuggable is if one puts printk() or trace
statements in the code, boots the platform, you get flooded with those
debugging messages, because every access to the rootfs generates and
tears down a mapping.

It would be nice to be able to inspect the IOMMU page tables and state
of the IOMMU, rather than having to resort to effectively disabling
the IOMMU.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Consult on ARM SMMU debugfs
  2021-01-15 15:14   ` Russell King - ARM Linux admin
@ 2021-01-15 17:17     ` Robin Murphy
  2021-02-05 16:52       ` Sai Prakash Ranjan
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2021-01-15 17:17 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: iommu, will, linux-arm-kernel

On 2021-01-15 15:14, Russell King - ARM Linux admin wrote:
> On Mon, Jan 11, 2021 at 08:01:48PM +0000, Robin Murphy wrote:
>> On 2021-01-07 02:45, chenxiang (M) wrote:
>>> Hi Will,� Robin or other guys,
>>>
>>> When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
>>> lacks of enough debugfs for ARM SMMU driver (such as
>>>
>>> the value of STE/CD which we need to check sometimes). Currently it
>>> creates top-level iommu directory in debugfs, but there is no debugfs
>>>
>>> for ARM SMMU driver specially. Do you know whether ARM have the plan to
>>> do that recently?
>>
>> FWIW I don't think I've ever felt the need to need to inspect the Stream
>> Table on a live system. So far the nature of the STE code has been simple
>> enough that it's very hard for any given STE to be *wrong* - either it's set
>> up as expected and thus works fine, or it's not initialised at all and you
>> get C_BAD_STE, where 99% of the time you then just cross-reference the
>> Stream ID against the firmware and find that the DT/IORT is wrong.
>>
>> Similarly I don't think I've even even *seen* an issue that could be
>> attributed to a context descriptor, although I appreciate that as we start
>> landing more PASID and SVA support the scope for that starts to widen
>> considerably.
>>
>> Feel free to propose a patch if you believe it would be genuinely useful and
>> won't just bit-rot into a maintenance burden, but it's not something that's
>> on our roadmap here.
> 
> I do think that the IOMMU stuff needs better debugging. I've hit the
> WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
> so I've resorted to putting the IOMMU into bypass mode permanently to
> work around the issue.
> 
> The reason that it's undebuggable is if one puts printk() or trace
> statements in the code, boots the platform, you get flooded with those
> debugging messages, because every access to the rootfs generates and
> tears down a mapping.
> 
> It would be nice to be able to inspect the IOMMU page tables and state
> of the IOMMU, rather than having to resort to effectively disabling
> the IOMMU.

Certainly once we get to stuff like unpinned VFIO, having the ability to 
inspect pagetables for arbitrary IOMMU API usage will indeed be useful. 
 From the DMA mapping perspective, though, unless you're working on the 
io-pgtable code itself it's not really going to tell you much that 
dumping the mappings from dma-debug can't already.

FWIW whenever I encounter that particular warning in iommu-dma context, 
I don't care where the existing mapping is pointing, since it's merely a 
symptom of the damage already having been done. At that point I'd 
usually go off and audit all the DMA API calls in the offending driver, 
since it's typically caused by corruption in the IOVA allocator from 
passing the wrong size in a dma_unmap_*() call, and those can often be 
spotted by inspection. For active debugging, what you really want to 
know is the *history* of operations around that IOVA, since you're 
primarily interested in the request that last mapped it, then the 
corresponding unmap request for nominally the same buffer (which allowed 
the IOVA region to be freed for reuse) that for some reason didn't cover 
one or more pages that it should have. The IOMMU API tracepoints can be 
a handy tool there.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Consult on ARM SMMU debugfs
  2021-01-15 17:17     ` Robin Murphy
@ 2021-02-05 16:52       ` Sai Prakash Ranjan
  0 siblings, 0 replies; 6+ messages in thread
From: Sai Prakash Ranjan @ 2021-02-05 16:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Russell King - ARM Linux admin, iommu, will, linux-arm-kernel

On 2021-01-15 22:47, Robin Murphy wrote:
> On 2021-01-15 15:14, Russell King - ARM Linux admin wrote:
>> On Mon, Jan 11, 2021 at 08:01:48PM +0000, Robin Murphy wrote:
>>> On 2021-01-07 02:45, chenxiang (M) wrote:
>>>> Hi Will,� Robin or other guys,
>>>> 
>>>> When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
>>>> lacks of enough debugfs for ARM SMMU driver (such as
>>>> 
>>>> the value of STE/CD which we need to check sometimes). Currently it
>>>> creates top-level iommu directory in debugfs, but there is no 
>>>> debugfs
>>>> 
>>>> for ARM SMMU driver specially. Do you know whether ARM have the plan 
>>>> to
>>>> do that recently?
>>> 
>>> FWIW I don't think I've ever felt the need to need to inspect the 
>>> Stream
>>> Table on a live system. So far the nature of the STE code has been 
>>> simple
>>> enough that it's very hard for any given STE to be *wrong* - either 
>>> it's set
>>> up as expected and thus works fine, or it's not initialised at all 
>>> and you
>>> get C_BAD_STE, where 99% of the time you then just cross-reference 
>>> the
>>> Stream ID against the firmware and find that the DT/IORT is wrong.
>>> 
>>> Similarly I don't think I've even even *seen* an issue that could be
>>> attributed to a context descriptor, although I appreciate that as we 
>>> start
>>> landing more PASID and SVA support the scope for that starts to widen
>>> considerably.
>>> 
>>> Feel free to propose a patch if you believe it would be genuinely 
>>> useful and
>>> won't just bit-rot into a maintenance burden, but it's not something 
>>> that's
>>> on our roadmap here.
>> 
>> I do think that the IOMMU stuff needs better debugging. I've hit the
>> WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
>> so I've resorted to putting the IOMMU into bypass mode permanently to
>> work around the issue.
>> 
>> The reason that it's undebuggable is if one puts printk() or trace
>> statements in the code, boots the platform, you get flooded with those
>> debugging messages, because every access to the rootfs generates and
>> tears down a mapping.
>> 
>> It would be nice to be able to inspect the IOMMU page tables and state
>> of the IOMMU, rather than having to resort to effectively disabling
>> the IOMMU.
> 
> Certainly once we get to stuff like unpinned VFIO, having the ability
> to inspect pagetables for arbitrary IOMMU API usage will indeed be
> useful. From the DMA mapping perspective, though, unless you're
> working on the io-pgtable code itself it's not really going to tell
> you much that dumping the mappings from dma-debug can't already.
> 
> FWIW whenever I encounter that particular warning in iommu-dma
> context, I don't care where the existing mapping is pointing, since
> it's merely a symptom of the damage already having been done. At that
> point I'd usually go off and audit all the DMA API calls in the
> offending driver, since it's typically caused by corruption in the
> IOVA allocator from passing the wrong size in a dma_unmap_*() call,
> and those can often be spotted by inspection. For active debugging,
> what you really want to know is the *history* of operations around
> that IOVA, since you're primarily interested in the request that last
> mapped it, then the corresponding unmap request for nominally the same
> buffer (which allowed the IOVA region to be freed for reuse) that for
> some reason didn't cover one or more pages that it should have. The
> IOMMU API tracepoints can be a handy tool there.
> 

Currently IOMMU trace events are not straight forward to decode if
there are multiple devices attached. For ex: consider below:

map: IOMMU: iova=0x0000000ffff35000 paddr=0x0000000113be2000 size=4096
unmap: IOMMU: iova=0x0000000ffff34000 size=4096 unmapped_size=4096
unmap: IOMMU: iova=0x0000000ffff35000 size=4096 unmapped_size=4096
map: IOMMU: iova=0x0000000ffff36000 paddr=0x00000001164d8000 size=4096
map: IOMMU: iova=0x0000000ffff37000 paddr=0x00000001164da000 size=4096
unmap: IOMMU: iova=0x0000000ffff36000 size=4096 unmapped_size=4096
unmap: IOMMU: iova=0x0000000ffff37000 size=4096 unmapped_size=4096

How about making it more useful adding the device name as well? Ex:

map: IOMMU:ae00000.mdss iova=0x000000000002b000 paddr=0x000000010a9e6000 
size=8192
map: IOMMU:ae00000.mdss iova=0x000000000002d000 paddr=0x000000010a9ec000 
size=21790
map: IOMMU:ae00000.mdss iova=0x0000000000241000 paddr=0x000000010c400000 
size=59392
map: IOMMU:a600000.dwc3 iova=0x0000000ffff4a000 paddr=0x000000010a821000 
size=4096
map: IOMMU:a600000.dwc3 iova=0x0000000ffff49000 paddr=0x000000010a820000 
size=4096
unmap: IOMMU:a600000.dwc3 iova=0x0000000ffff4a000 size=4096 
unmapped_size=4096
unmap: IOMMU:a600000.dwc3 iova=0x0000000ffff49000 size=4096 
unmapped_size=4096

We have been carrying a local patch downstream like forever, I can post 
a
patch if you guys think it is useful in general.

Thanks
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-05 16:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07  2:45 Consult on ARM SMMU debugfs chenxiang (M)
2021-01-11 20:01 ` Robin Murphy
2021-01-15 11:26   ` chenxiang (M)
2021-01-15 15:14   ` Russell King - ARM Linux admin
2021-01-15 17:17     ` Robin Murphy
2021-02-05 16:52       ` Sai Prakash Ranjan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).