iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
@ 2020-02-14 20:13 Jerry Snitselaar
  2020-02-14 20:58 ` Robin Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Jerry Snitselaar @ 2020-02-14 20:13 UTC (permalink / raw)
  To: iommu, Will Deacon

Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7

I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to start digging into this?

fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32


Regards,
Jerry

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
  2020-02-14 20:13 arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1 Jerry Snitselaar
@ 2020-02-14 20:58 ` Robin Murphy
  2020-02-16 22:11   ` Jerry Snitselaar
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Murphy @ 2020-02-14 20:58 UTC (permalink / raw)
  To: Jerry Snitselaar, iommu, Will Deacon

Hi Jerry,

On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
> Hi Will,
> 
> On a gigabyte system with Cavium CN8xx, when doing a fio test against
> an nvme drive we are seeing the following:
> 
> [  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7
> [  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
> fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7

Those "IOVAs" don't look much like IOVAs from the DMA allocator - if 
they were physical addresses, would they correspond to an expected 
region of the physical memory map?

I would suspect that this is most likely misbehaviour in the NVMe driver 
(issuing a write to a non-DMA-mapped address), and the SMMU is just 
doing its job in blocking and reporting it.

> I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I 
> couldn't narrow it down further into 5.4-rc1.
> I don't know smmu or the code well, any thoughts on where to start 
> digging into this?
> 
> fio test that is being run is:
> 
> #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
> -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
> -name=mytest -numjobs=32

Just to clarify, do other tests work OK on the same device?

Thanks,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
  2020-02-14 20:58 ` Robin Murphy
@ 2020-02-16 22:11   ` Jerry Snitselaar
  2020-02-17 13:08     ` Robin Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Jerry Snitselaar @ 2020-02-16 22:11 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, Will Deacon

On Fri Feb 14 20, Robin Murphy wrote:
>Hi Jerry,
>
>On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
>>Hi Will,
>>
>>On a gigabyte system with Cavium CN8xx, when doing a fio test against
>>an nvme drive we are seeing the following:
>>
>>[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>>[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, 
>>cbfrsynra=0x9000, cb=7
>
>Those "IOVAs" don't look much like IOVAs from the DMA allocator - if 
>they were physical addresses, would they correspond to an expected 
>region of the physical memory map?
>
>I would suspect that this is most likely misbehaviour in the NVMe 
>driver (issuing a write to a non-DMA-mapped address), and the SMMU is 
>just doing its job in blocking and reporting it.
>
>>I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. 
>>I couldn't narrow it down further into 5.4-rc1.
>>I don't know smmu or the code well, any thoughts on where to start 
>>digging into this?
>>
>>fio test that is being run is:
>>
>>#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
>>-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
>>-name=mytest -numjobs=32
>
>Just to clarify, do other tests work OK on the same device?
>
>Thanks,
>Robin.
>

I was able to get back on the system today. I think I know what the problem is:

[    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu passthrough mode
[    6.324282] iommu: Default domain type: Translated

So the new default domain code in 5.4 overrides the iommu quirk code setting default
passthrough. Testing a quick patch that tracks whether the default domain was set
in the quirk code, and leaves it alone if it was. So far it seems to be working.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
  2020-02-16 22:11   ` Jerry Snitselaar
@ 2020-02-17 13:08     ` Robin Murphy
  2020-02-17 14:58       ` Jerry Snitselaar
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Murphy @ 2020-02-17 13:08 UTC (permalink / raw)
  To: Jerry Snitselaar; +Cc: iommu, Will Deacon

On 16/02/2020 10:11 pm, Jerry Snitselaar wrote:
> On Fri Feb 14 20, Robin Murphy wrote:
>> Hi Jerry,
>>
>> On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
>>> Hi Will,
>>>
>>> On a gigabyte system with Cavium CN8xx, when doing a fio test against
>>> an nvme drive we are seeing the following:
>>>
>>> [  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>> [  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
>>> fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, 
>>> cb=7
>>
>> Those "IOVAs" don't look much like IOVAs from the DMA allocator - if 
>> they were physical addresses, would they correspond to an expected 
>> region of the physical memory map?
>>
>> I would suspect that this is most likely misbehaviour in the NVMe 
>> driver (issuing a write to a non-DMA-mapped address), and the SMMU is 
>> just doing its job in blocking and reporting it.
>>
>>> I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I 
>>> couldn't narrow it down further into 5.4-rc1.
>>> I don't know smmu or the code well, any thoughts on where to start 
>>> digging into this?
>>>
>>> fio test that is being run is:
>>>
>>> #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
>>> -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
>>> -name=mytest -numjobs=32
>>
>> Just to clarify, do other tests work OK on the same device?
>>
>> Thanks,
>> Robin.
>>
> 
> I was able to get back on the system today. I think I know what the 
> problem is:
> 
> [    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu 
> passthrough mode
> [    6.324282] iommu: Default domain type: Translated
> 
> So the new default domain code in 5.4 overrides the iommu quirk code 
> setting default
> passthrough. Testing a quick patch that tracks whether the default 
> domain was set
> in the quirk code, and leaves it alone if it was. So far it seems to be 
> working.

Ah, OK. Could you point me at that quirk code? I can't seem to track it 
down in mainline, and seeing this much leaves me dubious that it's even 
correct - matching a particular board implies that it's a firmware issue 
(as far as I'm aware the SMMUs in CN88xx SoCs are usable in general), 
but if the firmware description is wrong to the point that DMA ops 
translation doesn't work, then no other translation (e.g. VFIO) is 
likely to work either. In that case it's simply not safe to enable the 
SMMU at all, and fudging the default domain type merely hides one 
symptom of the problem.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
  2020-02-17 13:08     ` Robin Murphy
@ 2020-02-17 14:58       ` Jerry Snitselaar
  0 siblings, 0 replies; 5+ messages in thread
From: Jerry Snitselaar @ 2020-02-17 14:58 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, Will Deacon

On Mon Feb 17 20, Robin Murphy wrote:
>On 16/02/2020 10:11 pm, Jerry Snitselaar wrote:
>>On Fri Feb 14 20, Robin Murphy wrote:
>>>Hi Jerry,
>>>
>>>On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
>>>>Hi Will,
>>>>
>>>>On a gigabyte system with Cavium CN8xx, when doing a fio test against
>>>>an nvme drive we are seeing the following:
>>>>
>>>>[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>>[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context 
>>>>fault: fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, 
>>>>cbfrsynra=0x9000, cb=7
>>>
>>>Those "IOVAs" don't look much like IOVAs from the DMA allocator - 
>>>if they were physical addresses, would they correspond to an 
>>>expected region of the physical memory map?
>>>
>>>I would suspect that this is most likely misbehaviour in the NVMe 
>>>driver (issuing a write to a non-DMA-mapped address), and the SMMU 
>>>is just doing its job in blocking and reporting it.
>>>
>>>>I also reproduced with 5.5-rc7, and will check 5.6-rc1 later 
>>>>today. I couldn't narrow it down further into 5.4-rc1.
>>>>I don't know smmu or the code well, any thoughts on where to 
>>>>start digging into this?
>>>>
>>>>fio test that is being run is:
>>>>
>>>>#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
>>>>-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
>>>>-name=mytest -numjobs=32
>>>
>>>Just to clarify, do other tests work OK on the same device?
>>>
>>>Thanks,
>>>Robin.
>>>
>>
>>I was able to get back on the system today. I think I know what the 
>>problem is:
>>
>>[    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu 
>>passthrough mode
>>[    6.324282] iommu: Default domain type: Translated
>>
>>So the new default domain code in 5.4 overrides the iommu quirk code 
>>setting default
>>passthrough. Testing a quick patch that tracks whether the default 
>>domain was set
>>in the quirk code, and leaves it alone if it was. So far it seems to 
>>be working.
>
>Ah, OK. Could you point me at that quirk code? I can't seem to track 
>it down in mainline, and seeing this much leaves me dubious that it's 
>even correct - matching a particular board implies that it's a 
>firmware issue (as far as I'm aware the SMMUs in CN88xx SoCs are 
>usable in general), but if the firmware description is wrong to the 
>point that DMA ops translation doesn't work, then no other translation 
>(e.g. VFIO) is likely to work either. In that case it's simply not 
>safe to enable the SMMU at all, and fudging the default domain type 
>merely hides one symptom of the problem.
>
>Robin.
>

Ugh. It is a RHEL only patch, but for some reason it is applied to the
ark kernel builds as well. Sorry for the noise.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-17 14:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 20:13 arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1 Jerry Snitselaar
2020-02-14 20:58 ` Robin Murphy
2020-02-16 22:11   ` Jerry Snitselaar
2020-02-17 13:08     ` Robin Murphy
2020-02-17 14:58       ` Jerry Snitselaar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).