All of lore.kernel.org
 help / color / mirror / Atom feed
* smmuv1 breakage
@ 2021-06-15  2:21 Stefano Stabellini
  2021-06-15 16:28 ` Rahul Singh
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2021-06-15  2:21 UTC (permalink / raw)
  To: Rahul Singh
  Cc: sstabellini, edgar.iglesias, xen-devel, Bertrand.Marquis, julien, fnuv

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

Hi Rahul,

Unfortunately, after bisecting, I discovered a few more breakages due to
your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
attached the DTB as reference. Please note that I made sure to
cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
the S2CR" during bisection. So the errors are present also on staging.

The first breakage is an error at boot time in smmu.c#find_smmu_master,
see log1. I think it is due to the lack of ability to parse the new smmu
bindings in the old smmu driver.

After removing all the "smmus" and "#stream-id-cells" properties in
device tree, I get past the previous error, everything seems to be OK at
early boot, but I actually get SMMU errors as soon as dom0 starting
using devices:

(XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
(XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
[   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
[   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Do you think you'll be able to help fix them?


You should be able to reproduce the two issues using Xilinx QEMU (but to
be honest I haven't tested it on QEMU yet, I was testing on real
hardware):
- clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
  ./configure  --target-list=aarch64-softmmu
  make
- clone and build git://github.com/Xilinx/qemu-devicetrees.git
- use the attached script to run it
    - kernel can be upstream defconfig 5.10

Cheers,

Stefano

[-- Attachment #2: Type: application/octet-stream, Size: 39521 bytes --]

[-- Attachment #3: Type: text/plain, Size: 5713 bytes --]

(XEN) smmu: /smmu@fd800000: d0: p2maddr 0x000000087bf9a000
(XEN) Data Abort Trap. Syndrome=0x4
(XEN) Walking Hypervisor VA 0x14ed0000fbf9fd20 on CPU0 via TTBR 0x0000000000f3f000
(XEN) 0TH[0x0] = 0x0000000000f42f7f
(XEN) 1ST[0x3] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000024cfa4 smmu.c#find_smmu_master+0x8/0x3c
(XEN) LR:     000000000024d188
(XEN) SP:     00000000003071f0
(XEN) CPSR:   20000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 14ed0000fbf9fd28  X1: 00008000fbfcb040  X2: 00008000fbfcb2b0
(XEN)      X3: 00000000002b3870  X4: 0000000000000000  X5: 0000000000000001
(XEN)      X6: 0000000000000000  X7: 0000000000000000  X8: 00008000fbf81c30
(XEN)      X9: fffffffffffffffa X10: 0101010101010101 X11: 0000000000000020
(XEN)     X12: 0000000000000020 X13: ff00000000000000 X14: 0000000000000030
(XEN)     X15: 0000000000000000 X16: 00000000002b5000 X17: 00000000002b5000
(XEN)     X18: 00000000002b6000 X19: 00008000fbffcbf0 X20: 00000000002b3878
(XEN)     X21: 00008000fbfcb040 X22: 00008000fbf9ffd0 X23: 00008000fbfcb0d0
(XEN)     X24: 00008000fbf9d000 X25: 0000000000000001 X26: 0000000000000001
(XEN)     X27: 0000000000000000 X28: 0000000000000000  FP: 00000000003071f0
(XEN) 
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 000000087bf9a000
(XEN) 
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000000000003a
(XEN)  TTBR0_EL2: 0000000000f3f000
(XEN) 
(XEN)    ESR_EL2: 96000004
(XEN)  HPFAR_EL2: 0000000000f90100
(XEN)    FAR_EL2: 14ed0000fbf9fd20
(XEN) 
(XEN) Xen stack trace from sp=00000000003071f0:
(XEN)    0000000000307220 000000000024d7b0 00008000fbf9d000 00000000fffffff0
(XEN)    00008000fbfcb0b0 0000000800000001 00000000003072a0 000000000024edc8
(XEN)    00008000fbf9d000 00000000fffffff0 00008000fbfcb040 00008000fbfcb0a0
(XEN)    00008000fbfcb0d0 0000000000000000 0000000000000001 0000000000000001
(XEN)    0000000000000000 0000000000000000 00000000003072a0 000000000024ed98
(XEN)    00008000fbf9d000 0000000000307550 00000000003072d0 00000000002c9f7c
(XEN)    00008000fbfcb040 0000000000307550 00008000fbf9d000 0000000000000005
(XEN)    0000000000307390 00000000002ca40c 00008000fbfc8038 0000000000307550
(XEN)    00008000fbf9d000 0000000000000005 00008000fbfcb040 0000000000000000
(XEN)    00008000fbfe2130 0000000000000000 0000000000000000 0000000000000000
(XEN)    00000000002da8e8 00000000fbf6a090 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307380 00000000fd070000 00008000fbfc8038 0000000000030000
(XEN)    00008000fbf9d000 00008000fbf9d000 0000000000000005 00000000002ca308
(XEN)    0000000000307450 00000000002ca40c 00008000fbfc0000 0000000000307550
(XEN)    00008000fbf9d000 0000000000000005 00008000fbfc8038 0000000000000000
(XEN)    00008000fbfe00c4 0000000000000015 0000000000000000 0000000000000000
(XEN)    00000000002da8e8 00000000fbf6a090 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307440 0000000000203ec4 00008000fbfc0000 0000000000307550
(XEN)    00008000fbf9d000 00008000fbf9d000 0000000000000005 00000000002ca308
(XEN)    0000000000307510 00000000002cac70 000000000000a090 0000000000e00000
(XEN)    00000000002daae8 00008000fbf9d000 000000000000000f 0000000000000004
(XEN)    00000000002e8600 0000000000000000 0000000880000000 0000000000000002
(XEN)    00000000002da8e8 000000000022ce50 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307500 00000000002bbcc8 000000000000a090 0000000000e00000
(XEN)    00000000002daae8 00008000fbf9d000 0000000000000005 00000000002cac58
(XEN)    0000000000307df0 00000000002cef64 00008000fbf9d000 00000000002b4780
(XEN)    0000000000348430 0000000000000004 00000000002a84e0 0000000000000000
(XEN)    0000000000000001 00008000fbf9d000 00008000fbf60000 0000000000000000
(XEN)    0000000000000001 0000000020000000 0000000040000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<000000000024cfa4>] smmu.c#find_smmu_master+0x8/0x3c (PC)
(XEN)    [<000000000024d188>] smmu.c#find_smmu_for_device+0x48/0x94 (LR)
(XEN)    [<000000000024d7b0>] smmu.c#arm_smmu_assign_dev+0x58/0xb48
(XEN)    [<000000000024edc8>] iommu_assign_dt_device+0x64/0xc0
(XEN)    [<00000000002c9f7c>] domain_build.c#handle_node+0x310/0x9ec
(XEN)    [<00000000002ca40c>] domain_build.c#handle_node+0x7a0/0x9ec
(XEN)    [<00000000002ca40c>] domain_build.c#handle_node+0x7a0/0x9ec
(XEN)    [<00000000002cac70>] construct_dom0+0x410/0x4bc
(XEN)    [<00000000002cef64>] start_xen+0xbe8/0xcd0
(XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

[-- Attachment #4: Type: application/x-sh, Size: 967 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-15  2:21 smmuv1 breakage Stefano Stabellini
@ 2021-06-15 16:28 ` Rahul Singh
  2021-06-16  1:20   ` Stefano Stabellini
  0 siblings, 1 reply; 7+ messages in thread
From: Rahul Singh @ 2021-06-15 16:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, xen-devel, Bertrand Marquis, Julien Grall, fnuv

Hi Stefano

> On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi Rahul,
> 
> Unfortunately, after bisecting, I discovered a few more breakages due to
> your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
> attached the DTB as reference. Please note that I made sure to
> cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
> the S2CR" during bisection. So the errors are present also on staging.
> 
> The first breakage is an error at boot time in smmu.c#find_smmu_master,
> see log1. I think it is due to the lack of ability to parse the new smmu
> bindings in the old smmu driver.
> 
> After removing all the "smmus" and "#stream-id-cells" properties in
> device tree, I get past the previous error, everything seems to be OK at
> early boot, but I actually get SMMU errors as soon as dom0 starting
> using devices:
> 
> (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
> (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000

 This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"


> [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
> [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> 
> Do you think you'll be able to help fix them?
> 
> 
> You should be able to reproduce the two issues using Xilinx QEMU (but to
> be honest I haven't tested it on QEMU yet, I was testing on real
> hardware):
> - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
>  ./configure  --target-list=aarch64-softmmu
>  make
> - clone and build git://github.com/Xilinx/qemu-devicetrees.git
> - use the attached script to run it
>    - kernel can be upstream defconfig 5.10
> 

I tried to reproduce the issue on Xilinx QEMU as per the steps shared above 
but I am not observing any issue on Xilinx QEMU.

I also tested and confirmed on QEMU that SMMU is configured correctly 
for specifically StreamID “ 0x877” and for other streamIDs.

I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
property in the master device but the "mmu-masters" property is present in the
smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
If you need to add the new smmu binding please add the "iommu-cells”
property in the smmu node and the “iommus” property in the master device.

Can you please share the xen boot logs with me so that I can debug further why the error is observed? 

Regards,
Rahul


> Cheers,
> 
> Stefano<xen.dtb><log1.txt><qemu-run-zynqmp-xilinx-xen.sh>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-15 16:28 ` Rahul Singh
@ 2021-06-16  1:20   ` Stefano Stabellini
  2021-06-22 21:06     ` Stefano Stabellini
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2021-06-16  1:20 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Stefano Stabellini, edgar.iglesias, xen-devel, Bertrand Marquis,
	Julien Grall, fnuv

[-- Attachment #1: Type: text/plain, Size: 4040 bytes --]

On Tue, 15 Jun 2021, Rahul Singh wrote:
> Hi Stefano
> 
> > On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > Hi Rahul,
> > 
> > Unfortunately, after bisecting, I discovered a few more breakages due to
> > your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
> > attached the DTB as reference. Please note that I made sure to
> > cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
> > the S2CR" during bisection. So the errors are present also on staging.
> > 
> > The first breakage is an error at boot time in smmu.c#find_smmu_master,
> > see log1. I think it is due to the lack of ability to parse the new smmu
> > bindings in the old smmu driver.
> > 
> > After removing all the "smmus" and "#stream-id-cells" properties in
> > device tree, I get past the previous error, everything seems to be OK at
> > early boot, but I actually get SMMU errors as soon as dom0 starting
> > using devices:
> > 
> > (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
> > (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
> 
>  This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"
> 
> 
> > [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
> > [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > 
> > Do you think you'll be able to help fix them?
> > 
> > 
> > You should be able to reproduce the two issues using Xilinx QEMU (but to
> > be honest I haven't tested it on QEMU yet, I was testing on real
> > hardware):
> > - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
> >  ./configure  --target-list=aarch64-softmmu
> >  make
> > - clone and build git://github.com/Xilinx/qemu-devicetrees.git
> > - use the attached script to run it
> >    - kernel can be upstream defconfig 5.10
> > 
> 
> I tried to reproduce the issue on Xilinx QEMU as per the steps shared above 
> but I am not observing any issue on Xilinx QEMU.

I tried on QEMU and it doesn't repro. I cannot explain why it works on
QEMU and it fails on real hardware.


> I also tested and confirmed on QEMU that SMMU is configured correctly 
> for specifically StreamID “ 0x877” and for other streamIDs.
> 
> I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
> property in the master device but the "mmu-masters" property is present in the
> smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
> If you need to add the new smmu binding please add the "iommu-cells”
> property in the smmu node and the “iommus” property in the master device.

In regards to the missing "stream-id-cells" property, I shared the wrong
dtb before, sorry. I was running a number of tests and I might have
picked the wrong file. The proper dtb comes with "stream-id-cells" for
the 0x877 device, see attached.



> Can you please share the xen boot logs with me so that I can debug further why the error is observed? 

See attached. I did some debugging and discovered that it crashes while
accessing master->of_node in find_smmu_master. If I revert your series,
the crash goes away. It is very strange because your patches don't touch
find_smmu_master or insert_smmu_master directly.

I did a git reset --hard on the commit "xen/arm: smmuv1: Add a stream
map entry iterator" and it worked, which points to "xen/arm: smmuv1:
Intelligent SMR allocation" being the problem, even if I have the revert
cherry-picked on top. Maybe the revert is not reverting enough?

After this test, I switched back to staging and did:
git revert 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
git revert 0435784cc75dcfef3b5f59c29deb1dbb84265ddb

And it worked! So the issue truly is that
9f6cd4983715cb31f0ea540e6bbb63f799a35d8a doesn't revert "enough".
See "full-revert" for the patch reverting the remaining code. That on
top of staging fixes boot for me.

[-- Attachment #2: Type: text/plain, Size: 9459 bytes --]

Starting kernel ...

 Xen 4.16-unstable
(XEN) Xen version 4.16-unstable (sstabellini@) (aarch64-linux-gnu-gcc (Linaro GCC 5.3-2016.05) 5.3.1 20160412) debug=y Tue Jun 15 17:56:03 PDT 2021
(XEN) Latest ChangeSet: Fri Apr 16 12:25:02 2021 +0100 git:3e6047ddfa
(XEN) build-id: 77b37e543360352920af4c457fc5c29004139357
(XEN) Processor: 00000000410fd034: "ARM Limited", variant: 0x0, part 0xd03,rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000000002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features:  0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 0000000000000131:0000000000011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 0000000003010066
(XEN)   Auxiliary Features: 0000000000000000
(XEN)   Memory Model Features: 0000000010201105 0000000040000000
(XEN)                          0000000001260000 0000000002102211
(XEN)   ISA Features: 0000000002101110 0000000013112111 0000000021232042
(XEN)                 0000000001112131 0000000000011142 0000000000011121
(XEN) Using SMC Calling Convention v1.1
(XEN) Using PSCI v1.1
(XEN) SMP: Allowing 4 CPUs
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 99990 KHz
(XEN) GICv2 initialization:
(XEN)         gic_dist_addr=00000000f9010000
(XEN)         gic_cpu_addr=00000000f9020000
(XEN)         gic_hyp_addr=00000000f9040000
(XEN)         gic_vcpu_addr=00000000f9060000
(XEN)         gic_maintenance_irq=25
(XEN) GICv2: Adjusting CPU interface base to 0xf902f000
(XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b).
(XEN) XSM Framework v1.0.0 initialized
(XEN) Initialising XSM SILO mode
(XEN) Using scheduler: null Scheduler (null)
(XEN) Initializing null scheduler
(XEN) WARNING: This is experimental software in development.
(XEN) Use at your own risk.
(XEN) Allocated console ring of 32 KiB.
(XEN) CPU0: Guest atomics will try 12 times before pausing the domain
(XEN) Bringing up CPU1
(XEN) CPU1: Guest atomics will try 13 times before pausing the domain
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
(XEN) CPU2: Guest atomics will try 13 times before pausing the domain
(XEN) CPU 2 booted.
(XEN) Bringing up CPU3
(XEN) CPU3: Guest atomics will try 13 times before pausing the domain
(XEN) Brought up 4 CPUs
(XEN) CPU 3 booted.
(XEN) smmu: /smmu@fd800000: probing hardware configuration...
(XEN) smmu: /smmu@fd800000: SMMUv2 with:
(XEN) smmu: /smmu@fd800000:     stage 2 translation
(XEN) smmu: /smmu@fd800000:     stream matching with 48 register groups, mask 0x7fff<2>smmu: /smmu@fd800000:    16 context banks (0 stage-2 only)
(XEN) smmu: /smmu@fd800000:     Stage-2: 48-bit IPA -> 48-bit PA
(XEN) smmu: /smmu@fd800000: registered 26 master devices
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) alternatives: Patching with alt table 00000000002dc3b8 -> 00000000002dcbe0
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading d0 kernel from boot module @ 0000000001000000
(XEN) Loading ramdisk from boot module @ 0000000001800000
(XEN) Allocating 1:1 mappings totalling 1024MB for dom0:
(XEN) BANK[0] 0x00000020000000-0x00000060000000 (1024MB)
(XEN) Grant table range: 0x00000000e00000-0x00000000e40000
(XEN) smmu: /smmu@fd800000: d0: p2maddr 0x000000087bf9a000
(XEN) Data Abort Trap. Syndrome=0x4
(XEN) Walking Hypervisor VA 0x14ed0000fbf9fd20 on CPU0 via TTBR 0x0000000000f3f000
(XEN) 0TH[0x0] = 0x0000000000f42f7f
(XEN) 1ST[0x3] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000024cfa4 smmu.c#find_smmu_master+0x8/0x3c
(XEN) LR:     000000000024d188
(XEN) SP:     00000000003071f0
(XEN) CPSR:   20000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 14ed0000fbf9fd28  X1: 00008000fbfcaf40  X2: 00008000fbfcb190
(XEN)      X3: 00000000002b3870  X4: 0000000000000000  X5: 00651f11030c0719
(XEN)      X6: 0000000000000000  X7: 0000000000000000  X8: 00008000f907ec30
(XEN)      X9: fffffffffffffffe X10: 0101010101010101 X11: 0000000000000003
(XEN)     X12: 0000000000000020 X13: ff00000000000000 X14: 0000000000000030
(XEN)     X15: 0000000000000000 X16: 00000000002b5000 X17: 00000000002b5000
(XEN)     X18: 00000000002b6000 X19: 00008000fbffcbf0 X20: 00000000002b3878
(XEN)     X21: 00008000fbfcaf40 X22: 00008000fbf9ffd0 X23: 00008000fbfcafd0
(XEN)     X24: 00008000fbf9d000 X25: 0000000000000001 X26: 0000000000000001
(XEN)     X27: 0000000000000000 X28: 0000000000000000  FP: 00000000003071f0
(XEN) 
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 000000087bf9a000
(XEN) 
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000000000003a
(XEN)  TTBR0_EL2: 0000000000f3f000
(XEN) 
(XEN)    ESR_EL2: 96000004
(XEN)  HPFAR_EL2: 0000000000f90100
(XEN)    FAR_EL2: 14ed0000fbf9fd20
(XEN) 
(XEN) Xen stack trace from sp=00000000003071f0:
(XEN)    0000000000307220 000000000024d7b0 00008000fbf9d000 00000000fffffff0
(XEN)    00008000fbfcafb0 0000000000000000 00000000003072a0 000000000024edc8
(XEN)    00008000fbf9d000 00000000fffffff0 00008000fbfcaf40 00008000fbfcafa0
(XEN)    00008000fbfcafd0 0000000000000001 0000000000000001 0000000000000001
(XEN)    0000000000000000 0000000000000000 00000000003072a0 000000000024ed98
(XEN)    00008000fbf9d000 0000000000307550 00000000003072d0 00000000002c9f7c
(XEN)    00008000fbfcaf40 0000000000307550 00008000fbf9d000 0000000000000005
(XEN)    0000000000307390 00000000002ca40c 00008000fbfc8038 0000000000307550
(XEN)    00008000fbf9d000 0000000000000005 00008000fbfcaf40 0000000000000000
(XEN)    00008000fbfe2130 0000000000000000 0000000000000000 0000000000000000
(XEN)    00000000002da8e8 00000000f907a090 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307380 00000000fd070000 00008000fbfc8038 0000000000030000
(XEN)    00008000fbf9d000 00008000fbf9d000 0000000000000005 00000000002ca308
(XEN)    0000000000307450 00000000002ca40c 00008000fbfc0000 0000000000307550
(XEN)    00008000fbf9d000 0000000000000005 00008000fbfc8038 0000000000000000
(XEN)    00008000fbfe00c4 0000000000000015 0000000000000000 0000000000000000
(XEN)    00000000002da8e8 00000000f907a090 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307440 0000000000203ec4 00008000fbfc0000 0000000000307550
(XEN)    00008000fbf9d000 00008000fbf9d000 0000000000000005 00000000002ca308
(XEN)    0000000000307510 00000000002cac70 000000000000a090 0000000000e00000
(XEN)    00000000002daae8 00008000fbf9d000 000000000000000f 0000000000000004
(XEN)    00000000002e8600 0000000000000000 0000000880000000 0000000000000002
(XEN)    00000000002da8e8 000000000022ce50 00000000002da8d8 00000000002d9b80
(XEN)    0000000000307500 00000000002bbcc8 000000000000a090 0000000000e00000
(XEN)    00000000002daae8 00008000fbf9d000 0000000000000005 00000000002cac58
(XEN)    0000000000307df0 00000000002cef64 00008000fbf9d000 00000000002b4780
(XEN)    0000000000348430 0000000000000004 00000000002a84e0 0000000000000000
(XEN)    0000000000000001 00008000fbf9d000 00008000f9070000 0000000000000000
(XEN)    0000000000000001 0000000020000000 0000000040000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<000000000024cfa4>] smmu.c#find_smmu_master+0x8/0x3c (PC)
(XEN)    [<000000000024d188>] smmu.c#find_smmu_for_device+0x48/0x94 (LR)
(XEN)    [<000000000024d7b0>] smmu.c#arm_smmu_assign_dev+0x58/0xb48
(XEN)    [<000000000024edc8>] iommu_assign_dt_device+0x64/0xc0
(XEN)    [<00000000002c9f7c>] domain_build.c#handle_node+0x310/0x9ec
(XEN)    [<00000000002ca40c>] domain_build.c#handle_node+0x7a0/0x9ec
(XEN)    [<00000000002ca40c>] domain_build.c#handle_node+0x7a0/0x9ec
(XEN)    [<00000000002cac70>] construct_dom0+0x410/0x4bc
(XEN)    [<00000000002cef64>] start_xen+0xbe8/0xcd0
(XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


[-- Attachment #3: Type: application/octet-stream, Size: 40337 bytes --]

[-- Attachment #4: Type: text/plain, Size: 9376 bytes --]

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 1a68c2ab3b..07b8785380 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -597,7 +597,6 @@ enum arm_smmu_arch_version {
 };
 
 struct arm_smmu_s2cr {
-	int				count;
 	enum arm_smmu_s2cr_type		type;
 	enum arm_smmu_s2cr_privcfg	privcfg;
 	u8				cbndx;
@@ -614,7 +613,6 @@ struct arm_smmu_smr {
 };
 
 struct arm_smmu_master_cfg {
-	struct arm_smmu_device		*smmu;
 	int				num_streamids;
 	u16				streamids[MAX_MASTER_STREAMIDS];
 	s16				smendx[MAX_MASTER_STREAMIDS];
@@ -657,7 +655,6 @@ struct arm_smmu_device {
 	u16				smr_mask_mask;
 	struct arm_smmu_smr		*smrs;
 	struct arm_smmu_s2cr		*s2crs;
-	spinlock_t			stream_map_lock;
 
 	unsigned long			s1_input_size;
 	unsigned long			s1_output_size;
@@ -1410,6 +1407,23 @@ static void arm_smmu_domain_destroy(struct iommu_domain *domain)
 	kfree(smmu_domain);
 }
 
+static int arm_smmu_alloc_smr(struct arm_smmu_device *smmu)
+{
+	int i;
+
+	for (i = 0; i < smmu->num_mapping_groups; i++)
+		if (!cmpxchg(&smmu->smrs[i].valid, false, true))
+			return i;
+
+	return INVALID_SMENDX;
+}
+
+static void arm_smmu_free_smr(struct arm_smmu_device *smmu, int idx)
+{
+	writel_relaxed(~SMR_VALID, ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_SMR(idx));
+	write_atomic(&smmu->smrs[idx].valid, false);
+}
+
 static void arm_smmu_write_smr(struct arm_smmu_device *smmu, int idx)
 {
 	struct arm_smmu_smr *smr = smmu->smrs + idx;
@@ -1438,132 +1452,98 @@ static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx)
 		arm_smmu_write_smr(smmu, idx);
 }
 
-static int arm_smmu_find_sme(struct arm_smmu_device *smmu, u16 id, u16 mask)
+static int arm_smmu_master_alloc_smes(struct arm_smmu_device *smmu,
+				      struct arm_smmu_master_cfg *cfg)
 {
 	struct arm_smmu_smr *smrs = smmu->smrs;
-	int i, free_idx = -ENOSPC;
+	int i, idx;
 
-	/* Stream indexing is blissfully easy */
-	if (!smrs)
-		return id;
+	/* Allocate the SMRs on the SMMU */
+	for_each_cfg_sme(cfg, i, idx) {
+		if (idx != INVALID_SMENDX)
+			return -EEXIST;
 
-	/* Validating SMRs is... less so */
-	for (i = 0; i < smmu->num_mapping_groups; ++i) {
-		if (!smrs[i].valid) {
-			/*
-			 * Note the first free entry we come across, which
-			 * we'll claim in the end if nothing else matches.
-			 */
-			if (free_idx < 0)
-				free_idx = i;
+		/* ...except on stream indexing hardware, of course */
+		if (!smrs) {
+			cfg->smendx[i] = cfg->streamids[i];
 			continue;
 		}
-		/*
-		 * If the new entry is _entirely_ matched by an existing entry,
-		 * then reuse that, with the guarantee that there also cannot
-		 * be any subsequent conflicting entries. In normal use we'd
-		 * expect simply identical entries for this case, but there's
-		 * no harm in accommodating the generalisation.
-		 */
-		if ((mask & smrs[i].mask) == mask &&
-		    !((id ^ smrs[i].id) & ~smrs[i].mask))
-			return i;
-		/*
-		 * If the new entry has any other overlap with an existing one,
-		 * though, then there always exists at least one stream ID
-		 * which would cause a conflict, and we can't allow that risk.
-		 */
-		if (!((id ^ smrs[i].id) & ~(smrs[i].mask | mask)))
-			return -EINVAL;
-	}
 
-	return free_idx;
-}
-
-static bool arm_smmu_free_sme(struct arm_smmu_device *smmu, int idx)
-{
-	if (--smmu->s2crs[idx].count)
-		return false;
-
-	smmu->s2crs[idx] = s2cr_init_val;
-	if (smmu->smrs)
-		smmu->smrs[idx].valid = false;
-
-	return true;
-}
-
-static int arm_smmu_master_alloc_smes(struct device *dev)
-{
-	struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);
-	struct arm_smmu_device *smmu = cfg->smmu;
-	struct arm_smmu_smr *smrs = smmu->smrs;
-	int i, idx, ret;
-
-	spin_lock(&smmu->stream_map_lock);
-	/* Figure out a viable stream map entry allocation */
-	for_each_cfg_sme(cfg, i, idx) {
-		if (idx != INVALID_SMENDX) {
-			ret = -EEXIST;
-			goto out_err;
+		idx = arm_smmu_alloc_smr(smmu);
+		if (IS_ERR_VALUE(idx)) {
+			dev_err(smmu->dev, "failed to allocate free SMR\n");
+			goto err_free_smrs;
 		}
+		cfg->smendx[i] = idx;
 
-		ret = arm_smmu_find_sme(smmu, cfg->streamids[i], 0);
-		if (ret < 0)
-			goto out_err;
-
-		idx = ret;
-		if (smrs && smmu->s2crs[idx].count == 0) {
-			smrs[idx].id = cfg->streamids[i];
-			smrs[idx].mask = 0; /* We don't currently share SMRs */
-			smrs[idx].valid = true;
-		}
-		smmu->s2crs[idx].count++;
-		cfg->smendx[i] = (s16)idx;
+		smrs[idx].id = cfg->streamids[i];
+		smrs[idx].mask = 0; /* We don't currently share SMRs */
 	}
 
+	if (!smrs)
+		return 0;
+
 	/* It worked! Now, poke the actual hardware */
-	for_each_cfg_sme(cfg, i, idx) {
-		arm_smmu_write_sme(smmu, idx);
-	}
+	for_each_cfg_sme(cfg, i, idx)
+		arm_smmu_write_smr(smmu, idx);
 
-	spin_unlock(&smmu->stream_map_lock);
 	return 0;
 
-out_err:
+err_free_smrs:
 	while (i--) {
-		arm_smmu_free_sme(smmu, cfg->smendx[i]);
+		arm_smmu_free_smr(smmu, cfg->smendx[i]);
 		cfg->smendx[i] = INVALID_SMENDX;
 	}
-	spin_unlock(&smmu->stream_map_lock);
-	return ret;
+	return -ENOSPC;
 }
 
-static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
+static void arm_smmu_master_free_smes(struct arm_smmu_device *smmu,
+				      struct arm_smmu_master_cfg *cfg)
 {
-    struct arm_smmu_device *smmu = cfg->smmu;
 	int i, idx;
 
-	spin_lock(&smmu->stream_map_lock);
+	/*
+	 * We *must* clear the S2CR first, because freeing the SMR means
+	 * that it can be re-allocated immediately.
+	 */
 	for_each_cfg_sme(cfg, i, idx) {
-		if (arm_smmu_free_sme(smmu, idx))
-			arm_smmu_write_sme(smmu, idx);
+		/* An IOMMU group is torn down by the first device to be removed */
+		if (idx == INVALID_SMENDX)
+			return;
+
+		smmu->s2crs[idx] = s2cr_init_val;
+		arm_smmu_write_s2cr(smmu, idx);
+	}
+	/* Sync S2CR updates before touching anything else */
+	__iowmb();
+
+	/* Invalidate the SMRs before freeing back to the allocator */
+	for_each_cfg_sme(cfg, i, idx) {
+		if (smmu->smrs)
+			arm_smmu_free_smr(smmu, idx);
+
 		cfg->smendx[i] = INVALID_SMENDX;
 	}
-	spin_unlock(&smmu->stream_map_lock);
 }
 
 static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
 				      struct arm_smmu_master_cfg *cfg)
 {
+	int i, idx, ret = 0;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s2cr *s2cr = smmu->s2crs;
 	enum arm_smmu_s2cr_type type = S2CR_TYPE_TRANS;
 	u8 cbndx = smmu_domain->cfg.cbndx;
-	int i, idx;
+
+	if (cfg->smendx[0] == INVALID_SMENDX)
+		ret = arm_smmu_master_alloc_smes(smmu, cfg);
+	if (ret)
+		return ret;
 
 	for_each_cfg_sme(cfg, i, idx) {
+		/* Devices in an IOMMU group may already be configured */
 		if (type == s2cr[idx].type && cbndx == s2cr[idx].cbndx)
-			continue;
+			break;
 
 		s2cr[idx].type = type ;
 		s2cr[idx].privcfg = S2CR_PRIVCFG_UNPRIV;
@@ -1622,10 +1602,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
 {
+	struct arm_smmu_device *smmu = find_smmu_for_device(dev);
 	struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);
 
-	if (cfg)
-		arm_smmu_master_free_smes(cfg);
+	if (smmu && cfg)
+		arm_smmu_master_free_smes(smmu, cfg);
 
 }
 
@@ -1960,17 +1941,25 @@ static int arm_smmu_add_device(struct device *dev)
 	struct arm_smmu_master_cfg *cfg;
 	struct iommu_group *group;
 	void (*releasefn)(void *) = NULL;
+	int ret;
 
 	smmu = find_smmu_for_device(dev);
 	if (!smmu)
 		return -ENODEV;
 
+	group = iommu_group_alloc();
+	if (IS_ERR(group)) {
+		dev_err(dev, "Failed to allocate IOMMU group\n");
+		return PTR_ERR(group);
+	}
+
 	if (dev_is_pci(dev)) {
 		struct pci_dev *pdev = to_pci_dev(dev);
 
 		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 		if (!cfg) {
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto out_put_group;
 		}
 
 		cfg->num_streamids = 1;
@@ -1981,30 +1970,24 @@ static int arm_smmu_add_device(struct device *dev)
 		pci_for_each_dma_alias(pdev, __arm_smmu_get_pci_sid,
 				       &cfg->streamids[0]);
 		releasefn = __arm_smmu_release_pci_iommudata;
-		cfg->smmu = smmu;
 	} else {
 		struct arm_smmu_master *master;
 
 		master = find_smmu_master(smmu, dev->of_node);
 		if (!master) {
-			return -ENODEV;
+			ret = -ENODEV;
+			goto out_put_group;
 		}
 
 		cfg = &master->cfg;
-		cfg->smmu = smmu;
-	}
-
-	group = iommu_group_alloc();
-	if (IS_ERR(group)) {
-		dev_err(dev, "Failed to allocate IOMMU group\n");
-		return PTR_ERR(group);
 	}
 
 	iommu_group_set_iommudata(group, cfg, releasefn);
-	iommu_group_add_device(group, dev);
-	iommu_group_put(group);
+	ret = iommu_group_add_device(group, dev);
 
-	return arm_smmu_master_alloc_smes(dev);
+out_put_group:
+	iommu_group_put(group);
+	return ret;
 }
 
 #if 0 /* Xen: We don't support remove device for now. Will be useful for PCI */
@@ -2237,7 +2220,6 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		smmu->s2crs[i] = s2cr_init_val;
 
 	smmu->num_mapping_groups = size;
-	spin_lock_init(&smmu->stream_map_lock);
 
 	/* ID1 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-16  1:20   ` Stefano Stabellini
@ 2021-06-22 21:06     ` Stefano Stabellini
  2021-06-23  8:09       ` Rahul Singh
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2021-06-22 21:06 UTC (permalink / raw)
  To: Rahul.Singh
  Cc: edgar.iglesias, xen-devel, Bertrand Marquis, Julien Grall, fnuv,
	sstabellini

[-- Attachment #1: Type: text/plain, Size: 4531 bytes --]

Hi Rahul,

Do you have an opinion on how we should move forward on this?

Do you think it is OK to go for a full revert of "xen/arm: smmuv1:
Intelligent SMR allocation" or do you think it is best to go with an
alternative fix? If so, do you have something in mind?



On Tue, 15 Jun 2021, Stefano Stabellini wrote:
> On Tue, 15 Jun 2021, Rahul Singh wrote:
> > Hi Stefano
> > 
> > > On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > > 
> > > Hi Rahul,
> > > 
> > > Unfortunately, after bisecting, I discovered a few more breakages due to
> > > your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
> > > attached the DTB as reference. Please note that I made sure to
> > > cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
> > > the S2CR" during bisection. So the errors are present also on staging.
> > > 
> > > The first breakage is an error at boot time in smmu.c#find_smmu_master,
> > > see log1. I think it is due to the lack of ability to parse the new smmu
> > > bindings in the old smmu driver.
> > > 
> > > After removing all the "smmus" and "#stream-id-cells" properties in
> > > device tree, I get past the previous error, everything seems to be OK at
> > > early boot, but I actually get SMMU errors as soon as dom0 starting
> > > using devices:
> > > 
> > > (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
> > > (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
> > 
> >  This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"
> > 
> > 
> > > [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
> > > [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > > 
> > > Do you think you'll be able to help fix them?
> > > 
> > > 
> > > You should be able to reproduce the two issues using Xilinx QEMU (but to
> > > be honest I haven't tested it on QEMU yet, I was testing on real
> > > hardware):
> > > - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
> > >  ./configure  --target-list=aarch64-softmmu
> > >  make
> > > - clone and build git://github.com/Xilinx/qemu-devicetrees.git
> > > - use the attached script to run it
> > >    - kernel can be upstream defconfig 5.10
> > > 
> > 
> > I tried to reproduce the issue on Xilinx QEMU as per the steps shared above 
> > but I am not observing any issue on Xilinx QEMU.
> 
> I tried on QEMU and it doesn't repro. I cannot explain why it works on
> QEMU and it fails on real hardware.
> 
> 
> > I also tested and confirmed on QEMU that SMMU is configured correctly 
> > for specifically StreamID “ 0x877” and for other streamIDs.
> > 
> > I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
> > property in the master device but the "mmu-masters" property is present in the
> > smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
> > If you need to add the new smmu binding please add the "iommu-cells”
> > property in the smmu node and the “iommus” property in the master device.
> 
> In regards to the missing "stream-id-cells" property, I shared the wrong
> dtb before, sorry. I was running a number of tests and I might have
> picked the wrong file. The proper dtb comes with "stream-id-cells" for
> the 0x877 device, see attached.
> 
> 
> 
> > Can you please share the xen boot logs with me so that I can debug further why the error is observed? 
> 
> See attached. I did some debugging and discovered that it crashes while
> accessing master->of_node in find_smmu_master. If I revert your series,
> the crash goes away. It is very strange because your patches don't touch
> find_smmu_master or insert_smmu_master directly.
> 
> I did a git reset --hard on the commit "xen/arm: smmuv1: Add a stream
> map entry iterator" and it worked, which points to "xen/arm: smmuv1:
> Intelligent SMR allocation" being the problem, even if I have the revert
> cherry-picked on top. Maybe the revert is not reverting enough?
> 
> After this test, I switched back to staging and did:
> git revert 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
> git revert 0435784cc75dcfef3b5f59c29deb1dbb84265ddb
> 
> And it worked! So the issue truly is that
> 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a doesn't revert "enough".
> See "full-revert" for the patch reverting the remaining code. That on
> top of staging fixes boot for me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-22 21:06     ` Stefano Stabellini
@ 2021-06-23  8:09       ` Rahul Singh
  2021-06-23 16:15         ` Rahul Singh
  0 siblings, 1 reply; 7+ messages in thread
From: Rahul Singh @ 2021-06-23  8:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, xen-devel, Bertrand Marquis, Julien Grall, fnuv

Hi Stefano,

> On 22 Jun 2021, at 10:06 pm, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi Rahul,
> 
> Do you have an opinion on how we should move forward on this?
> 
> Do you think it is OK to go for a full revert of "xen/arm: smmuv1:
> Intelligent SMR allocation" or do you think it is best to go with an
> alternative fix? If so, do you have something in mind?
> 

Sorry for the late reply I was working on another high-priority task. 
I will work on this will try to fix the issue. I will update you within 2-3 days. 

Regards,
Rahul

> 
> 
> On Tue, 15 Jun 2021, Stefano Stabellini wrote:
>> On Tue, 15 Jun 2021, Rahul Singh wrote:
>>> Hi Stefano
>>> 
>>>> On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>> 
>>>> Hi Rahul,
>>>> 
>>>> Unfortunately, after bisecting, I discovered a few more breakages due to
>>>> your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
>>>> attached the DTB as reference. Please note that I made sure to
>>>> cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
>>>> the S2CR" during bisection. So the errors are present also on staging.
>>>> 
>>>> The first breakage is an error at boot time in smmu.c#find_smmu_master,
>>>> see log1. I think it is due to the lack of ability to parse the new smmu
>>>> bindings in the old smmu driver.
>>>> 
>>>> After removing all the "smmus" and "#stream-id-cells" properties in
>>>> device tree, I get past the previous error, everything seems to be OK at
>>>> early boot, but I actually get SMMU errors as soon as dom0 starting
>>>> using devices:
>>>> 
>>>> (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
>>>> (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
>>> 
>>> This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"
>>> 
>>> 
>>>> [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
>>>> [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>> 
>>>> Do you think you'll be able to help fix them?
>>>> 
>>>> 
>>>> You should be able to reproduce the two issues using Xilinx QEMU (but to
>>>> be honest I haven't tested it on QEMU yet, I was testing on real
>>>> hardware):
>>>> - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
>>>> ./configure  --target-list=aarch64-softmmu
>>>> make
>>>> - clone and build git://github.com/Xilinx/qemu-devicetrees.git
>>>> - use the attached script to run it
>>>>   - kernel can be upstream defconfig 5.10
>>>> 
>>> 
>>> I tried to reproduce the issue on Xilinx QEMU as per the steps shared above 
>>> but I am not observing any issue on Xilinx QEMU.
>> 
>> I tried on QEMU and it doesn't repro. I cannot explain why it works on
>> QEMU and it fails on real hardware.
>> 
>> 
>>> I also tested and confirmed on QEMU that SMMU is configured correctly 
>>> for specifically StreamID “ 0x877” and for other streamIDs.
>>> 
>>> I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
>>> property in the master device but the "mmu-masters" property is present in the
>>> smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
>>> If you need to add the new smmu binding please add the "iommu-cells”
>>> property in the smmu node and the “iommus” property in the master device.
>> 
>> In regards to the missing "stream-id-cells" property, I shared the wrong
>> dtb before, sorry. I was running a number of tests and I might have
>> picked the wrong file. The proper dtb comes with "stream-id-cells" for
>> the 0x877 device, see attached.
>> 
>> 
>> 
>>> Can you please share the xen boot logs with me so that I can debug further why the error is observed? 
>> 
>> See attached. I did some debugging and discovered that it crashes while
>> accessing master->of_node in find_smmu_master. If I revert your series,
>> the crash goes away. It is very strange because your patches don't touch
>> find_smmu_master or insert_smmu_master directly.
>> 
>> I did a git reset --hard on the commit "xen/arm: smmuv1: Add a stream
>> map entry iterator" and it worked, which points to "xen/arm: smmuv1:
>> Intelligent SMR allocation" being the problem, even if I have the revert
>> cherry-picked on top. Maybe the revert is not reverting enough?
>> 
>> After this test, I switched back to staging and did:
>> git revert 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
>> git revert 0435784cc75dcfef3b5f59c29deb1dbb84265ddb
>> 
>> And it worked! So the issue truly is that
>> 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a doesn't revert "enough".
>> See "full-revert" for the patch reverting the remaining code. That on
>> top of staging fixes boot for me.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-23  8:09       ` Rahul Singh
@ 2021-06-23 16:15         ` Rahul Singh
  2021-06-23 20:57           ` Stefano Stabellini
  0 siblings, 1 reply; 7+ messages in thread
From: Rahul Singh @ 2021-06-23 16:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, xen-devel, Bertrand Marquis, Julien Grall, fnuv


[-- Attachment #1.1: Type: text/plain, Size: 5542 bytes --]

Hi Stefano,

> On 23 Jun 2021, at 9:09 am, Rahul Singh <Rahul.Singh@arm.com> wrote:
>
> Hi Stefano,
>
>> On 22 Jun 2021, at 10:06 pm, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>
>> Hi Rahul,
>>
>> Do you have an opinion on how we should move forward on this?
>>
>> Do you think it is OK to go for a full revert of "xen/arm: smmuv1:
>> Intelligent SMR allocation" or do you think it is best to go with an
>> alternative fix? If so, do you have something in mind?
>>
>
> Sorry for the late reply I was working on another high-priority task.
> I will work on this will try to fix the issue. I will update you within 2-3 days.

I again checked my patches and found out that while allocating SMR I by mistake
allocated one SMR for each SMMU device but we have to allocate the number of
SMR based on supported stream matching register for each SMMU device.

This might be causing the issue. As I don’t have any Xilinx hardware and on
QEMU/Juno issue is not reproducible.Can you please test the attached patch and
let me know if it works.



Regards,
Rahul

>
> Regards,
> Rahul
>
>>
>>
>> On Tue, 15 Jun 2021, Stefano Stabellini wrote:
>>> On Tue, 15 Jun 2021, Rahul Singh wrote:
>>>> Hi Stefano
>>>>
>>>>> On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>>>
>>>>> Hi Rahul,
>>>>>
>>>>> Unfortunately, after bisecting, I discovered a few more breakages due to
>>>>> your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
>>>>> attached the DTB as reference. Please note that I made sure to
>>>>> cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
>>>>> the S2CR" during bisection. So the errors are present also on staging.
>>>>>
>>>>> The first breakage is an error at boot time in smmu.c#find_smmu_master,
>>>>> see log1. I think it is due to the lack of ability to parse the new smmu
>>>>> bindings in the old smmu driver.
>>>>>
>>>>> After removing all the "smmus" and "#stream-id-cells" properties in
>>>>> device tree, I get past the previous error, everything seems to be OK at
>>>>> early boot, but I actually get SMMU errors as soon as dom0 starting
>>>>> using devices:
>>>>>
>>>>> (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
>>>>> (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
>>>>
>>>> This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"
>>>>
>>>>
>>>>> [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
>>>>> [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>
>>>>> Do you think you'll be able to help fix them?
>>>>>
>>>>>
>>>>> You should be able to reproduce the two issues using Xilinx QEMU (but to
>>>>> be honest I haven't tested it on QEMU yet, I was testing on real
>>>>> hardware):
>>>>> - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
>>>>> ./configure  --target-list=aarch64-softmmu
>>>>> make
>>>>> - clone and build git://github.com/Xilinx/qemu-devicetrees.git
>>>>> - use the attached script to run it
>>>>>  - kernel can be upstream defconfig 5.10
>>>>>
>>>>
>>>> I tried to reproduce the issue on Xilinx QEMU as per the steps shared above
>>>> but I am not observing any issue on Xilinx QEMU.
>>>
>>> I tried on QEMU and it doesn't repro. I cannot explain why it works on
>>> QEMU and it fails on real hardware.
>>>
>>>
>>>> I also tested and confirmed on QEMU that SMMU is configured correctly
>>>> for specifically StreamID “ 0x877” and for other streamIDs.
>>>>
>>>> I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
>>>> property in the master device but the "mmu-masters" property is present in the
>>>> smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
>>>> If you need to add the new smmu binding please add the "iommu-cells”
>>>> property in the smmu node and the “iommus” property in the master device.
>>>
>>> In regards to the missing "stream-id-cells" property, I shared the wrong
>>> dtb before, sorry. I was running a number of tests and I might have
>>> picked the wrong file. The proper dtb comes with "stream-id-cells" for
>>> the 0x877 device, see attached.
>>>
>>>
>>>
>>>> Can you please share the xen boot logs with me so that I can debug further why the error is observed?
>>>
>>> See attached. I did some debugging and discovered that it crashes while
>>> accessing master->of_node in find_smmu_master. If I revert your series,
>>> the crash goes away. It is very strange because your patches don't touch
>>> find_smmu_master or insert_smmu_master directly.
>>>
>>> I did a git reset --hard on the commit "xen/arm: smmuv1: Add a stream
>>> map entry iterator" and it worked, which points to "xen/arm: smmuv1:
>>> Intelligent SMR allocation" being the problem, even if I have the revert
>>> cherry-picked on top. Maybe the revert is not reverting enough?
>>>
>>> After this test, I switched back to staging and did:
>>> git revert 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
>>> git revert 0435784cc75dcfef3b5f59c29deb1dbb84265ddb
>>>
>>> And it worked! So the issue truly is that
>>> 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a doesn't revert "enough".
>>> See "full-revert" for the patch reverting the remaining code. That on
>>> top of staging fixes boot for me.
>


[-- Attachment #1.2: Type: text/html, Size: 7928 bytes --]

[-- Attachment #2: 0001-xen-arm-smmuv1-Fixed-SMR-allocation.patch --]
[-- Type: application/octet-stream, Size: 1582 bytes --]

From 7fbd3b278a5db095da1b2d163b1aca5da7d5bb93 Mon Sep 17 00:00:00 2001
Message-Id: <7fbd3b278a5db095da1b2d163b1aca5da7d5bb93.1624464053.git.rahul.singh@arm.com>
From: Rahul Singh <rahul.singh@arm.com>
Date: Wed, 23 Jun 2021 16:54:55 +0100
Subject: [PATCH] xen/arm: smmuv1: Fixed SMR allocation

SMR allocation should be based on number of supported stream matching/
indexing register for each SMMU device.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
---
 xen/drivers/passthrough/arm/smmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index ab5b1bc434..8ac782041f 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -149,6 +149,7 @@ typedef enum irqreturn irqreturn_t;
 #define kzalloc(size, flags)		_xzalloc(size, sizeof(void *))
 #define devm_kzalloc(dev, size, flags)	_xzalloc(size, sizeof(void *))
 #define kmalloc_array(size, n, flags)	_xmalloc_array(size, sizeof(void *), n)
+#define kzalloc_array(size, n, flags)	_xzalloc_array(size, sizeof(void *), n)
 
 static void __iomem *devm_ioremap_resource(struct device *dev,
 					   struct resource *res)
@@ -2185,7 +2186,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		smmu->smr_mask_mask = smr >> SMR_MASK_SHIFT;
 
 		/* Zero-initialised to mark as invalid */
-		smmu->smrs = devm_kzalloc(smmu->dev, sizeof(*smmu->smrs), GFP_KERNEL);
+		smmu->smrs = kzalloc_array(sizeof(*smmu->smrs), size, GFP_KERNEL);
 		if (!smmu->smrs)
 			return -ENOMEM;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: smmuv1 breakage
  2021-06-23 16:15         ` Rahul Singh
@ 2021-06-23 20:57           ` Stefano Stabellini
  0 siblings, 0 replies; 7+ messages in thread
From: Stefano Stabellini @ 2021-06-23 20:57 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Stefano Stabellini, edgar.iglesias, xen-devel, Bertrand Marquis,
	Julien Grall, fnuv

[-- Attachment #1: Type: text/plain, Size: 5856 bytes --]

On Wed, 23 Jun 2021, Rahul Singh wrote:
> Hi Stefano,
> 
> > On 23 Jun 2021, at 9:09 am, Rahul Singh <Rahul.Singh@arm.com> wrote:
> >
> > Hi Stefano,
> >
> >> On 22 Jun 2021, at 10:06 pm, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >>
> >> Hi Rahul,
> >>
> >> Do you have an opinion on how we should move forward on this?
> >>
> >> Do you think it is OK to go for a full revert of "xen/arm: smmuv1:
> >> Intelligent SMR allocation" or do you think it is best to go with an
> >> alternative fix? If so, do you have something in mind?
> >>
> >
> > Sorry for the late reply I was working on another high-priority task.
> > I will work on this will try to fix the issue. I will update you within 2-3 days.
> 
> I again checked my patches and found out that while allocating SMR I by mistake
> allocated one SMR for each SMMU device but we have to allocate the number of
> SMR based on supported stream matching register for each SMMU device.
> 
> This might be causing the issue. As I don’t have any Xilinx hardware and on
> QEMU/Juno issue is not reproducible.Can you please test the attached patch and
> let me know if it works.

Yes this solves the issue for me, thank you!!


Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>


> >
> > Regards,
> > Rahul
> >
> >>
> >>
> >> On Tue, 15 Jun 2021, Stefano Stabellini wrote:
> >>> On Tue, 15 Jun 2021, Rahul Singh wrote:
> >>>> Hi Stefano
> >>>>
> >>>>> On 15 Jun 2021, at 3:21 am, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >>>>>
> >>>>> Hi Rahul,
> >>>>>
> >>>>> Unfortunately, after bisecting, I discovered a few more breakages due to
> >>>>> your smmuv1 series (commits e889809b .. 3e6047ddf) on Xilinx ZynqMP. I
> >>>>> attached the DTB as reference. Please note that I made sure to
> >>>>> cherry-pick "xen/arm: smmuv1: Revert associating the group pointer with
> >>>>> the S2CR" during bisection. So the errors are present also on staging.
> >>>>>
> >>>>> The first breakage is an error at boot time in smmu.c#find_smmu_master,
> >>>>> see log1. I think it is due to the lack of ability to parse the new smmu
> >>>>> bindings in the old smmu driver.
> >>>>>
> >>>>> After removing all the "smmus" and "#stream-id-cells" properties in
> >>>>> device tree, I get past the previous error, everything seems to be OK at
> >>>>> early boot, but I actually get SMMU errors as soon as dom0 starting
> >>>>> using devices:
> >>>>>
> >>>>> (XEN) smmu: /smmu@fd800000: Unexpected global fault, this could be serious
> >>>>> (XEN) smmu: /smmu@fd800000:     GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000877, GFSYNR2 0x00000000
> >>>>
> >>>> This fault is "Unidentified stream fault” for StreamID “ 0x877” that means SMMU SMR is not configured for streamID “0x877"
> >>>>
> >>>>
> >>>>> [   10.419681] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
> >>>>> [   10.426452] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>>>>
> >>>>> Do you think you'll be able to help fix them?
> >>>>>
> >>>>>
> >>>>> You should be able to reproduce the two issues using Xilinx QEMU (but to
> >>>>> be honest I haven't tested it on QEMU yet, I was testing on real
> >>>>> hardware):
> >>>>> - clone and compile xilinx QEMU https://github.com/Xilinx/qemu.git
> >>>>> ./configure  --target-list=aarch64-softmmu
> >>>>> make
> >>>>> - clone and build git://github.com/Xilinx/qemu-devicetrees.git
> >>>>> - use the attached script to run it
> >>>>>  - kernel can be upstream defconfig 5.10
> >>>>>
> >>>>
> >>>> I tried to reproduce the issue on Xilinx QEMU as per the steps shared above
> >>>> but I am not observing any issue on Xilinx QEMU.
> >>>
> >>> I tried on QEMU and it doesn't repro. I cannot explain why it works on
> >>> QEMU and it fails on real hardware.
> >>>
> >>>
> >>>> I also tested and confirmed on QEMU that SMMU is configured correctly
> >>>> for specifically StreamID “ 0x877” and for other streamIDs.
> >>>>
> >>>> I check the xen.dtb shared by you and found out the there is no "stream-id-cells”
> >>>> property in the master device but the "mmu-masters" property is present in the
> >>>> smmu node. For legacy smmu binding we need both "stream-id-cells” and "mmu-masters”.
> >>>> If you need to add the new smmu binding please add the "iommu-cells”
> >>>> property in the smmu node and the “iommus” property in the master device.
> >>>
> >>> In regards to the missing "stream-id-cells" property, I shared the wrong
> >>> dtb before, sorry. I was running a number of tests and I might have
> >>> picked the wrong file. The proper dtb comes with "stream-id-cells" for
> >>> the 0x877 device, see attached.
> >>>
> >>>
> >>>
> >>>> Can you please share the xen boot logs with me so that I can debug further why the error is observed?
> >>>
> >>> See attached. I did some debugging and discovered that it crashes while
> >>> accessing master->of_node in find_smmu_master. If I revert your series,
> >>> the crash goes away. It is very strange because your patches don't touch
> >>> find_smmu_master or insert_smmu_master directly.
> >>>
> >>> I did a git reset --hard on the commit "xen/arm: smmuv1: Add a stream
> >>> map entry iterator" and it worked, which points to "xen/arm: smmuv1:
> >>> Intelligent SMR allocation" being the problem, even if I have the revert
> >>> cherry-picked on top. Maybe the revert is not reverting enough?
> >>>
> >>> After this test, I switched back to staging and did:
> >>> git revert 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a
> >>> git revert 0435784cc75dcfef3b5f59c29deb1dbb84265ddb
> >>>
> >>> And it worked! So the issue truly is that
> >>> 9f6cd4983715cb31f0ea540e6bbb63f799a35d8a doesn't revert "enough".
> >>> See "full-revert" for the patch reverting the remaining code. That on
> >>> top of staging fixes boot for me.
> >
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-23 20:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15  2:21 smmuv1 breakage Stefano Stabellini
2021-06-15 16:28 ` Rahul Singh
2021-06-16  1:20   ` Stefano Stabellini
2021-06-22 21:06     ` Stefano Stabellini
2021-06-23  8:09       ` Rahul Singh
2021-06-23 16:15         ` Rahul Singh
2021-06-23 20:57           ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.