On Tue, Jun 20, 2017 at 10:58:47AM +0300, Sagi Grimberg wrote: > > > > Hi Robert, > > > > > > > I ran into this with 4.9.32 when I rebooted the target. I tested > > > > 4.12-rc6 and this particular error seems to have been resolved, but I > > > > now get a new one on the initiator. This one doesn't seem as > > > > impactful. > > > > > > > > [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 > > > > > > Max, Leon, > > > > > > Care to parse this syndrome for us? ;) > > > > Here the parsed output, it says that it was access to mkey which is > > free. > > > > ======== cqe_with_error ======== > > wqe_id : 0x0 > > srqn_usr_index : 0x0 > > byte_cnt : 0x0 > > hw_error_syndrome : 0x93 > > hw_syndrome_type : 0x0 > > vendor_error_syndrome : 0x52 > > Can you share the check that correlates to the vendor+hw syndrome? mkey.free == 1 > > > syndrome : LOCAL_PROTECTION_ERROR (0x4) > > s_wqe_opcode : SEND (0xa) > > That's interesting, the opcode is a send operation. I'm assuming > that this is immediate-data write? Robert, did this happen when > you issued >4k writes to the target?