FastLinQ: possible duplicate flush of FastReg and LocalInv

* FastLinQ: possible duplicate flush of FastReg and LocalInv
@ 2021-03-16 19:58 Chuck Lever III
  2021-03-17  8:54 ` Bernard Metzler
  2021-03-17 15:14 ` Chuck Lever III
  0 siblings, 2 replies; 6+ messages in thread
From: Chuck Lever III @ 2021-03-16 19:58 UTC (permalink / raw)
  To: linux-rdma

Hi-

I've been trying to track down some crashes when running NFS/RDMA
tests over FastLinQ devices in iWARP mode. To make it stressful,
I've enabled disconnect injection, where rpcrdma injects a
connection disconnect every so often.

As part of a disconnect event, the Receive and Send queues are
drained. Sometimes I see a duplicate flush for one or more of
memory registration ops. This is not a big deal for FastReq
because its completion handler is basically a no-op.

But for LocalInv this is a problem. On a flushed completion, the
MR is destroyed. If the completion occurs again, of course, all
kinds of badness happens because we're DMA-unmapping twice,
touching memory that has already been freed, and deleting from a
list_head that is poisonous.

The last straw is that wc_localinv_done calls the generic RPC layer
to indicate that an RPC Reply is ready. The duplicate flush
dereferences one or more NULL pointers.

Doesn't the verbs API contract stipulate that every posted WR gets
exactly one completion? I don't see this behavior with other
providers.

Thanks for any advice.

--
Chuck Lever

^ permalink raw reply	[flat|nested] 6+ messages in thread