nvmet panics during high load

* nvmet panics during high load
@ 2017-07-26  9:33 Alon Horev
  2017-07-27 13:05 ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Alon Horev @ 2017-07-26  9:33 UTC (permalink / raw)

Hi All,

This is my first post on this mailing list. Let me know if this is the
wrong place or format to post bugs in.

We're running nvmef using RDMA on kernel 4.11.8.
We found a zero-dereference bug in nvmet during high load and
identified the root cause:
The location according to current linux master (fd2b2c57) is
drivers/nvme/target/rdma.c at function nvmet_rdma_get_rsp line 170.
list_first_entry is called on the list of free responses (free_rsps)
which is empty and obviously unexpected. I added an assert to validate
that and also tested a hack that enlarges the queue times 10 and it
seemed to solve it.
It's probably not a leak but a miscalculation of the size of the queue
(queue->recv_queue_size * 2). Can anyone explain the rationale behind
this calculation? Is the queue assumed to never be empty?

I'd happily submit a patch. Just want to make sure it's the right one.

Thanks, Alon Horev

^ permalink raw reply	[flat|nested] 7+ messages in thread