All of lore.kernel.org
 help / color / mirror / Atom feed
* blk_mq_reinit_tagset during NVMEoF port toggling
       [not found] <85228a2d-053f-cf79-292e-725aad6ee5fa@mellanox.com>
@ 2017-08-28  7:40 ` Sagi Grimberg
  2017-08-28  8:34   ` Israel Rukshin
  0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2017-08-28  7:40 UTC (permalink / raw)


> Hi guys,

Hi Max, CCing linux-nvme.

> we have encountered a bug during our port toggling test with MP using 
> NVMEoF over RDMA (1 IO queue repro it quickly).
> We have been receiving local protection errors dumps after failing back 
> to the port that became active again (it's not the retransmission issue 
> we fixed in the past). After debugging it we saw that the requests have 
> been doing a reinit process (dereg_mr/alloc_mr).
> But somehow the req->mr->need_inval is still true in the beginning of 
> nvme_rdma_queue_rq function. This shouldn't happen since we should have 
> perform the dereg_mr/alloc_mr in the reinit func and set it to false.
> We don't see this issue in kernel older than 4.11 so before bisecting:

Which code base is this max?

is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?

if so, maybe it is possible that not all requests are being reinitialized.
Can you reproduce with the following applied:
--
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d0be72ccb091..420ef106057e 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -302,8 +302,10 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
                         continue;

                 for (j = 0; j < tags->nr_tags; j++) {
-                       if (!tags->static_rqs[j])
+                       if (!tags->static_rqs[j]) {
+                               pr_info("passing rq %d\n", j);
                                 continue;
+                       }

                         ret = set->ops->reinit_request(set->driver_data,
                                                 tags->static_rqs[j]);
--

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* blk_mq_reinit_tagset during NVMEoF port toggling
  2017-08-28  7:40 ` blk_mq_reinit_tagset during NVMEoF port toggling Sagi Grimberg
@ 2017-08-28  8:34   ` Israel Rukshin
  2017-08-28 13:30     ` Sagi Grimberg
  0 siblings, 1 reply; 4+ messages in thread
From: Israel Rukshin @ 2017-08-28  8:34 UTC (permalink / raw)


On 8/28/2017 10:40 AM, Sagi Grimberg wrote:
>> Hi guys,
>
> Hi Max, CCing linux-nvme.
Hi Sagi,
>
>> we have encountered a bug during our port toggling test with MP using 
>> NVMEoF over RDMA (1 IO queue repro it quickly).
>> We have been receiving local protection errors dumps after failing 
>> back to the port that became active again (it's not the 
>> retransmission issue we fixed in the past). After debugging it we saw 
>> that the requests have been doing a reinit process (dereg_mr/alloc_mr).
>> But somehow the req->mr->need_inval is still true in the beginning of 
>> nvme_rdma_queue_rq function. This shouldn't happen since we should 
>> have perform the dereg_mr/alloc_mr in the reinit func and set it to 
>> false.
>> We don't see this issue in kernel older than 4.11 so before bisecting:
>
> Which code base is this max?
The code base is kernel 4.13.0-rc3.
>
> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
Yes.
>
> if so, maybe it is possible that not all requests are being 
> reinitialized.
> Can you reproduce with the following applied:
We reproduced this issue with similar prints and we didn't see them.
blk_mq_reinit_tagset() went over all the the static requests.

Thanks,
Israel.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* blk_mq_reinit_tagset during NVMEoF port toggling
  2017-08-28  8:34   ` Israel Rukshin
@ 2017-08-28 13:30     ` Sagi Grimberg
  2017-08-28 15:49       ` Max Gurtovoy
  0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2017-08-28 13:30 UTC (permalink / raw)



>>> Hi guys,
>>
>> Hi Max, CCing linux-nvme.
> Hi Sagi,
>>
>>> we have encountered a bug during our port toggling test with MP using 
>>> NVMEoF over RDMA (1 IO queue repro it quickly).
>>> We have been receiving local protection errors dumps after failing 
>>> back to the port that became active again (it's not the 
>>> retransmission issue we fixed in the past). After debugging it we saw 
>>> that the requests have been doing a reinit process (dereg_mr/alloc_mr).
>>> But somehow the req->mr->need_inval is still true in the beginning of 
>>> nvme_rdma_queue_rq function. This shouldn't happen since we should 
>>> have perform the dereg_mr/alloc_mr in the reinit func and set it to 
>>> false.
>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>
>> Which code base is this max?
> The code base is kernel 4.13.0-rc3.
>>
>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
> Yes.
>>
>> if so, maybe it is possible that not all requests are being 
>> reinitialized.
>> Can you reproduce with the following applied:
> We reproduced this issue with similar prints and we didn't see them.
> blk_mq_reinit_tagset() went over all the the static requests.

If mr->need_inval is true in queue_rq it means that reinit was not
called on it, did you see a request that performed reinit but
still had need_inval == true?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* blk_mq_reinit_tagset during NVMEoF port toggling
  2017-08-28 13:30     ` Sagi Grimberg
@ 2017-08-28 15:49       ` Max Gurtovoy
  0 siblings, 0 replies; 4+ messages in thread
From: Max Gurtovoy @ 2017-08-28 15:49 UTC (permalink / raw)




On 8/28/2017 4:30 PM, Sagi Grimberg wrote:
> 
>>>> Hi guys,
>>>
>>> Hi Max, CCing linux-nvme.
>> Hi Sagi,
>>>
>>>> we have encountered a bug during our port toggling test with MP 
>>>> using NVMEoF over RDMA (1 IO queue repro it quickly).
>>>> We have been receiving local protection errors dumps after failing 
>>>> back to the port that became active again (it's not the 
>>>> retransmission issue we fixed in the past). After debugging it we 
>>>> saw that the requests have been doing a reinit process 
>>>> (dereg_mr/alloc_mr).
>>>> But somehow the req->mr->need_inval is still true in the beginning 
>>>> of nvme_rdma_queue_rq function. This shouldn't happen since we 
>>>> should have perform the dereg_mr/alloc_mr in the reinit func and set 
>>>> it to false.
>>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>>
>>> Which code base is this max?
>> The code base is kernel 4.13.0-rc3.
>>>
>>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
>> Yes.
>>>
>>> if so, maybe it is possible that not all requests are being 
>>> reinitialized.
>>> Can you reproduce with the following applied:
>> We reproduced this issue with similar prints and we didn't see them.
>> blk_mq_reinit_tagset() went over all the the static requests.
> 
> If mr->need_inval is true in queue_rq it means that reinit was not
> called on it, did you see a request that performed reinit but
> still had need_inval == true?

No. The requests that are "reinited" are the static_rqs.
 From what I saw in the code, the MP has a cloned request queue that it 
uses but those requests are not "reinited" (only the original requests).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-28 15:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <85228a2d-053f-cf79-292e-725aad6ee5fa@mellanox.com>
2017-08-28  7:40 ` blk_mq_reinit_tagset during NVMEoF port toggling Sagi Grimberg
2017-08-28  8:34   ` Israel Rukshin
2017-08-28 13:30     ` Sagi Grimberg
2017-08-28 15:49       ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.