* blk_mq_reinit_tagset during NVMEoF port toggling
[not found] <85228a2d-053f-cf79-292e-725aad6ee5fa@mellanox.com>
@ 2017-08-28 7:40 ` Sagi Grimberg
2017-08-28 8:34 ` Israel Rukshin
0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2017-08-28 7:40 UTC (permalink / raw)
> Hi guys,
Hi Max, CCing linux-nvme.
> we have encountered a bug during our port toggling test with MP using
> NVMEoF over RDMA (1 IO queue repro it quickly).
> We have been receiving local protection errors dumps after failing back
> to the port that became active again (it's not the retransmission issue
> we fixed in the past). After debugging it we saw that the requests have
> been doing a reinit process (dereg_mr/alloc_mr).
> But somehow the req->mr->need_inval is still true in the beginning of
> nvme_rdma_queue_rq function. This shouldn't happen since we should have
> perform the dereg_mr/alloc_mr in the reinit func and set it to false.
> We don't see this issue in kernel older than 4.11 so before bisecting:
Which code base is this max?
is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
if so, maybe it is possible that not all requests are being reinitialized.
Can you reproduce with the following applied:
--
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d0be72ccb091..420ef106057e 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -302,8 +302,10 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
continue;
for (j = 0; j < tags->nr_tags; j++) {
- if (!tags->static_rqs[j])
+ if (!tags->static_rqs[j]) {
+ pr_info("passing rq %d\n", j);
continue;
+ }
ret = set->ops->reinit_request(set->driver_data,
tags->static_rqs[j]);
--
^ permalink raw reply related [flat|nested] 4+ messages in thread
* blk_mq_reinit_tagset during NVMEoF port toggling
2017-08-28 7:40 ` blk_mq_reinit_tagset during NVMEoF port toggling Sagi Grimberg
@ 2017-08-28 8:34 ` Israel Rukshin
2017-08-28 13:30 ` Sagi Grimberg
0 siblings, 1 reply; 4+ messages in thread
From: Israel Rukshin @ 2017-08-28 8:34 UTC (permalink / raw)
On 8/28/2017 10:40 AM, Sagi Grimberg wrote:
>> Hi guys,
>
> Hi Max, CCing linux-nvme.
Hi Sagi,
>
>> we have encountered a bug during our port toggling test with MP using
>> NVMEoF over RDMA (1 IO queue repro it quickly).
>> We have been receiving local protection errors dumps after failing
>> back to the port that became active again (it's not the
>> retransmission issue we fixed in the past). After debugging it we saw
>> that the requests have been doing a reinit process (dereg_mr/alloc_mr).
>> But somehow the req->mr->need_inval is still true in the beginning of
>> nvme_rdma_queue_rq function. This shouldn't happen since we should
>> have perform the dereg_mr/alloc_mr in the reinit func and set it to
>> false.
>> We don't see this issue in kernel older than 4.11 so before bisecting:
>
> Which code base is this max?
The code base is kernel 4.13.0-rc3.
>
> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
Yes.
>
> if so, maybe it is possible that not all requests are being
> reinitialized.
> Can you reproduce with the following applied:
We reproduced this issue with similar prints and we didn't see them.
blk_mq_reinit_tagset() went over all the the static requests.
Thanks,
Israel.
^ permalink raw reply [flat|nested] 4+ messages in thread
* blk_mq_reinit_tagset during NVMEoF port toggling
2017-08-28 8:34 ` Israel Rukshin
@ 2017-08-28 13:30 ` Sagi Grimberg
2017-08-28 15:49 ` Max Gurtovoy
0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2017-08-28 13:30 UTC (permalink / raw)
>>> Hi guys,
>>
>> Hi Max, CCing linux-nvme.
> Hi Sagi,
>>
>>> we have encountered a bug during our port toggling test with MP using
>>> NVMEoF over RDMA (1 IO queue repro it quickly).
>>> We have been receiving local protection errors dumps after failing
>>> back to the port that became active again (it's not the
>>> retransmission issue we fixed in the past). After debugging it we saw
>>> that the requests have been doing a reinit process (dereg_mr/alloc_mr).
>>> But somehow the req->mr->need_inval is still true in the beginning of
>>> nvme_rdma_queue_rq function. This shouldn't happen since we should
>>> have perform the dereg_mr/alloc_mr in the reinit func and set it to
>>> false.
>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>
>> Which code base is this max?
> The code base is kernel 4.13.0-rc3.
>>
>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
> Yes.
>>
>> if so, maybe it is possible that not all requests are being
>> reinitialized.
>> Can you reproduce with the following applied:
> We reproduced this issue with similar prints and we didn't see them.
> blk_mq_reinit_tagset() went over all the the static requests.
If mr->need_inval is true in queue_rq it means that reinit was not
called on it, did you see a request that performed reinit but
still had need_inval == true?
^ permalink raw reply [flat|nested] 4+ messages in thread
* blk_mq_reinit_tagset during NVMEoF port toggling
2017-08-28 13:30 ` Sagi Grimberg
@ 2017-08-28 15:49 ` Max Gurtovoy
0 siblings, 0 replies; 4+ messages in thread
From: Max Gurtovoy @ 2017-08-28 15:49 UTC (permalink / raw)
On 8/28/2017 4:30 PM, Sagi Grimberg wrote:
>
>>>> Hi guys,
>>>
>>> Hi Max, CCing linux-nvme.
>> Hi Sagi,
>>>
>>>> we have encountered a bug during our port toggling test with MP
>>>> using NVMEoF over RDMA (1 IO queue repro it quickly).
>>>> We have been receiving local protection errors dumps after failing
>>>> back to the port that became active again (it's not the
>>>> retransmission issue we fixed in the past). After debugging it we
>>>> saw that the requests have been doing a reinit process
>>>> (dereg_mr/alloc_mr).
>>>> But somehow the req->mr->need_inval is still true in the beginning
>>>> of nvme_rdma_queue_rq function. This shouldn't happen since we
>>>> should have perform the dereg_mr/alloc_mr in the reinit func and set
>>>> it to false.
>>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>>
>>> Which code base is this max?
>> The code base is kernel 4.13.0-rc3.
>>>
>>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
>> Yes.
>>>
>>> if so, maybe it is possible that not all requests are being
>>> reinitialized.
>>> Can you reproduce with the following applied:
>> We reproduced this issue with similar prints and we didn't see them.
>> blk_mq_reinit_tagset() went over all the the static requests.
>
> If mr->need_inval is true in queue_rq it means that reinit was not
> called on it, did you see a request that performed reinit but
> still had need_inval == true?
No. The requests that are "reinited" are the static_rqs.
From what I saw in the code, the MP has a cloned request queue that it
uses but those requests are not "reinited" (only the original requests).
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-08-28 15:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <85228a2d-053f-cf79-292e-725aad6ee5fa@mellanox.com>
2017-08-28 7:40 ` blk_mq_reinit_tagset during NVMEoF port toggling Sagi Grimberg
2017-08-28 8:34 ` Israel Rukshin
2017-08-28 13:30 ` Sagi Grimberg
2017-08-28 15:49 ` Max Gurtovoy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.