* RE: I/O error on dd commands
@ 2016-12-08 0:45 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-08 0:45 UTC (permalink / raw)
To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
Hi Max,
> Any interesting logs in dmesg ?
Dynamic debug was used and there was following log in dmesg of target.
[ 584.274459] rdma_rxe: qp#17 state = GET_REQ
[ 584.274461] rdma_rxe: qp#17 state = CHK_PSN
[ 584.274462] rdma_rxe: qp#17 state = CHK_OP_SEQ
[ 584.274464] rdma_rxe: qp#17 state = CHK_OP_VALID
[ 584.274465] rdma_rxe: qp#17 state = CHK_RESOURCE
[ 584.274467] rdma_rxe: qp#17 state = CHK_LENGTH
[ 584.274468] rdma_rxe: qp#17 state = CHK_RKEY
[ 584.274470] rdma_rxe: qp#17 state = EXECUTE
[ 584.274473] rdma_rxe: qp#17 state = COMPLETE
[ 584.274475] rdma_rxe: qp#17 state = ACKNOWLEDGE
[ 584.274496] rdma_rxe: qp#17 state = CLEANUP
[ 584.274498] rdma_rxe: qp#17 state = DONE
[ 584.274499] rdma_rxe: qp#17 state = GET_REQ
[ 584.274500] rdma_rxe: qp#17 state = EXIT
[ 584.275561] nvmet_rdma: IB send queue full (needed 1): queue 0 cntlid 1
thanks,
Haruo.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O error on dd commands
[not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-08 9:25 ` Max Gurtovoy
[not found] ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-08 9:25 UTC (permalink / raw)
To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/8/2016 2:45 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Max,
>
>> Any interesting logs in dmesg ?
>
> Dynamic debug was used and there was following log in dmesg of target.
>
> [ 584.274459] rdma_rxe: qp#17 state = GET_REQ
> [ 584.274461] rdma_rxe: qp#17 state = CHK_PSN
> [ 584.274462] rdma_rxe: qp#17 state = CHK_OP_SEQ
> [ 584.274464] rdma_rxe: qp#17 state = CHK_OP_VALID
> [ 584.274465] rdma_rxe: qp#17 state = CHK_RESOURCE
> [ 584.274467] rdma_rxe: qp#17 state = CHK_LENGTH
> [ 584.274468] rdma_rxe: qp#17 state = CHK_RKEY
> [ 584.274470] rdma_rxe: qp#17 state = EXECUTE
> [ 584.274473] rdma_rxe: qp#17 state = COMPLETE
> [ 584.274475] rdma_rxe: qp#17 state = ACKNOWLEDGE
> [ 584.274496] rdma_rxe: qp#17 state = CLEANUP
> [ 584.274498] rdma_rxe: qp#17 state = DONE
> [ 584.274499] rdma_rxe: qp#17 state = GET_REQ
> [ 584.274500] rdma_rxe: qp#17 state = EXIT
> [ 584.275561] nvmet_rdma: IB send queue full (needed 1): queue 0 cntlid 1
>
> thanks,
> Haruo.
>
Very wierd. can you pring "queue->host_qid" also in nvmet_rdma
nvmet_rdma_execute_command (IB send queue full (needed 1): queue_idx 0
cntlid 1 queue_qid ????).
seems like it's the admin queue but i'm not sure.
Moni,
can you advise regarding rdma_rxe logs ?
thanks,
Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: I/O error on dd commands
[not found] ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-12-16 4:25 ` Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
0 siblings, 0 replies; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-16 4:25 UTC (permalink / raw)
To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
Hi Max and Moni,
"IB send queue full" issue is reproduced easily in vanilla kernel4.9.
> Very wierd. can you pring "queue->host_qid" also in nvmet_rdma
> nvmet_rdma_execute_command (IB send queue full (needed 1): queue_idx 0
> cntlid 1 queue_qid ????).
>
> seems like it's the admin queue but i'm not sure.
>
> Moni,
> can you advise regarding rdma_rxe logs ?
" queue->host_qid" is printed. When there is other information to debug
for this issue, please tell me.
thanks,
Haruo.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O error on dd commands
[not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-07 10:47 ` Max Gurtovoy
0 siblings, 0 replies; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-07 10:47 UTC (permalink / raw)
To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Any interesting logs in dmesg ?
On 12/6/2016 7:55 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Max,
>
> Thank you for your reply.
> This issue reproduced vanilla 4.9-rc8.
>
>> can you try to repro it with iSER ?
>
> I'm sorry. I can't try iSER.
>
>> what is your backing store device ?
>
> I do now using TOSHIBA Enterprise SSD PX04PMB160.
> https://toshiba.semicon-storage.com/ap-en/product/storage-products/enterprise-ssd.html
>
>> is this happens in 1k bs only or in different bs as well ?
>
> This issue is confirming the reproduction in 512k/1024k/2048k/4096k/8192k bs.
> It occurs by more than one machine, so it isn't a failed of a NIC.
> LAN is a connection directly (It isn't a failed of a hub.)
>
>> rxe_req.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>> --- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig 2016-12-05 10:11:38.000000000 +0900
>> +++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c 2016-12-05 10:15:43.000000000 +0900
>> @@ -705,12 +705,12 @@ next_wqe:
>> skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
>> if (unlikely(!skb)) {
>> pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
>> - goto err;
>> + goto err1;
>> }
>>
>> if (fill_packet(qp, wqe, &pkt, skb, payload)) {
>> pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
>> - goto err;
>> + goto err2;
>> }
>>
>> /*
>> @@ -734,15 +734,16 @@ next_wqe:
>> goto exit;
>> }
>>
>> - goto err;
>> + goto err1;
>> }
>>
>> update_state(qp, wqe, &pkt, payload);
>>
>> goto next_wqe;
>>
>> -err:
>> +err2:
>> kfree_skb(skb);
>> +err1:
>> wqe->status = IB_WC_LOC_PROT_ERR;
>> wqe->state = wqe_state_error;
>>
>
> It's unrelated to this issue, please apply this patch.
>
> thanks,
> Haruo.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: I/O error on dd commands
@ 2016-12-06 5:55 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-06 5:55 UTC (permalink / raw)
To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
Hi Max,
Thank you for your reply.
This issue reproduced vanilla 4.9-rc8.
> can you try to repro it with iSER ?
I'm sorry. I can't try iSER.
> what is your backing store device ?
I do now using TOSHIBA Enterprise SSD PX04PMB160.
https://toshiba.semicon-storage.com/ap-en/product/storage-products/enterprise-ssd.html
> is this happens in 1k bs only or in different bs as well ?
This issue is confirming the reproduction in 512k/1024k/2048k/4096k/8192k bs.
It occurs by more than one machine, so it isn't a failed of a NIC.
LAN is a connection directly (It isn't a failed of a hub.)
> rxe_req.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
> --- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig 2016-12-05 10:11:38.000000000 +0900
> +++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c 2016-12-05 10:15:43.000000000 +0900
> @@ -705,12 +705,12 @@ next_wqe:
> skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
> if (unlikely(!skb)) {
> pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
> - goto err;
> + goto err1;
> }
>
> if (fill_packet(qp, wqe, &pkt, skb, payload)) {
> pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
> - goto err;
> + goto err2;
> }
>
> /*
> @@ -734,15 +734,16 @@ next_wqe:
> goto exit;
> }
>
> - goto err;
> + goto err1;
> }
>
> update_state(qp, wqe, &pkt, payload);
>
> goto next_wqe;
>
> -err:
> +err2:
> kfree_skb(skb);
> +err1:
> wqe->status = IB_WC_LOC_PROT_ERR;
> wqe->state = wqe_state_error;
>
It's unrelated to this issue, please apply this patch.
thanks,
Haruo.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O error on dd commands
[not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-05 10:12 ` Max Gurtovoy
0 siblings, 0 replies; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-05 10:12 UTC (permalink / raw)
To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/5/2016 7:41 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Moni,
>
> Does a rxe driver of vanilla 4.9-rc6 fine work?
> When the dd command is tested for a read and write, it'll be the following error.
>
> (read)
> # dd if=/dev/nvme0n1 of=<readfile>.bin bs=1024 count=10000 iflag=direct
>
> blk_update_request: I/O error, dev nvme0n1, sector 1860
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> or
>
> nvme nvme0: failed nvme_keep_alive_end_io error=16391
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> (write)
> # dd if=<writefile>.bin of=/dev/nvme0n1 bs=1024 count=10000 oflag=direct
>
> blk_update_request: I/O error, dev nvme0n1, sector 1860
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> or
>
> nvme nvme0: failed nvme_keep_alive_end_io error=16391
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> I'd like to investigate the root cause of this error, are there any ideas?
Hi Haruo,
can you try to repro it with iSER ?
what is your backing store device ?
is this happens in 1k bs only or in different bs as well ?
thanks,
Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* I/O error on dd commands
@ 2016-12-05 5:41 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-05 5:41 UTC (permalink / raw)
To: monis-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
Hi Moni,
Does a rxe driver of vanilla 4.9-rc6 fine work?
When the dd command is tested for a read and write, it'll be the following error.
(read)
# dd if=/dev/nvme0n1 of=<readfile>.bin bs=1024 count=10000 iflag=direct
blk_update_request: I/O error, dev nvme0n1, sector 1860
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected
or
nvme nvme0: failed nvme_keep_alive_end_io error=16391
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected
(write)
# dd if=<writefile>.bin of=/dev/nvme0n1 bs=1024 count=10000 oflag=direct
blk_update_request: I/O error, dev nvme0n1, sector 1860
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected
or
nvme nvme0: failed nvme_keep_alive_end_io error=16391
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected
I'd like to investigate the root cause of this error, are there any ideas?
(PS)
I'm checking a rxe driver.
typo was found by release of skb in rxe_requester().
Is my patch right?
rxe_req.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig 2016-12-05 10:11:38.000000000 +0900
+++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c 2016-12-05 10:15:43.000000000 +0900
@@ -705,12 +705,12 @@ next_wqe:
skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
if (unlikely(!skb)) {
pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
- goto err;
+ goto err1;
}
if (fill_packet(qp, wqe, &pkt, skb, payload)) {
pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
- goto err;
+ goto err2;
}
/*
@@ -734,15 +734,16 @@ next_wqe:
goto exit;
}
- goto err;
+ goto err1;
}
update_state(qp, wqe, &pkt, payload);
goto next_wqe;
-err:
+err2:
kfree_skb(skb);
+err1:
wqe->status = IB_WC_LOC_PROT_ERR;
wqe->state = wqe_state_error;
--
Haruo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-12-16 4:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08 0:45 I/O error on dd commands Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-08 9:25 ` Max Gurtovoy
[not found] ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-12-16 4:25 ` Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
-- strict thread matches above, loose matches on Subject: below --
2016-12-06 5:55 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-07 10:47 ` Max Gurtovoy
2016-12-05 5:41 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
[not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-05 10:12 ` Max Gurtovoy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.