All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: I/O error on dd commands
@ 2016-12-08  0:45 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
       [not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-08  0:45 UTC (permalink / raw)
  To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil

Hi Max,

> Any interesting logs in dmesg ?

Dynamic debug was used and there was following log in dmesg of target.

[  584.274459] rdma_rxe: qp#17 state = GET_REQ
[  584.274461] rdma_rxe: qp#17 state = CHK_PSN
[  584.274462] rdma_rxe: qp#17 state = CHK_OP_SEQ
[  584.274464] rdma_rxe: qp#17 state = CHK_OP_VALID
[  584.274465] rdma_rxe: qp#17 state = CHK_RESOURCE
[  584.274467] rdma_rxe: qp#17 state = CHK_LENGTH
[  584.274468] rdma_rxe: qp#17 state = CHK_RKEY
[  584.274470] rdma_rxe: qp#17 state = EXECUTE
[  584.274473] rdma_rxe: qp#17 state = COMPLETE
[  584.274475] rdma_rxe: qp#17 state = ACKNOWLEDGE
[  584.274496] rdma_rxe: qp#17 state = CLEANUP
[  584.274498] rdma_rxe: qp#17 state = DONE
[  584.274499] rdma_rxe: qp#17 state = GET_REQ
[  584.274500] rdma_rxe: qp#17 state = EXIT
[  584.275561] nvmet_rdma: IB send queue full (needed 1): queue 0 cntlid 1

thanks,
Haruo.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O error on dd commands
       [not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-08  9:25   ` Max Gurtovoy
       [not found]     ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-08  9:25 UTC (permalink / raw)
  To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
	monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 12/8/2016 2:45 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Max,
>
>> Any interesting logs in dmesg ?
>
> Dynamic debug was used and there was following log in dmesg of target.
>
> [  584.274459] rdma_rxe: qp#17 state = GET_REQ
> [  584.274461] rdma_rxe: qp#17 state = CHK_PSN
> [  584.274462] rdma_rxe: qp#17 state = CHK_OP_SEQ
> [  584.274464] rdma_rxe: qp#17 state = CHK_OP_VALID
> [  584.274465] rdma_rxe: qp#17 state = CHK_RESOURCE
> [  584.274467] rdma_rxe: qp#17 state = CHK_LENGTH
> [  584.274468] rdma_rxe: qp#17 state = CHK_RKEY
> [  584.274470] rdma_rxe: qp#17 state = EXECUTE
> [  584.274473] rdma_rxe: qp#17 state = COMPLETE
> [  584.274475] rdma_rxe: qp#17 state = ACKNOWLEDGE
> [  584.274496] rdma_rxe: qp#17 state = CLEANUP
> [  584.274498] rdma_rxe: qp#17 state = DONE
> [  584.274499] rdma_rxe: qp#17 state = GET_REQ
> [  584.274500] rdma_rxe: qp#17 state = EXIT
> [  584.275561] nvmet_rdma: IB send queue full (needed 1): queue 0 cntlid 1
>
> thanks,
> Haruo.
>

Very wierd. can you pring "queue->host_qid" also in nvmet_rdma  
nvmet_rdma_execute_command (IB send queue full (needed 1): queue_idx 0  
cntlid 1 queue_qid ????).

seems like it's the admin queue but i'm not sure.

Moni,
can you advise regarding rdma_rxe logs ?

thanks,
Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: I/O error on dd commands
       [not found]     ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-12-16  4:25       ` Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
  0 siblings, 0 replies; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-16  4:25 UTC (permalink / raw)
  To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil

Hi Max and Moni,

"IB send queue full" issue is reproduced easily in vanilla kernel4.9.

> Very wierd. can you pring "queue->host_qid" also in nvmet_rdma  
> nvmet_rdma_execute_command (IB send queue full (needed 1): queue_idx 0  
> cntlid 1 queue_qid ????).
> 
> seems like it's the admin queue but i'm not sure.
> 
> Moni,
> can you advise regarding rdma_rxe logs ?

" queue->host_qid" is printed. When there is other information to debug
for this issue, please tell me.

thanks,
Haruo.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O error on dd commands
       [not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-07 10:47   ` Max Gurtovoy
  0 siblings, 0 replies; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-07 10:47 UTC (permalink / raw)
  To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
	monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Any interesting logs in dmesg ?

On 12/6/2016 7:55 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Max,
>
> Thank you for your reply.
> This issue reproduced vanilla 4.9-rc8.
>
>> can you try to repro it with iSER ?
>
> I'm sorry. I can't try iSER.
>
>> what is your backing store device ?
>
> I do now using TOSHIBA Enterprise SSD PX04PMB160.
> https://toshiba.semicon-storage.com/ap-en/product/storage-products/enterprise-ssd.html
>
>> is this happens in 1k bs only or in different bs as well ?
>
> This issue is confirming the reproduction in 512k/1024k/2048k/4096k/8192k bs.
> It occurs by more than one machine, so it isn't a failed of a NIC.
> LAN is a connection directly (It isn't a failed of a hub.)
>
>> rxe_req.c |    9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>> --- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig      2016-12-05 10:11:38.000000000 +0900
>> +++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c   2016-12-05 10:15:43.000000000 +0900
>> @@ -705,12 +705,12 @@ next_wqe:
>>         skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
>>         if (unlikely(!skb)) {
>>                 pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
>> -               goto err;
>> +               goto err1;
>>         }
>>
>>         if (fill_packet(qp, wqe, &pkt, skb, payload)) {
>>                 pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
>> -               goto err;
>> +               goto err2;
>>         }
>>
>>         /*
>> @@ -734,15 +734,16 @@ next_wqe:
>>                         goto exit;
>>                 }
>>
>> -               goto err;
>> +               goto err1;
>>         }
>>
>>         update_state(qp, wqe, &pkt, payload);
>>
>>         goto next_wqe;
>>
>> -err:
>> +err2:
>>         kfree_skb(skb);
>> +err1:
>>         wqe->status = IB_WC_LOC_PROT_ERR;
>>         wqe->state = wqe_state_error;
>>
>
> It's unrelated to this issue, please apply this patch.
>
> thanks,
> Haruo.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: I/O error on dd commands
@ 2016-12-06  5:55 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
       [not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-06  5:55 UTC (permalink / raw)
  To: maxg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil

Hi Max,

Thank you for your reply.
This issue reproduced vanilla 4.9-rc8.

> can you try to repro it with iSER ?

I'm sorry. I can't try iSER.

> what is your backing store device ?

I do now using TOSHIBA Enterprise SSD PX04PMB160.
https://toshiba.semicon-storage.com/ap-en/product/storage-products/enterprise-ssd.html

> is this happens in 1k bs only or in different bs as well ?

This issue is confirming the reproduction in 512k/1024k/2048k/4096k/8192k bs.
It occurs by more than one machine, so it isn't a failed of a NIC. 
LAN is a connection directly (It isn't a failed of a hub.)

> rxe_req.c |    9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
> --- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig      2016-12-05 10:11:38.000000000 +0900
> +++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c   2016-12-05 10:15:43.000000000 +0900
> @@ -705,12 +705,12 @@ next_wqe:
>         skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
>         if (unlikely(!skb)) {
>                 pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
> -               goto err;
> +               goto err1;
>         }
> 
>         if (fill_packet(qp, wqe, &pkt, skb, payload)) {
>                 pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
> -               goto err;
> +               goto err2;
>         }
> 
>         /*
> @@ -734,15 +734,16 @@ next_wqe:
>                         goto exit;
>                 }
> 
> -               goto err;
> +               goto err1;
>         }
> 
>         update_state(qp, wqe, &pkt, payload);
> 
>         goto next_wqe;
> 
> -err:
> +err2:
>         kfree_skb(skb);
> +err1:
>         wqe->status = IB_WC_LOC_PROT_ERR;
>         wqe->state = wqe_state_error;
> 

It's unrelated to this issue, please apply this patch.

thanks,
Haruo.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O error on dd commands
       [not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
@ 2016-12-05 10:12   ` Max Gurtovoy
  0 siblings, 0 replies; 7+ messages in thread
From: Max Gurtovoy @ 2016-12-05 10:12 UTC (permalink / raw)
  To: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil,
	monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 12/5/2016 7:41 AM, Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil@public.gmane.org wrote:
> Hi Moni,
>
> Does a rxe driver of vanilla 4.9-rc6 fine work?
> When the dd command is tested for a read and write, it'll be the following error.
>
> (read)
> # dd if=/dev/nvme0n1 of=<readfile>.bin bs=1024 count=10000 iflag=direct
>
> blk_update_request: I/O error, dev nvme0n1, sector 1860
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> or
>
> nvme nvme0: failed nvme_keep_alive_end_io error=16391
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> (write)
> # dd if=<writefile>.bin of=/dev/nvme0n1 bs=1024 count=10000 oflag=direct
>
> blk_update_request: I/O error, dev nvme0n1, sector 1860
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> or
>
> nvme nvme0: failed nvme_keep_alive_end_io error=16391
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
>
> I'd like to investigate the root cause of this error, are there any ideas?

Hi Haruo,

can you try to repro it with iSER ?
what is your backing store device ?
is this happens in 1k bs only or in different bs as well ?

thanks,
Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* I/O error on dd commands
@ 2016-12-05  5:41 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
       [not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil @ 2016-12-05  5:41 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil

Hi Moni,

Does a rxe driver of vanilla 4.9-rc6 fine work?
When the dd command is tested for a read and write, it'll be the following error.

(read)
# dd if=/dev/nvme0n1 of=<readfile>.bin bs=1024 count=10000 iflag=direct

blk_update_request: I/O error, dev nvme0n1, sector 1860
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected

or

nvme nvme0: failed nvme_keep_alive_end_io error=16391
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected

(write)
# dd if=<writefile>.bin of=/dev/nvme0n1 bs=1024 count=10000 oflag=direct

blk_update_request: I/O error, dev nvme0n1, sector 1860
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected

or

nvme nvme0: failed nvme_keep_alive_end_io error=16391
nvme nvme0: reconnecting in 10 seconds
nvme nvme0: Successfully reconnected

I'd like to investigate the root cause of this error, are there any ideas?

(PS)
I'm checking a rxe driver.
typo was found by release of skb in rxe_requester().
Is my patch right?

rxe_req.c |    9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c.orig      2016-12-05 10:11:38.000000000 +0900
+++ linux-4.9-rc7/drivers/infiniband/sw/rxe/rxe_req.c   2016-12-05 10:15:43.000000000 +0900
@@ -705,12 +705,12 @@ next_wqe:
        skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
        if (unlikely(!skb)) {
                pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
-               goto err;
+               goto err1;
        }

        if (fill_packet(qp, wqe, &pkt, skb, payload)) {
                pr_debug("qp#%d Error during fill packet\n", qp_num(qp));
-               goto err;
+               goto err2;
        }

        /*
@@ -734,15 +734,16 @@ next_wqe:
                        goto exit;
                }

-               goto err;
+               goto err1;
        }

        update_state(qp, wqe, &pkt, payload);

        goto next_wqe;

-err:
+err2:
        kfree_skb(skb);
+err1:
        wqe->status = IB_WC_LOC_PROT_ERR;
        wqe->state = wqe_state_error;


--
Haruo

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-16  4:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08  0:45 I/O error on dd commands Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
     [not found] ` <45c451dc71cc42b5bb24e385a160249a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-08  9:25   ` Max Gurtovoy
     [not found]     ` <50499552-0842-a9f0-e46b-30b427b29ca9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-12-16  4:25       ` Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
  -- strict thread matches above, loose matches on Subject: below --
2016-12-06  5:55 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
     [not found] ` <0980d1e6e7d34e098900ee293aa0e487-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-07 10:47   ` Max Gurtovoy
2016-12-05  5:41 Tomita.Haruo-IGagC74glE2s6Rmoc/2Z03gSJqDPrsil
     [not found] ` <b8b48041a15444bc9c62176d6807433a-DqNMWkYM789gYsFYm0uEO7jjLBE8jN/0@public.gmane.org>
2016-12-05 10:12   ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.