From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============6110690550518472111==" MIME-Version: 1.0 From: Yang, Ziye Subject: [SPDK] Re: Print backtrace in SPDK Date: Thu, 10 Sep 2020 01:30:40 +0000 Message-ID: In-Reply-To: ADC0394B-2634-42CA-85B7-94181FB9B426@vmware.com List-ID: To: spdk@lists.01.org --===============6110690550518472111== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Wenhua, If the errno from sendmsg in _sock_flush is (ENOBUFS, 105), I think the iss= ue you faced may be similar with this issue (https://github.com/spdk/spdk/i= ssues/1592). You may try the patch provided by Jeffry Molanus (https://rev= iew.spdk.io/gerrit/c/spdk/spdk/+/4129) and see whether this patch can fix y= our issue. = Thanks. Best Regards Ziye Yang = -----Original Message----- From: Wenhua Liu = Sent: Wednesday, August 26, 2020 12:52 PM To: Storage Performance Development Kit Subject: [SPDK] Re: Print backtrace in SPDK I did not check errno. The only thing I knew is _sock_flush returns -1 whic= h is the return value of sendmsg. Thanks, -Wenhua =EF=BB=BFOn 8/25/20, 9:31 PM, "Yang, Ziye" wrote: Hi Wenhua, What's error number when you see that sendmsg function returns -1 when = you use posix socket implmentation? = Best Regards Ziye Yang = -----Original Message----- From: Wenhua Liu = Sent: Wednesday, August 26, 2020 12:27 PM To: Storage Performance Development Kit Subject: [SPDK] Re: Print backtrace in SPDK Hi Ziye, Back to April/May, I used SPDK 20.01 (the first release supported FUSED= operation) in a VM and ran into this issue once in a while. Recently, in order to test NVMe Abort, I updated the SPDK in that VM to= 20.07 and I started seeing this issue consistently. Maybe this is because = the change at our side that makes the issue easier to reproduce. I spent a lot time debugging this issue and found in wire data, the TCP= /IP FIN flag is set in the TCP packet in response to an NVME READ command. = As FIN flag is set when closing TCP connection. With this information, I fo= und it's the function nvmf_tcp_close_qpair close the TCP connection. To fig= ure out how this function is called, I wanted to print stack trace but coul= d not find a way, so I sent an email to the SPDK community asking for a sol= ution. Later I used some other way and figured out the call path which poin= ts where the problem happens. I noticed the zero copy thing and tried to disable it but did not help = (I can try it again to confirm). I started thinking if my VM itself has pro= blem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the probl= em still exists in this new target. As I could not find how sendmsg works a= nd I noticed there is a uring based socket implementation. I wanted to give= it a try so I asked you. I will let you know if disabling zero copy will help. Thanks, -Wenhua On 8/25/20, 6:52 PM, "Yang, Ziye" wrote: Hi Wenhua, Did you reproduce the issue you mentioned in last email with same V= M environment (OS) and same SPDK version? You mention that there is no iss= ue with uring, but there is issue with posix on the same SPDK version? Can= you reproduce the issue with latest version in SPDK master branch. I think that the current difference with uring and posix is: For th= e posix implementation, it uses the zero copy feature. Could you do some ex= periments to disable the zero copy feature manually in posix.c like the fol= lowing shows. Then we can firstly eliminate whether there is issue with zer= o copy feature on the target side. Thanks. #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) //#define SPDK_ZEROCOPY #endif Best Regards Ziye Yang = -----Original Message----- From: Wenhua Liu = Sent: Wednesday, August 26, 2020 8:20 AM To: Storage Performance Development Kit Subject: [SPDK] Re: Print backtrace in SPDK Hi Ziye, I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.= 44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/s= pdk$ I downloaded, buit and installed liburing from source. git clone https://nam04.safelinks.protection.outlook.com/?url=3Dhtt= ps%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&data=3D02%7C01%7Cliuw%40vm= ware.com%7C2b9ea5455283477b7a6808d84978df30%7Cb39138ca3cee4b4aa4d6cd83d9dd6= 2f0%7C0%7C0%7C637340130778719667&sdata=3DWTKvtbsfdhTSKT%2FxQAylx8QwMdwx= UtKtdZeyNlIE9gk%3D&reserved=3D0 After switching to uring sock implementation, the "connection rese= t by peer" problem is gone. I tried to power on and shutdown my testing VM = and did not see one single "connection reset by peer" issue. Before this, e= very time, I powered on my testing VM, there were multiple "connection rese= t by peer" failures happened. Actually, I had this issue back to April/May. At that time, I could= not identify/corelate how the issue happened and did not drill down. This = time, the issue happened so frequently. This helped me dig out more informa= tion. In summary, it seems the posix sock implementation may have some pr= oblem. I'm not sure if this is generic or specific for running SPDK in VM. = The issue might also be related to our initiator implementation. Thanks, -Wenhua On 8/24/20, 12:33 AM, "Yang, Ziye" wrote: Hi Wenhua, You need to compile spdk with --with-uring option. And you nee= d to = 1 Download the liburing and install it by yourself. 2 Check your kernel version. Uring socket implementation depend= s on the kernel (> 5.4.3). What's you kernel version in the VM? Thanks. Best Regards Ziye Yang = -----Original Message----- From: Wenhua Liu = Sent: Monday, August 24, 2020 3:19 PM To: Storage Performance Development Kit Subject: [SPDK] Re: Print backtrace in SPDK Hi Ziye, I'm using SPDK NVMe-oF target. I used some other way and figured out the following call path: posix_sock_group_impl_poll -> _sock_flush <------------------ failed -> spdk_sock_abort_requests -> _pdu_write_done -> nvmf_tcp_qpair_disconnect -> spdk_nvmf_qpair_disconnect -> _nvmf_qpair_destroy -> spdk_nvmf_poll_group_remove -> nvmf_transport_poll_group_remove -> nvmf_tcp_poll_group_remove -> spdk_sock_group_remove_sock -> posix_sock_group_impl_remove_sock -> spdk_sock_abort_requests -> _nvmf_ctrlr_free_from_qpair -> _nvmf_transport_qpair_fini -> nvmf_transport_qpair_fini -> nvmf_tcp_close_qpair -> spdk_sock_close The _sock_flush calls sendmsg to write the data to the socket. = It's sendmsg failing with return value -1. I captured wire data. In Wiresha= rk, I can see the READ command has been received by the target as a TCP pac= ket. As the response to this TCP packet, a TCP packet with FIN flag set is = sent to the initiator. The FIN is to close the socket connection. I'm running SPDK target inside a VM. My NVMe/TCP initiator runs= inside another VM. I'm going to try with another SPDK target which runs on= a physical machine. By the way, I noticed there is a uring based sock implementatio= n, how do I switch to this sock implementation. It seems the default is po= six sock implementation. Thanks, -Wenhua = On 8/23/20, 9:55 PM, "Yang, Ziye" wrote: Hi Wenhua, Which applications are you using from SPDK? = 1 SPDK NVMe-oF target in target side? 2 SPDK NVMe perf or others? For nvmf_tcp_close_qpair will be called in the following po= ssible cases (not all listed) for TCP transport. But it will be called by s= pdk_nvmf_qpair_disconnect as the entry. 1 qpair is not in polling group spdk_nvmf_qpair_disconnect nvmf_transport_qpair_fini 2 spdk_nvmf_qpair_disconnect .... _nvmf_qpair_destroy nvmf_transport_qpair_fini .. nvmf_tcp_close_qpair 3 spdk_nvmf_qpair_disconnect .... _nvmf_qpair_destroy _nvmf_ctrlr_free_from_qpair = _nvmf_transport_qpair_fini .. nvmf_tcp_close_qpair spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disc= onnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases: (1) _pdu_write_done (if there is error for write); (2) nvmf_tcp_qpair_handle_timeout.( No response from initia= tor in 30s if targets sends c2h_term_req) (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req) (4) nvmf_tcp_sock_cb. TCP PDU related handling issue. = Also in lib/nvmf/ctrlr.c Target side has a timer poller: nv= mf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host,= it will call spdk_nvmf_qpair_disconnect in related polling group assoicate= d with the controller. Best Regards Ziye Yang = -----Original Message----- From: Wenhua Liu = Sent: Saturday, August 22, 2020 3:15 PM To: Storage Performance Development Kit Subject: [SPDK] Print backtrace in SPDK Hi, Does anyone know if there is a function in SPDK that prints= the backtrace? I run into a =E2=80=9CConnection Reset by Peer=E2=80=9D iss= ue on host side when testing NVMe/TCP. I identified it=E2=80=99s because so= me queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but= I could not figure out how/why this function is called. I thought if the b= acktrace can be printed when calling this function, it might be helpful to = me to find the root cause. Thanks, -Wenhua _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org --===============6110690550518472111==--