All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-30  7:52 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-30  7:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 15850 bytes --]

Hi Wenhua,

Thanks. It would be better that you can reproduce your issue in an easy way then submit a issue in github. Then the community can help you.

发自我的iPad

> 在 2020年8月30日,下午2:05,Wenhua Liu <liuw(a)vmware.com> 写道:
> 
> Hi Ziye,
> 
> I tested the patch you provided. It does not help. The problem still exists.
> 
> Thanks,
> -Wenhua
> 
> On 8/26/20, 10:09 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
> 
>    Hi Wenhua,
> 
>    Thanks for your continuing verification. So there should be some issues with zero copy support in SPDK posix socket implementation in target side.
> 
> 
> 
> 
>    Best Regards
>    Ziye Yang 
> 
>    -----Original Message-----
>    From: Wenhua Liu <liuw(a)vmware.com> 
>    Sent: Thursday, August 27, 2020 1:05 PM
>    To: Storage Performance Development Kit <spdk(a)lists.01.org>
>    Subject: [SPDK] Re: Print backtrace in SPDK
> 
>    Hi Ziye,
> 
>    I have verified after disabling zero copy, the problem is gone. The following is the change I made to disable zero copy.
> 
>    spdk$ git diff module/sock/posix/posix.c diff --git a/module/sock/posix/posix.c b/module/sock/posix/posix.c index 4eb1bf106..7b77289bb 100644
>    --- a/module/sock/posix/posix.c
>    +++ b/module/sock/posix/posix.c
>    @@ -53,9 +53,9 @@
>     #define MIN_SO_SNDBUF_SIZE (2 * 1024 * 1024)  #define IOV_BATCH_SIZE 64
> 
>    -#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) -#define SPDK_ZEROCOPY -#endif
>    +//#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) //#define 
>    +SPDK_ZEROCOPY //#endif
> 
>     struct spdk_posix_sock {
>            struct spdk_sock        base;
>    ~/spdk$
> 
>    With this change, I did VM power-on and shutdown 8 times and did not see a single "Connection Reset by Peer" issue. Without the change, I did VM power-on and shutdown 4 times, every time I saw at least one "Connection Reset by Peer" error on every IO queue (4 IO queues in total).
> 
>    Thanks,
>    -Wenhua
> 
>    On 8/25/20, 9:51 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:
> 
>        I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.
> 
>        Thanks,
>        -Wenhua
> 
>        On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
> 
>            Hi  Wenhua,
> 
>            What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 
> 
> 
> 
> 
>            Best Regards
>            Ziye Yang 
> 
>            -----Original Message-----
>            From: Wenhua Liu <liuw(a)vmware.com> 
>            Sent: Wednesday, August 26, 2020 12:27 PM
>            To: Storage Performance Development Kit <spdk(a)lists.01.org>
>            Subject: [SPDK] Re: Print backtrace in SPDK
> 
>            Hi Ziye,
> 
>            Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.
> 
>            Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.
> 
>            I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.
> 
>            I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.
> 
>            I will let you know if disabling zero copy will help.
> 
>            Thanks,
>            -Wenhua
> 
>            On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
> 
>                Hi Wenhua,
> 
>                Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.
> 
>                I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.
> 
>                #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
>                //#define SPDK_ZEROCOPY
>                #endif
> 
> 
> 
> 
>                Best Regards
>                Ziye Yang 
> 
>                -----Original Message-----
>                From: Wenhua Liu <liuw(a)vmware.com> 
>                Sent: Wednesday, August 26, 2020 8:20 AM
>                To: Storage Performance Development Kit <spdk(a)lists.01.org>
>                Subject: [SPDK] Re: Print backtrace in SPDK
> 
>                Hi Ziye,
> 
>                I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$
> 
>                I downloaded, buit and installed liburing from source.
>                git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C758eeeac6b8041c3010e08d84a4756df%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637341017551671723&amp;sdata=s%2BnqigVqrppruGRJAxYG1mNGI18GqCec8JhStfcKi5g%3D&amp;reserved=0
> 
>                After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.
> 
>                Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.
> 
>                In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.
> 
>                Thanks,
>                -Wenhua
> 
> 
>                On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
> 
>                    Hi Wenhua,
> 
>                    You need to compile spdk with --with-uring option.  And you need to 
>                    1 Download the liburing and install it by yourself.
>                    2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).
> 
>                    What's you kernel version in the VM?
> 
>                    Thanks.
> 
> 
> 
> 
>                    Best Regards
>                    Ziye Yang 
> 
>                    -----Original Message-----
>                    From: Wenhua Liu <liuw(a)vmware.com> 
>                    Sent: Monday, August 24, 2020 3:19 PM
>                    To: Storage Performance Development Kit <spdk(a)lists.01.org>
>                    Subject: [SPDK] Re: Print backtrace in SPDK
> 
>                    Hi Ziye,
> 
>                    I'm using SPDK NVMe-oF target.
> 
>                    I used some other way and figured out the following call path:
>                    posix_sock_group_impl_poll
>                    -> _sock_flush    <------------------ failed
>                    -> spdk_sock_abort_requests
>                       -> _pdu_write_done
>                          -> nvmf_tcp_qpair_disconnect
>                             -> spdk_nvmf_qpair_disconnect
>                                -> _nvmf_qpair_destroy
>                                   -> spdk_nvmf_poll_group_remove
>                                      -> nvmf_transport_poll_group_remove
>                                         -> nvmf_tcp_poll_group_remove
>                                            -> spdk_sock_group_remove_sock
>                                               -> posix_sock_group_impl_remove_sock
>                                                  -> spdk_sock_abort_requests
>                                   -> _nvmf_ctrlr_free_from_qpair
>                                      -> _nvmf_transport_qpair_fini
>                                         -> nvmf_transport_qpair_fini
>                                            -> nvmf_tcp_close_qpair
>                                               -> spdk_sock_close
> 
>                    The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.
> 
>                    I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.
> 
>                    By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.
> 
>                    Thanks,
>                    -Wenhua 
> 
>                    On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
> 
>                        Hi Wenhua,
> 
>                        Which applications are you using from SPDK?  
>                        1 SPDK NVMe-oF target in target side?
>                        2  SPDK NVMe perf or others?
> 
>                        For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.
> 
>                        1  qpair is not in polling group
>                        spdk_nvmf_qpair_disconnect
>                            nvmf_transport_qpair_fini
> 
>                        2  spdk_nvmf_qpair_disconnect
>                                ....
>                            _nvmf_qpair_destroy
>                                nvmf_transport_qpair_fini
>                                    ..
>                                    nvmf_tcp_close_qpair
> 
> 
>                        3  spdk_nvmf_qpair_disconnect
>                                ....
>                            _nvmf_qpair_destroy
>                                _nvmf_ctrlr_free_from_qpair    
>                                    _nvmf_transport_qpair_fini
>                                        ..
>                                        nvmf_tcp_close_qpair
> 
> 
>                        spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:
> 
>                        (1) _pdu_write_done (if there is error for write);
>                        (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
>                        (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
>                        (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 
> 
> 
>                        Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.
> 
> 
>                        Best Regards
>                        Ziye Yang 
> 
>                        -----Original Message-----
>                        From: Wenhua Liu <liuw(a)vmware.com> 
>                        Sent: Saturday, August 22, 2020 3:15 PM
>                        To: Storage Performance Development Kit <spdk(a)lists.01.org>
>                        Subject: [SPDK] Print backtrace in SPDK
> 
>                        Hi,
> 
>                        Does anyone know if there is a function in SPDK that prints the backtrace?
> 
>                        I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.
> 
>                        Thanks,
>                        -Wenhua
>                        _______________________________________________
>                        SPDK mailing list -- spdk(a)lists.01.org
>                        To unsubscribe send an email to spdk-leave(a)lists.01.org
>                        _______________________________________________
>                        SPDK mailing list -- spdk(a)lists.01.org
>                        To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
>                    _______________________________________________
>                    SPDK mailing list -- spdk(a)lists.01.org
>                    To unsubscribe send an email to spdk-leave(a)lists.01.org
>                    _______________________________________________
>                    SPDK mailing list -- spdk(a)lists.01.org
>                    To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
>                _______________________________________________
>                SPDK mailing list -- spdk(a)lists.01.org
>                To unsubscribe send an email to spdk-leave(a)lists.01.org
>                _______________________________________________
>                SPDK mailing list -- spdk(a)lists.01.org
>                To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
>            _______________________________________________
>            SPDK mailing list -- spdk(a)lists.01.org
>            To unsubscribe send an email to spdk-leave(a)lists.01.org
>            _______________________________________________
>            SPDK mailing list -- spdk(a)lists.01.org
>            To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
>        _______________________________________________
>        SPDK mailing list -- spdk(a)lists.01.org
>        To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
>    _______________________________________________
>    SPDK mailing list -- spdk(a)lists.01.org
>    To unsubscribe send an email to spdk-leave(a)lists.01.org
>    _______________________________________________
>    SPDK mailing list -- spdk(a)lists.01.org
>    To unsubscribe send an email to spdk-leave(a)lists.01.org
> 
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-09-10  1:30 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-09-10  1:30 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12252 bytes --]

Hi Wenhua,

If the errno from sendmsg in _sock_flush is (ENOBUFS, 105), I think the issue you faced may be similar with this issue (https://github.com/spdk/spdk/issues/1592). You may try the patch provided by Jeffry Molanus  (https://review.spdk.io/gerrit/c/spdk/spdk/+/4129) and see whether this patch can fix your issue.  


Thanks.

Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Wednesday, August 26, 2020 12:52 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK

I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.

Thanks,
-Wenhua

On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi  Wenhua,

    What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Wednesday, August 26, 2020 12:27 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

    Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

    I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

    I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

    I will let you know if disabling zero copy will help.

    Thanks,
    -Wenhua

    On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

        I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

        #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
        //#define SPDK_ZEROCOPY
        #endif




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Wednesday, August 26, 2020 8:20 AM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

        I downloaded, buit and installed liburing from source.
        git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C2b9ea5455283477b7a6808d84978df30%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340130778719667&amp;sdata=WTKvtbsfdhTSKT%2FxQAylx8QwMdwxUtKtdZeyNlIE9gk%3D&amp;reserved=0

        After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

        Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

        In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

        Thanks,
        -Wenhua


        On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            You need to compile spdk with --with-uring option.  And you need to 
            1 Download the liburing and install it by yourself.
            2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

            What's you kernel version in the VM?

            Thanks.




            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Monday, August 24, 2020 3:19 PM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Re: Print backtrace in SPDK

            Hi Ziye,

            I'm using SPDK NVMe-oF target.

            I used some other way and figured out the following call path:
            posix_sock_group_impl_poll
            -> _sock_flush    <------------------ failed
            -> spdk_sock_abort_requests
               -> _pdu_write_done
                  -> nvmf_tcp_qpair_disconnect
                     -> spdk_nvmf_qpair_disconnect
                        -> _nvmf_qpair_destroy
                           -> spdk_nvmf_poll_group_remove
                              -> nvmf_transport_poll_group_remove
                                 -> nvmf_tcp_poll_group_remove
                                    -> spdk_sock_group_remove_sock
                                       -> posix_sock_group_impl_remove_sock
                                          -> spdk_sock_abort_requests
                           -> _nvmf_ctrlr_free_from_qpair
                              -> _nvmf_transport_qpair_fini
                                 -> nvmf_transport_qpair_fini
                                    -> nvmf_tcp_close_qpair
                                       -> spdk_sock_close

            The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

            I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

            By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

            Thanks,
            -Wenhua 

            On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                Hi Wenhua,

                Which applications are you using from SPDK?  
                1 SPDK NVMe-oF target in target side?
                2  SPDK NVMe perf or others?

                For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

                1  qpair is not in polling group
                spdk_nvmf_qpair_disconnect
                	nvmf_transport_qpair_fini

                2  spdk_nvmf_qpair_disconnect
                		....
                	_nvmf_qpair_destroy
                		nvmf_transport_qpair_fini
                			..
                			nvmf_tcp_close_qpair


                3  spdk_nvmf_qpair_disconnect
                		....
                	_nvmf_qpair_destroy
                		_nvmf_ctrlr_free_from_qpair	
                			_nvmf_transport_qpair_fini
                				..
                				nvmf_tcp_close_qpair


                spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

                (1) _pdu_write_done (if there is error for write);
                (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
                (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
                (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


                Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


                Best Regards
                Ziye Yang 

                -----Original Message-----
                From: Wenhua Liu <liuw(a)vmware.com> 
                Sent: Saturday, August 22, 2020 3:15 PM
                To: Storage Performance Development Kit <spdk(a)lists.01.org>
                Subject: [SPDK] Print backtrace in SPDK

                Hi,

                Does anyone know if there is a function in SPDK that prints the backtrace?

                I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

                Thanks,
                -Wenhua
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org

            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org

_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-30  6:04 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-30  6:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 14958 bytes --]

Hi Ziye,

I tested the patch you provided. It does not help. The problem still exists.

Thanks,
-Wenhua

On 8/26/20, 10:09 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    Thanks for your continuing verification. So there should be some issues with zero copy support in SPDK posix socket implementation in target side.




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Thursday, August 27, 2020 1:05 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    I have verified after disabling zero copy, the problem is gone. The following is the change I made to disable zero copy.

    spdk$ git diff module/sock/posix/posix.c diff --git a/module/sock/posix/posix.c b/module/sock/posix/posix.c index 4eb1bf106..7b77289bb 100644
    --- a/module/sock/posix/posix.c
    +++ b/module/sock/posix/posix.c
    @@ -53,9 +53,9 @@
     #define MIN_SO_SNDBUF_SIZE (2 * 1024 * 1024)  #define IOV_BATCH_SIZE 64

    -#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) -#define SPDK_ZEROCOPY -#endif
    +//#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) //#define 
    +SPDK_ZEROCOPY //#endif

     struct spdk_posix_sock {
            struct spdk_sock        base;
    ~/spdk$

    With this change, I did VM power-on and shutdown 8 times and did not see a single "Connection Reset by Peer" issue. Without the change, I did VM power-on and shutdown 4 times, every time I saw at least one "Connection Reset by Peer" error on every IO queue (4 IO queues in total).

    Thanks,
    -Wenhua

    On 8/25/20, 9:51 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:

        I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.

        Thanks,
        -Wenhua

        On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi  Wenhua,

            What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Wednesday, August 26, 2020 12:27 PM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Re: Print backtrace in SPDK

            Hi Ziye,

            Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

            Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

            I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

            I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

            I will let you know if disabling zero copy will help.

            Thanks,
            -Wenhua

            On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                Hi Wenhua,

                Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

                I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

                #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
                //#define SPDK_ZEROCOPY
                #endif




                Best Regards
                Ziye Yang 

                -----Original Message-----
                From: Wenhua Liu <liuw(a)vmware.com> 
                Sent: Wednesday, August 26, 2020 8:20 AM
                To: Storage Performance Development Kit <spdk(a)lists.01.org>
                Subject: [SPDK] Re: Print backtrace in SPDK

                Hi Ziye,

                I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

                I downloaded, buit and installed liburing from source.
                git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C758eeeac6b8041c3010e08d84a4756df%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637341017551671723&amp;sdata=s%2BnqigVqrppruGRJAxYG1mNGI18GqCec8JhStfcKi5g%3D&amp;reserved=0

                After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

                Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

                In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

                Thanks,
                -Wenhua


                On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                    Hi Wenhua,

                    You need to compile spdk with --with-uring option.  And you need to 
                    1 Download the liburing and install it by yourself.
                    2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

                    What's you kernel version in the VM?

                    Thanks.




                    Best Regards
                    Ziye Yang 

                    -----Original Message-----
                    From: Wenhua Liu <liuw(a)vmware.com> 
                    Sent: Monday, August 24, 2020 3:19 PM
                    To: Storage Performance Development Kit <spdk(a)lists.01.org>
                    Subject: [SPDK] Re: Print backtrace in SPDK

                    Hi Ziye,

                    I'm using SPDK NVMe-oF target.

                    I used some other way and figured out the following call path:
                    posix_sock_group_impl_poll
                    -> _sock_flush    <------------------ failed
                    -> spdk_sock_abort_requests
                       -> _pdu_write_done
                          -> nvmf_tcp_qpair_disconnect
                             -> spdk_nvmf_qpair_disconnect
                                -> _nvmf_qpair_destroy
                                   -> spdk_nvmf_poll_group_remove
                                      -> nvmf_transport_poll_group_remove
                                         -> nvmf_tcp_poll_group_remove
                                            -> spdk_sock_group_remove_sock
                                               -> posix_sock_group_impl_remove_sock
                                                  -> spdk_sock_abort_requests
                                   -> _nvmf_ctrlr_free_from_qpair
                                      -> _nvmf_transport_qpair_fini
                                         -> nvmf_transport_qpair_fini
                                            -> nvmf_tcp_close_qpair
                                               -> spdk_sock_close

                    The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

                    I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

                    By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

                    Thanks,
                    -Wenhua 

                    On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                        Hi Wenhua,

                        Which applications are you using from SPDK?  
                        1 SPDK NVMe-oF target in target side?
                        2  SPDK NVMe perf or others?

                        For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

                        1  qpair is not in polling group
                        spdk_nvmf_qpair_disconnect
                        	nvmf_transport_qpair_fini

                        2  spdk_nvmf_qpair_disconnect
                        		....
                        	_nvmf_qpair_destroy
                        		nvmf_transport_qpair_fini
                        			..
                        			nvmf_tcp_close_qpair


                        3  spdk_nvmf_qpair_disconnect
                        		....
                        	_nvmf_qpair_destroy
                        		_nvmf_ctrlr_free_from_qpair	
                        			_nvmf_transport_qpair_fini
                        				..
                        				nvmf_tcp_close_qpair


                        spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

                        (1) _pdu_write_done (if there is error for write);
                        (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
                        (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
                        (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


                        Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


                        Best Regards
                        Ziye Yang 

                        -----Original Message-----
                        From: Wenhua Liu <liuw(a)vmware.com> 
                        Sent: Saturday, August 22, 2020 3:15 PM
                        To: Storage Performance Development Kit <spdk(a)lists.01.org>
                        Subject: [SPDK] Print backtrace in SPDK

                        Hi,

                        Does anyone know if there is a function in SPDK that prints the backtrace?

                        I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

                        Thanks,
                        -Wenhua
                        _______________________________________________
                        SPDK mailing list -- spdk(a)lists.01.org
                        To unsubscribe send an email to spdk-leave(a)lists.01.org
                        _______________________________________________
                        SPDK mailing list -- spdk(a)lists.01.org
                        To unsubscribe send an email to spdk-leave(a)lists.01.org

                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org
                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org

                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org

            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-27  5:09 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-27  5:09 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 13896 bytes --]

Hi Wenhua,

Thanks for your continuing verification. So there should be some issues with zero copy support in SPDK posix socket implementation in target side.




Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Thursday, August 27, 2020 1:05 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK

Hi Ziye,

I have verified after disabling zero copy, the problem is gone. The following is the change I made to disable zero copy.

spdk$ git diff module/sock/posix/posix.c diff --git a/module/sock/posix/posix.c b/module/sock/posix/posix.c index 4eb1bf106..7b77289bb 100644
--- a/module/sock/posix/posix.c
+++ b/module/sock/posix/posix.c
@@ -53,9 +53,9 @@
 #define MIN_SO_SNDBUF_SIZE (2 * 1024 * 1024)  #define IOV_BATCH_SIZE 64
 
-#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) -#define SPDK_ZEROCOPY -#endif
+//#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY) //#define 
+SPDK_ZEROCOPY //#endif
 
 struct spdk_posix_sock {
        struct spdk_sock        base;
~/spdk$

With this change, I did VM power-on and shutdown 8 times and did not see a single "Connection Reset by Peer" issue. Without the change, I did VM power-on and shutdown 4 times, every time I saw at least one "Connection Reset by Peer" error on every IO queue (4 IO queues in total).

Thanks,
-Wenhua
 
On 8/25/20, 9:51 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:

    I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.

    Thanks,
    -Wenhua

    On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi  Wenhua,

        What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Wednesday, August 26, 2020 12:27 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

        Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

        I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

        I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

        I will let you know if disabling zero copy will help.

        Thanks,
        -Wenhua

        On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

            I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

            #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
            //#define SPDK_ZEROCOPY
            #endif




            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Wednesday, August 26, 2020 8:20 AM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Re: Print backtrace in SPDK

            Hi Ziye,

            I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

            I downloaded, buit and installed liburing from source.
            git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C0cd14fcb43754526ee0408d8497bbbf0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340143074650318&amp;sdata=d57WcdbNbMvb0ssPH68XYDdk%2BuapdxALEAAKiAXxFFQ%3D&amp;reserved=0

            After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

            Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

            In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

            Thanks,
            -Wenhua


            On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                Hi Wenhua,

                You need to compile spdk with --with-uring option.  And you need to 
                1 Download the liburing and install it by yourself.
                2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

                What's you kernel version in the VM?

                Thanks.




                Best Regards
                Ziye Yang 

                -----Original Message-----
                From: Wenhua Liu <liuw(a)vmware.com> 
                Sent: Monday, August 24, 2020 3:19 PM
                To: Storage Performance Development Kit <spdk(a)lists.01.org>
                Subject: [SPDK] Re: Print backtrace in SPDK

                Hi Ziye,

                I'm using SPDK NVMe-oF target.

                I used some other way and figured out the following call path:
                posix_sock_group_impl_poll
                -> _sock_flush    <------------------ failed
                -> spdk_sock_abort_requests
                   -> _pdu_write_done
                      -> nvmf_tcp_qpair_disconnect
                         -> spdk_nvmf_qpair_disconnect
                            -> _nvmf_qpair_destroy
                               -> spdk_nvmf_poll_group_remove
                                  -> nvmf_transport_poll_group_remove
                                     -> nvmf_tcp_poll_group_remove
                                        -> spdk_sock_group_remove_sock
                                           -> posix_sock_group_impl_remove_sock
                                              -> spdk_sock_abort_requests
                               -> _nvmf_ctrlr_free_from_qpair
                                  -> _nvmf_transport_qpair_fini
                                     -> nvmf_transport_qpair_fini
                                        -> nvmf_tcp_close_qpair
                                           -> spdk_sock_close

                The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

                I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

                By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

                Thanks,
                -Wenhua 

                On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                    Hi Wenhua,

                    Which applications are you using from SPDK?  
                    1 SPDK NVMe-oF target in target side?
                    2  SPDK NVMe perf or others?

                    For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

                    1  qpair is not in polling group
                    spdk_nvmf_qpair_disconnect
                    	nvmf_transport_qpair_fini

                    2  spdk_nvmf_qpair_disconnect
                    		....
                    	_nvmf_qpair_destroy
                    		nvmf_transport_qpair_fini
                    			..
                    			nvmf_tcp_close_qpair


                    3  spdk_nvmf_qpair_disconnect
                    		....
                    	_nvmf_qpair_destroy
                    		_nvmf_ctrlr_free_from_qpair	
                    			_nvmf_transport_qpair_fini
                    				..
                    				nvmf_tcp_close_qpair


                    spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

                    (1) _pdu_write_done (if there is error for write);
                    (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
                    (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
                    (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


                    Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


                    Best Regards
                    Ziye Yang 

                    -----Original Message-----
                    From: Wenhua Liu <liuw(a)vmware.com> 
                    Sent: Saturday, August 22, 2020 3:15 PM
                    To: Storage Performance Development Kit <spdk(a)lists.01.org>
                    Subject: [SPDK] Print backtrace in SPDK

                    Hi,

                    Does anyone know if there is a function in SPDK that prints the backtrace?

                    I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

                    Thanks,
                    -Wenhua
                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org
                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org

                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org

            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org

_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-27  5:04 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-27  5:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 13336 bytes --]

Hi Ziye,

I have verified after disabling zero copy, the problem is gone. The following is the change I made to disable zero copy.

spdk$ git diff module/sock/posix/posix.c
diff --git a/module/sock/posix/posix.c b/module/sock/posix/posix.c
index 4eb1bf106..7b77289bb 100644
--- a/module/sock/posix/posix.c
+++ b/module/sock/posix/posix.c
@@ -53,9 +53,9 @@
 #define MIN_SO_SNDBUF_SIZE (2 * 1024 * 1024)
 #define IOV_BATCH_SIZE 64
 
-#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
-#define SPDK_ZEROCOPY
-#endif
+//#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
+//#define SPDK_ZEROCOPY
+//#endif
 
 struct spdk_posix_sock {
        struct spdk_sock        base;
~/spdk$

With this change, I did VM power-on and shutdown 8 times and did not see a single "Connection Reset by Peer" issue. Without the change, I did VM power-on and shutdown 4 times, every time I saw at least one "Connection Reset by Peer" error on every IO queue (4 IO queues in total).

Thanks,
-Wenhua
 
On 8/25/20, 9:51 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:

    I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.

    Thanks,
    -Wenhua

    On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi  Wenhua,

        What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Wednesday, August 26, 2020 12:27 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

        Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

        I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

        I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

        I will let you know if disabling zero copy will help.

        Thanks,
        -Wenhua

        On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

            I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

            #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
            //#define SPDK_ZEROCOPY
            #endif




            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Wednesday, August 26, 2020 8:20 AM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Re: Print backtrace in SPDK

            Hi Ziye,

            I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

            I downloaded, buit and installed liburing from source.
            git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C0cd14fcb43754526ee0408d8497bbbf0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340143074650318&amp;sdata=d57WcdbNbMvb0ssPH68XYDdk%2BuapdxALEAAKiAXxFFQ%3D&amp;reserved=0

            After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

            Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

            In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

            Thanks,
            -Wenhua


            On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                Hi Wenhua,

                You need to compile spdk with --with-uring option.  And you need to 
                1 Download the liburing and install it by yourself.
                2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

                What's you kernel version in the VM?

                Thanks.




                Best Regards
                Ziye Yang 

                -----Original Message-----
                From: Wenhua Liu <liuw(a)vmware.com> 
                Sent: Monday, August 24, 2020 3:19 PM
                To: Storage Performance Development Kit <spdk(a)lists.01.org>
                Subject: [SPDK] Re: Print backtrace in SPDK

                Hi Ziye,

                I'm using SPDK NVMe-oF target.

                I used some other way and figured out the following call path:
                posix_sock_group_impl_poll
                -> _sock_flush    <------------------ failed
                -> spdk_sock_abort_requests
                   -> _pdu_write_done
                      -> nvmf_tcp_qpair_disconnect
                         -> spdk_nvmf_qpair_disconnect
                            -> _nvmf_qpair_destroy
                               -> spdk_nvmf_poll_group_remove
                                  -> nvmf_transport_poll_group_remove
                                     -> nvmf_tcp_poll_group_remove
                                        -> spdk_sock_group_remove_sock
                                           -> posix_sock_group_impl_remove_sock
                                              -> spdk_sock_abort_requests
                               -> _nvmf_ctrlr_free_from_qpair
                                  -> _nvmf_transport_qpair_fini
                                     -> nvmf_transport_qpair_fini
                                        -> nvmf_tcp_close_qpair
                                           -> spdk_sock_close

                The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

                I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

                By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

                Thanks,
                -Wenhua 

                On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                    Hi Wenhua,

                    Which applications are you using from SPDK?  
                    1 SPDK NVMe-oF target in target side?
                    2  SPDK NVMe perf or others?

                    For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

                    1  qpair is not in polling group
                    spdk_nvmf_qpair_disconnect
                    	nvmf_transport_qpair_fini

                    2  spdk_nvmf_qpair_disconnect
                    		....
                    	_nvmf_qpair_destroy
                    		nvmf_transport_qpair_fini
                    			..
                    			nvmf_tcp_close_qpair


                    3  spdk_nvmf_qpair_disconnect
                    		....
                    	_nvmf_qpair_destroy
                    		_nvmf_ctrlr_free_from_qpair	
                    			_nvmf_transport_qpair_fini
                    				..
                    				nvmf_tcp_close_qpair


                    spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

                    (1) _pdu_write_done (if there is error for write);
                    (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
                    (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
                    (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


                    Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


                    Best Regards
                    Ziye Yang 

                    -----Original Message-----
                    From: Wenhua Liu <liuw(a)vmware.com> 
                    Sent: Saturday, August 22, 2020 3:15 PM
                    To: Storage Performance Development Kit <spdk(a)lists.01.org>
                    Subject: [SPDK] Print backtrace in SPDK

                    Hi,

                    Does anyone know if there is a function in SPDK that prints the backtrace?

                    I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

                    Thanks,
                    -Wenhua
                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org
                    _______________________________________________
                    SPDK mailing list -- spdk(a)lists.01.org
                    To unsubscribe send an email to spdk-leave(a)lists.01.org

                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org

            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-26  4:51 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-26  4:51 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11511 bytes --]

I did not check errno. The only thing I knew is _sock_flush returns -1 which is the return value of sendmsg.

Thanks,
-Wenhua

On 8/25/20, 9:31 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi  Wenhua,

    What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Wednesday, August 26, 2020 12:27 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

    Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

    I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

    I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

    I will let you know if disabling zero copy will help.

    Thanks,
    -Wenhua

    On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

        I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

        #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
        //#define SPDK_ZEROCOPY
        #endif




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Wednesday, August 26, 2020 8:20 AM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

        I downloaded, buit and installed liburing from source.
        git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7C2b9ea5455283477b7a6808d84978df30%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340130778719667&amp;sdata=WTKvtbsfdhTSKT%2FxQAylx8QwMdwxUtKtdZeyNlIE9gk%3D&amp;reserved=0

        After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

        Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

        In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

        Thanks,
        -Wenhua


        On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            You need to compile spdk with --with-uring option.  And you need to 
            1 Download the liburing and install it by yourself.
            2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

            What's you kernel version in the VM?

            Thanks.




            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Monday, August 24, 2020 3:19 PM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Re: Print backtrace in SPDK

            Hi Ziye,

            I'm using SPDK NVMe-oF target.

            I used some other way and figured out the following call path:
            posix_sock_group_impl_poll
            -> _sock_flush    <------------------ failed
            -> spdk_sock_abort_requests
               -> _pdu_write_done
                  -> nvmf_tcp_qpair_disconnect
                     -> spdk_nvmf_qpair_disconnect
                        -> _nvmf_qpair_destroy
                           -> spdk_nvmf_poll_group_remove
                              -> nvmf_transport_poll_group_remove
                                 -> nvmf_tcp_poll_group_remove
                                    -> spdk_sock_group_remove_sock
                                       -> posix_sock_group_impl_remove_sock
                                          -> spdk_sock_abort_requests
                           -> _nvmf_ctrlr_free_from_qpair
                              -> _nvmf_transport_qpair_fini
                                 -> nvmf_transport_qpair_fini
                                    -> nvmf_tcp_close_qpair
                                       -> spdk_sock_close

            The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

            I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

            By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

            Thanks,
            -Wenhua 

            On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

                Hi Wenhua,

                Which applications are you using from SPDK?  
                1 SPDK NVMe-oF target in target side?
                2  SPDK NVMe perf or others?

                For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

                1  qpair is not in polling group
                spdk_nvmf_qpair_disconnect
                	nvmf_transport_qpair_fini

                2  spdk_nvmf_qpair_disconnect
                		....
                	_nvmf_qpair_destroy
                		nvmf_transport_qpair_fini
                			..
                			nvmf_tcp_close_qpair


                3  spdk_nvmf_qpair_disconnect
                		....
                	_nvmf_qpair_destroy
                		_nvmf_ctrlr_free_from_qpair	
                			_nvmf_transport_qpair_fini
                				..
                				nvmf_tcp_close_qpair


                spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

                (1) _pdu_write_done (if there is error for write);
                (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
                (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
                (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


                Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


                Best Regards
                Ziye Yang 

                -----Original Message-----
                From: Wenhua Liu <liuw(a)vmware.com> 
                Sent: Saturday, August 22, 2020 3:15 PM
                To: Storage Performance Development Kit <spdk(a)lists.01.org>
                Subject: [SPDK] Print backtrace in SPDK

                Hi,

                Does anyone know if there is a function in SPDK that prints the backtrace?

                I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

                Thanks,
                -Wenhua
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org
                _______________________________________________
                SPDK mailing list -- spdk(a)lists.01.org
                To unsubscribe send an email to spdk-leave(a)lists.01.org

            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-26  4:31 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-26  4:31 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10583 bytes --]

Hi  Wenhua,

What's error number when you see that sendmsg function returns -1 when you use posix socket implmentation? 




Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Wednesday, August 26, 2020 12:27 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK

Hi Ziye,

Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

I will let you know if disabling zero copy will help.

Thanks,
-Wenhua

On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

    I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

    #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
    //#define SPDK_ZEROCOPY
    #endif




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Wednesday, August 26, 2020 8:20 AM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

    I downloaded, buit and installed liburing from source.
    git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7Ce519303716184388a80808d84962a147%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340035291578999&amp;sdata=%2BAC8AAOLzWFRlGgH0eQmwP5Qz9AlxxgRI6k2smsO9BQ%3D&amp;reserved=0

    After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

    Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

    In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

    Thanks,
    -Wenhua


    On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        You need to compile spdk with --with-uring option.  And you need to 
        1 Download the liburing and install it by yourself.
        2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

        What's you kernel version in the VM?

        Thanks.




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Monday, August 24, 2020 3:19 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        I'm using SPDK NVMe-oF target.

        I used some other way and figured out the following call path:
        posix_sock_group_impl_poll
        -> _sock_flush    <------------------ failed
        -> spdk_sock_abort_requests
           -> _pdu_write_done
              -> nvmf_tcp_qpair_disconnect
                 -> spdk_nvmf_qpair_disconnect
                    -> _nvmf_qpair_destroy
                       -> spdk_nvmf_poll_group_remove
                          -> nvmf_transport_poll_group_remove
                             -> nvmf_tcp_poll_group_remove
                                -> spdk_sock_group_remove_sock
                                   -> posix_sock_group_impl_remove_sock
                                      -> spdk_sock_abort_requests
                       -> _nvmf_ctrlr_free_from_qpair
                          -> _nvmf_transport_qpair_fini
                             -> nvmf_transport_qpair_fini
                                -> nvmf_tcp_close_qpair
                                   -> spdk_sock_close

        The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

        I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

        By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

        Thanks,
        -Wenhua 

        On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            Which applications are you using from SPDK?  
            1 SPDK NVMe-oF target in target side?
            2  SPDK NVMe perf or others?

            For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

            1  qpair is not in polling group
            spdk_nvmf_qpair_disconnect
            	nvmf_transport_qpair_fini

            2  spdk_nvmf_qpair_disconnect
            		....
            	_nvmf_qpair_destroy
            		nvmf_transport_qpair_fini
            			..
            			nvmf_tcp_close_qpair


            3  spdk_nvmf_qpair_disconnect
            		....
            	_nvmf_qpair_destroy
            		_nvmf_ctrlr_free_from_qpair	
            			_nvmf_transport_qpair_fini
            				..
            				nvmf_tcp_close_qpair


            spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

            (1) _pdu_write_done (if there is error for write);
            (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
            (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
            (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


            Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Saturday, August 22, 2020 3:15 PM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Print backtrace in SPDK

            Hi,

            Does anyone know if there is a function in SPDK that prints the backtrace?

            I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

            Thanks,
            -Wenhua
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org

_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-26  4:27 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-26  4:27 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10053 bytes --]

Hi Ziye,

Back to April/May, I used SPDK 20.01 (the first release supported FUSED operation) in a VM and ran into this issue once in a while.

Recently, in order to test NVMe Abort, I updated the SPDK in that VM to 20.07 and I started seeing this issue consistently. Maybe this is because the change at our side that makes the issue easier to reproduce.

I spent a lot time debugging this issue and found in wire data, the TCP/IP FIN flag is set in the TCP packet in response to an NVME READ command. As FIN flag is set when closing TCP connection. With this information, I found it's the function nvmf_tcp_close_qpair close the TCP connection. To figure out how this function is called, I wanted to print stack trace but could not find a way, so I sent an email to the SPDK community asking for a solution. Later I used some other way and figured out the call path which points where the problem happens.

I noticed the zero copy thing and tried to disable it but did not help (I can try it again to confirm). I started thinking if my VM itself has problem. I set up another VM with Ubuntu 20.04.1 and SPDK 20.07, but the problem still exists in this new target. As I could not find how sendmsg works and I noticed there is a uring based socket implementation. I wanted to give it a try so I asked you.

I will let you know if disabling zero copy will help.

Thanks,
-Wenhua

On 8/25/20, 6:52 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

    I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

    #if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
    //#define SPDK_ZEROCOPY
    #endif




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Wednesday, August 26, 2020 8:20 AM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

    I downloaded, buit and installed liburing from source.
    git clone https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Fliburing.git&amp;data=02%7C01%7Cliuw%40vmware.com%7Ce519303716184388a80808d84962a147%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637340035291578999&amp;sdata=%2BAC8AAOLzWFRlGgH0eQmwP5Qz9AlxxgRI6k2smsO9BQ%3D&amp;reserved=0

    After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

    Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

    In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

    Thanks,
    -Wenhua


    On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        You need to compile spdk with --with-uring option.  And you need to 
        1 Download the liburing and install it by yourself.
        2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

        What's you kernel version in the VM?

        Thanks.




        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Monday, August 24, 2020 3:19 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Re: Print backtrace in SPDK

        Hi Ziye,

        I'm using SPDK NVMe-oF target.

        I used some other way and figured out the following call path:
        posix_sock_group_impl_poll
        -> _sock_flush    <------------------ failed
        -> spdk_sock_abort_requests
           -> _pdu_write_done
              -> nvmf_tcp_qpair_disconnect
                 -> spdk_nvmf_qpair_disconnect
                    -> _nvmf_qpair_destroy
                       -> spdk_nvmf_poll_group_remove
                          -> nvmf_transport_poll_group_remove
                             -> nvmf_tcp_poll_group_remove
                                -> spdk_sock_group_remove_sock
                                   -> posix_sock_group_impl_remove_sock
                                      -> spdk_sock_abort_requests
                       -> _nvmf_ctrlr_free_from_qpair
                          -> _nvmf_transport_qpair_fini
                             -> nvmf_transport_qpair_fini
                                -> nvmf_tcp_close_qpair
                                   -> spdk_sock_close

        The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

        I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

        By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

        Thanks,
        -Wenhua 

        On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

            Hi Wenhua,

            Which applications are you using from SPDK?  
            1 SPDK NVMe-oF target in target side?
            2  SPDK NVMe perf or others?

            For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

            1  qpair is not in polling group
            spdk_nvmf_qpair_disconnect
            	nvmf_transport_qpair_fini

            2  spdk_nvmf_qpair_disconnect
            		....
            	_nvmf_qpair_destroy
            		nvmf_transport_qpair_fini
            			..
            			nvmf_tcp_close_qpair


            3  spdk_nvmf_qpair_disconnect
            		....
            	_nvmf_qpair_destroy
            		_nvmf_ctrlr_free_from_qpair	
            			_nvmf_transport_qpair_fini
            				..
            				nvmf_tcp_close_qpair


            spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

            (1) _pdu_write_done (if there is error for write);
            (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
            (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
            (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


            Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


            Best Regards
            Ziye Yang 

            -----Original Message-----
            From: Wenhua Liu <liuw(a)vmware.com> 
            Sent: Saturday, August 22, 2020 3:15 PM
            To: Storage Performance Development Kit <spdk(a)lists.01.org>
            Subject: [SPDK] Print backtrace in SPDK

            Hi,

            Does anyone know if there is a function in SPDK that prints the backtrace?

            I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

            Thanks,
            -Wenhua
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org
            _______________________________________________
            SPDK mailing list -- spdk(a)lists.01.org
            To unsubscribe send an email to spdk-leave(a)lists.01.org

        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-26  1:50 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-26  1:50 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7689 bytes --]

Hi Wenhua,

Did you reproduce the issue you mentioned in last email with same VM environment (OS) and same SPDK version?  You mention that there is no issue with uring, but there is issue with posix on the same SPDK version?  Can you reproduce the issue with latest version in SPDK master branch.

I think that the current difference with uring and posix is: For the posix implementation, it uses the zero copy feature. Could you do some experiments to disable the zero copy feature manually in posix.c like the following shows. Then we can firstly eliminate whether there is issue with zero copy feature on the target side. Thanks.

#if defined(SO_ZEROCOPY) && defined(MSG_ZEROCOPY)
//#define SPDK_ZEROCOPY
#endif




Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Wednesday, August 26, 2020 8:20 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK

Hi Ziye,

I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44 ~spdk$ cat /proc/version_signature Ubuntu 5.4.0-42.46-generic 5.4.44 ~/spdk$

I downloaded, buit and installed liburing from source.
git clone https://github.com/axboe/liburing.git

After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

Thanks,
-Wenhua


On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    You need to compile spdk with --with-uring option.  And you need to 
    1 Download the liburing and install it by yourself.
    2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

    What's you kernel version in the VM?

    Thanks.




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Monday, August 24, 2020 3:19 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    I'm using SPDK NVMe-oF target.

    I used some other way and figured out the following call path:
    posix_sock_group_impl_poll
    -> _sock_flush    <------------------ failed
    -> spdk_sock_abort_requests
       -> _pdu_write_done
          -> nvmf_tcp_qpair_disconnect
             -> spdk_nvmf_qpair_disconnect
                -> _nvmf_qpair_destroy
                   -> spdk_nvmf_poll_group_remove
                      -> nvmf_transport_poll_group_remove
                         -> nvmf_tcp_poll_group_remove
                            -> spdk_sock_group_remove_sock
                               -> posix_sock_group_impl_remove_sock
                                  -> spdk_sock_abort_requests
                   -> _nvmf_ctrlr_free_from_qpair
                      -> _nvmf_transport_qpair_fini
                         -> nvmf_transport_qpair_fini
                            -> nvmf_tcp_close_qpair
                               -> spdk_sock_close

    The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

    I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

    By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

    Thanks,
    -Wenhua 

    On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        Which applications are you using from SPDK?  
        1 SPDK NVMe-oF target in target side?
        2  SPDK NVMe perf or others?

        For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

        1  qpair is not in polling group
        spdk_nvmf_qpair_disconnect
        	nvmf_transport_qpair_fini

        2  spdk_nvmf_qpair_disconnect
        		....
        	_nvmf_qpair_destroy
        		nvmf_transport_qpair_fini
        			..
        			nvmf_tcp_close_qpair


        3  spdk_nvmf_qpair_disconnect
        		....
        	_nvmf_qpair_destroy
        		_nvmf_ctrlr_free_from_qpair	
        			_nvmf_transport_qpair_fini
        				..
        				nvmf_tcp_close_qpair


        spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

        (1) _pdu_write_done (if there is error for write);
        (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
        (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
        (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


        Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Saturday, August 22, 2020 3:15 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Print backtrace in SPDK

        Hi,

        Does anyone know if there is a function in SPDK that prints the backtrace?

        I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

        Thanks,
        -Wenhua
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org

_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-26  0:20 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-26  0:20 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6564 bytes --]

Hi Ziye,

I'm using Ubuntu-20.04.1. The Linux kernel version seems to be 5.4.44
~spdk$ cat /proc/version_signature 
Ubuntu 5.4.0-42.46-generic 5.4.44
~/spdk$

I downloaded, buit and installed liburing from source.
git clone https://github.com/axboe/liburing.git

After switching to uring sock implementation,  the "connection reset by peer" problem is gone. I tried to power on and shutdown my testing VM and did not see one single "connection reset by peer" issue. Before this, every time, I powered on my testing VM, there were multiple "connection reset by peer" failures happened.

Actually, I had this issue back to April/May. At that time, I could not identify/corelate how the issue happened and did not drill down. This time, the issue happened so frequently. This helped me dig out more information.

In summary, it seems the posix sock implementation may have some problem. I'm not sure if this is generic or specific for running SPDK in VM. The issue might also be related to our initiator implementation.

Thanks,
-Wenhua


On 8/24/20, 12:33 AM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    You need to compile spdk with --with-uring option.  And you need to 
    1 Download the liburing and install it by yourself.
    2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

    What's you kernel version in the VM?

    Thanks.




    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Monday, August 24, 2020 3:19 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Re: Print backtrace in SPDK

    Hi Ziye,

    I'm using SPDK NVMe-oF target.

    I used some other way and figured out the following call path:
    posix_sock_group_impl_poll
    -> _sock_flush    <------------------ failed
    -> spdk_sock_abort_requests
       -> _pdu_write_done
          -> nvmf_tcp_qpair_disconnect
             -> spdk_nvmf_qpair_disconnect
                -> _nvmf_qpair_destroy
                   -> spdk_nvmf_poll_group_remove
                      -> nvmf_transport_poll_group_remove
                         -> nvmf_tcp_poll_group_remove
                            -> spdk_sock_group_remove_sock
                               -> posix_sock_group_impl_remove_sock
                                  -> spdk_sock_abort_requests
                   -> _nvmf_ctrlr_free_from_qpair
                      -> _nvmf_transport_qpair_fini
                         -> nvmf_transport_qpair_fini
                            -> nvmf_tcp_close_qpair
                               -> spdk_sock_close

    The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

    I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

    By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

    Thanks,
    -Wenhua 

    On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

        Hi Wenhua,

        Which applications are you using from SPDK?  
        1 SPDK NVMe-oF target in target side?
        2  SPDK NVMe perf or others?

        For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

        1  qpair is not in polling group
        spdk_nvmf_qpair_disconnect
        	nvmf_transport_qpair_fini

        2  spdk_nvmf_qpair_disconnect
        		....
        	_nvmf_qpair_destroy
        		nvmf_transport_qpair_fini
        			..
        			nvmf_tcp_close_qpair


        3  spdk_nvmf_qpair_disconnect
        		....
        	_nvmf_qpair_destroy
        		_nvmf_ctrlr_free_from_qpair	
        			_nvmf_transport_qpair_fini
        				..
        				nvmf_tcp_close_qpair


        spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

        (1) _pdu_write_done (if there is error for write);
        (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
        (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
        (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


        Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


        Best Regards
        Ziye Yang 

        -----Original Message-----
        From: Wenhua Liu <liuw(a)vmware.com> 
        Sent: Saturday, August 22, 2020 3:15 PM
        To: Storage Performance Development Kit <spdk(a)lists.01.org>
        Subject: [SPDK] Print backtrace in SPDK

        Hi,

        Does anyone know if there is a function in SPDK that prints the backtrace?

        I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

        Thanks,
        -Wenhua
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org
        _______________________________________________
        SPDK mailing list -- spdk(a)lists.01.org
        To unsubscribe send an email to spdk-leave(a)lists.01.org

    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-24  7:32 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-24  7:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4923 bytes --]

Hi Wenhua,

You need to compile spdk with --with-uring option.  And you need to 
1 Download the liburing and install it by yourself.
2 Check your kernel version. Uring socket implementation depends on the kernel (> 5.4.3).

What's you kernel version in the VM?

Thanks.




Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Monday, August 24, 2020 3:19 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: Print backtrace in SPDK

Hi Ziye,

I'm using SPDK NVMe-oF target.

I used some other way and figured out the following call path:
posix_sock_group_impl_poll
-> _sock_flush    <------------------ failed
-> spdk_sock_abort_requests
   -> _pdu_write_done
      -> nvmf_tcp_qpair_disconnect
         -> spdk_nvmf_qpair_disconnect
            -> _nvmf_qpair_destroy
               -> spdk_nvmf_poll_group_remove
                  -> nvmf_transport_poll_group_remove
                     -> nvmf_tcp_poll_group_remove
                        -> spdk_sock_group_remove_sock
                           -> posix_sock_group_impl_remove_sock
                              -> spdk_sock_abort_requests
               -> _nvmf_ctrlr_free_from_qpair
                  -> _nvmf_transport_qpair_fini
                     -> nvmf_transport_qpair_fini
                        -> nvmf_tcp_close_qpair
                           -> spdk_sock_close

The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

Thanks,
-Wenhua 

On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    Which applications are you using from SPDK?  
    1 SPDK NVMe-oF target in target side?
    2  SPDK NVMe perf or others?

    For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

    1  qpair is not in polling group
    spdk_nvmf_qpair_disconnect
    	nvmf_transport_qpair_fini

    2  spdk_nvmf_qpair_disconnect
    		....
    	_nvmf_qpair_destroy
    		nvmf_transport_qpair_fini
    			..
    			nvmf_tcp_close_qpair


    3  spdk_nvmf_qpair_disconnect
    		....
    	_nvmf_qpair_destroy
    		_nvmf_ctrlr_free_from_qpair	
    			_nvmf_transport_qpair_fini
    				..
    				nvmf_tcp_close_qpair


    spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

    (1) _pdu_write_done (if there is error for write);
    (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
    (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
    (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


    Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Saturday, August 22, 2020 3:15 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Print backtrace in SPDK

    Hi,

    Does anyone know if there is a function in SPDK that prints the backtrace?

    I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

    Thanks,
    -Wenhua
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org

_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-24  7:18 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-24  7:18 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4242 bytes --]

Hi Ziye,

I'm using SPDK NVMe-oF target.

I used some other way and figured out the following call path:
posix_sock_group_impl_poll
-> _sock_flush    <------------------ failed
-> spdk_sock_abort_requests
   -> _pdu_write_done
      -> nvmf_tcp_qpair_disconnect
         -> spdk_nvmf_qpair_disconnect
            -> _nvmf_qpair_destroy
               -> spdk_nvmf_poll_group_remove
                  -> nvmf_transport_poll_group_remove
                     -> nvmf_tcp_poll_group_remove
                        -> spdk_sock_group_remove_sock
                           -> posix_sock_group_impl_remove_sock
                              -> spdk_sock_abort_requests
               -> _nvmf_ctrlr_free_from_qpair
                  -> _nvmf_transport_qpair_fini
                     -> nvmf_transport_qpair_fini
                        -> nvmf_tcp_close_qpair
                           -> spdk_sock_close

The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.

I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.

By the way, I noticed there is a uring based sock implementation,  how do I switch to this sock implementation. It seems the default is posix sock implementation.

Thanks,
-Wenhua 

On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:

    Hi Wenhua,

    Which applications are you using from SPDK?  
    1 SPDK NVMe-oF target in target side?
    2  SPDK NVMe perf or others?

    For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

    1  qpair is not in polling group
    spdk_nvmf_qpair_disconnect
    	nvmf_transport_qpair_fini

    2  spdk_nvmf_qpair_disconnect
    		....
    	_nvmf_qpair_destroy
    		nvmf_transport_qpair_fini
    			..
    			nvmf_tcp_close_qpair


    3  spdk_nvmf_qpair_disconnect
    		....
    	_nvmf_qpair_destroy
    		_nvmf_ctrlr_free_from_qpair	
    			_nvmf_transport_qpair_fini
    				..
    				nvmf_tcp_close_qpair


    spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

    (1) _pdu_write_done (if there is error for write);
    (2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
    (3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
    (4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


    Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


    Best Regards
    Ziye Yang 

    -----Original Message-----
    From: Wenhua Liu <liuw(a)vmware.com> 
    Sent: Saturday, August 22, 2020 3:15 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: [SPDK] Print backtrace in SPDK

    Hi,

    Does anyone know if there is a function in SPDK that prints the backtrace?

    I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

    Thanks,
    -Wenhua
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-24  4:55 Yang, Ziye
  0 siblings, 0 replies; 15+ messages in thread
From: Yang, Ziye @ 2020-08-24  4:55 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

Hi Wenhua,

Which applications are you using from SPDK?  
1 SPDK NVMe-oF target in target side?
2  SPDK NVMe perf or others?

For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.

1  qpair is not in polling group
spdk_nvmf_qpair_disconnect
	nvmf_transport_qpair_fini

2  spdk_nvmf_qpair_disconnect
		....
	_nvmf_qpair_destroy
		nvmf_transport_qpair_fini
			..
			nvmf_tcp_close_qpair


3  spdk_nvmf_qpair_disconnect
		....
	_nvmf_qpair_destroy
		_nvmf_ctrlr_free_from_qpair	
			_nvmf_transport_qpair_fini
				..
				nvmf_tcp_close_qpair


spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:

(1) _pdu_write_done (if there is error for write);
(2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
(3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
(4) nvmf_tcp_sock_cb.   TCP PDU related handling issue. 


Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.


Best Regards
Ziye Yang 

-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com> 
Sent: Saturday, August 22, 2020 3:15 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Print backtrace in SPDK

Hi,

Does anyone know if there is a function in SPDK that prints the backtrace?

I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.

Thanks,
-Wenhua
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-23  2:53 Wenhua Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Wenhua Liu @ 2020-08-23  2:53 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3627 bytes --]

Thanks Andrey for suggestion. The backtrace(3) seems to be the right way, but is not sufficient.

There is a function rte_dump_stack which calls backtrace and backtrace_symbols. I copied it into nvmf_tcp_close_qpair and modified a little bit as below:
+       {
+               void *func[BACKTRACE_SIZE];
+               char **symb = NULL;
+               int size;
+       
+               size = backtrace(func, BACKTRACE_SIZE);
+               symb = backtrace_symbols(func, size);
+       
+               if (symb == NULL)
+                       return;
+       
+               while (size > 0) {
+                       SPDK_ERRLOG("%d: [%s]\n", size, symb[size - 1]);
+                       size --;
+               }
+       
+               free(symb);
+       }

The output I got is like:
[2020-08-22 21:44:34.404823] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 13: [build/bin/nvmf_tgt(+0x126fe) [0x5627642236fe]]
[2020-08-22 21:44:34.404857] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 12: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fed1369d1e3]]
[2020-08-22 21:44:34.404868] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 11: [build/bin/nvmf_tgt(+0x128e9) [0x5627642238e9]]
[2020-08-22 21:44:34.404876] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 10: [build/bin/nvmf_tgt(+0xd9f32) [0x5627642eaf32]]
[2020-08-22 21:44:34.404882] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 9: [build/bin/nvmf_tgt(+0xdc096) [0x5627642ed096]]
[2020-08-22 21:44:34.404889] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 8: [build/bin/nvmf_tgt(+0xdbc73) [0x5627642ecc73]]
[2020-08-22 21:44:34.404895] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 7: [build/bin/nvmf_tgt(+0xdbb5d) [0x5627642ecb5d]]
[2020-08-22 21:44:34.404902] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 6: [build/bin/nvmf_tgt(+0xe35da) [0x5627642f45da]]
[2020-08-22 21:44:34.404908] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 5: [build/bin/nvmf_tgt(+0xe3054) [0x5627642f4054]]
[2020-08-22 21:44:34.404915] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 4: [build/bin/nvmf_tgt(+0xe2d94) [0x5627642f3d94]]
[2020-08-22 21:44:34.404922] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 3: [build/bin/nvmf_tgt(+0xca3b7) [0x5627642db3b7]]
[2020-08-22 21:44:34.404928] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 2: [build/bin/nvmf_tgt(+0xd09e8) [0x5627642e19e8]]
[2020-08-22 21:44:34.404935] tcp.c:2395:nvmf_tcp_close_qpair: *ERROR*: 1: [build/bin/nvmf_tgt(+0xd8158) [0x5627642e9158]]

This is not very helpful.

Thanks,
-Wenhua


On 8/22/20, 12:58 AM, "Andrey Kuzmin" <andrey.v.kuzmin(a)gmail.com> wrote:

    On Sat, Aug 22, 2020, 10:14 Wenhua Liu <liuw(a)vmware.com> wrote:

    > Hi,
    >
    > Does anyone know if there is a function in SPDK that prints the backtrace?
    >

    backtrace(3) lets you do that.

    Regards,
    Andrey


    > I run into a “Connection Reset by Peer” issue on host side when testing
    > NVMe/TCP. I identified it’s because some queue pairs are closed
    > unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out
    > how/why this function is called. I thought if the backtrace can be printed
    > when calling this function, it might be helpful to me to find the root
    > cause.
    >
    > Thanks,
    > -Wenhua
    > _______________________________________________
    > SPDK mailing list -- spdk(a)lists.01.org
    > To unsubscribe send an email to spdk-leave(a)lists.01.org
    >
    _______________________________________________
    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SPDK] Re: Print backtrace in SPDK
@ 2020-08-22  7:57 Andrey Kuzmin
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Kuzmin @ 2020-08-22  7:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Sat, Aug 22, 2020, 10:14 Wenhua Liu <liuw(a)vmware.com> wrote:

> Hi,
>
> Does anyone know if there is a function in SPDK that prints the backtrace?
>

backtrace(3) lets you do that.

Regards,
Andrey


> I run into a “Connection Reset by Peer” issue on host side when testing
> NVMe/TCP. I identified it’s because some queue pairs are closed
> unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out
> how/why this function is called. I thought if the backtrace can be printed
> when calling this function, it might be helpful to me to find the root
> cause.
>
> Thanks,
> -Wenhua
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-10  1:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-30  7:52 [SPDK] Re: Print backtrace in SPDK Yang, Ziye
  -- strict thread matches above, loose matches on Subject: below --
2020-09-10  1:30 Yang, Ziye
2020-08-30  6:04 Wenhua Liu
2020-08-27  5:09 Yang, Ziye
2020-08-27  5:04 Wenhua Liu
2020-08-26  4:51 Wenhua Liu
2020-08-26  4:31 Yang, Ziye
2020-08-26  4:27 Wenhua Liu
2020-08-26  1:50 Yang, Ziye
2020-08-26  0:20 Wenhua Liu
2020-08-24  7:32 Yang, Ziye
2020-08-24  7:18 Wenhua Liu
2020-08-24  4:55 Yang, Ziye
2020-08-23  2:53 Wenhua Liu
2020-08-22  7:57 Andrey Kuzmin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.