All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harris, James R <james.r.harris at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem
Date: Mon, 01 Apr 2019 21:27:04 +0000	[thread overview]
Message-ID: <25CD21CD-F692-4C8B-B3B3-FE7CD47C9DEC@intel.com> (raw)
In-Reply-To: AM6PR04MB512748A9107588845108831989550@AM6PR04MB5127.eurprd04.prod.outlook.com

[-- Attachment #1: Type: text/plain, Size: 5276 bytes --]

Hi Shahar,

On v18.10.x, _nvmf_ctrlr_destruct calls spdk_nvmf_ctrlr_destruct, and spdk_nvmf_ctrlr_destruct ends with free(ctrlr).  So the compiler is doing a tail call optimization.  The call stack makes sense now.

Seeing malloc_printerr in the backtrace indicates that most likely this ctrlr pointer was double-freed.  I see this spdk_nvmf_ctrlr_destruct call chain has been modified since v18.10, possibly to fix this issue.  I’m explicitly adding Seth here to get his take.

Thanks,

-Jim


From: Shahar Salzman <shahar.salzman(a)kaminario.com>
Date: Monday, April 1, 2019 at 2:16 PM
To: James Harris <james.r.harris(a)intel.com>, Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem

Looks like the problem is in _nvmf_ctrlr_destruct.
Since the memory is not accessible, I copied the function to another on stack array, and nullified the value after it has been successfully called.

Looking at _nvmf_ctrlr_destruct, I see that it indeed calls free.
I'll get to the bottom of things tomorrow, but I still can't understand how the compiler has been able to optimize the function calls in between, as they are all function pointers...

#5  0x000000000047320f in _spdk_event_queue_run_batch (arg=0x7fff54008640) at reactor.c:207
        event = <value optimized out>
        count = 8
        i = <value optimized out>
        events = {0x2000fe77e640, 0x2000fe77f240, 0x2000fe77f900, 0x2000fe77f540, 0x2000fe77ef40, 0x2000fe77f180, 0x2000fe77e580, 0x2000fe77e880}
        fns = {0, 0, 0, 0, 0x46aef0 <_nvmf_ctrlr_destruct>, 0, 0, 0}

________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Shahar Salzman <shahar.salzman(a)kaminario.com>
Sent: Monday, April 1, 2019 9:48 PM
To: Harris, James R; Storage Performance Development Kit
Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem

Thanks for the tips!

I just recreated the issue with a smaller application, I will try to recreate it with the spdk app
The spdk stack is 18.10.x, but the application is ours, which is why the top frames are not familiar.

I already tried to print the events, but I am unable to print them, possibly since they have already been freed.

I will try to recreate this with a debug build, and then I can get a better idea as to the functions being run on the event.


________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Harris, James R <james.r.harris(a)intel.com>
Sent: Monday, April 1, 2019 7:02 PM
To: Storage Performance Development Kit
Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem



On 4/1/19, 8:15 AM, "SPDK on behalf of Shahar Salzman" <spdk-bounces(a)lists.01.org on behalf of shahar.salzman(a)kaminario.com> wrote:

    Hi,

    I am running with spdk 18.10.3, and crashing during delete_nvmf_subsystem.
    This happens pretty consistently (every 2-3 attempts) when I do the delete while the initiator is still connected.

    I saw that bug #235 has the same stack trace as I am seeing, but I already have the fix.
    I am digging into this, but would be glad to get some pointers as to where to start looking, or more specifically, how did you know to look into the lib/nvmf controller code according to this error.

Hi Shahar,

Stack frames #8 and #9 indicate maybe an out-of-tree implementation of some parts of the reactor framework?  If so, are you able to reproduce this issue with the stock SPDK NVMe-of target?

Can you go to frame #5 and print event->fn?  The callstack indicates this is calling free() directly, but that doesn't really make sense here.  I would suggest you start looking there for clues.

Regards,

-Jim


    This is my stack:
    (gdb) bt
    #0  0x00007f338ed46495 in raise () from /lib64/libc.so.6
    #1  0x00007f338ed47c75 in abort () from /lib64/libc.so.6
    #2  0x00007f338ed843a7 in __libc_message () from /lib64/libc.so.6
    #3  0x00007f338ed89dee in malloc_printerr () from /lib64/libc.so.6
    #4  0x00007f338ed8cc80 in _int_free () from /lib64/libc.so.6
    #5  0x000000000047ca07 in _spdk_event_queue_run_batch (arg=0x7f32dc008640) at reactor.c:205
    #6  _spdk_reactor_run (arg=0x7f32dc008640) at reactor.c:502
    #7  0x000000000047ccac in spdk_reactors_start () at reactor.c:677
    #8  0x00000000007a942e in km_spdk_target_reactor_thread (arg=0x1da4b160) at km_spdk_target_reactor.c:143
    #9  0x00000000007b28e1 in km_thread_log (ctxt=0x7f32e0001ea0) at km_thread_pthreads.c:79
    #10 0x00007f338fb85aa1 in start_thread () from /lib64/libpthread.so.0
    #11 0x00007f338edfcbcd in clone () from /lib64/libc.so.6


    I will open a bug on github for this once I collect all the information.

    Shahar
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

             reply	other threads:[~2019-04-01 21:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-01 21:27 Harris, James R [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-04-03 19:58 [SPDK] __libc_message calling assert during delete_nvmf_subsystem Howell, Seth
2019-04-03 18:50 Shahar Salzman
2019-04-03 18:39 Howell, Seth
2019-04-03  7:01 Shahar Salzman
2019-04-03  5:48 Shahar Salzman
2019-04-02 15:02 Howell, Seth
2019-04-02 14:23 Shahar Salzman
2019-04-02 10:38 Shahar Salzman
2019-04-01 22:15 Howell, Seth
2019-04-01 21:24 Shahar Salzman
2019-04-01 21:16 Shahar Salzman
2019-04-01 18:48 Shahar Salzman
2019-04-01 16:02 Harris, James R
2019-04-01 15:14 Shahar Salzman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25CD21CD-F692-4C8B-B3B3-FE7CD47C9DEC@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.