All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [QUESTION] multipathd crash when stopping
@ 2021-01-26  6:50 lixiaokeng
  2021-01-26  8:34 ` Martin Wilck
  0 siblings, 1 reply; 7+ messages in thread
From: lixiaokeng @ 2021-01-26  6:50 UTC (permalink / raw)
  To: Christophe Varoqui, Martin Wilck, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen

When stopping multipathd (systemctl restart multipathd.service), there
is a multipathd crash occasionally(not reproduced).

Here is stack:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000ffff87d9e81c in __GI_abort () at abort.c:79
#2 0x0000ffff87dd7818 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0xffff87e97888 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x0000ffff87dddf6c in malloc_printerr (
str=str@entry=0xffff87e950d0 "free(): invalid pointer") at malloc.c:5389
#4 0x0000ffff87ddf780 in _int_free (av=0xffff87ed7a58 <main_arena>, p=0xffff80000070,
have_lock=0) at malloc.c:4172
#5 0x0000ffff880f55a8 in internal_hashmap_clear (h=h@entry=0xffff80027980,
default_free_key=, default_free_value=)
at ../src/basic/hashmap.c:902
#6 0x0000ffff880f56a0 in internal_hashmap_free (h=,
default_free_key=, default_free_value=,
default_free_value=, default_free_key=, h=)
at ../src/basic/hashmap.c:874
#7 0x0000ffff880f582c in ordered_hashmap_free_free_free () at ../src/basic/hashmap.h:118
#8 device_free (device=0xffff80027820) at ../src/libsystemd/sd-device/sd-device.c:68
#9 sd_device_unref (p=) at ../src/libsystemd/sd-device/sd-device.c:78
#10 0x0000ffff88100978 in sd_device_unrefp () at ../src/systemd/sd-device.h:118
#11 device_new_from_nulstr (len=, nulstr=0xffff877f93d0 "",
ret=) at ../src/libsystemd/sd-device/device-private.c:448
#12 device_monitor_receive_device (m=0xffff80000b20, ret=ret@entry=0xffff877fb388)
at ../src/libsystemd/sd-device/device-monitor.c:447
#13 0x0000ffff881028a4 in udev_monitor_receive_sd_device (ret=0xffff877fb388,
udev_monitor=0xffff80000c70) at ../src/libudev/libudev-monitor.c:207
#14 udev_monitor_receive_device (udev_monitor=0xffff80000c70,
udev_monitor@entry=0xffff877fb3a0) at ../src/libudev/libudev-monitor.c:253
#15 0x0000ffff881a3478 in uevent_listen (udev=0xffff877fbf40) at uevent.c:853
#16 0x0000aaaadc524514 in ueventloop (ap=0xffffc4134bd0) at main.c:1518
#17 0x0000ffff880827ac in start_thread (arg=0xffff8821e380) at pthread_create.c:486
#18 0x0000ffff87e3c47c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

There's a strange phenomenon here.
In frame 11, nulstr=0xffff877f93d0 "". But in frame 12,
x/32bs (uint8_t*) &buf.raw[bufpos]
0xffff877f9360: "ACTION"
0xffff877f9367: "change"
0xffff877f936e: "DEVPATH"
0xffff877f9376: "/devices/virtual/block/dm-69"
0xffff877f9393: "SUBSYSTEM"
0xffff877f939d: "block"
0xffff877f93a3: "DM_COOKIE"
0xffff877f93ad: "23068672"
0xffff877f93b6: "DEVNAME"
0xffff877f93be: "/dev/dm-69"
0xffff877f93c9: "DEVTYPE"
0xffff877f93d1: "disk"
0xffff877f93d6: "SEQNUM"
0xffff877f93dd: "14437"
0xffff877f93e3: "USEC_INITIALIZED"
0xffff877f93f4: "8213096220"
0xffff877f93ff: "MAJOR"
0xffff877f9405: "253"
0xffff877f9409: "MINOR"
0xffff877f940f: "69"
0xffff877f9412: "DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG"
0xffff877f9438: "1"
0xffff877f943a: "DM_UDEV_PRIMARY_SOURCE_FLAG"
0xffff877f9456: "1"
0xffff877f9458: "DM_SUBSYSTEM_UDEV_FLAG0"
0xffff877f9470: "1"
0xffff877f9472: "DM_ACTIVATION"
0xffff877f9480: "0"
0xffff877f9482: "DM_NAME"
0xffff877f948a: "36e02861100592fcc99ad3c3800000195"
0xffff877f94ac: "DM_UUID"
0xffff877f94b4: "mpath-36e02861100592fcc99ad3c3800000195"

The udev API is suspected at first. However, hashmap is a common data
structure of systemd. Systemd has never had the same call stack.
Can someone help me?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-26  6:50 [dm-devel] [QUESTION] multipathd crash when stopping lixiaokeng
@ 2021-01-26  8:34 ` Martin Wilck
  2021-01-26  9:23   ` lixiaokeng
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2021-01-26  8:34 UTC (permalink / raw)
  To: lixiaokeng, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen

On Tue, 2021-01-26 at 14:50 +0800, lixiaokeng wrote:
> When stopping multipathd (systemctl restart multipathd.service),
> there
> is a multipathd crash occasionally(not reproduced).

If this happens only during shutdown, it might be related to thread
cancellation. Can you try disabling cancellation for the
udev_monitor_receive_device() call?


    int oldstate;

    pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldstate);

    udev_monitor_receive_device(...)

    pthread_setcancelstate(oldstate, NULL);
    pthread_testcancel();

In general, running multipathd under valgrind might help finding the
issue. 

But valgrind will slow down multipathd drastically, so timings will
change, and it's not granted that the problem will still be
reproducable. Alternatively, you can work with -fsanitize=address, but
in this specific case you'd need to compile libudev with this option,
too.

Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-26  8:34 ` Martin Wilck
@ 2021-01-26  9:23   ` lixiaokeng
  2021-01-26  9:28     ` Martin Wilck
  0 siblings, 1 reply; 7+ messages in thread
From: lixiaokeng @ 2021-01-26  9:23 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen


> In general, running multipathd under valgrind might help finding the
> issue. 
> 
> But valgrind will slow down multipathd drastically, so timings will
> change, and it's not granted that the problem will still be
> reproducable. Alternatively, you can work with -fsanitize=address, but
> in this specific case you'd need to compile libudev with this option,
> too.
> 
> Martin
> 

   Thanks very much. Your suggestions is very helpful.
   The problem reproduced and the bug seems that shown in
https://bugzilla.redhat.com/show_bug.cgi?id=1293594.


Regards,
Lixiaokeng

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-26  9:23   ` lixiaokeng
@ 2021-01-26  9:28     ` Martin Wilck
  2021-01-28 13:17       ` lixiaokeng
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2021-01-26  9:28 UTC (permalink / raw)
  To: lixiaokeng, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen

On Tue, 2021-01-26 at 17:23 +0800, lixiaokeng wrote:
> 
> > In general, running multipathd under valgrind might help finding
> > the
> > issue. 
> > 
> > But valgrind will slow down multipathd drastically, so timings will
> > change, and it's not granted that the problem will still be
> > reproducable. Alternatively, you can work with -fsanitize=address,
> > but
> > in this specific case you'd need to compile libudev with this
> > option,
> > too.
> > 
> > Martin
> > 
> 
>    Thanks very much. Your suggestions is very helpful.
>    The problem reproduced and the bug seems that shown in
> https://bugzilla.redhat.com/show_bug.cgi?id=1293594.

Really? I don't see a connection to your case there. It's about
glusterfs and libgcc...

Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-26  9:28     ` Martin Wilck
@ 2021-01-28 13:17       ` lixiaokeng
  2021-01-28 16:18         ` Martin Wilck
  2021-01-28 21:09         ` Martin Wilck
  0 siblings, 2 replies; 7+ messages in thread
From: lixiaokeng @ 2021-01-28 13:17 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen


>>
>>    Thanks very much. Your suggestions is very helpful.
>>    The problem reproduced and the bug seems that shown in
>> https://bugzilla.redhat.com/show_bug.cgi?id=1293594.
> 
> Really? I don't see a connection to your case there. It's about
> glusterfs and libgcc...
> 
> Martin
> 
> 
Hi Martin:

    Change code as your suggestion, multipathd crash happens easly.
Stack:
#0  0x0000ffffb6118f4c in aarch64_fallback_frame_state (context=0xffffb523f200, context=0xffffb523f200, fs=0xffffb523e700) at ./md-unwind-support.h:74
#1  uw_frame_state_for (context=context@entry=0xffffb523f200, fs=fs@entry=0xffffb523e700) at ../../../libgcc/unwind-dw2.c:1257
#2  0x0000ffffb6119ef4 in _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0xffffb52403b0, context=context@entry=0xffffb523f200) at ../../../libgcc/unwind.inc:155
#3  0x0000ffffb611a284 in _Unwind_ForcedUnwind (exc=0xffffb52403b0, stop=stop@entry=0xffffb64846c0 <unwind_stop>, stop_argument=0xffffb523f630) at ../../../libgcc/unwind.inc:207
#4  0x0000ffffb6484860 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#5  0x0000ffffb6482d08 in __do_cancel () at pthreadP.h:304
#6  __GI___pthread_testcancel () at pthread_testcancel.c:26
#7  0x0000ffffb5c528e8 in ?? ()

This issue seems being different from that I described firstly.
Do you think they are related?
Will udev_device_unref lead to double free about first issue?

Regards,
Lixiaokeng


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-28 13:17       ` lixiaokeng
@ 2021-01-28 16:18         ` Martin Wilck
  2021-01-28 21:09         ` Martin Wilck
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2021-01-28 16:18 UTC (permalink / raw)
  To: lixiaokeng, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen

On Thu, 2021-01-28 at 21:17 +0800, lixiaokeng wrote:
> 
> > > 
> > >    Thanks very much. Your suggestions is very helpful.
> > >    The problem reproduced and the bug seems that shown in
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1293594.
> > 
> > Really? I don't see a connection to your case there. It's about
> > glusterfs and libgcc...
> > 
> > Martin
> > 
> > 
> Hi Martin:
> 
>     Change code as your suggestion, multipathd crash happens easly.
> Stack:
> #0  0x0000ffffb6118f4c in aarch64_fallback_frame_state
> (context=0xffffb523f200, context=0xffffb523f200, fs=0xffffb523e700)
> at ./md-unwind-support.h:74
> #1  uw_frame_state_for (context=context@entry=0xffffb523f200, 
> fs=fs@entry=0xffffb523e700) at ../../../libgcc/unwind-dw2.c:1257
> #2  0x0000ffffb6119ef4 in _Unwind_ForcedUnwind_Phase2
> (exc=exc@entry=0xffffb52403b0, context=context@entry=0xffffb523f200)
> at ../../../libgcc/unwind.inc:155
> #3  0x0000ffffb611a284 in _Unwind_ForcedUnwind (exc=0xffffb52403b0, 
> stop=stop@entry=0xffffb64846c0 <unwind_stop>,
> stop_argument=0xffffb523f630) at ../../../libgcc/unwind.inc:207
> #4  0x0000ffffb6484860 in __GI___pthread_unwind (buf=<optimized out>)
> at unwind.c:121
> #5  0x0000ffffb6482d08 in __do_cancel () at pthreadP.h:304
> #6  __GI___pthread_testcancel () at pthread_testcancel.c:26
> #7  0x0000ffffb5c528e8 in ?? ()
> 
> This issue seems being different from that I described firstly.
> Do you think they are related?

Yes. The thread crashes when it's cancelled. I think we have a problem
in a cleanup handler that we installed with pthread_cleanup_push().


> Will udev_device_unref lead to double free about first issue?

Any double free could cause it if it happens in a pthread cleanup
handler (iow, if we free something in the cleanup handler that had
been freed already).

What code base exactly were you using?

Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [QUESTION] multipathd crash when stopping
  2021-01-28 13:17       ` lixiaokeng
  2021-01-28 16:18         ` Martin Wilck
@ 2021-01-28 21:09         ` Martin Wilck
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2021-01-28 21:09 UTC (permalink / raw)
  To: lixiaokeng, Christophe Varoqui, Benjamin Marzinski,
	dm-devel mailing list
  Cc: linfeilong, hexiaowen

On Thu, 2021-01-28 at 21:17 +0800, lixiaokeng wrote:
> 
> This issue seems being different from that I described firstly.
> Do you think they are related?
> Will udev_device_unref lead to double free about first issue?

I just sent a patch, please try that.

Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-28 21:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26  6:50 [dm-devel] [QUESTION] multipathd crash when stopping lixiaokeng
2021-01-26  8:34 ` Martin Wilck
2021-01-26  9:23   ` lixiaokeng
2021-01-26  9:28     ` Martin Wilck
2021-01-28 13:17       ` lixiaokeng
2021-01-28 16:18         ` Martin Wilck
2021-01-28 21:09         ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.