All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: lixiaokeng <lixiaokeng@huawei.com>,
	Benjamin Marzinski <bmarzins@redhat.com>,
	Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] [PATCH] multipathd: avoid crash in uevent_cleanup()
Date: Mon, 08 Feb 2021 12:03:05 +0100	[thread overview]
Message-ID: <7b2c571eb7ff9d54c51037a4fae87796ead1144e.camel@suse.com> (raw)
In-Reply-To: <6c80ccbe-0c35-aef8-e95b-97acd06a3487@huawei.com>

On Mon, 2021-02-08 at 18:49 +0800, lixiaokeng wrote:
> 
> 
> On 2021/2/8 17:50, Martin Wilck wrote:
> > On Mon, 2021-02-08 at 15:41 +0800, lixiaokeng wrote:
> > > 
> > > Hi Martin,
> > > 
> > > There is a _cleanup_ in device_new_from_nulstr. If uevent_thr
> > > exit in
> > > device_new_from_nulstr and some keys is not be append to
> > > sd_device,
> > > the _cleanup_ will be called, which leads to multipathd crashes
> > > with
> > > the stack.
> > > 
> > > When I use your advice,
> > > 
> > > 
> > > On 2021/1/26 16:34, Martin Wilck wrote:
> > > >     int oldstate;
> > > > 
> > > >     pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldstate);
> > > > 
> > > >     udev_monitor_receive_device(...)
> > > > 
> > > >     pthread_setcancelstate(oldstate, NULL);
> > > >     pthread_testcancel();
> > > 
> > > this coredump does not seem to appear anymore (several hours with
> > > test scripts).
> > 
> > Thanks for your continued hard work on this, but I can't follow
> > you. In
> > this post:
> > 
> > https://listman.redhat.com/archives/dm-devel/2021-January/msg00396.html
> > 
> > you said that this advice did _not_ help. Please clarify.
> > 
> 
> Hi Martin,
> At that time, I did not know how the crash occurred in the systemd
> interface.
> There were still some crashes with pthread_testcancel(), for example
> #0  0x0000ffffb6118f4c in aarch64_fallback_frame_state
> (context=0xffffb523f200, context=0xffffb523f200, fs=0xffffb523e700)
> at ./md-unwind-support.h:74
> #1  uw_frame_state_for (context=context@entry=0xffffb523f200, 
> fs=fs@entry=0xffffb523e700) at ../../../libgcc/unwind-dw2.c:1257
> #2  0x0000ffffb6119ef4 in _Unwind_ForcedUnwind_Phase2
> (exc=exc@entry=0xffffb52403b0, context=context@entry=0xffffb523f200)
> at ../../../libgcc/unwind.inc:155
> #3  0x0000ffffb611a284 in _Unwind_ForcedUnwind (exc=0xffffb52403b0, 
> stop=stop@entry=0xffffb64846c0 <unwind_stop>,
> stop_argument=0xffffb523f630) at ../../../libgcc/unwind.inc:207
> #4  0x0000ffffb6484860 in __GI___pthread_unwind (buf=<optimized out>)
> at unwind.c:121
> #5  0x0000ffffb6482d08 in __do_cancel () at pthreadP.h:304
> #6  __GI___pthread_testcancel () at pthread_testcancel.c:26
> #7  0x0000ffffb5c528e8 in ?? ()
> 

I still don't fully understand. Above you said "this coredump doesn't
seem to appear any more". Am I understanding correctly that you
observed *other* core dumps instead?

The uw_frame_state_for() stack looks healthy (learned that just
recently from one of our experts in the area). Most probably the actual
crash occured in another thread in this case. It would be intersting to
look at a core dump.

The point of my suggestion was not the pthread_testcancel(), but the
blocking of thread cancellation during udev_monitor_receive_device().

> I thought these crashes might be related to crash in systemd
> interface.
> 
> However, I think these may be independent questions after analyzing
> coredump and discussing with the community. So I test it again.
> ?? and _Unwind_XXX crashes still exist but no crash in
> device_monitor_receive_device.

The "best" solution would probably be to generally disallow
cancellation, and only run pthread_testcancel() at certain points in
the code where we might block (and know that being cancelled would be
safe). That would not only make multipathd safer from crashing, it
would also enable us to remove hundreds of ugly
pthread_cleanup_push()/pop() calls from our code.

Finding all these points would be a challenge though, and if we don't
find them, we risk hanging on exit again, which is bad too, and was
just recently improved.

Regards
Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-02-08 11:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28 21:08 [dm-devel] [PATCH] multipathd: avoid crash in uevent_cleanup() mwilck
2021-02-02 20:52 ` Martin Wilck
2021-02-03 10:48   ` lixiaokeng
2021-02-03 13:57     ` Martin Wilck
2021-02-04  1:40       ` lixiaokeng
2021-02-04 15:06         ` Martin Wilck
2021-02-05 11:08           ` Martin Wilck
2021-02-05 11:09             ` Martin Wilck
2021-02-07  7:05             ` lixiaokeng
2021-03-01 14:53       ` lixiaokeng
2021-03-02  8:41         ` lixiaokeng
2021-03-02 11:07           ` Martin Wilck
2021-03-02 15:49             ` lixiaokeng
2021-03-02  9:56         ` Martin Wilck
2021-03-02 12:44           ` lixiaokeng
2021-03-02 15:29             ` Martin Wilck
2021-03-02 16:55               ` Martin Wilck
2021-03-03 10:42               ` lixiaokeng
2021-03-08  9:40                 ` Martin Wilck
2021-03-15 13:00                   ` Martin Wilck
2021-03-16 11:12                     ` lixiaokeng
2021-03-17 16:59                       ` Martin Wilck
2021-03-19  1:49                         ` lixiaokeng
2021-02-08  7:41     ` lixiaokeng
2021-02-08  9:50       ` Martin Wilck
2021-02-08 10:49         ` lixiaokeng
2021-02-08 11:03           ` Martin Wilck [this message]
2021-02-09  1:36             ` lixiaokeng
2021-02-09 17:30               ` Martin Wilck
2021-02-10  2:02                 ` lixiaokeng
2021-02-10  2:29                   ` Hexiaowen (Hex, EulerOS)
2021-02-19 10:35                     ` Martin Wilck
2021-02-19  1:36                 ` lixiaokeng
2021-02-02 22:23 ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b2c571eb7ff9d54c51037a4fae87796ead1144e.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=bmarzins@redhat.com \
    --cc=christophe.varoqui@opensvc.com \
    --cc=dm-devel@redhat.com \
    --cc=lixiaokeng@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.