dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Benjamin Marzinski <bmarzins@redhat.com>
To: mwilck@suse.com
Cc: lixiaokeng@huawei.com, dm-devel@redhat.com
Subject: Re: [PATCH 01/23] multipathd: uxlsnr: avoid deadlock on exit
Date: Fri, 25 Sep 2020 20:52:07 -0500	[thread overview]
Message-ID: <20200926015207.GJ3384@octiron.msp.redhat.com> (raw)
In-Reply-To: <20200924134054.14632-2-mwilck@suse.com>

On Thu, Sep 24, 2020 at 03:40:32PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> The uxlsnr wouldn't always release the client lock when cancelled,
> causing a deadlock in uxsock_cleanup(). While this hasn't been
> caused by commit 3d611a2, the deadlock seems to have become much
> more likely after that patch. Solving this means that we have to
> treat reallocation failure of the pollfd array differently.
> We will now just ignore any clients above the last valid pfd index.
> That's a minor problem, as we're in an OOM situation anyway.
> 
> Moreover, client_lock is not a "struct lock", but a plain
> pthread_mutex_t.
> 
> Fixes: 3d611a2 ("multipathd: cancel threads early during shutdown")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/uxlsnr.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index 1c5ce9d..d47ba1a 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -35,6 +35,7 @@
>  #include "config.h"
>  #include "mpath_cmd.h"
>  #include "time-util.h"
> +#include "util.h"
>  
>  #include "main.h"
>  #include "cli.h"
> @@ -116,7 +117,7 @@ static void _dead_client(struct client *c)
>  
>  static void dead_client(struct client *c)
>  {
> -	pthread_cleanup_push(cleanup_lock, &client_lock);
> +	pthread_cleanup_push(cleanup_mutex, &client_lock);
>  	pthread_mutex_lock(&client_lock);
>  	_dead_client(c);
>  	pthread_cleanup_pop(1);
> @@ -306,6 +307,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, long ux_sock,
>  
>  		/* setup for a poll */
>  		pthread_mutex_lock(&client_lock);
> +		pthread_cleanup_push(cleanup_mutex, &client_lock);
>  		num_clients = 0;
>  		list_for_each_entry(c, &clients, node) {
>  			num_clients++;
> @@ -322,14 +324,13 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, long ux_sock,
>  						sizeof(struct pollfd));
>  			}
>  			if (!new) {
> -				pthread_mutex_unlock(&client_lock);
>  				condlog(0, "%s: failed to realloc %d poll fds",
>  					"uxsock", 2 + num_clients);
> -				sched_yield();
> -				continue;
> +				num_clients = old_clients;

O.k. I'm getting way into the theoretical weeds here, but I believe that
realloc() is technically allowed to return NULL when it shrinks
allocated memory. In this case num_clients would be too big.  Later in
this function, when we loop through num_clients

                for (i = 2; i < num_clients + 2; i++) {
                        if (polls[i].revents & POLLIN) {
 
We could look at an unused polls entry, since its revents doesn't get
cleared. It's also possible that the fd of this unused entry matches the
fd of an existing client. Then we could try to get a packet from a
client that isn't sending one, and kill that client. Yeah, this will
almost certainly never happen.  But we could just zero out the revents
field, or loop over the actual number of structures we polled, and then
it can't happen.

-Ben

> +			} else {
> +				old_clients = num_clients;
> +				polls = new;
>  			}
> -			old_clients = num_clients;
> -			polls = new;
>  		}
>  		polls[0].fd = ux_sock;
>  		polls[0].events = POLLIN;
> @@ -347,8 +348,10 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, long ux_sock,
>  			polls[i].fd = c->fd;
>  			polls[i].events = POLLIN;
>  			i++;
> +			if (i >= 2 + num_clients)
> +				break;
>  		}
> -		pthread_mutex_unlock(&client_lock);
> +		pthread_cleanup_pop(1);
>  
>  		/* most of our life is spent in this call */
>  		poll_count = ppoll(polls, i, &sleep_time, &mask);
> -- 
> 2.28.0

  reply	other threads:[~2020-09-26  1:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24 13:40 [PATCH 00/23] libmultipath: improve cleanup on exit mwilck
2020-09-24 13:40 ` [PATCH 01/23] multipathd: uxlsnr: avoid deadlock " mwilck
2020-09-26  1:52   ` Benjamin Marzinski [this message]
2020-09-26 13:00     ` Martin Wilck
2020-09-24 13:40 ` [PATCH 02/23] multipathd: Fix liburcu memory leak mwilck
2020-09-24 13:40 ` [PATCH 03/23] multipathd: move handling of io_err_stat_attr into libmultipath mwilck
2020-09-24 13:40 ` [PATCH 04/23] multipathd: move vecs desctruction into cleanup function mwilck
2020-09-24 13:40 ` [PATCH 05/23] multipathd: make some globals static mwilck
2020-09-24 13:40 ` [PATCH 06/23] multipathd: move threads destruction into separate function mwilck
2020-09-24 13:40 ` [PATCH 07/23] multipathd: move conf " mwilck
2020-09-28 18:03   ` Benjamin Marzinski
2020-09-29  9:12     ` Martin Wilck
2020-09-24 13:40 ` [PATCH 08/23] multipathd: move pid " mwilck
2020-09-24 13:40 ` [PATCH 09/23] multipathd: close pidfile on exit mwilck
2020-09-24 13:40 ` [PATCH 10/23] multipathd: add helper for systemd notification at exit mwilck
2020-09-24 13:40 ` [PATCH 11/23] multipathd: child(): call cleanups in failure case, too mwilck
2020-09-24 13:40 ` [PATCH 12/23] multipathd: unwatch_all_dmevents: check if waiter is initialized mwilck
2020-09-24 13:40 ` [PATCH 13/23] multipathd: print error message if config can't be loaded mwilck
2020-09-24 13:40 ` [PATCH 14/23] libmultipath: add libmp_dm_exit() mwilck
2020-09-28 18:41   ` Benjamin Marzinski
2020-09-24 13:40 ` [PATCH 15/23] multipathd: fixup libdm deinitialization mwilck
2020-09-24 13:40 ` [PATCH 16/23] libmultipath: log_thread_stop(): check if logarea is initialized mwilck
2020-09-24 13:40 ` [PATCH 17/23] multipathd: add cleanup_child() exit handler mwilck
2020-09-24 13:40 ` [PATCH 18/23] libmultipath: fix log_thread startup and teardown mwilck
2020-09-28 20:15   ` Benjamin Marzinski
2020-09-29  9:18     ` Martin Wilck
2020-09-24 13:40 ` [PATCH 19/23] multipathd: move cleanup_{prio, checkers, foreign} to libmultipath_exit mwilck
2020-09-28 20:26   ` Benjamin Marzinski
2020-09-29  9:31     ` Martin Wilck
2020-09-29 17:50       ` Benjamin Marzinski
2020-09-24 13:40 ` [PATCH 20/23] multipath: use atexit() for cleanup handlers mwilck
2020-09-24 13:40 ` [PATCH 21/23] mpathpersist: " mwilck
2020-09-24 13:40 ` [PATCH 22/23] multipath: fix leaks in check_path_valid() mwilck
2020-09-24 13:40 ` [PATCH 23/23] multipath-tools: mpath-tools.supp: file with valgrind suppressions mwilck
2020-09-24 20:58   ` Benjamin Marzinski
2020-09-25  9:53     ` Martin Wilck
2020-09-28 20:51 ` [PATCH 00/23] libmultipath: improve cleanup on exit Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200926015207.GJ3384@octiron.msp.redhat.com \
    --to=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=lixiaokeng@huawei.com \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).