From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Benjamin Marzinski" Subject: Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout Date: Fri, 5 Oct 2018 12:02:59 -0500 Message-ID: <20181005170259.GD3172@octiron.msp.redhat.com> References: <1537571127-10143-1-git-send-email-bmarzins@redhat.com> <1537571127-10143-2-git-send-email-bmarzins@redhat.com> <20181004163118.GA3172@octiron.msp.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Martin Wilck Cc: mwilck+gmail@suse.de, device-mapper development List-Id: dm-devel.ids On Fri, Oct 05, 2018 at 12:11:18PM +0200, Martin Wilck wrote: > On Thu, 2018-10-04 at 11:31 -0500, Benjamin Marzinski wrote: > > It's been a while, and I'm not exactly sure what I was thinking here. > > But IIRC the idea was that if the state isn't set yet, then the old > > thread could mess with the results of a future thread. Also, it was > > to > > avoid a corner case where the tur checker was called, started the > > thread, got the result before the checker exitted, cancelled the > > thread > > and returned the result and then was called very shortly afterwards, > > before the thread could clean itself up. In that case, the right > > thing > > to do (I thought) would be to start a new thread, because the other > > one > > would be ending soon. In truth, starting a new thread at all is > > probably a bad idea, since the old thread will still mess with the > > checker context on exit. = > > = > > A better idea might be to simply fail back to a syncronous call to > > tur_check() when you notice a cancelled thread that hasn't exitted. > > That can cause all the other checkers to get delayed, but at least in > > that case you are still checking paths. The other option is to do as > > before and just not check this path, which won't effect the other > > checkers. It seems very unlikely that the thread could remain > > uncancelled forever, especially if the path was usable. > > > > thoughts? > = > Generally speaking, we're deeply in the realm of highly unlikely > situations I would say. But we should get it right eventually. > = > Maybe we can add logic to the tur thread to keep its hands off the > context if it's a "zombie", like below (just a thought, untested)? This still wouldn't stop a thread from racing with new thread creation to change ct->holders or ct->running. -Ben > Martin > = > = > diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c > index bf8486d3..e8493ca8 100644 > --- a/libmultipath/checkers/tur.c > +++ b/libmultipath/checkers/tur.c > @@ -219,11 +219,18 @@ static void cleanup_func(void *data) > rcu_unregister_thread(); > } > = > +#define tur_thread_quit_unless_owner(__ct) \ > + if (__ct->thread !=3D pthread_self()) { \ > + pthread_mutex_unlock(&__ct->lock); \ > + pthread_exit(NULL); \ > + } > + > static void copy_msg_to_tcc(void *ct_p, const char *msg) > { > struct tur_checker_context *ct =3D ct_p; > = > pthread_mutex_lock(&ct->lock); > + tur_thread_quit_unless_owner(ct); > strlcpy(ct->message, msg, sizeof(ct->message)); > pthread_mutex_unlock(&ct->lock); > } > @@ -243,6 +250,7 @@ static void *tur_thread(void *ctx) > = > /* TUR checker start up */ > pthread_mutex_lock(&ct->lock); > + tur_thread_quit_unless_owner(ct); > ct->state =3D PATH_PENDING; > ct->message[0] =3D '\0'; > pthread_mutex_unlock(&ct->lock); > @@ -252,6 +260,7 @@ static void *tur_thread(void *ctx) > = > /* TUR checker done */ > pthread_mutex_lock(&ct->lock); > + tur_thread_quit_unless_owner(ct); > ct->state =3D state; > pthread_cond_signal(&ct->active); > pthread_mutex_unlock(&ct->lock); > = > = > = > = > -- = > Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 > SUSELinux GmbH, GF: Felix Imend=F6rffer, Jane Smithard, Graham Norton > HRB 21284 (AG N=FCrnberg) > =