From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52108C433EF for ; Fri, 17 Sep 2021 06:35:24 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 96352610C7 for ; Fri, 17 Sep 2021 06:35:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 96352610C7 Authentication-Results: mail.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631860522; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=e2veRnoN78JXmkhvcW4FslSqmU+ipJeLSolxynpHQyI=; b=fozWGnoTeAnQXd6GmzqlpQA3SX5pX87FMH5rLeBO61CAqIsdGk7IRPu1qL7YE4vmpvnX6l PAzbEdJT02xGDkwEKx2U09L7ZwRLu2/Hl7ZRrQY/Zxqh6CF8xnXOzYbD3FJjz/Au6wPfG6 EWVGT++PgthFqlDPfsLwOJXwUQKoSoI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-581-hOOWOXnFN_6c6Pg0umtFIQ-1; Fri, 17 Sep 2021 02:35:21 -0400 X-MC-Unique: hOOWOXnFN_6c6Pg0umtFIQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8C86802929; Fri, 17 Sep 2021 06:35:16 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8D449100164C; Fri, 17 Sep 2021 06:35:16 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 60253180598A; Fri, 17 Sep 2021 06:35:16 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 18GGB5rK002488 for ; Thu, 16 Sep 2021 12:11:05 -0400 Received: by smtp.corp.redhat.com (Postfix) id BEB4760C9F; Thu, 16 Sep 2021 16:11:05 +0000 (UTC) Received: from octiron.msp.redhat.com (unknown [10.15.80.209]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DB19C60FDD; Thu, 16 Sep 2021 16:11:01 +0000 (UTC) Received: from octiron.msp.redhat.com (localhost.localdomain [127.0.0.1]) by octiron.msp.redhat.com (8.14.9/8.14.9) with ESMTP id 18GGB0Sn012535; Thu, 16 Sep 2021 11:11:00 -0500 Received: (from bmarzins@localhost) by octiron.msp.redhat.com (8.14.9/8.14.9/Submit) id 18GGAx5e012534; Thu, 16 Sep 2021 11:10:59 -0500 Date: Thu, 16 Sep 2021 11:10:58 -0500 From: Benjamin Marzinski To: Martin Wilck Message-ID: <20210916161058.GB3087@octiron.msp.redhat.com> References: <20210910114120.13665-1-mwilck@suse.com> <20210910114120.13665-32-mwilck@suse.com> <20210916041452.GQ3087@octiron.msp.redhat.com> <541494688720eb07d17f73e41a7877718208ea10.camel@suse.com> <20210916150638.GX3087@octiron.msp.redhat.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Fri, 17 Sep 2021 02:33:18 -0400 Cc: lixiaokeng@huawei.com, dm-devel@redhat.com, Chongyun Wu Subject: Re: [dm-devel] [PATCH 31/35] multipathd: uxlsnr: add idle notification X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Thu, Sep 16, 2021 at 05:54:14PM +0200, Martin Wilck wrote: > On Thu, 2021-09-16 at 10:06 -0500, Benjamin Marzinski wrote: > > On Thu, Sep 16, 2021 at 10:54:19AM +0200, Martin Wilck wrote: > > > On Wed, 2021-09-15 at 23:14 -0500, Benjamin Marzinski wrote: > > > > On Fri, Sep 10, 2021 at 01:41:16PM +0200, mwilck@suse.com=A0wrote: > > > > > From: Martin Wilck > > > > >=20 > > > > > The previous patches added the state machine and the timeout > > > > > handling, > > > > > but there was no wakeup mechanism for the uxlsnr for cases > > > > > where > > > > > client connections were waiting for the vecs lock. > > > > >=20 > > > > > This patch uses the previously introduced wakeup mechanism of > > > > > struct mutex_lock for this purpose. Processes which unlock the > > > > > "global" vecs lock send an event in an eventfd which the uxlsnr > > > > > loop is polling for. > > > > >=20 > > > > > As we are now woken up for servicing client handlers that don't > > > > > wait for input but for the lock, we need to set up the pollfds > > > > > differently, and iterate over all clients when handling events, > > > > > not only over the ones that are receiving. The hangup handling > > > > > is changed, too. We have to look at every client, even if one > > > > > has > > > > > hung up. Note that I don't take client_lock for the loop in > > > > > uxsock_listen(), it's not necessary and will be removed > > > > > elsewhere > > > > > in a follow-up patch. > > > > >=20 > > > > > With this in place, the lock need not be taken in > > > > > execute_handler() > > > > > any more. The uxlsnr only ever calls trylock() on the vecs > > > > > lock, > > > > > avoiding any waiting for other threads to finish. > > > > >=20 > > > > > Signed-off-by: Martin Wilck > > > > > --- > > > > > =A0multipathd/uxlsnr.c | 211 ++++++++++++++++++++++++++++++------ > > > > > ------ > > > > > -- > > > > > =A01 file changed, 143 insertions(+), 68 deletions(-) > > > > >=20 > > >=20 > > > >=20 > > > > I do worry that if there are, for instance, a lot of uevents > > > > coming in, > > > > this could starve the uxlsnr thread, since other threads could be > > > > grabbing and releasing the vecs lock, but if it's usually being > > > > held, > > > > then the uxlsnr thread might never try to grab it when it's free, > > > > and > > > > it > > > > will keep losing its place in line. Also, every time that the > > > > vecs lock > > > > is dropped between ppoll() calls, a wakeup will get triggered, > > > > even if > > > > the lock was grabbed by something else before the ppoll thread > > > > runs. > > >=20 > > > I've thought about this too. It's true that the ppoll -> > > > pthread_mutex_trylock() sequence will never acquire the lock if > > > some > > > other thread calls lock() at the same time. > > >=20 > > > If multiple processes call lock(), the "winner" of the lock is > > > random. > > > Thus in a way this change actually adds some predictablity: the > > > uxlsnr > > > will step back if some other process is actively trying to grab the > > > lock. IMO that the right thing to do in almost all situations. > > >=20 > > > We don't need to worry about "thundering herd" issues because the > > > number of threads that might wait on the lock is rather small. In > > > the > > > worst case, 3 threads (checker, dmevents handler and uevent > > > dispatcher, > > > plus the uxlsnr in ppoll()) wait for the lock at the same time. > > > Usually > > > one of them will have it grabbed. On systems that lack dmevent > > > polling, > > > the number of waiter threads may be higher, but AFAICS it's a very > > > rare > > > condition to have hundreds of dmevents delivered to different maps > > > simultaneously, and if it happens, it's probably correct to have > > > them > > > serviced quickly. > > >=20 > > > The uevent dispatcher doesn't hold the lock, it's taken and > > > released > > > for every event handled. Thus uxlsnr has a real chance to jump in > > > between uevents. The same holds for the dmevents thread, it takes > > > the > > > lock separately for every map affected. The only piece of code that > > > holds the lock for an extended period of time (except > > > reconfigure(), > > > where it's unavoidable) is the path checker (that's bad, and next > > > on > > > the todo list). > > >=20 > > > The really "important" commands (shutdown, reconfigure) don't take > > > the > > > lock and return immediately; the lock is no issue for them. I don't > > > see > > > any other cli command that needs to be served before uevents or dm > > > events. > > >=20 > > > I haven't been able to test this on huge configurations with 1000s > > > of > > > LUNs, but I tested with artificial delays in checker loop, uevent > > > handlers, and dmevent handler, and lots of clients querying the > > > daemon > > > in parallel, and saw that clients were handled very nicely. Some > > > timeouts are inevitable (e.g. if the checker simply holds the lock > > > longer than the uxsock_timeout), but that is no regression. > > >=20 > > > Bottom line: I believe that because this patch reduces the busy- > > > wait > > > time, clients will be served more reliably and more quickly than > > > before > > > (more precisely: both average and standard deviation of the service > > > delay will be improved wrt before, and timeouts occur less > > > frequently). > > > I encourage everyone to experiment and see if reality shows that > > > I'm > > > wrong. > > >=20 > > > > I suppose the only way to deal with that would be to move the > > > > locking > > > > commands to a list handled by a separate thread, so that it could > > > > block > > > > without stalling the non-locking commands. > > >=20 > > > Not sure if I understand correctly, just in case: non-locking > > > commands > > > are never stalled with my patch. > >=20 > > I realize. I was saying that you could avoid starvation while still > > allowing non-locking commands to complete by moving the locking > > commands > > to a seperate thread, which did block on the lock. I didn't consider > > a > > ticketing system. Ideally, the checker loop would have the lowest > > priority, Since it isn't responding to any event, and ususally is > > just > > verifiying that nothing has changed.=A0 But you do make a good point > > that > > when we are getting a lot of events, and the uxlsnr loop has a chance > > of > > getting starved, we probably want to prioritize the event handling > > anyway. > >=20 >=20 > I have also thought about using additional threads for handling cli > commands. One could either use a single thread, similar to the udev > listener/dispatcher pair (your suggestion IIUC), or one thread per > (blocking) client. >=20 > Moving client handling into separate thread(s) avoids the complexity of > the state machine and the eventfd-based wakeup. But on the back side, > it introduces new multithreading-related complexity (of which we > already have our fair share).=A0Client tasks running lock(&vecs->lock) in > order to serve commands like "multipathd show paths" might now starve > event handling, which would be worse than vice versa, IMO. >=20 > Eventually, I found the idea of the poll/wakeup loop with no additional > threads more appealing, and more suitable for the task. But I admit > that it's a matter of personal taste. I tend to try to use pthreads as > little as possible ;-). >=20 > So how do we proceed?=20 I think your argument that we'll only risk starving the uxlsnr thread when it makes sense to prioritize other threads is a good one. So, I'm o.k. with this trylock() solution. -Ben >=20 > Regards, > Martin >=20 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel