From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8204EC4361B for ; Thu, 17 Dec 2020 18:27:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 52BDD238E8 for ; Thu, 17 Dec 2020 18:27:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730220AbgLQS1E (ORCPT ); Thu, 17 Dec 2020 13:27:04 -0500 Received: from esa2.hc3370-68.iphmx.com ([216.71.145.153]:44561 "EHLO esa2.hc3370-68.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727368AbgLQS1E (ORCPT ); Thu, 17 Dec 2020 13:27:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=citrix.com; s=securemail; t=1608229623; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=0JnxpV5kSnzrC5FbbL8YNG1KAvQnfLMRnXpPN2OIrj0=; b=MS3qYUNMQpDO8xkDeo3i1YyqqQwGGm3BY/7rUGBqTxZ18mhFqO4d1DtD EaQEQ+MqC9R25/4B8rqbA/Jc7cENKYQvyisHEvhEQLd4CORqTPTwYfnAj dn61t+5LkkSl7salj3absNdhLCA8lvLqq332FtDFYSxf6MgfDRvWgm1E2 Y=; Authentication-Results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none IronPort-SDR: 00tXMVKz+bqsvoJATU8UhkG0eBARLmv5IpviZufx/bZc56Toigz1exXkaq/60al/HMtc0VNN9g k1QKMN0UehcqDT62F4WnbUjWK+u/kwbUPTkrS1h/l3YEhMC+8o7G9iRsiXIsYI9DydJJGpDWyd QSC5FwiSno+b9sn3UcJbmC9EQ1kdZMvZmZZJh+jJ/YWqaqU9+gLhru6eCHDx8A6/NMalhCeFkQ CwWXMpJnNZqRs2epeeF571ULEE5+T1kMsOrBOWiF3YoWDij5uJKUfkJ71GQKwF5sOV5fwk9E13 3Zs= X-SBRS: 5.2 X-MesageID: 33507226 X-Ironport-Server: esa2.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.78,428,1599537600"; d="scan'208";a="33507226" Subject: Re: [PATCH v2] xen/xenbus: make xs_talkv() interruptible To: =?UTF-8?B?SsO8cmdlbiBHcm/Dnw==?= , , CC: Boris Ostrovsky , Stefano Stabellini References: <20201215111055.3810-1-jgross@suse.com> <2deac9ce-0c27-a472-7d51-b91a640d92ed@citrix.com> <8d26b752-b7ba-159f-5bed-bb015a06d819@suse.com> From: Andrew Cooper Message-ID: <2414c191-ff55-e446-b555-c9d0ccca6b93@citrix.com> Date: Thu, 17 Dec 2020 18:25:52 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <8d26b752-b7ba-159f-5bed-bb015a06d819@suse.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-GB X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To FTLPEX02CL03.citrite.net (10.13.108.165) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16/12/2020 08:21, Jürgen Groß wrote: > On 15.12.20 21:59, Andrew Cooper wrote: >> On 15/12/2020 11:10, Juergen Gross wrote: >>> In case a process waits for any Xenstore action in the xenbus driver >>> it should be interruptible by signals. >>> >>> Signed-off-by: Juergen Gross >>> --- >>> V2: >>> - don't special case SIGKILL as libxenstore is handling -EINTR fine >>> --- >>>   drivers/xen/xenbus/xenbus_xs.c | 9 ++++++++- >>>   1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/xen/xenbus/xenbus_xs.c >>> b/drivers/xen/xenbus/xenbus_xs.c >>> index 3a06eb699f33..17c8f8a155fd 100644 >>> --- a/drivers/xen/xenbus/xenbus_xs.c >>> +++ b/drivers/xen/xenbus/xenbus_xs.c >>> @@ -205,8 +205,15 @@ static bool test_reply(struct xb_req_data *req) >>>     static void *read_reply(struct xb_req_data *req) >>>   { >>> +    int ret; >>> + >>>       do { >>> -        wait_event(req->wq, test_reply(req)); >>> +        ret = wait_event_interruptible(req->wq, test_reply(req)); >>> + >>> +        if (ret == -ERESTARTSYS && signal_pending(current)) { >>> +            req->msg.type = XS_ERROR; >>> +            return ERR_PTR(-EINTR); >>> +        } >> >> So now I can talk fully about the situations which lead to this, I think >> there is a bit more complexity. >> >> It turns out there are a number of issues related to running a Xen >> system with no xenstored. >> >> 1) If a xenstore-write occurs during startup before init-xenstore-domain >> runs, the former blocks on /dev/xen/xenbus waiting for xenstored to >> reply, while the latter blocks on /dev/xen/xenbus_backend when trying to >> tell the dom0 kernel that xenstored is in dom1.  This effectively >> deadlocks the system. > > This should be easy to solve: any request to /dev/xen/xenbus should > block upfront in case xenstored isn't up yet (could e.g. wait > interruptible until xenstored_ready is non-zero). I'm not sure that that would fix the problem.  The problem is that setting the ring details via /dev/xen/xenbus_backend blocks, which prevents us launching the xenstored stubdomain, which prevents the earlier xenbus write being completed. So long as /dev/xen/xenbus_backend doesn't block, there's no problem with other /dev/xen/xenbus activity being pending briefly. Looking at the current logic, I'm not completely convinced.  Even finding a filled-in evtchn/gfn doesn't mean that xenstored is actually ready. There are 3 possible cases. 1) PV guest, and details in start_info 2) HVM guest, and details in HVM_PARAMs 3) No details (expected for dom0).  Something in userspace must provide details at a later point. So the setup phases go from nothing, to having ring details, to finding the ring working. I think it would be prudent to try reading a key between having details and declaring the xenstored_ready.  Any activity, even XS_ERROR, indicates that the other end of the ring is listening. > >> 2) If xenstore-watch is running when xenstored dies, it spins at 100% >> cpu usage making no system calls at all.  This is caused by bad error >> handling from xs_watch(), and attempting to debug found: > > Can you expand on "bad error handling from xs_watch()", please? do_watch() has     for ( ... ) { // defaults to an infinite loop         vec = xs_read_watch();         if (vec == NULL)             continue;         ...     } My next plan was to experiment with break instead of continue, which I'll get to at some point. > >> >> 3) (this issue).  If anyone starts xenstore-watch with no xenstored >> running at all, it blocks in D in the kernel. > > Should be handled with solution for 1). > >> >> The cause is the special handling for watch/unwatch commands which, >> instead of just queuing up the data for xenstore, explicitly waits for >> an OK for registering the watch.  This causes a write() system call to >> block waiting for a non-existent entity to reply. >> >> So while this patch does resolve the major usability issue I found (I >> can't even SIGINT and get my terminal back), I think there are issues. >> >> The reason why XS_WATCH/XS_UNWATCH are special cased is because they do >> require special handling.  The main kernel thread for processing >> incoming data from xenstored does need to know how to associate each >> async XS_WATCH_EVENT to the caller who watched the path. >> >> Therefore, depending on when this cancellation hits, we might be in any >> of the following states: >> >> 1) the watch is queued in the kernel, but not even sent to xenstored yet >> 2) the watch is queued in the xenstored ring, but not acted upon >> 3) the watch is queued in the xenstored ring, and the xenstored has seen >> it but not replied yet >> 4) the watch has been processed, but the XS_WATCH reply hasn't been >> received yet >> 5) the watch has been processed, and the XS_WATCH reply received >> >> State 5 (and a little bit) is the normal success path when xenstored has >> acted upon the request, and the internal kernel infrastructure is set up >> appropriately to handle XS_WATCH_EVENTs. >> >> States 1 and 2 can be very common if there is no xenstored (or at least, >> it hasn't started up yet).  In reality, there is either no xenstored, or >> it is up and running (and for a period of time during system startup, >> these cases occur in sequence). > > Yes. this is the reason we can't just reject a user request if xenstored > hasn't been detected yet: it could be just starting. Right, and I'm not suggesting that we'd want to reject accesses while xenstored is starting up. > >> >> As soon as the XS_WATCH event has been written into the xenstored ring, >> it is not safe to cancel.  You've committed to xenstored processing the >> request (if it is up). > > I'm not sure this is true. Cancelling it might result in a stale watch > in xenstored, but there shouldn't be a problem related to that. In case > that watch fires the event will normally be discarded by the kernel as > no matching watch is found in the kernel's data. In case a new watch > has been setup with the same struct xenbus_watch address (which is used > as the token), then this new watch might fire without the node of the > new watch having changed, but spurious watch events are defined to be > okay (OTOH the path in the event might look strange to the handler). Watches are a quota'd resource in (at least some) xenstored configurations.  Losing track of the registration is a resource leak, even if the kernel can filter and discard the unexpected watch events. >> If xenstored is actually up and running, its fine and necessary to >> block.  The request will be processed in due course (timing subject to >> the client and server load).  If xenstored isn't up, blocking isn't ok. >> >> Therefore, I think we need to distinguish "not yet on the ring" from "on >> the ring", as our distinction as to whether cancelling is safe, and >> ensure we don't queue anything on the ring before we're sure xenstored >> has started up. >> >> Does this make sense? > > Basically, yes. Great.  If I get any time, I'll try to look into some fixes along the above lines. ~Andrew