From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1EA50C433F5 for ; Thu, 13 Jan 2022 11:20:15 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.257079.441541 (Exim 4.92) (envelope-from ) id 1n7y93-0003kR-Tp; Thu, 13 Jan 2022 11:19:53 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 257079.441541; Thu, 13 Jan 2022 11:19:53 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1n7y93-0003kK-Qr; Thu, 13 Jan 2022 11:19:53 +0000 Received: by outflank-mailman (input) for mailman id 257079; Thu, 13 Jan 2022 11:19:52 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1n7y92-0003kD-CB for xen-devel@lists.xenproject.org; Thu, 13 Jan 2022 11:19:52 +0000 Received: from smarthost01c.ixn.mail.zen.net.uk (smarthost01c.ixn.mail.zen.net.uk [212.23.1.22]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id b7f374fe-7462-11ec-bcf3-e9554a921baa; Thu, 13 Jan 2022 12:19:48 +0100 (CET) Received: from [217.155.64.189] (helo=mail0.xen.dingwall.me.uk) by smarthost01c.ixn.mail.zen.net.uk with esmtpsa (TLS1.0:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1n7y8y-0002Gu-68 for xen-devel@lists.xenproject.org; Thu, 13 Jan 2022 11:19:48 +0000 Received: from localhost (localhost [IPv6:::1]) by mail0.xen.dingwall.me.uk (Postfix) with ESMTP id C70AE271F4B for ; Thu, 13 Jan 2022 11:19:47 +0000 (GMT) Received: from mail0.xen.dingwall.me.uk ([127.0.0.1]) by localhost (mail0.xen.dingwall.me.uk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9UnyDRkFJgoY for ; Thu, 13 Jan 2022 11:19:47 +0000 (GMT) Received: from ghoul.dingwall.me.uk (ghoul.dingwall.me.uk [192.168.1.200]) by dingwall.me.uk (Postfix) with ESMTP id 977AE271F48 for ; Thu, 13 Jan 2022 11:19:46 +0000 (GMT) Received: by ghoul.dingwall.me.uk (Postfix, from userid 1000) id 82A7988A; Thu, 13 Jan 2022 11:19:46 +0000 (GMT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: b7f374fe-7462-11ec-bcf3-e9554a921baa X-Virus-Scanned: Debian amavisd-new at dingwall.me.uk Date: Thu, 13 Jan 2022 11:19:46 +0000 From: James Dingwall To: xen-devel@lists.xenproject.org Subject: possible kernel/libxl race with xl network-attach Message-ID: <20220113111946.GA4133739@dingwall.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Originating-smarthost01c-IP: [217.155.64.189] Feedback-ID: 217.155.64.189 Hi, I have been trying to debug a problem where a vif with the backend in a driver domain is added to dom0. Intermittently the hotplug script is not invoked by libxl (running as xl devd) in the driver domain. By enabling some debug for the driver domain kernel and libxl I have these messages: driver domain kernel (Ubuntu 5.4.0-92-generic): [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully created xenvif [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Initialising [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Connected [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> Connected xl devd (Xen 4.14.3): 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:51 UTC libxl: debug: libxl_aoutils.c:88:xswait_timeout_callback: backend /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state) 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 timed out 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: error: libxl_device.c:1105:device_backend_callback: unable to add device with path /local/domain/2/backend/vif/0/0 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: device /local/domain/2/backend/vif/0/0 add failed 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy the xenstore content for the backend: # xenstore-ls /local/domain/2/backend/vif/0 0 = "" frontend = "/local/domain/0/device/vif/0" frontend-id = "0" online = "1" state = "4" script = "/etc/xen/scripts/vif-zynstra" vifname = "dom0.0" mac = "00:16:3e:6c:de:82" bridge = "cluster" handle = "0" type = "vif" feature-sg = "1" feature-gso-tcpv4 = "1" feature-gso-tcpv6 = "1" feature-ipv6-csum-offload = "1" feature-rx-copy = "1" feature-rx-flip = "0" feature-multicast-control = "1" feature-dynamic-multicast-control = "1" feature-split-event-channels = "1" multi-queue-max-queues = "2" feature-ctrl-ring = "1" hotplug-status = "connected" My guess is that the libxl callback is started waiting for the backend state key to be set to XenbusStateInitWait (2) but the frontend in dom0 has already triggered the backend to transition to XenbusStateConnected (4) and therefore it does not successfully complete. Does this seem a reasonable explanation for the problem and what would the best approach to try and solve it? Thanks, James