All of lore.kernel.org
 help / color / mirror / Atom feed
* possible kernel/libxl race with xl network-attach
@ 2022-01-13 11:19 James Dingwall
  2022-01-13 16:11 ` Roger Pau Monné
  0 siblings, 1 reply; 7+ messages in thread
From: James Dingwall @ 2022-01-13 11:19 UTC (permalink / raw)
  To: xen-devel

Hi,

I have been trying to debug a problem where a vif with the backend in a 
driver domain is added to dom0.  Intermittently the hotplug script is 
not invoked by libxl (running as xl devd) in the driver domain.  By 
enabling some debug for the driver domain kernel and libxl I have these 
messages:

driver domain kernel (Ubuntu 5.4.0-92-generic):

[Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully created xenvif
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Initialising
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Connected
[Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> Connected

xl devd (Xen 4.14.3):

2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4
2022-01-13 01:39:51 UTC libxl: debug: libxl_aoutils.c:88:xswait_timeout_callback: backend /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered
2022-01-13 01:39:51 UTC libxl: error: libxl_device.c:1105:device_backend_callback: unable to add device with path /local/domain/2/backend/vif/0/0
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: device /local/domain/2/backend/vif/0/0 add failed
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy

the xenstore content for the backend:

# xenstore-ls /local/domain/2/backend/vif/0
0 = ""
 frontend = "/local/domain/0/device/vif/0"
 frontend-id = "0"
 online = "1"
 state = "4"
 script = "/etc/xen/scripts/vif-zynstra"
 vifname = "dom0.0"
 mac = "00:16:3e:6c:de:82"
 bridge = "cluster"
 handle = "0"
 type = "vif"
 feature-sg = "1"
 feature-gso-tcpv4 = "1"
 feature-gso-tcpv6 = "1"
 feature-ipv6-csum-offload = "1"
 feature-rx-copy = "1"
 feature-rx-flip = "0"
 feature-multicast-control = "1"
 feature-dynamic-multicast-control = "1"
 feature-split-event-channels = "1"
 multi-queue-max-queues = "2"
 feature-ctrl-ring = "1"
 hotplug-status = "connected"

My guess is that the libxl callback is started waiting for the backend 
state key to be set to XenbusStateInitWait (2) but the frontend in dom0 
has already triggered the backend to transition to XenbusStateConnected 
(4) and therefore it does not successfully complete.

Does this seem a reasonable explanation for the problem and what would 
the best approach to try and solve it?

Thanks,
James


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-24 13:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 11:19 possible kernel/libxl race with xl network-attach James Dingwall
2022-01-13 16:11 ` Roger Pau Monné
2022-01-21 13:34   ` James Dingwall
2022-01-21 14:00     ` Roger Pau Monné
2022-01-21 15:05       ` James Dingwall
2022-01-24  9:07         ` Roger Pau Monné
2022-01-24 13:38           ` James Dingwall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.