All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Error reporting for hotplug scripts
@ 2007-02-07  2:55 Daniel P. Berrange
  2007-02-07 14:35 ` Ian Pratt
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel P. Berrange @ 2007-02-07  2:55 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

The current way the hotplug scripts report errors is sub-optimal, rarely
giving the user enough information to diagnose the problem. This is 
particularly true of the networking scripts whose typical failure mode
is very obsecure. eg, if I configure a guest to use bridge7 which does
not exist, I will be told:

  # xm create demo
  ....60 seconds passes...
  Error: Device 0 (vif) could not be connected. Hotplug scripts not working.

Having looked at the hotplug scripts & the way things tie into XenD it looks
like we can do much better. Currently one of 3 things happens:

 - The script explicitly calls 'fatal' which sets the 'hotplug-status'
   field in xenstore to 'error'. The real error message is sent to
   syslog, which if you're lucky, may end up in /var/log/messages

 - A command exits with non-zero exit status, resulting in the signal
   trap being executed. This in turn calls 'fatal', with no useful
   error messages

 - For block devices where a sharing mode violation is detected, the
   'hotplug-status' is set to 'busy' and a real error message is also
   set in xenstored under the 'hotplug-error' field.

The latter case is the interesting one, because when it sees a 'busy' status
code, XenD will extract the actual error message from 'hotplug-error' and
feed it all the way back to the end user/tool. 

Basically I want the first two cases to also behave like the error, so that
whenever anything goes wrong in the hotplug scripts, the user always gets
a useful error message back. It also avoids having to wait for the 60 second
hotplug timeout.

The first case was easy to fix in this way - simply change XenD so that
it always looks for 'hotplug-error' for 'fatal' codes as well as 'busy'
codes. 

The second case is a little tricker because we have to identify commands
in the hotplug scripts which are likely to fail, and add explicit handling
to enable meaningful error feedback. I've done such analysis for the 
vif-bridge script and realized that the most likely cause of failure for
any command in this script is a missing bridge device. It is trivial to
do an upfront check for existance of the bridge device and immediately
feed an error back to the user. 

I'm using 'ip link show $bridge' to detect whether the bridge exists or
not. I expect this is reasonably portable, but as always its worth people
double-checking it works on their particular distro.

I'm attaching a patch which implements both of these fixups. The net
result is that now when creating a guest with a config refering to a
non-existant bridge device I see:

  # xm create demo
  Error: Device 0 (vif) could not be connected. Could not find bridge device xenbr7

There may well be other places in the hotplug scripts which need fixing
up - we can address these as they turn up if we still see people getting
the generic 'Hotplug scripts no working' message.

    Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

[-- Attachment #2: xen-hotplug-error-reporting.patch --]
[-- Type: text/plain, Size: 3957 bytes --]

diff -r e212203d7d34 tools/python/xen/xend/server/DevController.py
--- a/tools/python/xen/xend/server/DevController.py	Tue Feb 06 23:01:35 2007 +0000
+++ b/tools/python/xen/xend/server/DevController.py	Fri Feb 02 10:32:00 2007 -0500
@@ -153,9 +153,9 @@ class DevController:
         log.debug("Waiting for %s.", devid)
 
         if not self.hotplug:
-            return 
-        
-        status = self.waitForBackend(devid)
+            return
+
+        (status, err) = self.waitForBackend(devid)
 
         if status == Timeout:
             self.destroyDevice(devid, False)
@@ -165,25 +165,22 @@ class DevController:
 
         elif status == Error:
             self.destroyDevice(devid, False)
-            raise VmError("Device %s (%s) could not be connected. "
-                          "Backend device not found." %
-                          (devid, self.deviceClass))
-
+            if err is None:
+                raise VmError("Device %s (%s) could not be connected. "
+                              "Backend device not found." %
+                              (devid, self.deviceClass))
+            else:
+                raise VmError("Device %s (%s) could not be connected. "
+                              "%s" % (devid, self.deviceClass, err))
         elif status == Missing:
             # Don't try to destroy the device; it's already gone away.
             raise VmError("Device %s (%s) could not be connected. "
                           "Device not found." % (devid, self.deviceClass))
 
         elif status == Busy:
-            err = None
-            frontpath = self.frontendPath(devid)
-            backpath = xstransact.Read(frontpath, "backend")
-            if backpath:
-                err = xstransact.Read(backpath, HOTPLUG_ERROR_NODE)
-            if not err:
+            self.destroyDevice(devid, False)
+            if err is None:
                 err = "Busy."
-                
-            self.destroyDevice(devid, False)
             raise VmError("Device %s (%s) could not be connected.\n%s" %
                           (devid, self.deviceClass, err))
 
@@ -478,17 +475,21 @@ class DevController:
         frontpath = self.frontendPath(devid)
         backpath = xstransact.Read(frontpath, "backend")
 
+
         if backpath:
             statusPath = backpath + '/' + HOTPLUG_STATUS_NODE
             ev = Event()
             result = { 'status': Timeout }
-            
+
             xswatch(statusPath, hotplugStatusCallback, ev, result)
 
             ev.wait(DEVICE_CREATE_TIMEOUT)
-            return result['status']
-        else:
-            return Missing
+
+            err = xstransact.Read(backpath, HOTPLUG_ERROR_NODE)
+
+            return (result['status'], err)
+        else:
+            return (Missing, None)
 
 
     def backendPath(self, backdom, devid):
diff -r e212203d7d34 tools/examples/vif-bridge
--- a/tools/examples/vif-bridge	Tue Feb 06 23:01:35 2007 +0000
+++ b/tools/examples/vif-bridge	Tue Feb 06 21:04:31 2007 -0500
@@ -46,6 +46,13 @@ then
   fi
 fi
 
+RET=0
+ip link show $bridge 1>/dev/null 2>&1 || RET=1
+if [ "$RET" -eq 1 ]
+then
+    fatal "Could not find bridge device $bridge"
+fi
+
 case "$command" in
     online)
 	setup_bridge_port "$vif"
diff -r e212203d7d34 tools/examples/xen-hotplug-common.sh
--- a/tools/examples/xen-hotplug-common.sh	Tue Feb 06 23:01:35 2007 +0000
+++ b/tools/examples/xen-hotplug-common.sh	Tue Feb 06 21:03:49 2007 -0500
@@ -28,14 +28,15 @@ unset $(set | grep ^LC_ | cut -d= -f1)
 unset $(set | grep ^LC_ | cut -d= -f1)
 
 fatal() {
-  xenstore_write "$XENBUS_PATH"/hotplug-status error
+  xenstore_write "$XENBUS_PATH/hotplug-error" "$*" \
+                 "$XENBUS_PATH/hotplug-status" error
   log err "$@"
   exit 1
 }
 
 success() {
   # Tell DevController that backend is "connected"
-  xenstore_write "$XENBUS_PATH"/hotplug-status connected
+  xenstore_write "$XENBUS_PATH/hotplug-status" connected
 }
 
 do_or_die() {

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: [PATCH] Error reporting for hotplug scripts
  2007-02-07  2:55 [PATCH] Error reporting for hotplug scripts Daniel P. Berrange
@ 2007-02-07 14:35 ` Ian Pratt
  0 siblings, 0 replies; 2+ messages in thread
From: Ian Pratt @ 2007-02-07 14:35 UTC (permalink / raw)
  To: Daniel P. Berrange, xen-devel

> The current way the hotplug scripts report errors is sub-optimal,
rarely
> giving the user enough information to diagnose the problem. This is
> particularly true of the networking scripts whose typical failure mode
> is very obsecure. 
...
> I'm attaching a patch which implements both of these fixups. 

That's useful progress, thanks!

Please can folk using some other distros try this patch out and provide
feedback.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-02-07 14:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07  2:55 [PATCH] Error reporting for hotplug scripts Daniel P. Berrange
2007-02-07 14:35 ` Ian Pratt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.