All of lore.kernel.org
 help / color / mirror / Atom feed
* [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes)
@ 2019-01-25 14:50 Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die Ian Jackson
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Roger Pau Monné

This replaces the `DO NOT APPLY' patch 26/26 from the part 1 series.
It also contains some other fixes/improvements.

Ian Jackson (5):
  flight_otherjob: Use confess rather than die
  power: ssh: Fix handling of $delay
  power: ssh: Reduce timeout for script fragment
  power: ts-freebsd-host-install: Use power_reboot_attempts
  power: ssh: Wait for the target to appear to go down

 Osstest.pm              |  4 +++-
 Osstest/PDU/ssh.pm      |  4 ++--
 ts-freebsd-host-install | 33 +++++++++++++++++++++++++--------
 3 files changed, 30 insertions(+), 11 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die
  2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
@ 2019-01-25 14:50 ` Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 2/5] power: ssh: Fix handling of $delay Ian Jackson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

When this error trips it is usually because the call site looked up an
unset runvar and it can be hard to tell what that runvar was.

If we use confess we will at least find out the calling line number...

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 Osstest.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Osstest.pm b/Osstest.pm
index 85a6e78b..92b1a0ea 100644
--- a/Osstest.pm
+++ b/Osstest.pm
@@ -22,6 +22,7 @@ use warnings;
 use POSIX;
 use File::Basename;
 use IO::File;
+use Carp;
 
 BEGIN {
     use Exporter ();
@@ -370,7 +371,7 @@ sub flight_otherjob ($$) {
     my ($thisflight, $otherflightjob) = @_;    
     return $otherflightjob =~ m/^([^.]+)\.([^.]+)$/ ? ($1,$2) :
            $otherflightjob =~ m/^\.?([^.]+)$/ ? ($thisflight,$1) :
-           die "$otherflightjob ?";
+           confess "$otherflightjob ?";
 }
 
 sub other_revision_job_suffix ($$) {
@@ -444,3 +445,4 @@ sub show_abs_time ($) {
 }
 
 1;
+
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [OSSTEST PATCH 2/5] power: ssh: Fix handling of $delay
  2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die Ian Jackson
@ 2019-01-25 14:50 ` Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 3/5] power: ssh: Reduce timeout for script fragment Ian Jackson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

The script fragment contains a reference to $delay which is a perl
variable, not a variable in the script fragment.  We therefore need to
not ''-quote the script.

Without this, the ssh method will often fail spuriously: the exiting
parent (which will signal success back to the osstest controller)
races with the attempt to reboot.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index ac1eb919..cfcf8f85 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -47,7 +47,7 @@ sub pdu_power_state {
 
     my $delay = 5;
 
-    target_cmd_root($mo->{Host}, <<'END', 60);
+    target_cmd_root($mo->{Host}, <<END, 60);
  set -e
  type reboot
  exec >>/var/log/osstest-reboot.log
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [OSSTEST PATCH 3/5] power: ssh: Reduce timeout for script fragment
  2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 2/5] power: ssh: Fix handling of $delay Ian Jackson
@ 2019-01-25 14:50 ` Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts Ian Jackson
  2019-01-25 14:50 ` [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down Ian Jackson
  4 siblings, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

This is really not going to take a minute.  Probably, much less.
Waiting less long will save time when we fall back.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index cfcf8f85..16410937 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -47,7 +47,7 @@ sub pdu_power_state {
 
     my $delay = 5;
 
-    target_cmd_root($mo->{Host}, <<END, 60);
+    target_cmd_root($mo->{Host}, <<END, 30);
  set -e
  type reboot
  exec >>/var/log/osstest-reboot.log
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts
  2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
                   ` (2 preceding siblings ...)
  2019-01-25 14:50 ` [OSSTEST PATCH 3/5] power: ssh: Reduce timeout for script fragment Ian Jackson
@ 2019-01-25 14:50 ` Ian Jackson
  2019-01-25 17:16   ` Roger Pau Monné
  2019-01-25 14:50 ` [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down Ian Jackson
  4 siblings, 1 reply; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Roger Pau Monné

We look at the installer environment uptime, to
 | check that this is the installer environment
as requested by the comment
 | in particular $await must only succeed if the host really did
 | reboot into the boot environment that $await expects.
near the top of power_reboot_attempts

CC: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 ts-freebsd-host-install | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/ts-freebsd-host-install b/ts-freebsd-host-install
index 53daeefc..3c3e9c34 100755
--- a/ts-freebsd-host-install
+++ b/ts-freebsd-host-install
@@ -259,14 +259,31 @@ END
 }
 
 # Switch off, setup PXE and switch on to the installer
-power_state($ho, 0);
-setup_netboot_installer();
-power_cycle_sleep($ho);
-power_state($ho, 1);
-
-# Wait for the host to finish booting
-logm("Waiting for the installer to boot");
-await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+power_reboot_attempts($ho, sub {
+    setup_netboot_installer();
+}, sub {
+    # Wait for the host to finish booting
+    logm("Waiting for the installer to boot");
+    my $wait_start = time;
+    await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+
+    # We want to check that we actually rebooted.  We do this by
+    # comparing the (putative) installer environment's uptime,
+    # with the time we spent waiting for it to appear.
+    my $timeoutput = target_cmd_output_root($ho,
+        'date +%s; sysctl -n kern.boottime');
+    logm("got:\n$timeoutput");
+    $timeoutput =~ s{^(\d+)\n}{} or die "date: $timeoutput ?";
+    my $target_now = $1;
+    $timeoutput =~ m{\ssec\s?=\s?(\d+)\b} or die "sysctl: $timeoutput ?";
+    my $target_boottime = $1;
+
+    my $uptime = $target_now - $target_boottime;
+    my $elapsed = time - $wait_start;
+    logm("uptime=$uptime elapsed=$elapsed");
+    $uptime < $elapsed or die "uptime >= elapsed";
+
+}, undef, 'install');
 
 if ($bootonly) {
     hostprop_putative_record($ho, "MemdiskAppend", $memdisk_append)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down
  2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
                   ` (3 preceding siblings ...)
  2019-01-25 14:50 ` [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts Ian Jackson
@ 2019-01-25 14:50 ` Ian Jackson
  2019-01-25 17:22   ` Roger Pau Monné
  4 siblings, 1 reply; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 14:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Roger Pau Monné

When we `power on' with the ssh method, we actually run ssh reboot.

On some systems (notably, FreeBSD) the kernel does not simply reboot
immediately even with the runes we provide here, ie for FreeBSD
  reboot -nq
Eg, I have seen reboots with several messages like this:
  Jan 25 14:17:59.100044 Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done

This can result in the ssh method failing spuriously, because the
`power on' appears to complete while the host is still up in the
previous environment.  In one of my test runs I saw an ssh to the host
succeed, and print the uptime (of the existing environment), between
the reboot command being issued and the host actually rebooting.

So, wait (up to just over a minute) until the host does not respond to
ping.  (target_await_down runs ping -c 5.)

CC: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index 16410937..d68d3880 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -62,7 +62,7 @@ sub pdu_power_state {
  )&
 END
 
-    sleep($delay);
+    target_await_down($mo->{Host}, $delay + 70);
 }
 
 sub instantaneous {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts
  2019-01-25 14:50 ` [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts Ian Jackson
@ 2019-01-25 17:16   ` Roger Pau Monné
  0 siblings, 0 replies; 9+ messages in thread
From: Roger Pau Monné @ 2019-01-25 17:16 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Fri, Jan 25, 2019 at 02:50:45PM +0000, Ian Jackson wrote:
> We look at the installer environment uptime, to
>  | check that this is the installer environment
> as requested by the comment
>  | in particular $await must only succeed if the host really did
>  | reboot into the boot environment that $await expects.
> near the top of power_reboot_attempts
> 
> CC: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down
  2019-01-25 14:50 ` [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down Ian Jackson
@ 2019-01-25 17:22   ` Roger Pau Monné
  2019-01-25 17:29     ` Ian Jackson
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2019-01-25 17:22 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Fri, Jan 25, 2019 at 02:50:46PM +0000, Ian Jackson wrote:
> When we `power on' with the ssh method, we actually run ssh reboot.
> 
> On some systems (notably, FreeBSD) the kernel does not simply reboot
> immediately even with the runes we provide here, ie for FreeBSD
>   reboot -nq
> Eg, I have seen reboots with several messages like this:
>   Jan 25 14:17:59.100044 Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done

So it seems like reboot -nq doesn't behave as expected...

> This can result in the ssh method failing spuriously, because the
> `power on' appears to complete while the host is still up in the
> previous environment.  In one of my test runs I saw an ssh to the host
> succeed, and print the uptime (of the existing environment), between
> the reboot command being issued and the host actually rebooting.
> 
> So, wait (up to just over a minute) until the host does not respond to
> ping.  (target_await_down runs ping -c 5.)
> 
> CC: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down
  2019-01-25 17:22   ` Roger Pau Monné
@ 2019-01-25 17:29     ` Ian Jackson
  0 siblings, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2019-01-25 17:29 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

Roger Pau Monne writes ("Re: [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down"):
> On Fri, Jan 25, 2019 at 02:50:46PM +0000, Ian Jackson wrote:
> > On some systems (notably, FreeBSD) the kernel does not simply reboot
> > immediately even with the runes we provide here, ie for FreeBSD
> >   reboot -nq
> > Eg, I have seen reboots with several messages like this:
> >   Jan 25 14:17:59.100044 Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done
> 
> So it seems like reboot -nq doesn't behave as expected...

Mmm.  Ah well.

Thanks for the reviews.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-01-25 17:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25 14:50 [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes) Ian Jackson
2019-01-25 14:50 ` [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die Ian Jackson
2019-01-25 14:50 ` [OSSTEST PATCH 2/5] power: ssh: Fix handling of $delay Ian Jackson
2019-01-25 14:50 ` [OSSTEST PATCH 3/5] power: ssh: Reduce timeout for script fragment Ian Jackson
2019-01-25 14:50 ` [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts Ian Jackson
2019-01-25 17:16   ` Roger Pau Monné
2019-01-25 14:50 ` [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down Ian Jackson
2019-01-25 17:22   ` Roger Pau Monné
2019-01-25 17:29     ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.