All of lore.kernel.org
 help / color / mirror / Atom feed
* [XTF PATCH] xtf-runner: fix two synchronisation issues
@ 2016-07-29 12:07 Wei Liu
  2016-07-29 12:43 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Wei Liu @ 2016-07-29 12:07 UTC (permalink / raw)
  To: Xen-devel; +Cc: andrew.cooper3, Wei Liu, Ian Jackson

There were two synchronisation issues for the old code:

1. There was no guarantee that guest console was ready before "xl
   console" invocation.
2. There was no guarantee that runner wouldn't not exit before all test
   guests were gone.

For xtf-runner to be suitable to run in an automatic testing system like
osstest, we need to eliminate those two issues, otherwise we risk seeing
failures that are caused by races.

To fix these two issues:

1. Use "xl create -F" to force the process that creates a guest to
   stay until the guest is dead. With this we can wait for that
   particular subprocess in runner to make sure the guest is gone.
2. Poll xenstore guest console node to make sure console is available
   before "xl console" invocation.

To avoid the runner loops forever in that console checking loop, use
SIGALRM to make it time out after 5 seconds.

Redirect output that we're not interested in to DEVNULL.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xtf-runner | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/xtf-runner b/xtf-runner
index 66b2fb7..3296cdf 100755
--- a/xtf-runner
+++ b/xtf-runner
@@ -7,7 +7,7 @@ xtf-runner - A utility for enumerating and running XTF tests.
 Currently assumes the presence and availability of the `xl` toolstack.
 """
 
-import sys, os, os.path as path
+import sys, os, os.path as path, signal
 
 from optparse import OptionParser
 from subprocess import Popen, PIPE, call as subproc_call, check_output
@@ -17,6 +17,11 @@ try:
 except ImportError:
     import simplejson as json
 
+try:
+    from subprocess import DEVNULL
+except ImportError:
+    DEVNULL = open(os.devnull, 'wb')
+
 # All results of a test, keep in sync with C code report.h.
 # Note that warning is not a result on its own.
 all_results = ('SUCCESS', 'SKIP', 'ERROR', 'FAILURE')
@@ -41,6 +46,10 @@ all_environments = pv_environments + hvm_environments
 class RunnerError(Exception):
     """ Errors relating to xtf-runner itself """
 
+def sigalrm_handler(signum, frame):
+    """ Signal handler for SIGALRM """
+    raise RunnerError("Operation timed out")
+
 def open_test_info():
     """ Open and collate each test-info.json """
 
@@ -132,24 +141,53 @@ def list_tests(args):
 
         print name
 
+# A list of processes that we need to wait until they exits.
+wait_list = []
 
 def run_test(test):
     """ Run a specific test """
 
+    console_path = '/local/domain/0/backend/console'
+    # Setup watch for console
+    cmd = ['xenstore-watch', console_path]
+    print "Executing '%s'" % (" ".join(cmd), )
+    xs_watch = Popen(cmd, stdout = PIPE, stderr = PIPE)
+
     _, _, name = test.split('-', 2)
 
     cfg = path.join("tests", name, test + ".cfg")
 
-    cmd = ['xl', 'create', '-p', cfg]
+    cmd = ['xl', 'create', '-Fp', cfg]
 
     print "Executing '%s'" % (" ".join(cmd), )
-    rc = subproc_call(cmd)
-    if rc:
-        raise RunnerError("Failed to create VM")
+    wait = Popen(cmd, stdout = DEVNULL, stderr = DEVNULL)
+    wait_list.append(wait)
+
+    # Wait up to 5 seconds for console to show up
+    signal.alarm(5)
+    while True:
+        l = xs_watch.stdout.readline()
+        domid = l[len(console_path)+1:].split('/')[0]
+        if domid == '': continue
+
+        cmd = ['xenstore-read', '/local/domain/'+domid+'/name']
+        print "Executing '%s'" % (" ".join(cmd), )
+        gname = check_output(cmd).splitlines()[0]
+        if gname != test: continue
+
+        cmd = ['xenstore-read', '/local/domain/'+domid+'/console/tty']
+        print "Executing '%s'" % (" ".join(cmd), )
+        con = check_output(cmd).splitlines()[0]
+        if con != '': break
+
+    # Tear down watch and timer
+    signal.alarm(0)
+    xs_watch.kill()
 
     cmd = ['xl', 'console', test]
     print "Executing '%s'" % (" ".join(cmd), )
     console = Popen(cmd, stdout = PIPE)
+    wait_list.append(console)
 
     cmd = ['xl', 'unpause', test]
     print "Executing '%s'" % (" ".join(cmd), )
@@ -327,12 +365,17 @@ def main():
     opts, args = parser.parse_args()
 
     if opts.list_tests:
-        return list_tests(args)
+        ret = list_tests(args)
     else:
-        return run_tests(args)
+        ret = run_tests(args)
+
+    for child in wait_list: child.wait()
+
+    return ret
 
 
 if __name__ == "__main__":
+    signal.signal(signal.SIGALRM, sigalrm_handler)
     try:
         sys.exit(main())
     except RunnerError, e:
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
@ 2016-07-29 12:43 ` Andrew Cooper
  2016-07-29 12:58   ` Wei Liu
  2016-07-29 13:27 ` [XTF PATCH] xtf-runner: fix two synchronisation issues Andrew Cooper
  2016-07-29 14:21 ` Ian Jackson
  2 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 12:43 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Ian Jackson

On 29/07/16 13:07, Wei Liu wrote:
> There were two synchronisation issues for the old code:
>
> 1. There was no guarantee that guest console was ready before "xl
>    console" invocation.
> 2. There was no guarantee that runner wouldn't not exit before all test
>    guests were gone.

Sorry, but I can't parse this.

The runner existing before xl has torn down the guest is very
deliberate, because some part of hvm guests is terribly slow to tear
down; waiting synchronously for teardown tripled the wallclock time to
run a load of tests back-to-back.

> @@ -132,24 +141,53 @@ def list_tests(args):
>  
>          print name
>  
> +# A list of processes that we need to wait until they exits.
> +wait_list = []
>  
>  def run_test(test):
>      """ Run a specific test """
>  
> +    console_path = '/local/domain/0/backend/console'
> +    # Setup watch for console
> +    cmd = ['xenstore-watch', console_path]
> +    print "Executing '%s'" % (" ".join(cmd), )
> +    xs_watch = Popen(cmd, stdout = PIPE, stderr = PIPE)
> +
>      _, _, name = test.split('-', 2)
>  
>      cfg = path.join("tests", name, test + ".cfg")
>  
> -    cmd = ['xl', 'create', '-p', cfg]
> +    cmd = ['xl', 'create', '-Fp', cfg]
>  
>      print "Executing '%s'" % (" ".join(cmd), )
> -    rc = subproc_call(cmd)
> -    if rc:
> -        raise RunnerError("Failed to create VM")
> +    wait = Popen(cmd, stdout = DEVNULL, stderr = DEVNULL)

I would name this "create" rather than "wait"

> +    wait_list.append(wait)
> +
> +    # Wait up to 5 seconds for console to show up
> +    signal.alarm(5)
> +    while True:
> +        l = xs_watch.stdout.readline()
> +        domid = l[len(console_path)+1:].split('/')[0]
> +        if domid == '': continue

Please put continues and breaks on newlines.

if not domid:
    continue

> +
> +        cmd = ['xenstore-read', '/local/domain/'+domid+'/name']
> +        print "Executing '%s'" % (" ".join(cmd), )
> +        gname = check_output(cmd).splitlines()[0]
> +        if gname != test: continue
> +
> +        cmd = ['xenstore-read', '/local/domain/'+domid+'/console/tty']
> +        print "Executing '%s'" % (" ".join(cmd), )
> +        con = check_output(cmd).splitlines()[0]
> +        if con != '': break

Somewhere in this loop, you should poll the call to xl create.  In the
case that there is an xl configuration error, this setup will wait for 5
seconds, then discard all trace of the error.

> +
> +    # Tear down watch and timer
> +    signal.alarm(0)
> +    xs_watch.kill()
>  
>      cmd = ['xl', 'console', test]
>      print "Executing '%s'" % (" ".join(cmd), )
>      console = Popen(cmd, stdout = PIPE)
> +    wait_list.append(console)
>  
>      cmd = ['xl', 'unpause', test]
>      print "Executing '%s'" % (" ".join(cmd), )
> @@ -327,12 +365,17 @@ def main():
>      opts, args = parser.parse_args()
>  
>      if opts.list_tests:
> -        return list_tests(args)
> +        ret = list_tests(args)
>      else:
> -        return run_tests(args)
> +        ret = run_tests(args)
> +
> +    for child in wait_list: child.wait()

On a newline please.

You should use stdout=PIPE, stderr=STDOUT (to link stdout and stderr to
the same fd) and .communicate() to get the results.  In any case that a
child exists non-zero, you mustn't discard the error.  As a result, I
don't think you need DEVNULL any more.

~Andrew

> +
> +    return ret
>  
>  
>  if __name__ == "__main__":
> +    signal.signal(signal.SIGALRM, sigalrm_handler)
>      try:
>          sys.exit(main())
>      except RunnerError, e:


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:43 ` Andrew Cooper
@ 2016-07-29 12:58   ` Wei Liu
  2016-07-29 13:06     ` Andrew Cooper
  2016-07-29 14:31     ` Ian Jackson
  0 siblings, 2 replies; 45+ messages in thread
From: Wei Liu @ 2016-07-29 12:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Ian Jackson

On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> On 29/07/16 13:07, Wei Liu wrote:
> > There were two synchronisation issues for the old code:
> >
> > 1. There was no guarantee that guest console was ready before "xl
> >    console" invocation.
> > 2. There was no guarantee that runner wouldn't not exit before all test

s/not//

> >    guests were gone.
> 
> Sorry, but I can't parse this.
> 
> The runner existing before xl has torn down the guest is very
> deliberate, because some part of hvm guests is terribly slow to tear
> down; waiting synchronously for teardown tripled the wallclock time to
> run a load of tests back-to-back.
> 

Then you won't know if a guest is leaked or it is being slowly destroyed
when a dead guest shows up in the snapshot of 'xl list'.

Also consider that would make back-to-back tests that happen to have a
guest that has the same name as the one in previous test fail.

I don't think getting blocked for a few more seconds is a big issue.
It's is important to eliminate such race conditions so that osstest can
work properly.

> > @@ -132,24 +141,53 @@ def list_tests(args):
> >  
> >          print name
> >  
> > +# A list of processes that we need to wait until they exits.
> > +wait_list = []
> >  
> >  def run_test(test):
> >      """ Run a specific test """
> >  
> > +    console_path = '/local/domain/0/backend/console'
> > +    # Setup watch for console
> > +    cmd = ['xenstore-watch', console_path]
> > +    print "Executing '%s'" % (" ".join(cmd), )
> > +    xs_watch = Popen(cmd, stdout = PIPE, stderr = PIPE)
> > +
> >      _, _, name = test.split('-', 2)
> >  
> >      cfg = path.join("tests", name, test + ".cfg")
> >  
> > -    cmd = ['xl', 'create', '-p', cfg]
> > +    cmd = ['xl', 'create', '-Fp', cfg]
> >  
> >      print "Executing '%s'" % (" ".join(cmd), )
> > -    rc = subproc_call(cmd)
> > -    if rc:
> > -        raise RunnerError("Failed to create VM")
> > +    wait = Popen(cmd, stdout = DEVNULL, stderr = DEVNULL)
> 
> I would name this "create" rather than "wait"
> 

NP.

> > +    wait_list.append(wait)
> > +
> > +    # Wait up to 5 seconds for console to show up
> > +    signal.alarm(5)
> > +    while True:
> > +        l = xs_watch.stdout.readline()
> > +        domid = l[len(console_path)+1:].split('/')[0]
> > +        if domid == '': continue
> 
> Please put continues and breaks on newlines.
> 
> if not domid:
>     continue

Fixed.

> 
> > +
> > +        cmd = ['xenstore-read', '/local/domain/'+domid+'/name']
> > +        print "Executing '%s'" % (" ".join(cmd), )
> > +        gname = check_output(cmd).splitlines()[0]
> > +        if gname != test: continue
> > +
> > +        cmd = ['xenstore-read', '/local/domain/'+domid+'/console/tty']
> > +        print "Executing '%s'" % (" ".join(cmd), )
> > +        con = check_output(cmd).splitlines()[0]
> > +        if con != '': break
> 
> Somewhere in this loop, you should poll the call to xl create.  In the
> case that there is an xl configuration error, this setup will wait for 5
> seconds, then discard all trace of the error.
> 

Right. I will see what I can do here.

> > +
> > +    # Tear down watch and timer
> > +    signal.alarm(0)
> > +    xs_watch.kill()
> >  
> >      cmd = ['xl', 'console', test]
> >      print "Executing '%s'" % (" ".join(cmd), )
> >      console = Popen(cmd, stdout = PIPE)
> > +    wait_list.append(console)
> >  
> >      cmd = ['xl', 'unpause', test]
> >      print "Executing '%s'" % (" ".join(cmd), )
> > @@ -327,12 +365,17 @@ def main():
> >      opts, args = parser.parse_args()
> >  
> >      if opts.list_tests:
> > -        return list_tests(args)
> > +        ret = list_tests(args)
> >      else:
> > -        return run_tests(args)
> > +        ret = run_tests(args)
> > +
> > +    for child in wait_list: child.wait()
> 
> On a newline please.
> 
> You should use stdout=PIPE, stderr=STDOUT (to link stdout and stderr to
> the same fd) and .communicate() to get the results.  In any case that a
> child exists non-zero, you mustn't discard the error.  As a result, I
> don't think you need DEVNULL any more.
> 

OK, this makes sense.

Wei.

> ~Andrew
> 
> > +
> > +    return ret
> >  
> >  
> >  if __name__ == "__main__":
> > +    signal.signal(signal.SIGALRM, sigalrm_handler)
> >      try:
> >          sys.exit(main())
> >      except RunnerError, e:
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:58   ` Wei Liu
@ 2016-07-29 13:06     ` Andrew Cooper
  2016-07-29 13:12       ` Wei Liu
  2016-07-29 14:31     ` Ian Jackson
  1 sibling, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 13:06 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Ian Jackson

On 29/07/16 13:58, Wei Liu wrote:
> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>> On 29/07/16 13:07, Wei Liu wrote:
>>> There were two synchronisation issues for the old code:
>>>
>>> 1. There was no guarantee that guest console was ready before "xl
>>>    console" invocation.
>>> 2. There was no guarantee that runner wouldn't not exit before all test
> s/not//
>
>>>    guests were gone.
>> Sorry, but I can't parse this.
>>
>> The runner existing before xl has torn down the guest is very
>> deliberate, because some part of hvm guests is terribly slow to tear
>> down; waiting synchronously for teardown tripled the wallclock time to
>> run a load of tests back-to-back.
>>
> Then you won't know if a guest is leaked or it is being slowly destroyed
> when a dead guest shows up in the snapshot of 'xl list'.
>
> Also consider that would make back-to-back tests that happen to have a
> guest that has the same name as the one in previous test fail.

test names are globally unique, so this isn't an issue.

Also, the wait for `xl console` to complete shows that @releasedomain
has been fired for the domain.

>
> I don't think getting blocked for a few more seconds is a big issue.
> It's is important to eliminate such race conditions so that osstest can
> work properly.

Amortising the teardown cost in the background is fine (which is what
your "for child in wait_list" ends up doing), but an extra few seconds
per test is very important to be avoided for manual use of XTF.

>>> +
>>> +        cmd = ['xenstore-read', '/local/domain/'+domid+'/name']
>>> +        print "Executing '%s'" % (" ".join(cmd), )
>>> +        gname = check_output(cmd).splitlines()[0]
>>> +        if gname != test: continue
>>> +
>>> +        cmd = ['xenstore-read', '/local/domain/'+domid+'/console/tty']
>>> +        print "Executing '%s'" % (" ".join(cmd), )
>>> +        con = check_output(cmd).splitlines()[0]
>>> +        if con != '': break
>> Somewhere in this loop, you should poll the call to xl create.  In the
>> case that there is an xl configuration error, this setup will wait for 5
>> seconds, then discard all trace of the error.
>>
> Right. I will see what I can do here.

Popen().poll() seems to do what you want.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 13:06     ` Andrew Cooper
@ 2016-07-29 13:12       ` Wei Liu
  2016-07-29 13:23         ` Andrew Cooper
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-07-29 13:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Ian Jackson

On Fri, Jul 29, 2016 at 02:06:56PM +0100, Andrew Cooper wrote:
> On 29/07/16 13:58, Wei Liu wrote:
> > On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> >> On 29/07/16 13:07, Wei Liu wrote:
> >>> There were two synchronisation issues for the old code:
> >>>
> >>> 1. There was no guarantee that guest console was ready before "xl
> >>>    console" invocation.
> >>> 2. There was no guarantee that runner wouldn't not exit before all test
> > s/not//
> >
> >>>    guests were gone.
> >> Sorry, but I can't parse this.
> >>
> >> The runner existing before xl has torn down the guest is very
> >> deliberate, because some part of hvm guests is terribly slow to tear
> >> down; waiting synchronously for teardown tripled the wallclock time to
> >> run a load of tests back-to-back.
> >>
> > Then you won't know if a guest is leaked or it is being slowly destroyed
> > when a dead guest shows up in the snapshot of 'xl list'.
> >
> > Also consider that would make back-to-back tests that happen to have a
> > guest that has the same name as the one in previous test fail.
> 
> test names are globally unique, so this isn't an issue.
> 
> Also, the wait for `xl console` to complete shows that @releasedomain
> has been fired for the domain.
> 

Are you suggesting waiting for "xl console" only is good enough?

> >
> > I don't think getting blocked for a few more seconds is a big issue.
> > It's is important to eliminate such race conditions so that osstest can
> > work properly.
> 
> Amortising the teardown cost in the background is fine (which is what
> your "for child in wait_list" ends up doing), but an extra few seconds
> per test is very important to be avoided for manual use of XTF.
> 

So let's default to not wait and add an option to wait. Osstest will use
that option.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 13:12       ` Wei Liu
@ 2016-07-29 13:23         ` Andrew Cooper
  2016-07-29 13:26           ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 13:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Ian Jackson

On 29/07/16 14:12, Wei Liu wrote:
> On Fri, Jul 29, 2016 at 02:06:56PM +0100, Andrew Cooper wrote:
>> On 29/07/16 13:58, Wei Liu wrote:
>>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>>>> On 29/07/16 13:07, Wei Liu wrote:
>>>>> There were two synchronisation issues for the old code:
>>>>>
>>>>> 1. There was no guarantee that guest console was ready before "xl
>>>>>    console" invocation.
>>>>> 2. There was no guarantee that runner wouldn't not exit before all test
>>> s/not//
>>>
>>>>>    guests were gone.
>>>> Sorry, but I can't parse this.
>>>>
>>>> The runner existing before xl has torn down the guest is very
>>>> deliberate, because some part of hvm guests is terribly slow to tear
>>>> down; waiting synchronously for teardown tripled the wallclock time to
>>>> run a load of tests back-to-back.
>>>>
>>> Then you won't know if a guest is leaked or it is being slowly destroyed
>>> when a dead guest shows up in the snapshot of 'xl list'.
>>>
>>> Also consider that would make back-to-back tests that happen to have a
>>> guest that has the same name as the one in previous test fail.
>> test names are globally unique, so this isn't an issue.
>>
>> Also, the wait for `xl console` to complete shows that @releasedomain
>> has been fired for the domain.
>>
> Are you suggesting waiting for "xl console" only is good enough?

The "stdout, _ = console.communicate()" line waits for `xl console` to
exit.  (In fact, `xl` exec()'s `xenconosle`)

As we never put a CTRL-] into stdin, `xl console` only exits when the
PTY shuts, which is when xenconsoled decided the domain has terminated.

So, yes - I think this is safe.

>
>>> I don't think getting blocked for a few more seconds is a big issue.
>>> It's is important to eliminate such race conditions so that osstest can
>>> work properly.
>> Amortising the teardown cost in the background is fine (which is what
>> your "for child in wait_list" ends up doing), but an extra few seconds
>> per test is very important to be avoided for manual use of XTF.
>>
> So let's default to not wait and add an option to wait. Osstest will use
> that option.

The code you currently have should be fine (in this respect).  This way,
you are tearing down a dying domain at the same time as starting the
next one up, which is fine.

What I want to avoid is waiting for complete teardown before starting
the next domain up.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 13:23         ` Andrew Cooper
@ 2016-07-29 13:26           ` Wei Liu
  0 siblings, 0 replies; 45+ messages in thread
From: Wei Liu @ 2016-07-29 13:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Ian Jackson

On Fri, Jul 29, 2016 at 02:23:20PM +0100, Andrew Cooper wrote:
> On 29/07/16 14:12, Wei Liu wrote:
> > On Fri, Jul 29, 2016 at 02:06:56PM +0100, Andrew Cooper wrote:
> >> On 29/07/16 13:58, Wei Liu wrote:
> >>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> >>>> On 29/07/16 13:07, Wei Liu wrote:
> >>>>> There were two synchronisation issues for the old code:
> >>>>>
> >>>>> 1. There was no guarantee that guest console was ready before "xl
> >>>>>    console" invocation.
> >>>>> 2. There was no guarantee that runner wouldn't not exit before all test
> >>> s/not//
> >>>
> >>>>>    guests were gone.
> >>>> Sorry, but I can't parse this.
> >>>>
> >>>> The runner existing before xl has torn down the guest is very
> >>>> deliberate, because some part of hvm guests is terribly slow to tear
> >>>> down; waiting synchronously for teardown tripled the wallclock time to
> >>>> run a load of tests back-to-back.
> >>>>
> >>> Then you won't know if a guest is leaked or it is being slowly destroyed
> >>> when a dead guest shows up in the snapshot of 'xl list'.
> >>>
> >>> Also consider that would make back-to-back tests that happen to have a
> >>> guest that has the same name as the one in previous test fail.
> >> test names are globally unique, so this isn't an issue.
> >>
> >> Also, the wait for `xl console` to complete shows that @releasedomain
> >> has been fired for the domain.
> >>
> > Are you suggesting waiting for "xl console" only is good enough?
> 
> The "stdout, _ = console.communicate()" line waits for `xl console` to
> exit.  (In fact, `xl` exec()'s `xenconosle`)
> 
> As we never put a CTRL-] into stdin, `xl console` only exits when the
> PTY shuts, which is when xenconsoled decided the domain has terminated.
> 
> So, yes - I think this is safe.
> 

TBH I would rather to be explicit in this regard. I would rather stick
with checking "xl create".

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
  2016-07-29 12:43 ` Andrew Cooper
@ 2016-07-29 13:27 ` Andrew Cooper
  2016-07-29 14:21 ` Ian Jackson
  2 siblings, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 13:27 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Ian Jackson

On 29/07/16 13:07, Wei Liu wrote:
>  
>  if __name__ == "__main__":
> +    signal.signal(signal.SIGALRM, sigalrm_handler)
>      try:
>          sys.exit(main())
>      except RunnerError, e:

One final thing before I forget, this addition should be in main(),
rather than here.

This bit of code is just to allow main() to work per its normal
expectations.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
  2016-07-29 12:43 ` Andrew Cooper
  2016-07-29 13:27 ` [XTF PATCH] xtf-runner: fix two synchronisation issues Andrew Cooper
@ 2016-07-29 14:21 ` Ian Jackson
  2016-07-29 14:25   ` Wei Liu
  2016-07-29 14:26   ` Andrew Cooper
  2 siblings, 2 replies; 45+ messages in thread
From: Ian Jackson @ 2016-07-29 14:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, andrew.cooper3

Wei Liu writes ("[XTF PATCH] xtf-runner: fix two synchronisation issues"):
> There were two synchronisation issues for the old code:
> 
> 1. There was no guarantee that guest console was ready before "xl
>    console" invocation.

Is this not a bug in xl console ?

> 2. Poll xenstore guest console node to make sure console is available
>    before "xl console" invocation.

Users of things like xl shouldn't need to prat about in xenstore too.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:21 ` Ian Jackson
@ 2016-07-29 14:25   ` Wei Liu
  2016-07-29 14:35     ` Ian Jackson
  2016-07-29 14:26   ` Andrew Cooper
  1 sibling, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-07-29 14:25 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, andrew.cooper3

On Fri, Jul 29, 2016 at 03:21:52PM +0100, Ian Jackson wrote:
> Wei Liu writes ("[XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > There were two synchronisation issues for the old code:
> > 
> > 1. There was no guarantee that guest console was ready before "xl
> >    console" invocation.
> 
> Is this not a bug in xl console ?
> 

I don't think so. It gives up when it can't get tty from xenstore. I
think that's reasonable.

> > 2. Poll xenstore guest console node to make sure console is available
> >    before "xl console" invocation.
> 
> Users of things like xl shouldn't need to prat about in xenstore too.
> 

What does this mean?

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:21 ` Ian Jackson
  2016-07-29 14:25   ` Wei Liu
@ 2016-07-29 14:26   ` Andrew Cooper
  1 sibling, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 14:26 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu; +Cc: Xen-devel

On 29/07/16 15:21, Ian Jackson wrote:
> Wei Liu writes ("[XTF PATCH] xtf-runner: fix two synchronisation issues"):
>> There were two synchronisation issues for the old code:
>>
>> 1. There was no guarantee that guest console was ready before "xl
>>    console" invocation.
> Is this not a bug in xl console ?
>
>> 2. Poll xenstore guest console node to make sure console is available
>>    before "xl console" invocation.
> Users of things like xl shouldn't need to prat about in xenstore too.

Ideally, what xtf-runner wants is for `xl create -c $FOO.cfg` to work as
one would logically intend, but it doesn't.  It does a create, then
falls into the weeds trying to locate the console for a domain which has
already exited.

As a workaround, xtf-runner currently does

`xl create -p $foo.cfg`
`xl console $test &`
`xl unpause $text`
wait on xl console to complete.

but it turns out that this even isn't good enough.

xl should definitely be fixed, but xtf-runner needs to be able to cope
with older broken xl as well.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 12:58   ` Wei Liu
  2016-07-29 13:06     ` Andrew Cooper
@ 2016-07-29 14:31     ` Ian Jackson
  2016-07-29 14:55       ` Wei Liu
                         ` (3 more replies)
  1 sibling, 4 replies; 45+ messages in thread
From: Ian Jackson @ 2016-07-29 14:31 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> > The runner existing before xl has torn down the guest is very
> > deliberate, because some part of hvm guests is terribly slow to tear
> > down; waiting synchronously for teardown tripled the wallclock time to
> > run a load of tests back-to-back.
> 
> Then you won't know if a guest is leaked or it is being slowly destroyed
> when a dead guest shows up in the snapshot of 'xl list'.
> 
> Also consider that would make back-to-back tests that happen to have a
> guest that has the same name as the one in previous test fail.
> 
> I don't think getting blocked for a few more seconds is a big issue.
> It's is important to eliminate such race conditions so that osstest can
> work properly.

IMO the biggest reason for waiting for teardown is that that will make
it possible to accurately identify the xtf test which was responsible
for the failure if a test reveals a bug which causes problems for the
whole host.

Suppose there is a test T1 which, in buggy hypervisors, creates an
anomalous data structure, such that the hypervisor crashes when T1's
guest is finally torn down.

If we start to run the next test T2 immediately we see success output
from T1, we will observe the host crashing "due to T2", and T1 would
be regarded as having succeeded.

This is why in an in-person conversation with Wei yesterday I
recommended that osstest should after each xtf test (i) wait for
everything to be torn down and (ii) then check that the dom0 is still
up.  (And these two activities are regarded as part of the preceding
test step.)

If this leads to over-consumption of machine resources because this
serialisation is too slow then the right approach would be explicit
parallelisation in osstest.  That would still mean that in the
scenario above, T1 would be regarded as having failed, because T1
wouldn't be regarded as having passed until osstest had seen that all
of T1's cleanup had been done and the host was still up.  (T2 would
_also_ be regarded as failed, and that might look like a heisenbug,
but that would be tolerable.)

Wei: I need to check what happens with multiple failing test steps in
the same job.  Specifically, I need to check which one the bisector
is likely to try to attack.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:25   ` Wei Liu
@ 2016-07-29 14:35     ` Ian Jackson
  2016-07-29 14:46       ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2016-07-29 14:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, andrew.cooper3

Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
> On Fri, Jul 29, 2016 at 03:21:52PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("[XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > > There were two synchronisation issues for the old code:
> > > 
> > > 1. There was no guarantee that guest console was ready before "xl
> > >    console" invocation.
> > 
> > Is this not a bug in xl console ?
> 
> I don't think so. It gives up when it can't get tty from xenstore. I
> think that's reasonable.

If you say
  xl create /etc/xen/foo.cfg
  xl console foo
then surely you shouldn't find that xl console fails because of some
race.

> > > 2. Poll xenstore guest console node to make sure console is available
> > >    before "xl console" invocation.
> > 
> > Users of things like xl shouldn't need to prat about in xenstore too.
> 
> What does this mean?

I mean that xenstore is part of the implementation of libxl, not part
of its public interface.  In this sense it is a bit like libxc.
libxl callers should (in general) not look at xenstore - and of
course they should therefore;2~ not need to do so.

If the xtf runner needs to look in xenstore (other than perhaps if
it's doing strange things there as part of its test cases) then that
means there is some interface or capability missing in libxl.

OTOH I'm slightly confused because I was under the impression that
there was a polling loop in xl console (xenconsole) already.

Maybe I don't understand your problem.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:35     ` Ian Jackson
@ 2016-07-29 14:46       ` Wei Liu
  0 siblings, 0 replies; 45+ messages in thread
From: Wei Liu @ 2016-07-29 14:46 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, andrew.cooper3

On Fri, Jul 29, 2016 at 03:35:06PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > On Fri, Jul 29, 2016 at 03:21:52PM +0100, Ian Jackson wrote:
> > > Wei Liu writes ("[XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > > > There were two synchronisation issues for the old code:
> > > > 
> > > > 1. There was no guarantee that guest console was ready before "xl
> > > >    console" invocation.
> > > 
> > > Is this not a bug in xl console ?
> > 
> > I don't think so. It gives up when it can't get tty from xenstore. I
> > think that's reasonable.
> 
> If you say
>   xl create /etc/xen/foo.cfg
>   xl console foo
> then surely you shouldn't find that xl console fails because of some
> race.
> 

What if the guest just doesn't have / want a console? In that case there
will never be a tty node in xenstore, so xl console should wait forever
for one? What is the expectation for this case?

> > > > 2. Poll xenstore guest console node to make sure console is available
> > > >    before "xl console" invocation.
> > > 
> > > Users of things like xl shouldn't need to prat about in xenstore too.
> > 
> > What does this mean?
> 
> I mean that xenstore is part of the implementation of libxl, not part
> of its public interface.  In this sense it is a bit like libxc.
> libxl callers should (in general) not look at xenstore - and of
> course they should therefore;2~ not need to do so.
> 

In this particular case, all nodes that xtf-runner code reles on are
documented, so I already consider them public interfaces.

> If the xtf runner needs to look in xenstore (other than perhaps if
> it's doing strange things there as part of its test cases) then that
> means there is some interface or capability missing in libxl.
> 

I don't disagree.

But the missing capability won't be backported to older version of Xen.
I don't want to eliminate the possibility for running xtf on older
versions of Xen.

> OTOH I'm slightly confused because I was under the impression that
> there was a polling loop in xl console (xenconsole) already.
> 
> Maybe I don't understand your problem.
> 

The issue is that when it tries to get tty from xenstore and there isn't
one, it gives up. Your polling loop is something else.

http://osstest.xs.citrite.net/~osstest/testlogs/logs/66855/test-xtf-amd64-amd64-1/10.ts-xtf-fep.log

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:31     ` Ian Jackson
@ 2016-07-29 14:55       ` Wei Liu
  2016-07-29 16:18         ` Ian Jackson
  2016-07-29 15:05       ` Andrew Cooper
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-07-29 14:55 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Fri, Jul 29, 2016 at 03:31:30PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> > > The runner existing before xl has torn down the guest is very
> > > deliberate, because some part of hvm guests is terribly slow to tear
> > > down; waiting synchronously for teardown tripled the wallclock time to
> > > run a load of tests back-to-back.
> > 
> > Then you won't know if a guest is leaked or it is being slowly destroyed
> > when a dead guest shows up in the snapshot of 'xl list'.
> > 
> > Also consider that would make back-to-back tests that happen to have a
> > guest that has the same name as the one in previous test fail.
> > 
> > I don't think getting blocked for a few more seconds is a big issue.
> > It's is important to eliminate such race conditions so that osstest can
> > work properly.
> 
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.
> 
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
> 
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
> 
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up.  (And these two activities are regarded as part of the preceding
> test step.)
> 
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest.  That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up.  (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)
> 
> Wei: I need to check what happens with multiple failing test steps in
> the same job.  Specifically, I need to check which one the bisector
> is likely to try to attack.
> 

Yes. I think my current code can meet both you and Andrew's
requirement.

1. The runner waits for all tests to finish, which amortise the clean up
   time. This is what Andrew needs.
2. In osstest, we run one test case at a time. So "all tests" is only
   one test. This is what you need.

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:31     ` Ian Jackson
  2016-07-29 14:55       ` Wei Liu
@ 2016-07-29 15:05       ` Andrew Cooper
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
  2016-08-01 14:04       ` [XTF PATCH] xtf-runner: use xl create -Fc directly Wei Liu
  3 siblings, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 15:05 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu; +Cc: Xen-devel

On 29/07/16 15:31, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>>> The runner existing before xl has torn down the guest is very
>>> deliberate, because some part of hvm guests is terribly slow to tear
>>> down; waiting synchronously for teardown tripled the wallclock time to
>>> run a load of tests back-to-back.
>> Then you won't know if a guest is leaked or it is being slowly destroyed
>> when a dead guest shows up in the snapshot of 'xl list'.
>>
>> Also consider that would make back-to-back tests that happen to have a
>> guest that has the same name as the one in previous test fail.
>>
>> I don't think getting blocked for a few more seconds is a big issue.
>> It's is important to eliminate such race conditions so that osstest can
>> work properly.
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.

That is perfectly reasonable.

>
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
>
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
>
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up.  (And these two activities are regarded as part of the preceding
> test step.)

That is also my understanding of how the intended OSSTest integration is
going to work.

OSSTest asks `./xtf-runner --list` for all tests, then iterates over all
tests, running them one at a time, with suitable liveness checks
inbetween.  This is not using xtf-runner's ability to run multiple tests
back to back.


The dev usecase on the other hand is for something like, for checking a
test case refactoring or new bit of functionality.

$ ./xtf-runner sefltest
<snip>
Combined test results:
test-pv64-selftest                       SUCCESS
test-pv32pae-selftest                    SUCCESS
test-hvm64-selftest                      SUCCESS
test-hvm32pae-selftest                   SUCCESS
test-hvm32pse-selftest                   SUCCESS
test-hvm32-selftest                      SUCCESS


FWIW, I have just put a synchronous wait in to demonstrate.

Without wait:

$ time ./xtf-runner sefltest
<snip>

real    0m0.571s
user    0m0.060s
sys    0m0.228s

With wait:
$ time ./xtf-runner sefltest
<snip>

real    0m8.870s
user    0m0.048s
sys    0m0.280s


That is more than 8 wallclock seconds elapsed where nothing useful is
happening from the point of view of a human using ./xtf-runner.  All of
this time is spent between @releaseDomain and `xl create -F` finally
exiting.

>
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest.  That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up.  (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)

OSSTest shouldn't run multiple tests at once, and I have taken exactly
the same decision for XenRT.  Easy identification of what went bang is
the most important properly in these cases.

We are going to have to get to a vast test library before the wallclock
time of XTF tests approaches anything similar to installing a VM from
scratch.  I am not worried at the moment.

>
> Wei: I need to check what happens with multiple failing test steps in
> the same job.  Specifically, I need to check which one the bisector
> is likely to try to attack.

For individual XTF tests, it is entirely possible that every failure is
from a different change, so should be treated individually.

Having said that, it is also quite likely that, given a lot of similar
microckernels, one hypervisor bug would take a large number out at once,
and we really don't want to bisect each individual XTF test.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 14:55       ` Wei Liu
@ 2016-07-29 16:18         ` Ian Jackson
  2016-07-29 16:35           ` Andrew Cooper
  2016-07-29 16:41           ` Wei Liu
  0 siblings, 2 replies; 45+ messages in thread
From: Ian Jackson @ 2016-07-29 16:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

The three of us had an IRL conversation.  Here is what I think we
agreed:

* We intend to make the XTF runner capable of reading
  xenconsoled-created logfiles.  Both XenRT and osstest configure
  xenconsoled appropriately.

  The xtf runner will need to

     - on each test, wait for the test domain to shutdown, and then

     - look backwards through the xenconsoled logfile for the
       indication that the domain started (eg, a banner printed by the
       domain, or message from xenconsoled), and parse the relevant
       output there.  (This is needed because starting the same-named
       domain multiple times results in concatenations of the console
       output in the xenconsoled guest output logfile.)

     - arrange for the guest to be preserved on crash (at least)
       so that if the domain crashed, we don't risk parsing the
       output from a previous run.  Instead we can see it having
       crashed and report that as a failure.

* We intend to make `xl create -c' work to find all of the output,
  by (i) starting the domain paused (ii) spawning xenconsoled
  (iii) awaiting a new startup indication from xenconsoled
  (iv) unpausing the domain.  This will be done in xen-unstable.

  I propose the following startup protocol: xl runs
     xenconsole --startup-notify-fd=FD
  where FD is the writing end of a pipe.  xl waits for xenconsole to
  write the byte 0x00 to the FD.  If xenconsoled crashes, xl can tell
  by the EOF on the pipe.  (This approach saves xl from having to try
  to wait for either SIGCHLD or fd readability.)

  (The systemd startup notification protocol is too complex to
  reimplement and would therefore introduce a dependency on
  libsystemd's sd_notify, which would be awkward.  There is also the
  upstart SIGSTOP protocol but it could interact badly with an
  interactive user who uses ^Z.)

  The xtf runner would also be able to use `xl create -c' and simply
  expect to see all the console output.

  (We also discussed making xenconsole print something to its stdout
  or stderr when it has successfully connected, and when it
  disconnects.  While we're editing xenconsole it would probably be
  nice to do this, but with our plans it's not needed for XTF.)

As a result, the xtf runner can be used with a default install of
xen-unstable.  For older versions of Xen it will be necessary to
reconfigure xenconsoled, if the user wants to get reliable pass/fail
reports from the xtf runner.

* We discussed changing xenconsoled to not tear down the guest console
  until the guest is destroyed, rather than already tearing it down
  when the guest is shut down.  This is not now needed for the above,
  but I still think it would be nice.  However it is done, it should
  arrange that `xl console' doesn't hang waiting for further output
  from a crashed or shutdown domain (but perhaps should wait for
  output from a rebooting domain?).  It would probably be a good idea
  to put this work item in a bucket with `overhaul the console stuff'.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 16:18         ` Ian Jackson
@ 2016-07-29 16:35           ` Andrew Cooper
  2016-07-29 16:41           ` Wei Liu
  1 sibling, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2016-07-29 16:35 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu; +Cc: Xen-devel

On 29/07/16 17:18, Ian Jackson wrote:
> The three of us had an IRL conversation.  Here is what I think we
> agreed:
>
> * We intend to make the XTF runner capable of reading
>   xenconsoled-created logfiles.  Both XenRT and osstest configure
>   xenconsoled appropriately.
>
>   The xtf runner will need to
>
>      - on each test, wait for the test domain to shutdown, and then
>
>      - look backwards through the xenconsoled logfile for the
>        indication that the domain started (eg, a banner printed by the
>        domain, or message from xenconsoled), and parse the relevant
>        output there.  (This is needed because starting the same-named
>        domain multiple times results in concatenations of the console
>        output in the xenconsoled guest output logfile.)
>
>      - arrange for the guest to be preserved on crash (at least)
>        so that if the domain crashed, we don't risk parsing the
>        output from a previous run.  Instead we can see it having
>        crashed and report that as a failure.

I suppose it is worth noting that this is by far and away the easiest
solution (we can think of) which also works with older versions of Xen.

>
> * We intend to make `xl create -c' work to find all of the output,
>   by (i) starting the domain paused (ii) spawning xenconsoled
>   (iii) awaiting a new startup indication from xenconsoled
>   (iv) unpausing the domain.  This will be done in xen-unstable.
>
>   I propose the following startup protocol: xl runs
>      xenconsole --startup-notify-fd=FD
>   where FD is the writing end of a pipe.  xl waits for xenconsole to
>   write the byte 0x00 to the FD.  If xenconsoled crashes, xl can tell
>   by the EOF on the pipe.  (This approach saves xl from having to try
>   to wait for either SIGCHLD or fd readability.)
>
>   (The systemd startup notification protocol is too complex to
>   reimplement and would therefore introduce a dependency on
>   libsystemd's sd_notify, which would be awkward.  There is also the
>   upstart SIGSTOP protocol but it could interact badly with an
>   interactive user who uses ^Z.)
>
>   The xtf runner would also be able to use `xl create -c' and simply
>   expect to see all the console output.
>
>   (We also discussed making xenconsole print something to its stdout
>   or stderr when it has successfully connected, and when it
>   disconnects.  While we're editing xenconsole it would probably be
>   nice to do this, but with our plans it's not needed for XTF.)
>
> As a result, the xtf runner can be used with a default install of
> xen-unstable.  For older versions of Xen it will be necessary to
> reconfigure xenconsoled, if the user wants to get reliable pass/fail
> reports from the xtf runner.

I expect that the common usecase will be that people will develop tests
against unstable, and only automated test systems will be running tests
against older versions.

I don't expect it to be common for a human to need to develop tests
against older versions, but if someone does need to, there is at least a
way of doing so.

>
> * We discussed changing xenconsoled to not tear down the guest console
>   until the guest is destroyed, rather than already tearing it down
>   when the guest is shut down.  This is not now needed for the above,
>   but I still think it would be nice.  However it is done, it should
>   arrange that `xl console' doesn't hang waiting for further output
>   from a crashed or shutdown domain (but perhaps should wait for
>   output from a rebooting domain?).  It would probably be a good idea
>   to put this work item in a bucket with `overhaul the console stuff'.

+1.

As for the rest of the email, this matches my understanding from the
conversation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
  2016-07-29 16:18         ` Ian Jackson
  2016-07-29 16:35           ` Andrew Cooper
@ 2016-07-29 16:41           ` Wei Liu
  1 sibling, 0 replies; 45+ messages in thread
From: Wei Liu @ 2016-07-29 16:41 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, Xen-devel

Thanks for writing this up.

The whole email described accurately what we discussed.

On Fri, Jul 29, 2016 at 05:18:10PM +0100, Ian Jackson wrote:
> The three of us had an IRL conversation.  Here is what I think we
> agreed:
> 
> * We intend to make the XTF runner capable of reading
>   xenconsoled-created logfiles.  Both XenRT and osstest configure
>   xenconsoled appropriately.
> 
>   The xtf runner will need to
> 
>      - on each test, wait for the test domain to shutdown, and then
> 
>      - look backwards through the xenconsoled logfile for the
>        indication that the domain started (eg, a banner printed by the
>        domain, or message from xenconsoled), and parse the relevant
>        output there.  (This is needed because starting the same-named
>        domain multiple times results in concatenations of the console
>        output in the xenconsoled guest output logfile.)
> 
>      - arrange for the guest to be preserved on crash (at least)
>        so that if the domain crashed, we don't risk parsing the
>        output from a previous run.  Instead we can see it having
>        crashed and report that as a failure.
> 
> * We intend to make `xl create -c' work to find all of the output,
>   by (i) starting the domain paused (ii) spawning xenconsoled
>   (iii) awaiting a new startup indication from xenconsoled
>   (iv) unpausing the domain.  This will be done in xen-unstable.
> 
>   I propose the following startup protocol: xl runs
>      xenconsole --startup-notify-fd=FD
>   where FD is the writing end of a pipe.  xl waits for xenconsole to
>   write the byte 0x00 to the FD.  If xenconsoled crashes, xl can tell
>   by the EOF on the pipe.  (This approach saves xl from having to try
>   to wait for either SIGCHLD or fd readability.)
> 
>   (The systemd startup notification protocol is too complex to
>   reimplement and would therefore introduce a dependency on
>   libsystemd's sd_notify, which would be awkward.  There is also the
>   upstart SIGSTOP protocol but it could interact badly with an
>   interactive user who uses ^Z.)
> 
>   The xtf runner would also be able to use `xl create -c' and simply
>   expect to see all the console output.
> 
>   (We also discussed making xenconsole print something to its stdout
>   or stderr when it has successfully connected, and when it
>   disconnects.  While we're editing xenconsole it would probably be
>   nice to do this, but with our plans it's not needed for XTF.)
> 

FTR, s/xenconsoled/xenconsole/ in this part.

I'm going to ditch this patch and look into implementing this. With this
I can ditch all the crazy gross hack in this patch.

> As a result, the xtf runner can be used with a default install of
> xen-unstable.  For older versions of Xen it will be necessary to
> reconfigure xenconsoled, if the user wants to get reliable pass/fail
> reports from the xtf runner.
> 
> * We discussed changing xenconsoled to not tear down the guest console
>   until the guest is destroyed, rather than already tearing it down
>   when the guest is shut down.  This is not now needed for the above,
>   but I still think it would be nice.  However it is done, it should
>   arrange that `xl console' doesn't hang waiting for further output
>   from a crashed or shutdown domain (but perhaps should wait for
>   output from a rebooting domain?).  It would probably be a good idea
>   to put this work item in a bucket with `overhaul the console stuff'.
> 
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 0/8] Fix console synchronisation issues
  2016-07-29 14:31     ` Ian Jackson
  2016-07-29 14:55       ` Wei Liu
  2016-07-29 15:05       ` Andrew Cooper
@ 2016-08-01 13:16       ` Wei Liu
  2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
                           ` (7 more replies)
  2016-08-01 14:04       ` [XTF PATCH] xtf-runner: use xl create -Fc directly Wei Liu
  3 siblings, 8 replies; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

As discussed, this series implements the xen console client startup protocol.
Somewhat related, I need to fix libxl_console_exec as well.

With this series:

1. libxl will wait for tty node to show up before exec'ing console client.
2. xl will wait for console client to be ready before unpausing guest.

Document the protocol and do so clean up as well.

RFC because I haven't paied much attention to error handling etc.

Wei.

Wei Liu (8):
  tools/console: fix help string in client
  tools/console: introduce --start-notify-fd option for console client
  libxl: factor out libxl__console_tty_path
  libxl: wait up to 5s in libxl_console_exec for xenconsoled
  libxl: libxl_{primary_,}console_exec now take notify_fd argument
  docs: document xenconsole startup protocol
  xl: use xenconsole startup protocol
  tools/console: remove 5s bodge in console client

 docs/misc/console.txt       |   6 +++
 tools/console/client/main.c |  65 +++++++++++++++-------------
 tools/libxl/libxl.c         | 100 +++++++++++++++++++++++++++++++++-----------
 tools/libxl/libxl.h         |  43 +++++++++++++++++--
 tools/libxl/xl_cmdimpl.c    |  28 +++++++++++--
 5 files changed, 182 insertions(+), 60 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 1/8] tools/console: fix help string in client
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:40           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
                           ` (6 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

There is no short '-t' option.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/console/client/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index f660e10..be39700 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -76,7 +76,7 @@ static void usage(const char *program) {
 	       "\n"
 	       "  -h, --help       display this help and exit\n"
 	       "  -n, --num N      use console number N\n"
-	       "  -t, --type TYPE  console type. must be 'pv' or 'serial'\n"
+	       "  --type TYPE      console type. must be 'pv' or 'serial'\n"
 	       , program);
 }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
  2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:43           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
                           ` (5 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

The console client will write 0x00 to that fd before entering console
loop to indicate its readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/console/client/main.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index be39700..2751eb7 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -77,6 +77,7 @@ static void usage(const char *program) {
 	       "  -h, --help       display this help and exit\n"
 	       "  -n, --num N      use console number N\n"
 	       "  --type TYPE      console type. must be 'pv' or 'serial'\n"
+	       "  --start-notify-fd N file descriptor used to notify parent\n"
 	       , program);
 }
 
@@ -327,10 +328,12 @@ int main(int argc, char **argv)
 	int ch;
 	unsigned int num = 0;
 	int opt_ind=0;
+	int start_notify_fd = -1;
 	struct option lopt[] = {
 		{ "type",     1, 0, 't' },
 		{ "num",     1, 0, 'n' },
 		{ "help",    0, 0, 'h' },
+		{ "start-notify-fd", 1, 0, 's' },
 		{ 0 },
 
 	};
@@ -364,6 +367,9 @@ int main(int argc, char **argv)
 				exit(EINVAL);
 			}
 			break;
+		case 's':
+			start_notify_fd = atoi(optarg);
+			break;
 		default:
 			fprintf(stderr, "Invalid argument\n");
 			fprintf(stderr, "Try `%s --help' for more information.\n", 
@@ -462,6 +468,15 @@ int main(int argc, char **argv)
 		init_term(STDIN_FILENO, &stdin_old_attr);
 		atexit(restore_term_stdin); /* if this fails, oh dear */
 	}
+
+	if (start_notify_fd != -1) {
+		const char msg[] = { 0x00 };
+		if (write(start_notify_fd, msg, 1) != 1)
+			err(errno, "Could not notify parent with fd %d",
+			    start_notify_fd);
+		close(start_notify_fd);
+	}
+
 	console_loop(spty, xs, path, interactive);
 
 	free(path);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
  2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
  2016-08-01 13:16         ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:44           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
                           ` (4 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

No user yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.c | 57 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 760ebca..0bc805e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1742,6 +1742,40 @@ static void domain_destroy_domid_cb(libxl__egc *egc,
     dis->callback(egc, dis, rc);
 }
 
+static int libxl__console_tty_path(libxl__gc *gc, uint32_t domid, int cons_num,
+                                   libxl_console_type type, char **tty_path)
+{
+    int rc;
+    char *dom_path;
+
+    dom_path = libxl__xs_get_dompath(gc, domid);
+    if (!dom_path) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    switch (type) {
+    case LIBXL_CONSOLE_TYPE_SERIAL:
+        *tty_path = GCSPRINTF("%s/serial/0/tty", dom_path);
+        rc = 0;
+        break;
+    case LIBXL_CONSOLE_TYPE_PV:
+        if (cons_num == 0)
+            *tty_path = GCSPRINTF("%s/console/tty", dom_path);
+        else
+            *tty_path = GCSPRINTF("%s/device/console/%d/tty", dom_path,
+                                  cons_num);
+        rc = 0;
+        break;
+    default:
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+out:
+    return rc;
+}
+
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
                        libxl_console_type type)
 {
@@ -1773,30 +1807,13 @@ int libxl_console_get_tty(libxl_ctx *ctx, uint32_t domid, int cons_num,
                           libxl_console_type type, char **path)
 {
     GC_INIT(ctx);
-    char *dom_path;
     char *tty_path;
     char *tty;
     int rc;
 
-    dom_path = libxl__xs_get_dompath(gc, domid);
-    if (!dom_path) {
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    switch (type) {
-    case LIBXL_CONSOLE_TYPE_SERIAL:
-        tty_path = GCSPRINTF("%s/serial/0/tty", dom_path);
-        break;
-    case LIBXL_CONSOLE_TYPE_PV:
-        if (cons_num == 0)
-            tty_path = GCSPRINTF("%s/console/tty", dom_path);
-        else
-            tty_path = GCSPRINTF("%s/device/console/%d/tty", dom_path,
-                                cons_num);
-        break;
-    default:
-        rc = ERROR_INVAL;
+    rc = libxl__console_tty_path(gc, domid, cons_num, type, &tty_path);
+    if (rc) {
+        LOG(ERROR, "Failed to get tty path for domain %d\n", domid);
         goto out;
     }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
                           ` (2 preceding siblings ...)
  2016-08-01 13:16         ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:48           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
                           ` (3 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

Wait until the tty node is available before exec'ing xenconsole.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0bc805e..a611b29 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1780,10 +1780,14 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
                        libxl_console_type type)
 {
     GC_INIT(ctx);
+    int rc;
     char *p = GCSPRINTF("%s/xenconsole", libxl__private_bindir_path());
     char *domid_s = GCSPRINTF("%d", domid);
     char *cons_num_s = GCSPRINTF("%d", cons_num);
     char *cons_type_s;
+    char *tty_path;
+    const char *tty;
+    int retry = 50;             /* wait for 50 * 0.1 = 5 s for xenconsoled */
 
     switch (type) {
     case LIBXL_CONSOLE_TYPE_PV:
@@ -1793,6 +1797,28 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
         cons_type_s = "serial";
         break;
     default:
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* Make sure tty shows up before exec'ing xenconsole client */
+    rc = libxl__console_tty_path(gc, domid, cons_num, type, &tty_path);
+    if (rc) goto out;
+
+    while (retry-- > 0) {
+        rc = libxl__xs_read_checked(gc, XBT_NULL, tty_path, &tty);
+        if (rc) goto out;
+
+        if (tty != NULL && tty[0] != '\0')
+            break;
+
+        LOG(DEBUG, "TTY is not yet ready, retry");
+        usleep(100000);
+    }
+
+    if (retry == 0 && (!tty || tty[0] == '\0')) {
+        LOG(ERROR, "Failed to get tty from xenstore\n");
+        rc = ERROR_FAIL;
         goto out;
     }
 
@@ -1800,7 +1826,7 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
 
 out:
     GC_FREE;
-    return ERROR_FAIL;
+    return rc;
 }
 
 int libxl_console_get_tty(libxl_ctx *ctx, uint32_t domid, int cons_num,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
                           ` (3 preceding siblings ...)
  2016-08-01 13:16         ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:49           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
                           ` (2 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

The new argument will be passed down to xenconsole process, which then
uses it to notify readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl.c | 15 +++++++++++----
 tools/libxl/libxl.h | 43 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index a611b29..7f2f2b6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1777,13 +1777,14 @@ out:
 }
 
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
-                       libxl_console_type type)
+                       libxl_console_type type, int notify_fd)
 {
     GC_INIT(ctx);
     int rc;
     char *p = GCSPRINTF("%s/xenconsole", libxl__private_bindir_path());
     char *domid_s = GCSPRINTF("%d", domid);
     char *cons_num_s = GCSPRINTF("%d", cons_num);
+    char *notify_fd_s;
     char *cons_type_s;
     char *tty_path;
     const char *tty;
@@ -1822,7 +1823,13 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
         goto out;
     }
 
-    execl(p, p, domid_s, "--num", cons_num_s, "--type", cons_type_s, (void *)NULL);
+    if (notify_fd != -1) {
+        notify_fd_s = GCSPRINTF("%d", notify_fd);
+        execl(p, p, domid_s, "--num", cons_num_s, "--type", cons_type_s,
+              "--start-notify-fd", notify_fd_s, (void *)NULL);
+    } else
+        execl(p, p, domid_s, "--num", cons_num_s, "--type", cons_type_s,
+              (void *)NULL);
 
 out:
     GC_FREE;
@@ -1894,7 +1901,7 @@ out:
     return rc;
 }
 
-int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm)
+int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm, int notify_fd)
 {
     uint32_t domid;
     int cons_num;
@@ -1903,7 +1910,7 @@ int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm)
 
     rc = libxl__primary_console_find(ctx, domid_vm, &domid, &cons_num, &type);
     if ( rc ) return rc;
-    return libxl_console_exec(ctx, domid, cons_num, type);
+    return libxl_console_exec(ctx, domid, cons_num, type, notify_fd);
 }
 
 int libxl_primary_console_get_tty(libxl_ctx *ctx, uint32_t domid_vm,
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 48a43ce..f821e05 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -67,6 +67,14 @@
  * the same $(XEN_VERSION) (e.g. throughout a major release).
  */
 
+/* LIBXL_HAVE_CONSOLE_NOTIFY_FD
+ *
+ * If this is defined, libxl_console_exec and
+ * libxl_primary_console_exe take a notify_fd parameter. That
+ * parameter will be used to notify the caller that the console is connected.
+ */
+#define LIBXL_HAVE_CONSOLE_NOTIFY_FD 1
+
 /* LIBXL_HAVE_CONST_COPY_AND_LENGTH_FUNCTIONS
  *
  * If this is defined, the copy functions have constified src parameter and the
@@ -1413,15 +1421,44 @@ int libxl_wait_for_free_memory(libxl_ctx *ctx, uint32_t domid, uint32_t memory_k
 int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
 
 int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, int autopass);
-int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, libxl_console_type type);
+
+/*
+ * If notify_fd is not -1, xenconsole will write 0x00 to it to nofity
+ * the caller that it has connected to the guest console.
+ */
+int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
+                       libxl_console_type type, int notify_fd);
 /* libxl_primary_console_exec finds the domid and console number
  * corresponding to the primary console of the given vm, then calls
  * libxl_console_exec with the right arguments (domid might be different
  * if the guest is using stubdoms).
  * This function can be called after creating the device model, in
  * case of HVM guests, and before libxl_run_bootloader in case of PV
- * guests using pygrub. */
-int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm);
+ * guests using pygrub.
+ * If notify_fd is not -1, xenconsole will write 0x00 to it to nofity
+ * the caller that it has connected to the guest console.
+ */
+int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm,
+                               int notify_fd);
+
+#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040800
+
+static inline int libxl_console_exec_0x040700(libxl_ctx *ctx,
+                                              uint32_t domid, int cons_num,
+                                              libxl_console_type type)
+{
+    return libxl_console_exec(ctx, domid, cons_num, type, -1);
+}
+#define libxl_console_exec libxl_console_exec_0x040700
+
+static inline libxl_primary_console_exec_0x040700(libxl_ctx *ctx,
+                                                  uint32_t domid_vm)
+{
+    return libxl_primary_console_exec(ctx, domid_vm, -1);
+}
+#define libxl_primary_console_exec libxl_primary_console_exec_0x040700
+
+#endif
 
 /* libxl_console_get_tty retrieves the specified domain's console tty path
  * and stores it in path. Caller is responsible for freeing the memory.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 6/8] docs: document xenconsole startup protocol
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
                           ` (4 preceding siblings ...)
  2016-08-01 13:16         ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:52           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 7/8] xl: use " Wei Liu
  2016-08-01 13:16         ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 docs/misc/console.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/misc/console.txt b/docs/misc/console.txt
index ed7b795..0da295e 100644
--- a/docs/misc/console.txt
+++ b/docs/misc/console.txt
@@ -124,3 +124,9 @@ can only have one pv console with xenstored as backend (the stubdom
 could provide pv console backends to the hvm guest but then it would
 need another pv console connection for each console backend to export
 the pty to dom0).
+
+The xenconsole program supports a very simple protocol to notify parent about
+its readiness. If xenconsole (the client) is exec'ed and has been given a fd
+(normally the write end of a pipe) via --start-noitify-fd option, it will
+write 0x00 to that fd after connecting to the guest but before entering the
+event loop. Parent can read fromt the read end of the fd to know the status.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 7/8] xl: use xenconsole startup protocol
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
                           ` (5 preceding siblings ...)
  2016-08-01 13:16         ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:55           ` Ian Jackson
  2016-08-01 13:16         ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

If user asks xl to automatically connect to console when creating a
guest, use the new startup protocol before trying to unpause domain so
that we don't lose any console output.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/xl_cmdimpl.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 51dc7a0..bd64149 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2728,6 +2728,7 @@ static void autoconnect_console(libxl_ctx *ctx_ignored,
                                 libxl_event *ev, void *priv)
 {
     uint32_t bldomid = ev->domid;
+    int notify_fd = *(int*)priv; /* write end of the notification pipe */
 
     libxl_event_free(ctx, ev);
 
@@ -2740,7 +2741,7 @@ static void autoconnect_console(libxl_ctx *ctx_ignored,
     postfork();
 
     sleep(1);
-    libxl_primary_console_exec(ctx, bldomid);
+    libxl_primary_console_exec(ctx, bldomid, notify_fd);
     /* Do not return. xl continued in child process */
     perror("xl: unable to exec console client");
     _exit(1);
@@ -2810,6 +2811,7 @@ static int create_domain(struct domain_create *dom_info)
     int restore_fd_to_close = -1;
     int send_back_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
+    int notify_pipe[2] = { -1, -1 };
     struct save_file_header hdr;
     uint32_t domid_soft_reset = INVALID_DOMID;
 
@@ -2997,7 +2999,15 @@ start:
 
     libxl_asyncprogress_how autoconnect_console_how_buf;
     if ( dom_info->console_autoconnect ) {
+        if (pipe(notify_pipe)) {
+            fprintf(stderr,
+                    "Failed to create console notification pipe, errno %d\n",
+                    errno);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
         autoconnect_console_how_buf.callback = autoconnect_console;
+        autoconnect_console_how_buf.for_callback = &notify_pipe[1];
         autoconnect_console_how = &autoconnect_console_how_buf;
     }else{
         autoconnect_console_how = 0;
@@ -3047,6 +3057,18 @@ start:
         restore_fd_to_close = -1;
     }
 
+    if (autoconnect_console_how) {
+        char buf[1];
+        if (read(notify_pipe[0], buf, 1) != 1 && buf[0] != 0x00) {
+            fprintf(stderr,
+                    "Failed to get notification from xenconsole, errno %d\n",
+                    errno);
+        }
+        close(notify_pipe[0]);
+        close(notify_pipe[1]);
+        notify_pipe[0] = notify_pipe[1] = -1;
+    }
+
     if (!paused)
         libxl_domain_unpause(ctx, domid);
 
@@ -3754,9 +3776,9 @@ int main_console(int argc, char **argv)
 
     domid = find_domain(argv[optind]);
     if (!type)
-        libxl_primary_console_exec(ctx, domid);
+        libxl_primary_console_exec(ctx, domid, -1);
     else
-        libxl_console_exec(ctx, domid, num, type);
+        libxl_console_exec(ctx, domid, num, type, -1);
     fprintf(stderr, "Unable to attach console\n");
     return EXIT_FAILURE;
 }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
                           ` (6 preceding siblings ...)
  2016-08-01 13:16         ` [RFC PATCH 7/8] xl: use " Wei Liu
@ 2016-08-01 13:16         ` Wei Liu
  2016-08-05 15:57           ` Ian Jackson
  7 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-01 13:16 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu

The bug described in the comment has been fixed in libxl. Since console
client is a private binary and libxl is the only supported toolstack in
upstream, remove the bodge to simplify code.

Note that the xs watch is still needed because the console client
relies on it to tell if the tty is gone.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/console/client/main.c | 48 +++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index 2751eb7..275b49a 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -96,38 +96,34 @@ void cfmakeraw(struct termios *termios_p)
 }
 #endif
 
-static int get_pty_fd(struct xs_handle *xs, char *path, int seconds)
+static int get_pty_fd(struct xs_handle *xs, char *path)
 /* Check for a pty in xenstore, open it and return its fd.
  * Assumes there is already a watch set in the store for this path. */
 {
 	struct timeval tv;
 	fd_set watch_fdset;
 	int xs_fd = xs_fileno(xs), pty_fd = -1;
-	int start, now;
 	unsigned int len = 0;
 	char *pty_path, **watch_paths;
 
-	start = now = time(NULL);
-	do {
-		tv.tv_usec = 0;
-		tv.tv_sec = (start + seconds) - now;
-		FD_ZERO(&watch_fdset);
-		FD_SET(xs_fd, &watch_fdset);
-		if (select(xs_fd + 1, &watch_fdset, NULL, NULL, &tv)) {
-			/* Read the watch to drain the buffer */
-			watch_paths = xs_read_watch(xs, &len);
-			free(watch_paths);
-			/* We only watch for one thing, so no need to 
-			 * disambiguate: just read the pty path */
-			pty_path = xs_read(xs, XBT_NULL, path, &len);
-			if (pty_path != NULL && pty_path[0] != '\0') {
-				pty_fd = open(pty_path, O_RDWR | O_NOCTTY);
-				if (pty_fd == -1)
-					warn("Could not open tty `%s'", pty_path);
-			}
-			free(pty_path);
+	tv.tv_usec = 0;
+	tv.tv_sec = 0;
+	FD_ZERO(&watch_fdset);
+	FD_SET(xs_fd, &watch_fdset);
+	if (select(xs_fd + 1, &watch_fdset, NULL, NULL, &tv)) {
+		/* Read the watch to drain the buffer */
+		watch_paths = xs_read_watch(xs, &len);
+		free(watch_paths);
+		/* We only watch for one thing, so no need to
+		 * disambiguate: just read the pty path */
+		pty_path = xs_read(xs, XBT_NULL, path, &len);
+		if (pty_path != NULL && pty_path[0] != '\0') {
+			pty_fd = open(pty_path, O_RDWR | O_NOCTTY);
+			if (pty_fd == -1)
+				warn("Could not open tty `%s'", pty_path);
 		}
-	} while (pty_fd == -1 && (now = time(NULL)) < start + seconds);
+		free(pty_path);
+	}
 
 #ifdef __sun__
 	if (pty_fd != -1) {
@@ -201,7 +197,7 @@ static int console_loop(int fd, struct xs_handle *xs, char *pty_path,
 		}
 
 		if (FD_ISSET(xs_fileno(xs), &fds)) {
-			int newfd = get_pty_fd(xs, pty_path, 0);
+			int newfd = get_pty_fd(xs, pty_path);
 			if (fd != -1)
 				close(fd);
                         if (newfd == -1) 
@@ -454,11 +450,7 @@ int main(int argc, char **argv)
 		err(errno, "Can't set watch for console pty");
 	xsfd = xs_fileno(xs);
 
-	/* Wait a little bit for tty to appear.  There is a race
-	   condition that occurs after xend creates a domain.  This code
-	   might be running before consoled has noticed the new domain
-	   and setup a pty for it. */ 
-        spty = get_pty_fd(xs, path, 5);
+        spty = get_pty_fd(xs, path);
 	if (spty == -1) {
 		err(errno, "Could not read tty from store");
 	}
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [XTF PATCH] xtf-runner: use xl create -Fc directly
  2016-07-29 14:31     ` Ian Jackson
                         ` (2 preceding siblings ...)
  2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
@ 2016-08-01 14:04       ` Wei Liu
  3 siblings, 0 replies; 45+ messages in thread
From: Wei Liu @ 2016-08-01 14:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: Wei Liu, Ian Jackson, andrew.cooper3

Now that xl create -c is fixed in xen-unstable, there is no need to keep
the hack to get guest console output anymore.

Use xl create -Fc directly, then wait for the xl process to exit. Print
any error as it occurs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xtf-runner | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/xtf-runner b/xtf-runner
index 7c38db8..eeabc03 100755
--- a/xtf-runner
+++ b/xtf-runner
@@ -132,6 +132,7 @@ def list_tests(args):
 
         print name
 
+create_list = []
 
 def run_test(test):
     """ Run a specific test """
@@ -140,27 +141,17 @@ def run_test(test):
 
     cfg = path.join("tests", name, test + ".cfg")
 
-    cmd = ['xl', 'create', '-p', cfg]
-
-    print "Executing '%s'" % (" ".join(cmd), )
-    rc = subproc_call(cmd)
-    if rc:
-        raise RunnerError("Failed to create VM")
-
-    cmd = ['xl', 'console', test]
+    cmd = ['xl', 'create', '-Fc', cfg]
     print "Executing '%s'" % (" ".join(cmd), )
-    console = Popen(cmd, stdout = PIPE)
+    guest = Popen(cmd, stdout = PIPE, stderr = PIPE)
+    create_list.append(guest)
 
-    cmd = ['xl', 'unpause', test]
-    print "Executing '%s'" % (" ".join(cmd), )
-    rc = subproc_call(cmd)
-    if rc:
-        raise RunnerError("Failed to unpause VM")
+    # stdout is console output, stderr is xl output
+    stdout, stderr = guest.communicate()
 
-    stdout, _ = console.communicate()
-
-    if console.returncode:
-        raise RunnerError("Failed to obtain VM console")
+    if guest.returncode:
+        print stderr
+        raise RunnerError("Failed to communicate with guest")
 
     lines = stdout.splitlines()
 
@@ -327,9 +318,14 @@ def main():
     opts, args = parser.parse_args()
 
     if opts.list_tests:
-        return list_tests(args)
+        ret = list_tests(args)
     else:
-        return run_tests(args)
+        ret = run_tests(args)
+
+    for child in create_list:
+        child.wait()
+
+    return ret
 
 
 if __name__ == "__main__":
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 1/8] tools/console: fix help string in client
  2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
@ 2016-08-05 15:40           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 1/8] tools/console: fix help string in client"):
> There is no short '-t' option.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client
  2016-08-01 13:16         ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
@ 2016-08-05 15:43           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:43 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client"):
> The console client will write 0x00 to that fd before entering console
> loop to indicate its readiness.

This could sensibly be in a comment in the code.

> +	if (start_notify_fd != -1) {
> +		const char msg[] = { 0x00 };
                ^ static

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path
  2016-08-01 13:16         ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
@ 2016-08-05 15:44           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Ian Jackson, Andrew Cooper

Wei Liu writes ("[RFC PATCH 3/8] libxl: factor out libxl__console_tty_path"):
> No user yet.

You mean no _other_ user!

With that change to the commit message,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled
  2016-08-01 13:16         ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
@ 2016-08-05 15:48           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:48 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled"):
> Wait until the tty node is available before exec'ing xenconsole.

You shouldn't poll.  We have a perfectly good xenstore watch
mechanism.

You could either:

(a) do something ad-hoc with poll(), the xenstore fd, etc.

(b) make libxl_console_exec by a weird special case which starts an AO
but always synchronously (ie, doesn't take an ao_how), and then use
the existing libxl__xswait and a callback.

Doing the latter would mean that you would already have done half (a
third?) of the work for a future libxl_console_get_fd function.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument
  2016-08-01 13:16         ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
@ 2016-08-05 15:49           ` Ian Jackson
  2016-08-05 15:50             ` Ian Jackson
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 5/8] libxl: libxl_{primary_,}console_exec now take notify_fd argument"):
> The new argument will be passed down to xenconsole process, which then
> uses it to notify readiness.
...
>  int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
> -                       libxl_console_type type)
> +                       libxl_console_type type, int notify_fd)

Urgh.  I don't see a better way to do this though, given the existence
of libxl_console_exec.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument
  2016-08-05 15:49           ` Ian Jackson
@ 2016-08-05 15:50             ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:50 UTC (permalink / raw)
  To: Wei Liu, Xen-devel, Andrew Cooper

Ian Jackson writes ("Re: [RFC PATCH 5/8] libxl: libxl_{primary_,}console_exec now take notify_fd argument"):
> Wei Liu writes ("[RFC PATCH 5/8] libxl: libxl_{primary_,}console_exec now take notify_fd argument"):
> > The new argument will be passed down to xenconsole process, which then
> > uses it to notify readiness.
> ...
> >  int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
> > -                       libxl_console_type type)
> > +                       libxl_console_type type, int notify_fd)
> 
> Urgh.  I don't see a better way to do this though, given the existence
> of libxl_console_exec.

By which I mean,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 6/8] docs: document xenconsole startup protocol
  2016-08-01 13:16         ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
@ 2016-08-05 15:52           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:52 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 6/8] docs: document xenconsole startup protocol"):
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Oh!  Forget my comment on patch 02. about the commit message and/or
comment.  Maybe you want to squash this one into there, but either
way:

> +The xenconsole program supports a very simple protocol to notify parent about
> +its readiness. If xenconsole (the client) is exec'ed and has been given a fd
> +(normally the write end of a pipe) via --start-noitify-fd option, it will
                                                  ^^^^^^^ not
> +write 0x00 to that fd after connecting to the guest but before entering the
> +event loop. Parent can read fromt the read end of the fd to know the status.
                                   ^

With the two typos fixed,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 7/8] xl: use xenconsole startup protocol
  2016-08-01 13:16         ` [RFC PATCH 7/8] xl: use " Wei Liu
@ 2016-08-05 15:55           ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:55 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 7/8] xl: use xenconsole startup protocol"):
> If user asks xl to automatically connect to console when creating a
> guest, use the new startup protocol before trying to unpause domain so
> that we don't lose any console output.

Most of the logic here LGTM.

> @@ -2997,7 +2999,15 @@ start:
>  
>      libxl_asyncprogress_how autoconnect_console_how_buf;
>      if ( dom_info->console_autoconnect ) {
> +        if (pipe(notify_pipe)) {

Use libxl_pipe.

> +    if (autoconnect_console_how) {
> +        char buf[1];
> +        if (read(notify_pipe[0], buf, 1) != 1 && buf[0] != 0x00) {
> +            fprintf(stderr,
> +                    "Failed to get notification from xenconsole, errno %d\n",
> +                    errno);

As you say in your 0/, poor error handling.  You need to print
the value of buf, if relevant, and handle EOF properly.

> +        }
> +        close(notify_pipe[0]);
> +        close(notify_pipe[1]);
> +        notify_pipe[0] = notify_pipe[1] = -1;
> +    }

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-01 13:16         ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
@ 2016-08-05 15:57           ` Ian Jackson
  2016-08-05 16:16             ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 15:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("[RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> The bug described in the comment has been fixed in libxl. Since console
> client is a private binary and libxl is the only supported toolstack in
> upstream, remove the bodge to simplify code.

I don't think this is correct.

In particular, what about
   xl create /etc/xen/foo.cfg
   xl console foo
?

I think in this case xenconsole still needs to wait.

TBH I wonder if this means that
 [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled
is actually wrong, despite my ack.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 15:57           ` Ian Jackson
@ 2016-08-05 16:16             ` Wei Liu
  2016-08-05 16:18               ` Ian Jackson
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-05 16:16 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, Andrew Cooper

On Fri, Aug 05, 2016 at 04:57:33PM +0100, Ian Jackson wrote:
> Wei Liu writes ("[RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> > The bug described in the comment has been fixed in libxl. Since console
> > client is a private binary and libxl is the only supported toolstack in
> > upstream, remove the bodge to simplify code.
> 
> I don't think this is correct.
> 
> In particular, what about
>    xl create /etc/xen/foo.cfg
>    xl console foo
> ?
> 
> I think in this case xenconsole still needs to wait.
> 

xl console will call libxl_console_exec which should wait for the tty
node to appear before exec'ing xenconsole.

> TBH I wonder if this means that
>  [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled
> is actually wrong, despite my ack.
> 

You didn't ack that one. Maybe you mean something else?

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 16:16             ` Wei Liu
@ 2016-08-05 16:18               ` Ian Jackson
  2016-08-05 16:28                 ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 16:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> On Fri, Aug 05, 2016 at 04:57:33PM +0100, Ian Jackson wrote:
> > In particular, what about
> >    xl create /etc/xen/foo.cfg
> >    xl console foo
> > ?
> > 
> > I think in this case xenconsole still needs to wait.
> 
> xl console will call libxl_console_exec which should wait for the tty
> node to appear before exec'ing xenconsole.

Well, except that xenconsole is also on the path.  I (now) don't
understand why this functionality needs to be moved from xenconsole to
libxl.

> > TBH I wonder if this means that
> >  [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled
> > is actually wrong, despite my ack.
> > 
> 
> You didn't ack that one. Maybe you mean something else?

I meant "despite my implicit approval of the principle".

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 16:18               ` Ian Jackson
@ 2016-08-05 16:28                 ` Wei Liu
  2016-08-05 16:32                   ` Ian Jackson
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-05 16:28 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, Andrew Cooper

On Fri, Aug 05, 2016 at 05:18:47PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> > On Fri, Aug 05, 2016 at 04:57:33PM +0100, Ian Jackson wrote:
> > > In particular, what about
> > >    xl create /etc/xen/foo.cfg
> > >    xl console foo
> > > ?
> > > 
> > > I think in this case xenconsole still needs to wait.
> > 
> > xl console will call libxl_console_exec which should wait for the tty
> > node to appear before exec'ing xenconsole.
> 
> Well, except that xenconsole is also on the path.  I (now) don't
> understand why this functionality needs to be moved from xenconsole to
> libxl.
> 

There needs to be a entity that waits for the node to appear. I thought
libxl was in a better position.  That being said, I think let xenconsole
wait would also work.

I don't have a strong preference on how to proceed.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 16:28                 ` Wei Liu
@ 2016-08-05 16:32                   ` Ian Jackson
  2016-08-05 16:36                     ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2016-08-05 16:32 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> On Fri, Aug 05, 2016 at 05:18:47PM +0100, Ian Jackson wrote:
> > Well, except that xenconsole is also on the path.  I (now) don't
> > understand why this functionality needs to be moved from xenconsole to
> > libxl.
> 
> There needs to be a entity that waits for the node to appear. I thought
> libxl was in a better position.  That being said, I think let xenconsole
> wait would also work.

My point is that xenconsole seems to have the functionality already.
You are moving it to libxl.

The sole visible effect of this is to make xenconsole unreliable for
people who invoke it directly.

Or have I misunderstood ?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 16:32                   ` Ian Jackson
@ 2016-08-05 16:36                     ` Wei Liu
  2016-08-05 17:23                       ` Wei Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-05 16:36 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, Andrew Cooper

On Fri, Aug 05, 2016 at 05:32:30PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> > On Fri, Aug 05, 2016 at 05:18:47PM +0100, Ian Jackson wrote:
> > > Well, except that xenconsole is also on the path.  I (now) don't
> > > understand why this functionality needs to be moved from xenconsole to
> > > libxl.
> > 
> > There needs to be a entity that waits for the node to appear. I thought
> > libxl was in a better position.  That being said, I think let xenconsole
> > wait would also work.
> 
> My point is that xenconsole seems to have the functionality already.
> You are moving it to libxl.
> 

Yes, that's what I did.

> The sole visible effect of this is to make xenconsole unreliable for
> people who invoke it directly.
> 

No, it should (at least by design) still be reliable since the libxl now
waits before exec xenconsole.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 16:36                     ` Wei Liu
@ 2016-08-05 17:23                       ` Wei Liu
  2016-08-08 10:07                         ` Ian Jackson
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Liu @ 2016-08-05 17:23 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Xen-devel, Wei Liu, Andrew Cooper

On Fri, Aug 05, 2016 at 05:36:08PM +0100, Wei Liu wrote:
> On Fri, Aug 05, 2016 at 05:32:30PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> > > On Fri, Aug 05, 2016 at 05:18:47PM +0100, Ian Jackson wrote:
> > > > Well, except that xenconsole is also on the path.  I (now) don't
> > > > understand why this functionality needs to be moved from xenconsole to
> > > > libxl.
> > > 
> > > There needs to be a entity that waits for the node to appear. I thought
> > > libxl was in a better position.  That being said, I think let xenconsole
> > > wait would also work.
> > 
> > My point is that xenconsole seems to have the functionality already.
> > You are moving it to libxl.
> > 
> 
> Yes, that's what I did.
> 
> > The sole visible effect of this is to make xenconsole unreliable for
> > people who invoke it directly.
> > 
> 
> No, it should (at least by design) still be reliable since the libxl now
> waits before exec xenconsole.

Now I have decided to drop patch 4 -- the less patch the better. :-)

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client
  2016-08-05 17:23                       ` Wei Liu
@ 2016-08-08 10:07                         ` Ian Jackson
  0 siblings, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2016-08-08 10:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Andrew Cooper

Wei Liu writes ("Re: [RFC PATCH 8/8] tools/console: remove 5s bodge in console client"):
> On Fri, Aug 05, 2016 at 05:36:08PM +0100, Wei Liu wrote:
> > On Fri, Aug 05, 2016 at 05:32:30PM +0100, Ian Jackson wrote:
> > > My point is that xenconsole seems to have the functionality already.
> > > You are moving it to libxl.
> > 
> > Yes, that's what I did.

... but my implicit question was: why ?

> > > The sole visible effect of this is to make xenconsole unreliable for
> > > people who invoke it directly.
> > 
> > No, it should (at least by design) still be reliable since the libxl now
> > waits before exec xenconsole.

And, when someone invokes xenconsole directly, libxl is not involved.

> Now I have decided to drop patch 4 -- the less patch the better. :-)

Good, thanks.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-08-08 10:07 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
2016-07-29 12:43 ` Andrew Cooper
2016-07-29 12:58   ` Wei Liu
2016-07-29 13:06     ` Andrew Cooper
2016-07-29 13:12       ` Wei Liu
2016-07-29 13:23         ` Andrew Cooper
2016-07-29 13:26           ` Wei Liu
2016-07-29 14:31     ` Ian Jackson
2016-07-29 14:55       ` Wei Liu
2016-07-29 16:18         ` Ian Jackson
2016-07-29 16:35           ` Andrew Cooper
2016-07-29 16:41           ` Wei Liu
2016-07-29 15:05       ` Andrew Cooper
2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
2016-08-05 15:40           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
2016-08-05 15:43           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
2016-08-05 15:44           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
2016-08-05 15:48           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
2016-08-05 15:49           ` Ian Jackson
2016-08-05 15:50             ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
2016-08-05 15:52           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 7/8] xl: use " Wei Liu
2016-08-05 15:55           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
2016-08-05 15:57           ` Ian Jackson
2016-08-05 16:16             ` Wei Liu
2016-08-05 16:18               ` Ian Jackson
2016-08-05 16:28                 ` Wei Liu
2016-08-05 16:32                   ` Ian Jackson
2016-08-05 16:36                     ` Wei Liu
2016-08-05 17:23                       ` Wei Liu
2016-08-08 10:07                         ` Ian Jackson
2016-08-01 14:04       ` [XTF PATCH] xtf-runner: use xl create -Fc directly Wei Liu
2016-07-29 13:27 ` [XTF PATCH] xtf-runner: fix two synchronisation issues Andrew Cooper
2016-07-29 14:21 ` Ian Jackson
2016-07-29 14:25   ` Wei Liu
2016-07-29 14:35     ` Ian Jackson
2016-07-29 14:46       ` Wei Liu
2016-07-29 14:26   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.