All of lore.kernel.org
 help / color / mirror / Atom feed
* reboot driver domain, vifX.Y = NO-CARRIER?
@ 2018-04-27 15:03 Jason Cooper
  2018-04-27 15:11 ` Andrew Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-04-27 15:03 UTC (permalink / raw)
  To: xen-devel

All,

On Gentoo Xen 4.9.1, I've been creating minimal Linux DomU's to create a
virtual, segregated network infrastructure.  This has been going really
well, and I'm slowly progressing toward a self-updating system.

My main snag has to do with re-attaching VMs to a driver domain after
rebooting the driver domain.  e.g.


                              +-----+
                            /-| VM1 |
                           /  +-----+
       +----+   +-------+ /   +-----+
ISP ---| SW |---| GW/FW |-----| VM2 |
       +----+   +-------+ \   +-----+
        DD        DD       \  +-----+
                            \-| VMN |
                              +-----+

So, in this diagram, SW, GW/FW, and VM1 are mini-VMs.  VM2, and the rest
are full fledged Linux PV VMs.

Only SW, and GW/FW are driver domains.  SW has the physical nic via
pci-passthrough.  There are actually 7 GW/FW mini-VMs (for 7 public IPs,
and 7 different networks), and a trunk mini-VM that aren't shown.

The problem occurs when I reboot a driver domain.  Regardless of the
type of guest attached to it, I'm unable to re-establish connectivity
between the driver domain and the re-attached guest.  e.g. I reboot
GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
get:

$ ip link
...
11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff

In the driver domain.  At this point, absolutely no packets flow between
the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
reboot the PV guests.  After that, networking is fine.

Any thoughts?

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 15:03 reboot driver domain, vifX.Y = NO-CARRIER? Jason Cooper
@ 2018-04-27 15:11 ` Andrew Cooper
  2018-04-27 15:35   ` Jason Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2018-04-27 15:11 UTC (permalink / raw)
  To: Jason Cooper, xen-devel

On 27/04/18 16:03, Jason Cooper wrote:
> All,
>
> On Gentoo Xen 4.9.1, I've been creating minimal Linux DomU's to create a
> virtual, segregated network infrastructure.  This has been going really
> well, and I'm slowly progressing toward a self-updating system.
>
> My main snag has to do with re-attaching VMs to a driver domain after
> rebooting the driver domain.  e.g.
>
>
>                               +-----+
>                             /-| VM1 |
>                            /  +-----+
>        +----+   +-------+ /   +-----+
> ISP ---| SW |---| GW/FW |-----| VM2 |
>        +----+   +-------+ \   +-----+
>         DD        DD       \  +-----+
>                             \-| VMN |
>                               +-----+
>
> So, in this diagram, SW, GW/FW, and VM1 are mini-VMs.  VM2, and the rest
> are full fledged Linux PV VMs.
>
> Only SW, and GW/FW are driver domains.  SW has the physical nic via
> pci-passthrough.  There are actually 7 GW/FW mini-VMs (for 7 public IPs,
> and 7 different networks), and a trunk mini-VM that aren't shown.
>
> The problem occurs when I reboot a driver domain.  Regardless of the
> type of guest attached to it, I'm unable to re-establish connectivity
> between the driver domain and the re-attached guest.  e.g. I reboot
> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> get:
>
> $ ip link
> ...
> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>
> In the driver domain.  At this point, absolutely no packets flow between
> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> reboot the PV guests.  After that, networking is fine.
>
> Any thoughts?

XenServer found this when we investigated using device driver domains in
a similar way.

The underlying problem is that the frontend/backend setup in xenstore
encodes the domid in path, and changing that isn't transparent to the
guest at all.

The best idea we came up with was to reboot the driver domain and reuse
its old domid, at which point all the xenstore paths would remain
valid.  There is support in Xen for explicitly choosing the domid of a
domain, but I don't think that it is wired up sensibly in xl.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 15:11 ` Andrew Cooper
@ 2018-04-27 15:35   ` Jason Cooper
  2018-04-27 15:52     ` Andrew Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-04-27 15:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

Hi Andrew,

On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:03, Jason Cooper wrote:
> > The problem occurs when I reboot a driver domain.  Regardless of the
> > type of guest attached to it, I'm unable to re-establish connectivity
> > between the driver domain and the re-attached guest.  e.g. I reboot
> > GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> > get:
> >
> > $ ip link
> > ...
> > 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
> >     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >
> > In the driver domain.  At this point, absolutely no packets flow between
> > the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> > reboot the PV guests.  After that, networking is fine.
> >
> > Any thoughts?
> 
> The underlying problem is that the frontend/backend setup in xenstore
> encodes the domid in path, and changing that isn't transparent to the
> guest at all.

Oh joy.  Would seem to make more send to use the domain name or the
uuid...

> The best idea we came up with was to reboot the driver domain and reuse
> its old domid, at which point all the xenstore paths would remain
> valid.  There is support in Xen for explicitly choosing the domid of a
> domain, but I don't think that it is wired up sensibly in xl.

hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
how to reuse the domid?

The solution I see with my current, limited understanding could be to
change the path for the guest via xenstore-write.  Although I suspect
there's more going on underneath the hood than I'm currently aware of.

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 15:35   ` Jason Cooper
@ 2018-04-27 15:52     ` Andrew Cooper
  2018-04-27 16:14       ` Jason Cooper
  2018-04-27 16:56       ` Wei Liu
  0 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-27 15:52 UTC (permalink / raw)
  To: Jason Cooper; +Cc: xen-devel

On 27/04/18 16:35, Jason Cooper wrote:
> Hi Andrew,
>
> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
>> On 27/04/18 16:03, Jason Cooper wrote:
>>> The problem occurs when I reboot a driver domain.  Regardless of the
>>> type of guest attached to it, I'm unable to re-establish connectivity
>>> between the driver domain and the re-attached guest.  e.g. I reboot
>>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
>>> get:
>>>
>>> $ ip link
>>> ...
>>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
>>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>>
>>> In the driver domain.  At this point, absolutely no packets flow between
>>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
>>> reboot the PV guests.  After that, networking is fine.
>>>
>>> Any thoughts?
>> The underlying problem is that the frontend/backend setup in xenstore
>> encodes the domid in path, and changing that isn't transparent to the
>> guest at all.
> Oh joy.  Would seem to make more send to use the domain name or the
> uuid...

domids are also used in the grant and event hypercall interfaces with Xen.

There is no way this horse is being put back in its stable...

>
>> The best idea we came up with was to reboot the driver domain and reuse
>> its old domid, at which point all the xenstore paths would remain
>> valid.  There is support in Xen for explicitly choosing the domid of a
>> domain, but I don't think that it is wired up sensibly in xl.
> hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> how to reuse the domid?

xc_domain_create() takes a domid value by pointer.  Passing a value
other than zero will cause Xen to use that domid, rather than by
searching for the next free domid.

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b5e27a7..7866092 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
libxl_domain_config *d_config,
             goto out;
         }
 
+        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
         ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags,
domid,
                                &xc_config);
         if (ret < 0) {

This gross hack may get you somewhere (Entirely untested).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 15:52     ` Andrew Cooper
@ 2018-04-27 16:14       ` Jason Cooper
  2018-04-27 16:58         ` Wei Liu
  2018-04-27 17:02         ` Andrew Cooper
  2018-04-27 16:56       ` Wei Liu
  1 sibling, 2 replies; 28+ messages in thread
From: Jason Cooper @ 2018-04-27 16:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:35, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:03, Jason Cooper wrote:
> >>> The problem occurs when I reboot a driver domain.  Regardless of the
> >>> type of guest attached to it, I'm unable to re-establish connectivity
> >>> between the driver domain and the re-attached guest.  e.g. I reboot
> >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> >>> get:
> >>>
> >>> $ ip link
> >>> ...
> >>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
> >>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >>>
> >>> In the driver domain.  At this point, absolutely no packets flow between
> >>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> >>> reboot the PV guests.  After that, networking is fine.
> >>>
> >>> Any thoughts?
> >> The underlying problem is that the frontend/backend setup in xenstore
> >> encodes the domid in path, and changing that isn't transparent to the
> >> guest at all.
> > Oh joy.  Would seem to make more send to use the domain name or the
> > uuid...
> 
> domids are also used in the grant and event hypercall interfaces with Xen.
> 
> There is no way this horse is being put back in its stable...

:-(

> >> The best idea we came up with was to reboot the driver domain and reuse
> >> its old domid, at which point all the xenstore paths would remain
> >> valid.  There is support in Xen for explicitly choosing the domid of a
> >> domain, but I don't think that it is wired up sensibly in xl.
> > hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> > how to reuse the domid?
> 
> xc_domain_create() takes a domid value by pointer.  Passing a value
> other than zero will cause Xen to use that domid, rather than by
> searching for the next free domid.
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index b5e27a7..7866092 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> libxl_domain_config *d_config,
>              goto out;
>          }
>  
> +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
>          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
>                                 &xc_config);
>          if (ret < 0) {
> 
> This gross hack may get you somewhere (Entirely untested).

Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
series adding a 'domid' field to the domain config file would be
rejected outright?  That would allow callers of xl to use key=value for
reboot scripts like mine, and also allow for a static domid setup of the
driver domains if folks want that.

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 15:52     ` Andrew Cooper
  2018-04-27 16:14       ` Jason Cooper
@ 2018-04-27 16:56       ` Wei Liu
  1 sibling, 0 replies; 28+ messages in thread
From: Wei Liu @ 2018-04-27 16:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jason Cooper, Wei Liu

On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:35, Jason Cooper wrote:
> > Hi Andrew,
> >
> > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:03, Jason Cooper wrote:
> >>> The problem occurs when I reboot a driver domain.  Regardless of the
> >>> type of guest attached to it, I'm unable to re-establish connectivity
> >>> between the driver domain and the re-attached guest.  e.g. I reboot
> >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> >>> get:
> >>>
> >>> $ ip link
> >>> ...
> >>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
> >>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >>>
> >>> In the driver domain.  At this point, absolutely no packets flow between
> >>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> >>> reboot the PV guests.  After that, networking is fine.
> >>>
> >>> Any thoughts?
> >> The underlying problem is that the frontend/backend setup in xenstore
> >> encodes the domid in path, and changing that isn't transparent to the
> >> guest at all.
> > Oh joy.  Would seem to make more send to use the domain name or the
> > uuid...
> 
> domids are also used in the grant and event hypercall interfaces with Xen.

If the frontend manages to go through disconnect/reconnect cycle, grant
table and event channel aren't going to be a problem?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 16:14       ` Jason Cooper
@ 2018-04-27 16:58         ` Wei Liu
  2018-04-27 17:27           ` Jason Cooper
  2018-04-27 17:02         ` Andrew Cooper
  1 sibling, 1 reply; 28+ messages in thread
From: Wei Liu @ 2018-04-27 16:58 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Fri, Apr 27, 2018 at 04:14:16PM +0000, Jason Cooper wrote:
> On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> > On 27/04/18 16:35, Jason Cooper wrote:
> > > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> > >> On 27/04/18 16:03, Jason Cooper wrote:
> > >>> The problem occurs when I reboot a driver domain.  Regardless of the
> > >>> type of guest attached to it, I'm unable to re-establish connectivity
> > >>> between the driver domain and the re-attached guest.  e.g. I reboot
> > >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> > >>> get:
> > >>>
> > >>> $ ip link
> > >>> ...
> > >>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
> > >>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> > >>>
> > >>> In the driver domain.  At this point, absolutely no packets flow between
> > >>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> > >>> reboot the PV guests.  After that, networking is fine.
> > >>>
> > >>> Any thoughts?
> > >> The underlying problem is that the frontend/backend setup in xenstore
> > >> encodes the domid in path, and changing that isn't transparent to the
> > >> guest at all.
> > > Oh joy.  Would seem to make more send to use the domain name or the
> > > uuid...
> > 
> > domids are also used in the grant and event hypercall interfaces with Xen.
> > 
> > There is no way this horse is being put back in its stable...
> 
> :-(
> 
> > >> The best idea we came up with was to reboot the driver domain and reuse
> > >> its old domid, at which point all the xenstore paths would remain
> > >> valid.  There is support in Xen for explicitly choosing the domid of a
> > >> domain, but I don't think that it is wired up sensibly in xl.
> > > hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> > > how to reuse the domid?
> > 
> > xc_domain_create() takes a domid value by pointer.  Passing a value
> > other than zero will cause Xen to use that domid, rather than by
> > searching for the next free domid.
> > 
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index b5e27a7..7866092 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > libxl_domain_config *d_config,
> >              goto out;
> >          }
> >  
> > +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> >          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
> >                                 &xc_config);
> >          if (ret < 0) {
> > 
> > This gross hack may get you somewhere (Entirely untested).
> 
> Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> series adding a 'domid' field to the domain config file would be
> rejected outright?  That would allow callers of xl to use key=value for
> reboot scripts like mine, and also allow for a static domid setup of the
> driver domains if folks want that.

Seems a bit  hacky to me. You also need to reserve a set of domids
before hand?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 16:14       ` Jason Cooper
  2018-04-27 16:58         ` Wei Liu
@ 2018-04-27 17:02         ` Andrew Cooper
  2018-04-27 17:13           ` Wei Liu
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2018-04-27 17:02 UTC (permalink / raw)
  To: Jason Cooper; +Cc: xen-devel

On 27/04/18 17:14, Jason Cooper wrote:
> On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
>> On 27/04/18 16:35, Jason Cooper wrote:
>>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
>>>> On 27/04/18 16:03, Jason Cooper wrote:
>>>>> The problem occurs when I reboot a driver domain.  Regardless of the
>>>>> type of guest attached to it, I'm unable to re-establish connectivity
>>>>> between the driver domain and the re-attached guest.  e.g. I reboot
>>>>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
>>>>> get:
>>>>>
>>>>> $ ip link
>>>>> ...
>>>>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
>>>>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>>>>
>>>>> In the driver domain.  At this point, absolutely no packets flow between
>>>>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
>>>>> reboot the PV guests.  After that, networking is fine.
>>>>>
>>>>> Any thoughts?
>>>> The underlying problem is that the frontend/backend setup in xenstore
>>>> encodes the domid in path, and changing that isn't transparent to the
>>>> guest at all.
>>> Oh joy.  Would seem to make more send to use the domain name or the
>>> uuid...
>> domids are also used in the grant and event hypercall interfaces with Xen.
>>
>> There is no way this horse is being put back in its stable...
> :-(
>
>>>> The best idea we came up with was to reboot the driver domain and reuse
>>>> its old domid, at which point all the xenstore paths would remain
>>>> valid.  There is support in Xen for explicitly choosing the domid of a
>>>> domain, but I don't think that it is wired up sensibly in xl.
>>> hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
>>> how to reuse the domid?
>> xc_domain_create() takes a domid value by pointer.  Passing a value
>> other than zero will cause Xen to use that domid, rather than by
>> searching for the next free domid.
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index b5e27a7..7866092 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
>> libxl_domain_config *d_config,
>>              goto out;
>>          }
>>  
>> +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
>>          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
>>                                 &xc_config);
>>          if (ret < 0) {
>>
>> This gross hack may get you somewhere (Entirely untested).
> Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> series adding a 'domid' field to the domain config file would be
> rejected outright?  That would allow callers of xl to use key=value for
> reboot scripts like mine, and also allow for a static domid setup of the
> driver domains if folks want that.

That question would have to be deferred to the toolstack maintainers,
but some ability to manage exact domid's would be a very good thing.

Having a domid= field would allow for very fine grain control, but
probably more control than most people want.  Alternatively, having some
kind of "reuse_domid" field which booted the domain normally once,
recorded its domid, and reused that on reboot might be rather more useful.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 17:02         ` Andrew Cooper
@ 2018-04-27 17:13           ` Wei Liu
  2018-04-30 15:22             ` Ian Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: Wei Liu @ 2018-04-27 17:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jason Cooper, Wei Liu

On Fri, Apr 27, 2018 at 06:02:46PM +0100, Andrew Cooper wrote:
> On 27/04/18 17:14, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:35, Jason Cooper wrote:
> >>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> >>>> On 27/04/18 16:03, Jason Cooper wrote:
> >>>>> The problem occurs when I reboot a driver domain.  Regardless of the
> >>>>> type of guest attached to it, I'm unable to re-establish connectivity
> >>>>> between the driver domain and the re-attached guest.  e.g. I reboot
> >>>>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> >>>>> get:
> >>>>>
> >>>>> $ ip link
> >>>>> ...
> >>>>> 11: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
> >>>>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >>>>>
> >>>>> In the driver domain.  At this point, absolutely no packets flow between
> >>>>> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> >>>>> reboot the PV guests.  After that, networking is fine.
> >>>>>
> >>>>> Any thoughts?
> >>>> The underlying problem is that the frontend/backend setup in xenstore
> >>>> encodes the domid in path, and changing that isn't transparent to the
> >>>> guest at all.
> >>> Oh joy.  Would seem to make more send to use the domain name or the
> >>> uuid...
> >> domids are also used in the grant and event hypercall interfaces with Xen.
> >>
> >> There is no way this horse is being put back in its stable...
> > :-(
> >
> >>>> The best idea we came up with was to reboot the driver domain and reuse
> >>>> its old domid, at which point all the xenstore paths would remain
> >>>> valid.  There is support in Xen for explicitly choosing the domid of a
> >>>> domain, but I don't think that it is wired up sensibly in xl.
> >>> hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> >>> how to reuse the domid?
> >> xc_domain_create() takes a domid value by pointer.  Passing a value
> >> other than zero will cause Xen to use that domid, rather than by
> >> searching for the next free domid.
> >>
> >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> >> index b5e27a7..7866092 100644
> >> --- a/tools/libxl/libxl_create.c
> >> +++ b/tools/libxl/libxl_create.c
> >> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> >> libxl_domain_config *d_config,
> >>              goto out;
> >>          }
> >>  
> >> +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> >>          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
> >>                                 &xc_config);
> >>          if (ret < 0) {
> >>
> >> This gross hack may get you somewhere (Entirely untested).
> > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > series adding a 'domid' field to the domain config file would be
> > rejected outright?  That would allow callers of xl to use key=value for
> > reboot scripts like mine, and also allow for a static domid setup of the
> > driver domains if folks want that.
> 
> That question would have to be deferred to the toolstack maintainers,
> but some ability to manage exact domid's would be a very good thing.
> 
> Having a domid= field would allow for very fine grain control, but
> probably more control than most people want.  Alternatively, having some
> kind of "reuse_domid" field which booted the domain normally once,
> recorded its domid, and reused that on reboot might be rather more useful.
> 

To implement reuse_domid in a sane way, either the toolstack needs to
manage all domids and always sets domid when creating domain or the
hypervisor needs to cooperate -- to have interface to reserve /
pre-allocate domids.

Either should be doable. We should think a bit more which approach is
better.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 16:58         ` Wei Liu
@ 2018-04-27 17:27           ` Jason Cooper
  2018-05-01 11:29             ` Wei Liu
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-04-27 17:27 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, xen-devel

Hi Wei Liu,

On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote:
> On Fri, Apr 27, 2018 at 04:14:16PM +0000, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
...
> > > xc_domain_create() takes a domid value by pointer.  Passing a value
> > > other than zero will cause Xen to use that domid, rather than by
> > > searching for the next free domid.
> > > 
> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > index b5e27a7..7866092 100644
> > > --- a/tools/libxl/libxl_create.c
> > > +++ b/tools/libxl/libxl_create.c
> > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > > libxl_domain_config *d_config,
> > >              goto out;
> > >          }
> > >  
> > > +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> > >          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
> > >                                 &xc_config);
> > >          if (ret < 0) {
> > > 
> > > This gross hack may get you somewhere (Entirely untested).
> > 
> > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > series adding a 'domid' field to the domain config file would be
> > rejected outright?  That would allow callers of xl to use key=value for
> > reboot scripts like mine, and also allow for a static domid setup of the
> > driver domains if folks want that.
> 
> Seems a bit  hacky to me. You also need to reserve a set of domids
> before hand?

My thought of creating a domid config file variable was to do just as
you say, reserve specific domids for specific guests.  I could even
trigger an error if domid is set when driver_domain isn't.

Actually, I could slightly overload driver_domain, changing from a bool
to a 'static domid'.  0 = not a driver domain, >0 is it's static domid
assignment.

For backwards compatibility, 1 = next domid available, and >1 would be
the static domid.  I'm not sure if I like that though.

The racey part is when a driver domain is shut down, how does a create
thread know that that domid is reserved?

third option, tri-state:

driver_domain = 0   # not a driver domain
driver_domain = 1   # is a driver domain, use next avail domid
driver_domain = 2   # is a driver domain, re-use domid

Honestly, I'm not really liking any of these.  Perhaps 'xl
network-detach ...' should be doing a better job of cleaning up?  Or,
'xl network-attach ...' should do a better job of re-attaching?

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 17:13           ` Wei Liu
@ 2018-04-30 15:22             ` Ian Jackson
  2018-04-30 16:16               ` Jason Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2018-04-30 15:22 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Jason Cooper, xen-devel

Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> To implement reuse_domid in a sane way, either the toolstack needs to
> manage all domids and always sets domid when creating domain or the
> hypervisor needs to cooperate -- to have interface to reserve /
> pre-allocate domids.

I think this is entirely the wrong approach.

I think the right answer is that this is simply a bug in the
frontends.  frontends should cope if the backend path pointer in the
frontend directory is updated, and should start reading the new
backend instead.

I'm a bit surprised that this doesn't already work.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 15:22             ` Ian Jackson
@ 2018-04-30 16:16               ` Jason Cooper
  2018-04-30 16:26                 ` Ian Jackson
                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Jason Cooper @ 2018-04-30 16:16 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, xen-devel

Hi Ian,

On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > To implement reuse_domid in a sane way, either the toolstack needs to
> > manage all domids and always sets domid when creating domain or the
> > hypervisor needs to cooperate -- to have interface to reserve /
> > pre-allocate domids.
> 
> I think this is entirely the wrong approach.

Whew.  Glad I didn't start hacking yet...

> I think the right answer is that this is simply a bug in the
> frontends.  frontends should cope if the backend path pointer in the
> frontend directory is updated, and should start reading the new
> backend instead.

Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
"When a driver domain is rebooted (domid changed), previously connected
client domUs can't gain network connectivity to/through the driver
domain via 'xl network-attach client_domu mac=... bridge=...
backend=drv_dom'"

This is due to the fact that the frontend net driver doesn't / can't
follow the backend driver to the new domid in xenstore.

Does that sound right?

> I'm a bit surprised that this doesn't already work.

I'm currently running Xen 4.9.1 as patched in the standard Gentoo
ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
portage, until I nail this down.  I'm happy to move to 4.10 if needed.

Do you think this is something that is definitely fixed in a more recent
version of Xen?  I'm happy to test if so.  Is there a commit id I can
look for?


thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 16:16               ` Jason Cooper
@ 2018-04-30 16:26                 ` Ian Jackson
  2018-04-30 18:14                   ` Jason Cooper
  2018-04-30 16:38                 ` George Dunlap
  2018-05-01 11:50                 ` Wei Liu
  2 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2018-04-30 16:26 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Andrew Cooper, Wei Liu, xen-devel

Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > > To implement reuse_domid in a sane way, either the toolstack needs to
> > > manage all domids and always sets domid when creating domain or the
> > > hypervisor needs to cooperate -- to have interface to reserve /
> > > pre-allocate domids.
> > 
> > I think this is entirely the wrong approach.
> 
> Whew.  Glad I didn't start hacking yet...

Well, it might be that you end up having to use this fixed-domid thing
as a workaround :-/.

> > I think the right answer is that this is simply a bug in the
> > frontends.  frontends should cope if the backend path pointer in the
> > frontend directory is updated, and should start reading the new
> > backend instead.
> 
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"
> 
> This is due to the fact that the frontend net driver doesn't / can't
> follow the backend driver to the new domid in xenstore.

Yes.

> > I'm a bit surprised that this doesn't already work.
> 
> I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> 
> Do you think this is something that is definitely fixed in a more recent
> version of Xen?  I'm happy to test if so.  Is there a commit id I can
> look for?

I think that in my view (which others may disagree with) this is not a
bug in Xen but in the Linux kernel frontend.  So changing the Xen
version won't help.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 16:16               ` Jason Cooper
  2018-04-30 16:26                 ` Ian Jackson
@ 2018-04-30 16:38                 ` George Dunlap
  2018-04-30 18:17                   ` Jason Cooper
  2018-05-01 11:50                 ` Wei Liu
  2 siblings, 1 reply; 28+ messages in thread
From: George Dunlap @ 2018-04-30 16:38 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> Hi Ian,
>
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
>> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
>> > To implement reuse_domid in a sane way, either the toolstack needs to
>> > manage all domids and always sets domid when creating domain or the
>> > hypervisor needs to cooperate -- to have interface to reserve /
>> > pre-allocate domids.
>>
>> I think this is entirely the wrong approach.
>
> Whew.  Glad I didn't start hacking yet...
>
>> I think the right answer is that this is simply a bug in the
>> frontends.  frontends should cope if the backend path pointer in the
>> frontend directory is updated, and should start reading the new
>> backend instead.
>
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"

Hang on -- just to clarify, something like the following doesn't work
(or wouldn't, you suspect, work)?

* Start driver domain
* Start domU A with no network
* xl network-attach A backend=drv_dom
* [do some stuff]
* xl network-detach A [network devid]
* Restart driver domain
* xl network-attach A backend=drv_dom

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 16:26                 ` Ian Jackson
@ 2018-04-30 18:14                   ` Jason Cooper
  2018-05-01 11:20                     ` Wei Liu
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-04-30 18:14 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote:
> Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
...
> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > "When a driver domain is rebooted (domid changed), previously connected
> > client domUs can't gain network connectivity to/through the driver
> > domain via 'xl network-attach client_domu mac=... bridge=...
> > backend=drv_dom'"
> > 
> > This is due to the fact that the frontend net driver doesn't / can't
> > follow the backend driver to the new domid in xenstore.
> 
> Yes.
> 
> > > I'm a bit surprised that this doesn't already work.
> > 
> > I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> > ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> > portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> > 
> > Do you think this is something that is definitely fixed in a more recent
> > version of Xen?  I'm happy to test if so.  Is there a commit id I can
> > look for?
> 
> I think that in my view (which others may disagree with) this is not a
> bug in Xen but in the Linux kernel frontend.  So changing the Xen
> version won't help.

I'm running vanilla v4.16.4 based on allnoconfig in all of these
mini-domu's.  It doesn't look there's been any pertinent recent changes
in drivers/net/xen-netfront.c since v4.16.

Based on an initial scan of the code, it looks like xen-netback watches
for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in
connect()).  xen-netfront.c:1995-2036, netback_changed(), is the
registered callback for netfront.

Is the xenbus netback/netfront state machine documented anywhere?
include/xen/interface/io/netif.h has a great description of tx/rx queue
setup and teardown, but doesn't seem to have anything specific to the
high-level signalling that 'xl network-attach' would cause.

Any pointers?

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 16:38                 ` George Dunlap
@ 2018-04-30 18:17                   ` Jason Cooper
  2018-04-30 18:23                     ` Jason Cooper
  2018-05-01 10:25                     ` George Dunlap
  0 siblings, 2 replies; 28+ messages in thread
From: Jason Cooper @ 2018-04-30 18:17 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

Hi George,

On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> > Hi Ian,
> >
> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> >> > To implement reuse_domid in a sane way, either the toolstack needs to
> >> > manage all domids and always sets domid when creating domain or the
> >> > hypervisor needs to cooperate -- to have interface to reserve /
> >> > pre-allocate domids.
> >>
> >> I think this is entirely the wrong approach.
> >
> > Whew.  Glad I didn't start hacking yet...
> >
> >> I think the right answer is that this is simply a bug in the
> >> frontends.  frontends should cope if the backend path pointer in the
> >> frontend directory is updated, and should start reading the new
> >> backend instead.
> >
> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > "When a driver domain is rebooted (domid changed), previously connected
> > client domUs can't gain network connectivity to/through the driver
> > domain via 'xl network-attach client_domu mac=... bridge=...
> > backend=drv_dom'"
> 
> Hang on -- just to clarify, something like the following doesn't work
> (or wouldn't, you suspect, work)?
> 
> * Start driver domain
> * Start domU A with no network

My setup is different here.  I include the vif = [... backend=...]
declaration in my domain config.

> * xl network-attach A backend=drv_dom

So I don't do this step manually.

> * [do some stuff]
> * xl network-detach A [network devid]
> * Restart driver domain
> * xl network-attach A backend=drv_dom

Otherwise, this is all correct.  Then I get the NO-CARRIER in domU A.

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 18:17                   ` Jason Cooper
@ 2018-04-30 18:23                     ` Jason Cooper
  2018-05-01 10:25                     ` George Dunlap
  1 sibling, 0 replies; 28+ messages in thread
From: Jason Cooper @ 2018-04-30 18:23 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, xen-devel

correction:

On Mon, Apr 30, 2018 at 06:17:54PM +0000, Jason Cooper wrote:
> On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> > On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> > > Hi Ian,
> > >
> > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > >> > To implement reuse_domid in a sane way, either the toolstack needs to
> > >> > manage all domids and always sets domid when creating domain or the
> > >> > hypervisor needs to cooperate -- to have interface to reserve /
> > >> > pre-allocate domids.
> > >>
> > >> I think this is entirely the wrong approach.
> > >
> > > Whew.  Glad I didn't start hacking yet...
> > >
> > >> I think the right answer is that this is simply a bug in the
> > >> frontends.  frontends should cope if the backend path pointer in the
> > >> frontend directory is updated, and should start reading the new
> > >> backend instead.
> > >
> > > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > > "When a driver domain is rebooted (domid changed), previously connected
> > > client domUs can't gain network connectivity to/through the driver
> > > domain via 'xl network-attach client_domu mac=... bridge=...
> > > backend=drv_dom'"
> > 
> > Hang on -- just to clarify, something like the following doesn't work
> > (or wouldn't, you suspect, work)?
> > 
> > * Start driver domain
> > * Start domU A with no network
> 
> My setup is different here.  I include the vif = [... backend=...]
> declaration in my domain config.
> 
> > * xl network-attach A backend=drv_dom
> 
> So I don't do this step manually.
> 
> > * [do some stuff]
> > * xl network-detach A [network devid]
> > * Restart driver domain
> > * xl network-attach A backend=drv_dom
> 
> Otherwise, this is all correct.  Then I get the NO-CARRIER in domU A.

Sorry, I get NO-CARRIER in the just rebooted driver domain.  And the
interface is still UP in domU A.

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 18:17                   ` Jason Cooper
  2018-04-30 18:23                     ` Jason Cooper
@ 2018-05-01 10:25                     ` George Dunlap
  2018-05-01 12:37                       ` Jason Cooper
  1 sibling, 1 reply; 28+ messages in thread
From: George Dunlap @ 2018-05-01 10:25 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

On Mon, Apr 30, 2018 at 7:17 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> Hi George,
>
> On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
>> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <xen@lakedaemon.net> wrote:
>> > Hi Ian,
>> >
>> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
>> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
>> >> > To implement reuse_domid in a sane way, either the toolstack needs to
>> >> > manage all domids and always sets domid when creating domain or the
>> >> > hypervisor needs to cooperate -- to have interface to reserve /
>> >> > pre-allocate domids.
>> >>
>> >> I think this is entirely the wrong approach.
>> >
>> > Whew.  Glad I didn't start hacking yet...
>> >
>> >> I think the right answer is that this is simply a bug in the
>> >> frontends.  frontends should cope if the backend path pointer in the
>> >> frontend directory is updated, and should start reading the new
>> >> backend instead.
>> >
>> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
>> > "When a driver domain is rebooted (domid changed), previously connected
>> > client domUs can't gain network connectivity to/through the driver
>> > domain via 'xl network-attach client_domu mac=... bridge=...
>> > backend=drv_dom'"
>>
>> Hang on -- just to clarify, something like the following doesn't work
>> (or wouldn't, you suspect, work)?
>>
>> * Start driver domain
>> * Start domU A with no network
>
> My setup is different here.  I include the vif = [... backend=...]
> declaration in my domain config.
>
>> * xl network-attach A backend=drv_dom
>
> So I don't do this step manually.

Right, but you do the detach manually (as well as the subsequent
attach after the driver domain

>
>> * [do some stuff]
>> * xl network-detach A [network devid]
>> * Restart driver domain
>> * xl network-attach A backend=drv_dom
[snip]
> Sorry, I get NO-CARRIER in the just rebooted driver domain.  And the
> interface is still UP in domU A.

Wait, that sounds like a different problem than the one we thought you
were talking about.  You're saying that the driver domain is losing
connection to the *physical* network after reboot?  That sounds more
like an issue with PCI passthrough than with the PV networking
protocol.

So what happens if you do the following:

* Boot your driver domain (but don't connect any guests)
* From your driver domain, ping an off-host IP
* Reboot the driver domain
* Try pinging an off-host IP again

It sounds like maybe the second ping will fail?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 18:14                   ` Jason Cooper
@ 2018-05-01 11:20                     ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2018-05-01 11:20 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

On Mon, Apr 30, 2018 at 06:14:15PM +0000, Jason Cooper wrote:
> On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote:
> > Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> ...
> > > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > > "When a driver domain is rebooted (domid changed), previously connected
> > > client domUs can't gain network connectivity to/through the driver
> > > domain via 'xl network-attach client_domu mac=... bridge=...
> > > backend=drv_dom'"
> > > 
> > > This is due to the fact that the frontend net driver doesn't / can't
> > > follow the backend driver to the new domid in xenstore.
> > 
> > Yes.
> > 
> > > > I'm a bit surprised that this doesn't already work.
> > > 
> > > I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> > > ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> > > portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> > > 
> > > Do you think this is something that is definitely fixed in a more recent
> > > version of Xen?  I'm happy to test if so.  Is there a commit id I can
> > > look for?
> > 
> > I think that in my view (which others may disagree with) this is not a
> > bug in Xen but in the Linux kernel frontend.  So changing the Xen
> > version won't help.
> 
> I'm running vanilla v4.16.4 based on allnoconfig in all of these
> mini-domu's.  It doesn't look there's been any pertinent recent changes
> in drivers/net/xen-netfront.c since v4.16.
> 
> Based on an initial scan of the code, it looks like xen-netback watches
> for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in
> connect()).  xen-netfront.c:1995-2036, netback_changed(), is the
> registered callback for netfront.
> 
> Is the xenbus netback/netfront state machine documented anywhere?
> include/xen/interface/io/netif.h has a great description of tx/rx queue
> setup and teardown, but doesn't seem to have anything specific to the
> high-level signalling that 'xl network-attach' would cause.
> 

Netback state machine is in
drivers/net/xen-netback/xenbus.c:set_backend_state.

But honestly I don't think that solves the general issue. It is a bit
unfortunately that Xen drivers don't have a unified state machine.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-27 17:27           ` Jason Cooper
@ 2018-05-01 11:29             ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2018-05-01 11:29 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Fri, Apr 27, 2018 at 05:27:29PM +0000, Jason Cooper wrote:
> Hi Wei Liu,
> 
> On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote:
> > On Fri, Apr 27, 2018 at 04:14:16PM +0000, Jason Cooper wrote:
> > > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> ...
> > > > xc_domain_create() takes a domid value by pointer.  Passing a value
> > > > other than zero will cause Xen to use that domid, rather than by
> > > > searching for the next free domid.
> > > > 
> > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > > index b5e27a7..7866092 100644
> > > > --- a/tools/libxl/libxl_create.c
> > > > +++ b/tools/libxl/libxl_create.c
> > > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > > > libxl_domain_config *d_config,
> > > >              goto out;
> > > >          }
> > > >  
> > > > +        *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> > > >          ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
> > > >                                 &xc_config);
> > > >          if (ret < 0) {
> > > > 
> > > > This gross hack may get you somewhere (Entirely untested).
> > > 
> > > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > > series adding a 'domid' field to the domain config file would be
> > > rejected outright?  That would allow callers of xl to use key=value for
> > > reboot scripts like mine, and also allow for a static domid setup of the
> > > driver domains if folks want that.
> > 
> > Seems a bit  hacky to me. You also need to reserve a set of domids
> > before hand?
> 
> My thought of creating a domid config file variable was to do just as
> you say, reserve specific domids for specific guests.  I could even
> trigger an error if domid is set when driver_domain isn't.
> 
> Actually, I could slightly overload driver_domain, changing from a bool
> to a 'static domid'.  0 = not a driver domain, >0 is it's static domid
> assignment.
> 
> For backwards compatibility, 1 = next domid available, and >1 would be
> the static domid.  I'm not sure if I like that though.
> 
> The racey part is when a driver domain is shut down, how does a create
> thread know that that domid is reserved?

If a driver domain shuts down and another domain gets allocated that
domain id, your whole system is hosed.

It is even worse if you consider the security implication: some
potentially malicious guest can impersonate driver domain and sees what
other guests' data.

> 
> third option, tri-state:
> 
> driver_domain = 0   # not a driver domain
> driver_domain = 1   # is a driver domain, use next avail domid
> driver_domain = 2   # is a driver domain, re-use domid
> 

Let's shelve this UI discussion for now. I will have a look at the other
subthread.

Wei.

> Honestly, I'm not really liking any of these.  Perhaps 'xl
> network-detach ...' should be doing a better job of cleaning up?  Or,
> 'xl network-attach ...' should do a better job of re-attaching?
> 
> thx,
> 
> Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-04-30 16:16               ` Jason Cooper
  2018-04-30 16:26                 ` Ian Jackson
  2018-04-30 16:38                 ` George Dunlap
@ 2018-05-01 11:50                 ` Wei Liu
  2018-05-01 12:49                   ` Jason Cooper
  2 siblings, 1 reply; 28+ messages in thread
From: Wei Liu @ 2018-05-01 11:50 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

On Mon, Apr 30, 2018 at 04:16:09PM +0000, Jason Cooper wrote:
> Hi Ian,
> 
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > > To implement reuse_domid in a sane way, either the toolstack needs to
> > > manage all domids and always sets domid when creating domain or the
> > > hypervisor needs to cooperate -- to have interface to reserve /
> > > pre-allocate domids.
> > 
> > I think this is entirely the wrong approach.
> 
> Whew.  Glad I didn't start hacking yet...
> 
> > I think the right answer is that this is simply a bug in the
> > frontends.  frontends should cope if the backend path pointer in the
> > frontend directory is updated, and should start reading the new
> > backend instead.
> 
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"

This seems to be different from what I originally understood. I thought
you were just expecting the frontend to reconnect automatically.

At the risk of asking the obvious question: drv_dom is the name not
numeric domid, right?

> 
> This is due to the fact that the frontend net driver doesn't / can't
> follow the backend driver to the new domid in xenstore.
> 

This is strange. A new udev event should be initiated in DomU. It will
then scans xenstore for a _new_ network device. There should be a new
device from DomU's PoV, which means it doesn't need to know what backend
domid is. This should be already handled by core xenbus driver.

Also "backend-id" is already in a device's xenstore tree.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-01 10:25                     ` George Dunlap
@ 2018-05-01 12:37                       ` Jason Cooper
  2018-05-01 12:53                         ` Jason Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-05-01 12:37 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, xen-devel

Morning George,

On Tue, May 01, 2018 at 11:25:06AM +0100, George Dunlap wrote:
> On Mon, Apr 30, 2018 at 7:17 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> > On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> >> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <xen@lakedaemon.net> wrote:
> >> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> >> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> >> >> > To implement reuse_domid in a sane way, either the toolstack needs to
> >> >> > manage all domids and always sets domid when creating domain or the
> >> >> > hypervisor needs to cooperate -- to have interface to reserve /
> >> >> > pre-allocate domids.
> >> >>
> >> >> I think this is entirely the wrong approach.
> >> >
> >> > Whew.  Glad I didn't start hacking yet...
> >> >
> >> >> I think the right answer is that this is simply a bug in the
> >> >> frontends.  frontends should cope if the backend path pointer in the
> >> >> frontend directory is updated, and should start reading the new
> >> >> backend instead.
> >> >
> >> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> >> > "When a driver domain is rebooted (domid changed), previously connected
> >> > client domUs can't gain network connectivity to/through the driver
> >> > domain via 'xl network-attach client_domu mac=... bridge=...
> >> > backend=drv_dom'"
> >>
> >> Hang on -- just to clarify, something like the following doesn't work
> >> (or wouldn't, you suspect, work)?
> >>
> >> * Start driver domain
> >> * Start domU A with no network
> >
> > My setup is different here.  I include the vif = [... backend=...]
> > declaration in my domain config.
> >
> >> * xl network-attach A backend=drv_dom
> >
> > So I don't do this step manually.
> 
> Right, but you do the detach manually (as well as the subsequent
> attach after the driver domain
> 
> >
> >> * [do some stuff]
> >> * xl network-detach A [network devid]
> >> * Restart driver domain
> >> * xl network-attach A backend=drv_dom
> [snip]
> > Sorry, I get NO-CARRIER in the just rebooted driver domain.  And the
> > interface is still UP in domU A.
> 
> Wait, that sounds like a different problem than the one we thought you
> were talking about.  You're saying that the driver domain is losing
> connection to the *physical* network after reboot?

No, this has nothing to do with the physical nic that is
pic-passthrough'd.  It's as my subject line says: vifX.Y gets
NO-CARRIER.  Here's a snippet from 'ip link'

12: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff

> So what happens if you do the following:
> 
> * Boot your driver domain (but don't connect any guests)
> * From your driver domain, ping an off-host IP
> * Reboot the driver domain
> * Try pinging an off-host IP again
> 
> It sounds like maybe the second ping will fail?

I assume this is for debugging the (hopefully clarified) non-existent
problem with pci-passthrough.  fwiw, this particular driver domain is in
the middle of the diagram I did earlier in the thread.  It's a netfront
client to a driver domain which does have the pci-passthrough.

When I was first digging into this, I started a thread on xen-users [1],
I've attached my xl-reboot.sh script here so you can see exactly what
I'm attempting to do:

------------------------------->8----------------------------------------
#!/bin/bash

if [ $# -ne 1 ]; then
	echo >&2 "Usage: ${0##*/} domain"
	exit 1
fi

DOM="$1"

# get the domain id
DOMID="`xl domid $DOM`"
[[ "$DOMID" =~ (^[0-9]+$) ]] || exit 1

tmp="`mktemp`"

# loop through frontends
while read frontend <&4; do
	while read vif <&5; do
		if [ "x$vif" = "x" ]; then
			# stale frontend
			echo >&2 "WARN: stale frontend ($frontend), removing"
			xenstore-rm /local/domain/$DOMID/backend/vif/$frontend
			continue
		fi

		# store info for afterwards
		front="`xl domname $frontend`"
		bridge="`xenstore-read /local/domain/$DOMID/backend/vif/$frontend/$vif/bridge`"
		if [ "x$front" != "x" ] && [[ "$bridge" =~ (br[0-9][0-9]*) ]]; then
			echo "$front bridge=$bridge backend=$DOM" >>"$tmp"

			# remove the vif
			echo >&2 "Removing $vif from $front"
			xl -f network-detach $front $vif
		fi
	done 5< <(xenstore-list /local/domain/$DOMID/backend/vif/$frontend)
done 4< <(xenstore-list /local/domain/$DOMID/backend/vif)

# reboot the domain
xl shutdown -w $DOM || exit 2
sleep 1
xl create -c $DOM || exit 3

if [ "`cat $tmp | wc -c`" -eq 0 ]; then
	rm -f $tmp
	exit 0
fi

# reattach everything
echo
while read ln <&4; do
	echo >&2 "re-attach [$ln]"
	xl network-attach $ln || exit 4
done 4< <(cat $tmp)

rm -f $tmp

exit 0
------------------------------->8----------------------------------------

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-01 11:50                 ` Wei Liu
@ 2018-05-01 12:49                   ` Jason Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Cooper @ 2018-05-01 12:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Jackson, Andrew Cooper, xen-devel

Hi Wei Liu,

On Tue, May 01, 2018 at 12:50:13PM +0100, Wei Liu wrote:
> On Mon, Apr 30, 2018 at 04:16:09PM +0000, Jason Cooper wrote:
> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
...
> > > I think the right answer is that this is simply a bug in the
> > > frontends.  frontends should cope if the backend path pointer in the
> > > frontend directory is updated, and should start reading the new
> > > backend instead.
> > 
> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > "When a driver domain is rebooted (domid changed), previously connected
> > client domUs can't gain network connectivity to/through the driver
> > domain via 'xl network-attach client_domu mac=... bridge=...
> > backend=drv_dom'"
> 
> This seems to be different from what I originally understood. I thought
> you were just expecting the frontend to reconnect automatically.

When I call 'xl network-attach ... backend=drv_dom', yes.

> At the risk of asking the obvious question: drv_dom is the name not
> numeric domid, right?

Correct.

> > This is due to the fact that the frontend net driver doesn't / can't
> > follow the backend driver to the new domid in xenstore.
> > 
> 
> This is strange. A new udev event should be initiated in DomU. It will
> then scans xenstore for a _new_ network device. There should be a new
> device from DomU's PoV, which means it doesn't need to know what backend
> domid is. This should be already handled by core xenbus driver.

So, when I do 'xl network-detach ...; xl reboot drv_dom; xl
network-attach ...', that should be the equivalent of pulling out the
network card for the DomU?  I was envisioning it to be more akin to
unplugging the network cable and then plugging it back in.

My rootfs is >90% busybox atm, so I'm using mdev.  To my knowledge, I've
setup the mdev hotplug scripts correctly and set mdev as the binary for
the kernel to call on hotplug events.  So it *should* be removing and
adding the device if called.

ftr, the full DomUs, Gentoo and Debian both exhibit this problem as
clients to drv_dom, and I've not messed with their network / hotplug
setups.  Other than standard network configuration.

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-01 12:37                       ` Jason Cooper
@ 2018-05-01 12:53                         ` Jason Cooper
  2018-05-04 22:13                           ` Rich Persaud
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-05-01 12:53 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Jackson, Wei Liu, xen-devel, Andrew Cooper

add the link to xen-users thread of me talking to myself.  :-))

On Tue, May 01, 2018 at 12:37:51PM +0000, Jason Cooper wrote:
> When I was first digging into this, I started a thread on xen-users [1],
> I've attached my xl-reboot.sh script here so you can see exactly what
> I'm attempting to do:

[1] https://marc.info/?l=xen-users&m=152389443206023&w=2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-01 12:53                         ` Jason Cooper
@ 2018-05-04 22:13                           ` Rich Persaud
  2018-05-04 23:03                             ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Persaud @ 2018-05-04 22:13 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Wei Liu, jandryuk, Andrew Cooper, George Dunlap, marmarek,
	xen-devel, Ian Jackson

> On May 1, 2018, at 08:53, Jason Cooper <xen@lakedaemon.net> wrote:
> 
> add the link to xen-users thread of me talking to myself.  :-))
> 
>> On Tue, May 01, 2018 at 12:37:51PM +0000, Jason Cooper wrote:
>> When I was first digging into this, I started a thread on xen-users [1],
>> I've attached my xl-reboot.sh script here so you can see exactly what
>> I'm attempting to do:
> 
> [1] https://marc.info/?l=xen-users&m=152389443206023&w=2

You may want to look at the code (toolstack and/or frontend-backend drivers) for Qubes and OpenXT, both of which use network driver domains and support wired/wireless networks.  

Operational restart of a measured, non-persistent driver domain (instead of host) is a benefit of Xen disaggregation architectures.

Rich
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-04 22:13                           ` Rich Persaud
@ 2018-05-04 23:03                             ` Marek Marczykowski-Górecki
  2018-05-06 15:45                               ` Jason Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-05-04 23:03 UTC (permalink / raw)
  To: Rich Persaud
  Cc: Wei Liu, jandryuk, Andrew Cooper, George Dunlap, Jason Cooper,
	xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 2278 bytes --]

On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
> > On May 1, 2018, at 08:53, Jason Cooper <xen@lakedaemon.net> wrote:
> > 
> > add the link to xen-users thread of me talking to myself.  :-))
> > 
> >> On Tue, May 01, 2018 at 12:37:51PM +0000, Jason Cooper wrote:
> >> When I was first digging into this, I started a thread on xen-users [1],
> >> I've attached my xl-reboot.sh script here so you can see exactly what
> >> I'm attempting to do:
> > 
> > [1] https://marc.info/?l=xen-users&m=152389443206023&w=2
> 
> You may want to look at the code (toolstack and/or frontend-backend drivers) for Qubes and OpenXT, both of which use network driver domains and support wired/wireless networks.  
> 
> Operational restart of a measured, non-persistent driver domain (instead of host) is a benefit of Xen disaggregation architectures.

In Qubes, on backend restart, we do equivalent of xl network-detach &&
xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
any place to plug such script, but we use libvirt which provide events.
Also, we have full control over domain config (libvirt XML), so don't
need to extract vif list from xenstore...

The problem you describe looks related to
https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
There was also related libxl patch:
https://xen.markmail.org/thread/6qbgmwyjqsshjus7
(but it applies to the case where you first shutdown backend and only
then do xl network-detach)

Do you have xl devd running in your driver domain? Without that xl
network-attach wont work (AFAIR udev isn't used here anymore).

Also note that backend shutdown/restart/crash was a source of many
problems in frontend kernel and toolstack in the past. Even simple
dynamic network-attach/detach sometimes is problematic for the frontend.
Links:
https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
problem)
https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
+ libvirt)
https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
problem)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-04 23:03                             ` Marek Marczykowski-Górecki
@ 2018-05-06 15:45                               ` Jason Cooper
  2018-05-07 12:04                                 ` Jason Andryuk
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2018-05-06 15:45 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Wei Liu, jandryuk, Andrew Cooper, George Dunlap, Rich Persaud,
	xen-devel, Ian Jackson

Hi Marek,

On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote:
> On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
> > > On May 1, 2018, at 08:53, Jason Cooper <xen@lakedaemon.net> wrote:
> > > 
> > > add the link to xen-users thread of me talking to myself.  :-))
> > > 
> > >> On Tue, May 01, 2018 at 12:37:51PM +0000, Jason Cooper wrote:
> > >> When I was first digging into this, I started a thread on xen-users [1],
> > >> I've attached my xl-reboot.sh script here so you can see exactly what
> > >> I'm attempting to do:
> > > 
> > > [1] https://marc.info/?l=xen-users&m=152389443206023&w=2
> > 
> > You may want to look at the code (toolstack and/or frontend-backend
> > drivers) for Qubes and OpenXT, both of which use network driver
> > domains and support wired/wireless networks.  
> > 
> > Operational restart of a measured, non-persistent driver domain
> > (instead of host) is a benefit of Xen disaggregation architectures.
> 
> In Qubes, on backend restart, we do equivalent of xl network-detach &&
> xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
> any place to plug such script, but we use libvirt which provide events.
> Also, we have full control over domain config (libvirt XML), so don't
> need to extract vif list from xenstore...
> 
> The problem you describe looks related to
> https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
> There was also related libxl patch:
> https://xen.markmail.org/thread/6qbgmwyjqsshjus7
> (but it applies to the case where you first shutdown backend and only
> then do xl network-detach)
> 
> Do you have xl devd running in your driver domain? Without that xl
> network-attach wont work (AFAIR udev isn't used here anymore).

Yes, I've now modified the init script (xendomains in Gentoo) to create
a key /tool/vmstatus/$domname/status, start the domU, loop until it gets
it's domid, and -chmod the key.  It then does a -watch on that key.  In
the domU, *after* xl devd is started, it writes "online" to that key.

This allows me to automatically bring up the driver domains, and make
sure they're ready for connections before proceeding to booting the next
VM.  This only occurs when the host boots.

After the driver domains are up, the rest of the domains are started in
parallel.

> Also note that backend shutdown/restart/crash was a source of many
> problems in frontend kernel and toolstack in the past. Even simple
> dynamic network-attach/detach sometimes is problematic for the frontend.
> Links:
> https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
> problem)
> https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
> + libvirt)
> https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
> problem)

Mmm, clearly the state machine and it's implementation needs some
review.  I'm building v4.16.7 and we'll see how it goes for my usecase.

Thanks for all the pointers!

thx,

Jason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: reboot driver domain, vifX.Y = NO-CARRIER?
  2018-05-06 15:45                               ` Jason Cooper
@ 2018-05-07 12:04                                 ` Jason Andryuk
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-05-07 12:04 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Wei Liu, Andrew Cooper, George Dunlap,
	Marek Marczykowski-Górecki, Rich Persaud, Ian Jackson,
	xen-devel

Hi Jason,

On Sun, May 6, 2018 at 11:45 AM, Jason Cooper <xen@lakedaemon.net> wrote:
> Hi Marek,
>
> On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote:
>> On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
>> > > On May 1, 2018, at 08:53, Jason Cooper <xen@lakedaemon.net> wrote:
>> > >
>> > > add the link to xen-users thread of me talking to myself.  :-))
>> > >
>> > >> On Tue, May 01, 2018 at 12:37:51PM +0000, Jason Cooper wrote:
>> > >> When I was first digging into this, I started a thread on xen-users [1],
>> > >> I've attached my xl-reboot.sh script here so you can see exactly what
>> > >> I'm attempting to do:
>> > >
>> > > [1] https://marc.info/?l=xen-users&m=152389443206023&w=2
>> >
>> > You may want to look at the code (toolstack and/or frontend-backend
>> > drivers) for Qubes and OpenXT, both of which use network driver
>> > domains and support wired/wireless networks.
>> >
>> > Operational restart of a measured, non-persistent driver domain
>> > (instead of host) is a benefit of Xen disaggregation architectures.
>>
>> In Qubes, on backend restart, we do equivalent of xl network-detach &&
>> xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
>> any place to plug such script, but we use libvirt which provide events.
>> Also, we have full control over domain config (libvirt XML), so don't
>> need to extract vif list from xenstore...

OpenXT does the xl network-detach && xl network-attach in its own
daemon: https://github.com/OpenXT/network/blob/master/nwd/Main.hs#L767

>> The problem you describe looks related to
>> https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
>> There was also related libxl patch:
>> https://xen.markmail.org/thread/6qbgmwyjqsshjus7
>> (but it applies to the case where you first shutdown backend and only
>> then do xl network-detach)
>>
>> Do you have xl devd running in your driver domain? Without that xl
>> network-attach wont work (AFAIR udev isn't used here anymore).
>
> Yes, I've now modified the init script (xendomains in Gentoo) to create
> a key /tool/vmstatus/$domname/status, start the domU, loop until it gets
> it's domid, and -chmod the key.  It then does a -watch on that key.  In
> the domU, *after* xl devd is started, it writes "online" to that key.
>
> This allows me to automatically bring up the driver domains, and make
> sure they're ready for connections before proceeding to booting the next
> VM.  This only occurs when the host boots.
>
> After the driver domains are up, the rest of the domains are started in
> parallel.
>
>> Also note that backend shutdown/restart/crash was a source of many
>> problems in frontend kernel and toolstack in the past. Even simple
>> dynamic network-attach/detach sometimes is problematic for the frontend.
>> Links:
>> https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
>> problem)
>> https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
>> + libvirt)
>> https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
>> problem)
>
> Mmm, clearly the state machine and it's implementation needs some
> review.  I'm building v4.16.7 and we'll see how it goes for my usecase.

OpenXT has some patches for reconnecting netfront after the netback
domain is rebooted to a new domid:
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/netfront-support-backend-relocate.patch
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/xenbus-move-otherend-watches-on-relocate.patch

I'm too familiar with those, so they may be specific to the OpenXT
networking code.

Jason, when you see the vif NO-CARRIER, how do the frontend and
backend XenStore entries look?  Do the domids matchup and is the pair
in state 4 -> XenbusStateConnected?

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-05-07 12:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 15:03 reboot driver domain, vifX.Y = NO-CARRIER? Jason Cooper
2018-04-27 15:11 ` Andrew Cooper
2018-04-27 15:35   ` Jason Cooper
2018-04-27 15:52     ` Andrew Cooper
2018-04-27 16:14       ` Jason Cooper
2018-04-27 16:58         ` Wei Liu
2018-04-27 17:27           ` Jason Cooper
2018-05-01 11:29             ` Wei Liu
2018-04-27 17:02         ` Andrew Cooper
2018-04-27 17:13           ` Wei Liu
2018-04-30 15:22             ` Ian Jackson
2018-04-30 16:16               ` Jason Cooper
2018-04-30 16:26                 ` Ian Jackson
2018-04-30 18:14                   ` Jason Cooper
2018-05-01 11:20                     ` Wei Liu
2018-04-30 16:38                 ` George Dunlap
2018-04-30 18:17                   ` Jason Cooper
2018-04-30 18:23                     ` Jason Cooper
2018-05-01 10:25                     ` George Dunlap
2018-05-01 12:37                       ` Jason Cooper
2018-05-01 12:53                         ` Jason Cooper
2018-05-04 22:13                           ` Rich Persaud
2018-05-04 23:03                             ` Marek Marczykowski-Górecki
2018-05-06 15:45                               ` Jason Cooper
2018-05-07 12:04                                 ` Jason Andryuk
2018-05-01 11:50                 ` Wei Liu
2018-05-01 12:49                   ` Jason Cooper
2018-04-27 16:56       ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.