All of lore.kernel.org
 help / color / mirror / Atom feed
* opening tap devices that are created in a container
@ 2018-07-05 14:20 Jason Baron
  2018-07-05 16:10 ` Daniel P. Berrangé
  2018-07-05 16:24 ` [libvirt] " Roman Mohr
  0 siblings, 2 replies; 11+ messages in thread
From: Jason Baron @ 2018-07-05 14:20 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, libvir-list, rmohr, fabiand, berrange,
	Eric W. Biederman

Hi,

Opening tap devices, such as macvtap, that are created in containers is
problematic because the interface for opening tap devices is via
/dev/tapNN and devtmpfs is not typically mounted inside a container as
its not namespace aware. It is possible to do a mknod() in the
container, once the tap devices are created, however, since the tap
devices are created dynamically its not possible to apriori allow access
to certain major/minor numbers, since we don't know what these are going
to be. In addition, its desirable to not allow the mknod capability in
containers. This behavior, I think is somewhat inconsistent with the
tuntap driver where one can create tuntap devices inside a container by
first opening /dev/net/tun and then using them by supplying the tuntap
device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
network namespace, one is limited to opening network devices that belong
to your current network namespace.

Here are some options to this issue, that I wanted to get feedback
about, and just wondering if anybody else has run into this.

1)

Don't create the tap device, such as macvtap in the container. Instead,
create the tap device outside of the container and then move it into the
desired container network namespace. In addition, do a mknod() for the
corresponding /dev/tapNN device from outside the container before doing
chroot().

This solution still doesn't allow tap devices to be created inside the
container. Thus, in the case of kubevirt, which runs libvirtd inside of
a container, it would mean changing libvirtd to open existing tap
devices (as opposed to the current behavior of creating new ones). This
would not require any kernel changes, but as mentioned seems
inconsistent with the tuntap interface.

2)

Add a new kernel interface for tap devices similar to how /dev/net/tun
currently works. It might be nice to use TUNSETIFF for tap devices, but
because tap devices have different fops they can't be easily switched
after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where
the tap device name is supplied and a new fd (distinct from the fd
returned by the open of /dev/net/tun) is returned as an output field as
part of the new ioctl parameter.

It may not make sense to have this new ioctl call for /dev/net/tun since
its really about opening a tap device, so it may make sense to introduce
it as part of a new device, such as /dev/net/tap. This new ioctl could
be used for macvtap and ipvtap (or any tap device). I think it might
also improve performance for tuntap devices themselves, if they are
opened this way since currently all tun operations such as read() and
write() take a reference count on the underlying tuntap device, since it
can be changed via TUNSETIFF. I tested this interface out, so I can
provide the kernel changes if that's helpful for clarification.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: opening tap devices that are created in a container
  2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
@ 2018-07-05 16:10 ` Daniel P. Berrangé
  2018-07-09 20:56   ` Jason Baron
  2018-07-05 16:24 ` [libvirt] " Roman Mohr
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel P. Berrangé @ 2018-07-05 16:10 UTC (permalink / raw)
  To: Jason Baron
  Cc: netdev, David S. Miller, libvir-list, rmohr, Fabian Deutsch,
	Eric W. Biederman

On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> Hi,
> 
> Opening tap devices, such as macvtap, that are created in containers is
> problematic because the interface for opening tap devices is via
> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> its not namespace aware. It is possible to do a mknod() in the
> container, once the tap devices are created, however, since the tap
> devices are created dynamically its not possible to apriori allow access
> to certain major/minor numbers, since we don't know what these are going
> to be. In addition, its desirable to not allow the mknod capability in
> containers. This behavior, I think is somewhat inconsistent with the
> tuntap driver where one can create tuntap devices inside a container by
> first opening /dev/net/tun and then using them by supplying the tuntap
> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> network namespace, one is limited to opening network devices that belong
> to your current network namespace.
> 
> Here are some options to this issue, that I wanted to get feedback
> about, and just wondering if anybody else has run into this.
> 
> 1)
> 
> Don't create the tap device, such as macvtap in the container. Instead,
> create the tap device outside of the container and then move it into the
> desired container network namespace. In addition, do a mknod() for the
> corresponding /dev/tapNN device from outside the container before doing
> chroot().
> 
> This solution still doesn't allow tap devices to be created inside the
> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> a container, it would mean changing libvirtd to open existing tap
> devices (as opposed to the current behavior of creating new ones). This
> would not require any kernel changes, but as mentioned seems
> inconsistent with the tuntap interface.

Presumably the /dev/tapNN  device name also changes when you rename
the tap device interface using SIOCSIFNAME ?

eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
when moving it into the container, it would be /dev/eth0 inside the
container ?

Anyway, given that this /dev/tapNN approach is what exists today,
libvirt will likely want to implement support for this regardless
in order to support existing kernels.

> 2)
> 
> Add a new kernel interface for tap devices similar to how /dev/net/tun
> currently works. It might be nice to use TUNSETIFF for tap devices, but
> because tap devices have different fops they can't be easily switched
> after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where
> the tap device name is supplied and a new fd (distinct from the fd
> returned by the open of /dev/net/tun) is returned as an output field as
> part of the new ioctl parameter.
> 
> It may not make sense to have this new ioctl call for /dev/net/tun since
> its really about opening a tap device, so it may make sense to introduce
> it as part of a new device, such as /dev/net/tap. This new ioctl could
> be used for macvtap and ipvtap (or any tap device). I think it might
> also improve performance for tuntap devices themselves, if they are
> opened this way since currently all tun operations such as read() and
> write() take a reference count on the underlying tuntap device, since it
> can be changed via TUNSETIFF. I tested this interface out, so I can
> provide the kernel changes if that's helpful for clarification.

Either /dev/net/tun wit new ioctl, or /dev/net/tap with TNUSETIFF
would be workable from libvirt's POV.

One slight complication with either of the solutions above is that
libvirt won't know whether it is given a TAP or a MACVTAP device.
It'll only be given the device name. So with code today we would
probably have to first try /dev/tapNNN and if that doesn't exist
then try /dev/net/tun with TUNSETIFF.

If adding a new /dev/net/tap, something could seemlessy accept
either a TAP or MACTAP nic name would be nice.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
  2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
  2018-07-05 16:10 ` Daniel P. Berrangé
@ 2018-07-05 16:24 ` Roman Mohr
  2018-07-08  6:01   ` Martin Kletzander
  1 sibling, 1 reply; 11+ messages in thread
From: Roman Mohr @ 2018-07-05 16:24 UTC (permalink / raw)
  To: jbaron; +Cc: fabiand, libvir-list, netdev, ebiederm, davem


[-- Attachment #1.1: Type: text/plain, Size: 3886 bytes --]

On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:

> Hi,
>
> Opening tap devices, such as macvtap, that are created in containers is
> problematic because the interface for opening tap devices is via
> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> its not namespace aware. It is possible to do a mknod() in the
> container, once the tap devices are created, however, since the tap
> devices are created dynamically its not possible to apriori allow access
> to certain major/minor numbers, since we don't know what these are going
> to be. In addition, its desirable to not allow the mknod capability in
> containers. This behavior, I think is somewhat inconsistent with the
> tuntap driver where one can create tuntap devices inside a container by
> first opening /dev/net/tun and then using them by supplying the tuntap
> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> network namespace, one is limited to opening network devices that belong
> to your current network namespace.
>
> Here are some options to this issue, that I wanted to get feedback
> about, and just wondering if anybody else has run into this.
>
> 1)
>
> Don't create the tap device, such as macvtap in the container. Instead,
> create the tap device outside of the container and then move it into the
> desired container network namespace. In addition, do a mknod() for the
> corresponding /dev/tapNN device from outside the container before doing
> chroot().
>
> This solution still doesn't allow tap devices to be created inside the
> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> a container, it would mean changing libvirtd to open existing tap
> devices (as opposed to the current behavior of creating new ones). This
> would not require any kernel changes, but as mentioned seems
> inconsistent with the tuntap interface.
>

For KubeVirt, apart from how exactly the device ends up in the container, I
would want to pursue a way where all network preparations which require
privileges happens from a privileged process *outside* of the container.
Like CNI solutions do it. They run outside, have privileges and then create
devices in the right network/mount namespace or move them there. The final
goal for KubeVirt is that our pod with the qemu process is completely
unprivileged and privileged setup happens from outside.

As a consequence, and depending on which route Dan pursues with the
restructured libvirt, I would assume that either a privileged libvirtd-part
outside of containers creates the devices by entering the right namespaces,
or that libvirt in the container can consume pre-created tun/tap devices,
like qemu.

Best Regards,
Roman


>
> 2)
>
> Add a new kernel interface for tap devices similar to how /dev/net/tun
> currently works. It might be nice to use TUNSETIFF for tap devices, but
> because tap devices have different fops they can't be easily switched
> after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where
> the tap device name is supplied and a new fd (distinct from the fd
> returned by the open of /dev/net/tun) is returned as an output field as
> part of the new ioctl parameter.
>
> It may not make sense to have this new ioctl call for /dev/net/tun since
> its really about opening a tap device, so it may make sense to introduce
> it as part of a new device, such as /dev/net/tap. This new ioctl could
> be used for macvtap and ipvtap (or any tap device). I think it might
> also improve performance for tuntap devices themselves, if they are
> opened this way since currently all tun operations such as read() and
> write() take a reference count on the underlying tuntap device, since it
> can be changed via TUNSETIFF. I tested this interface out, so I can
> provide the kernel changes if that's helpful for clarification.
>
> Thanks,
>
> -Jason
>

[-- Attachment #1.2: Type: text/html, Size: 4539 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
  2018-07-05 16:24 ` [libvirt] " Roman Mohr
@ 2018-07-08  6:01   ` Martin Kletzander
  2018-07-09 21:00     ` Jason Baron
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Kletzander @ 2018-07-08  6:01 UTC (permalink / raw)
  To: Roman Mohr; +Cc: jbaron, fabiand, libvir-list, netdev, ebiederm, davem

[-- Attachment #1: Type: text/plain, Size: 3379 bytes --]

On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
>On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
>
>> Hi,
>>
>> Opening tap devices, such as macvtap, that are created in containers is
>> problematic because the interface for opening tap devices is via
>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
>> its not namespace aware. It is possible to do a mknod() in the
>> container, once the tap devices are created, however, since the tap
>> devices are created dynamically its not possible to apriori allow access
>> to certain major/minor numbers, since we don't know what these are going
>> to be. In addition, its desirable to not allow the mknod capability in
>> containers. This behavior, I think is somewhat inconsistent with the
>> tuntap driver where one can create tuntap devices inside a container by
>> first opening /dev/net/tun and then using them by supplying the tuntap
>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
>> network namespace, one is limited to opening network devices that belong
>> to your current network namespace.
>>
>> Here are some options to this issue, that I wanted to get feedback
>> about, and just wondering if anybody else has run into this.
>>
>> 1)
>>
>> Don't create the tap device, such as macvtap in the container. Instead,
>> create the tap device outside of the container and then move it into the
>> desired container network namespace. In addition, do a mknod() for the
>> corresponding /dev/tapNN device from outside the container before doing
>> chroot().
>>
>> This solution still doesn't allow tap devices to be created inside the
>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
>> a container, it would mean changing libvirtd to open existing tap
>> devices (as opposed to the current behavior of creating new ones). This
>> would not require any kernel changes, but as mentioned seems
>> inconsistent with the tuntap interface.
>>
>
>For KubeVirt, apart from how exactly the device ends up in the container, I
>would want to pursue a way where all network preparations which require
>privileges happens from a privileged process *outside* of the container.
>Like CNI solutions do it. They run outside, have privileges and then create
>devices in the right network/mount namespace or move them there. The final
>goal for KubeVirt is that our pod with the qemu process is completely
>unprivileged and privileged setup happens from outside.
>
>As a consequence, and depending on which route Dan pursues with the
>restructured libvirt, I would assume that either a privileged libvirtd-part
>outside of containers creates the devices by entering the right namespaces,
>or that libvirt in the container can consume pre-created tun/tap devices,
>like qemu.
>

That would be nice, but as far as I understand there will always be a need for
some privileges if you want to use a tap device.  It's nice that CNI does that
and all the containers can run unprivileged, but that's because they do not open
the tap device and they do not do any privileged operations on it.  But QEMU
needs to.  So the only way would be passing an opened fd to the container or
opening the tap device there and making the fd usable for one process in the
container.  Is this already supported for some type of containers in some way?

Martin

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: opening tap devices that are created in a container
  2018-07-05 16:10 ` Daniel P. Berrangé
@ 2018-07-09 20:56   ` Jason Baron
  2018-07-10  8:46     ` Daniel P. Berrangé
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Baron @ 2018-07-09 20:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: netdev, David S. Miller, libvir-list, rmohr, Fabian Deutsch,
	Eric W. Biederman



On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
>> Hi,
>>
>> Opening tap devices, such as macvtap, that are created in containers is
>> problematic because the interface for opening tap devices is via
>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
>> its not namespace aware. It is possible to do a mknod() in the
>> container, once the tap devices are created, however, since the tap
>> devices are created dynamically its not possible to apriori allow access
>> to certain major/minor numbers, since we don't know what these are going
>> to be. In addition, its desirable to not allow the mknod capability in
>> containers. This behavior, I think is somewhat inconsistent with the
>> tuntap driver where one can create tuntap devices inside a container by
>> first opening /dev/net/tun and then using them by supplying the tuntap
>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
>> network namespace, one is limited to opening network devices that belong
>> to your current network namespace.
>>
>> Here are some options to this issue, that I wanted to get feedback
>> about, and just wondering if anybody else has run into this.
>>
>> 1)
>>
>> Don't create the tap device, such as macvtap in the container. Instead,
>> create the tap device outside of the container and then move it into the
>> desired container network namespace. In addition, do a mknod() for the
>> corresponding /dev/tapNN device from outside the container before doing
>> chroot().
>>
>> This solution still doesn't allow tap devices to be created inside the
>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
>> a container, it would mean changing libvirtd to open existing tap
>> devices (as opposed to the current behavior of creating new ones). This
>> would not require any kernel changes, but as mentioned seems
>> inconsistent with the tuntap interface.
> 
> Presumably the /dev/tapNN  device name also changes when you rename
> the tap device interface using SIOCSIFNAME ?
> 

I don't think so. the NN is the ifindex of the device- changing the
device name does not affect the ifindex.

> eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> when moving it into the container, it would be /dev/eth0 inside the
> container ?
> 

When moving it into the container the ifindex can change since the
ifindex range is per-namespace (not global).

> Anyway, given that this /dev/tapNN approach is what exists today,
> libvirt will likely want to implement support for this regardless
> in order to support existing kernels.

Ok, in this case whatever created the tap device outside of the
container would pass the name of the device to libvirt and make sure
that the /dev/tapNN device was setup correctly in the container. I
believe this differs from how libvirt works today in that libvirt would
need to be modified to open an existing device (I think it currently
always creates new ones).

> 
>> 2)
>>
>> Add a new kernel interface for tap devices similar to how /dev/net/tun
>> currently works. It might be nice to use TUNSETIFF for tap devices, but
>> because tap devices have different fops they can't be easily switched
>> after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where
>> the tap device name is supplied and a new fd (distinct from the fd
>> returned by the open of /dev/net/tun) is returned as an output field as
>> part of the new ioctl parameter.
>>
>> It may not make sense to have this new ioctl call for /dev/net/tun since
>> its really about opening a tap device, so it may make sense to introduce
>> it as part of a new device, such as /dev/net/tap. This new ioctl could
>> be used for macvtap and ipvtap (or any tap device). I think it might
>> also improve performance for tuntap devices themselves, if they are
>> opened this way since currently all tun operations such as read() and
>> write() take a reference count on the underlying tuntap device, since it
>> can be changed via TUNSETIFF. I tested this interface out, so I can
>> provide the kernel changes if that's helpful for clarification.
> 
> Either /dev/net/tun wit new ioctl, or /dev/net/tap with TNUSETIFF
> would be workable from libvirt's POV.
>

So the TUNSETIFF interface isn't ideal from a kernel performance pov,
because it means that the read and writes paths have to take a reference
to the underlying device (since it can be changed out asynchronously).
So the interface I was proposing was a new ioctl that could return a new
fd (not the one return by the initial open()).

> One slight complication with either of the solutions above is that
> libvirt won't know whether it is given a TAP or a MACVTAP device.
> It'll only be given the device name. So with code today we would
> probably have to first try /dev/tapNNN and if that doesn't exist
> then try /dev/net/tun with TUNSETIFF.
>

hmmm. doesn't libvirt make this distinction today?


> If adding a new /dev/net/tap, something could seemlessy accept
> either a TAP or MACTAP nic name would be nice.
> 
>

I think if we added a new ioctl() as I proposed it could accept either
type of nic.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
  2018-07-08  6:01   ` Martin Kletzander
@ 2018-07-09 21:00     ` Jason Baron
  2018-07-10  8:47       ` Daniel P. Berrangé
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jason Baron @ 2018-07-09 21:00 UTC (permalink / raw)
  To: Martin Kletzander, Roman Mohr
  Cc: libvir-list, fabiand, davem, ebiederm, netdev



On 07/08/2018 02:01 AM, Martin Kletzander wrote:
> On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
>> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
>>
>>> Hi,
>>>
>>> Opening tap devices, such as macvtap, that are created in containers is
>>> problematic because the interface for opening tap devices is via
>>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
>>> its not namespace aware. It is possible to do a mknod() in the
>>> container, once the tap devices are created, however, since the tap
>>> devices are created dynamically its not possible to apriori allow access
>>> to certain major/minor numbers, since we don't know what these are going
>>> to be. In addition, its desirable to not allow the mknod capability in
>>> containers. This behavior, I think is somewhat inconsistent with the
>>> tuntap driver where one can create tuntap devices inside a container by
>>> first opening /dev/net/tun and then using them by supplying the tuntap
>>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
>>> network namespace, one is limited to opening network devices that belong
>>> to your current network namespace.
>>>
>>> Here are some options to this issue, that I wanted to get feedback
>>> about, and just wondering if anybody else has run into this.
>>>
>>> 1)
>>>
>>> Don't create the tap device, such as macvtap in the container. Instead,
>>> create the tap device outside of the container and then move it into the
>>> desired container network namespace. In addition, do a mknod() for the
>>> corresponding /dev/tapNN device from outside the container before doing
>>> chroot().
>>>
>>> This solution still doesn't allow tap devices to be created inside the
>>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
>>> a container, it would mean changing libvirtd to open existing tap
>>> devices (as opposed to the current behavior of creating new ones). This
>>> would not require any kernel changes, but as mentioned seems
>>> inconsistent with the tuntap interface.
>>>
>>
>> For KubeVirt, apart from how exactly the device ends up in the
>> container, I
>> would want to pursue a way where all network preparations which require
>> privileges happens from a privileged process *outside* of the container.
>> Like CNI solutions do it. They run outside, have privileges and then
>> create
>> devices in the right network/mount namespace or move them there. The
>> final
>> goal for KubeVirt is that our pod with the qemu process is completely
>> unprivileged and privileged setup happens from outside.
>>
>> As a consequence, and depending on which route Dan pursues with the
>> restructured libvirt, I would assume that either a privileged
>> libvirtd-part
>> outside of containers creates the devices by entering the right
>> namespaces,
>> or that libvirt in the container can consume pre-created tun/tap devices,
>> like qemu.
>>
> 
> That would be nice, but as far as I understand there will always be a
> need for
> some privileges if you want to use a tap device.  It's nice that CNI
> does that
> and all the containers can run unprivileged, but that's because they do
> not open
> the tap device and they do not do any privileged operations on it.  But
> QEMU
> needs to.  So the only way would be passing an opened fd to the
> container or
> opening the tap device there and making the fd usable for one process in
> the
> container.  Is this already supported for some type of containers in
> some way?
> 
> Martin

Hi,

So another option here call it #3 is to pass open fds via unix sockets.
If there are privileged operations that QEMU is trying to do with the fd
though, how will opening it first and then passing it to an unprivileged
QEMU address that? Is the opener doing those operations first?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: opening tap devices that are created in a container
  2018-07-09 20:56   ` Jason Baron
@ 2018-07-10  8:46     ` Daniel P. Berrangé
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel P. Berrangé @ 2018-07-10  8:46 UTC (permalink / raw)
  To: Jason Baron
  Cc: netdev, David S. Miller, libvir-list, rmohr, Fabian Deutsch,
	Eric W. Biederman

On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote:
> 
> 
> On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> >> Hi,
> >>
> >> Opening tap devices, such as macvtap, that are created in containers is
> >> problematic because the interface for opening tap devices is via
> >> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >> its not namespace aware. It is possible to do a mknod() in the
> >> container, once the tap devices are created, however, since the tap
> >> devices are created dynamically its not possible to apriori allow access
> >> to certain major/minor numbers, since we don't know what these are going
> >> to be. In addition, its desirable to not allow the mknod capability in
> >> containers. This behavior, I think is somewhat inconsistent with the
> >> tuntap driver where one can create tuntap devices inside a container by
> >> first opening /dev/net/tun and then using them by supplying the tuntap
> >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >> network namespace, one is limited to opening network devices that belong
> >> to your current network namespace.
> >>
> >> Here are some options to this issue, that I wanted to get feedback
> >> about, and just wondering if anybody else has run into this.
> >>
> >> 1)
> >>
> >> Don't create the tap device, such as macvtap in the container. Instead,
> >> create the tap device outside of the container and then move it into the
> >> desired container network namespace. In addition, do a mknod() for the
> >> corresponding /dev/tapNN device from outside the container before doing
> >> chroot().
> >>
> >> This solution still doesn't allow tap devices to be created inside the
> >> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >> a container, it would mean changing libvirtd to open existing tap
> >> devices (as opposed to the current behavior of creating new ones). This
> >> would not require any kernel changes, but as mentioned seems
> >> inconsistent with the tuntap interface.
> > 
> > Presumably the /dev/tapNN  device name also changes when you rename
> > the tap device interface using SIOCSIFNAME ?
> > 
> 
> I don't think so. the NN is the ifindex of the device- changing the
> device name does not affect the ifindex.

Ah right that makes sense. 

> > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> > when moving it into the container, it would be /dev/eth0 inside the
> > container ?
> > 
> 
> When moving it into the container the ifindex can change since the
> ifindex range is per-namespace (not global).

Oh thats interesting, I hadn't realized that.

> > Anyway, given that this /dev/tapNN approach is what exists today,
> > libvirt will likely want to implement support for this regardless
> > in order to support existing kernels.
> 
> Ok, in this case whatever created the tap device outside of the
> container would pass the name of the device to libvirt and make sure
> that the /dev/tapNN device was setup correctly in the container. I
> believe this differs from how libvirt works today in that libvirt would
> need to be modified to open an existing device (I think it currently
> always creates new ones).

Libvirt can use a pre-created TAP device today, but not a pre-created
MACVTAP, so supporting the latter is new code for us no matter what.

> > One slight complication with either of the solutions above is that
> > libvirt won't know whether it is given a TAP or a MACVTAP device.
> > It'll only be given the device name. So with code today we would
> > probably have to first try /dev/tapNNN and if that doesn't exist
> > then try /dev/net/tun with TUNSETIFF.
> >
> 
> hmmm. doesn't libvirt make this distinction today?

No need to make the distinction yet, since we only support pre-created
TAP devices right now. In cases where we create the devices ourselves,
we already know what is what.

> > If adding a new /dev/net/tap, something could seemlessy accept
> > either a TAP or MACTAP nic name would be nice.
> > 
> >
> 
> I think if we added a new ioctl() as I proposed it could accept either
> type of nic.

ok that would be nice.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
  2018-07-09 21:00     ` Jason Baron
@ 2018-07-10  8:47       ` Daniel P. Berrangé
  2018-07-17 11:45       ` Martin Kletzander
       [not found]       ` <20180711101005.GA13392@wheatley>
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel P. Berrangé @ 2018-07-10  8:47 UTC (permalink / raw)
  To: Jason Baron
  Cc: fabiand, libvir-list, netdev, Roman Mohr, ebiederm,
	Martin Kletzander, davem

On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote:
> 
> 
> On 07/08/2018 02:01 AM, Martin Kletzander wrote:
> > On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
> >> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Opening tap devices, such as macvtap, that are created in containers is
> >>> problematic because the interface for opening tap devices is via
> >>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >>> its not namespace aware. It is possible to do a mknod() in the
> >>> container, once the tap devices are created, however, since the tap
> >>> devices are created dynamically its not possible to apriori allow access
> >>> to certain major/minor numbers, since we don't know what these are going
> >>> to be. In addition, its desirable to not allow the mknod capability in
> >>> containers. This behavior, I think is somewhat inconsistent with the
> >>> tuntap driver where one can create tuntap devices inside a container by
> >>> first opening /dev/net/tun and then using them by supplying the tuntap
> >>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >>> network namespace, one is limited to opening network devices that belong
> >>> to your current network namespace.
> >>>
> >>> Here are some options to this issue, that I wanted to get feedback
> >>> about, and just wondering if anybody else has run into this.
> >>>
> >>> 1)
> >>>
> >>> Don't create the tap device, such as macvtap in the container. Instead,
> >>> create the tap device outside of the container and then move it into the
> >>> desired container network namespace. In addition, do a mknod() for the
> >>> corresponding /dev/tapNN device from outside the container before doing
> >>> chroot().
> >>>
> >>> This solution still doesn't allow tap devices to be created inside the
> >>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >>> a container, it would mean changing libvirtd to open existing tap
> >>> devices (as opposed to the current behavior of creating new ones). This
> >>> would not require any kernel changes, but as mentioned seems
> >>> inconsistent with the tuntap interface.
> >>>
> >>
> >> For KubeVirt, apart from how exactly the device ends up in the
> >> container, I
> >> would want to pursue a way where all network preparations which require
> >> privileges happens from a privileged process *outside* of the container.
> >> Like CNI solutions do it. They run outside, have privileges and then
> >> create
> >> devices in the right network/mount namespace or move them there. The
> >> final
> >> goal for KubeVirt is that our pod with the qemu process is completely
> >> unprivileged and privileged setup happens from outside.
> >>
> >> As a consequence, and depending on which route Dan pursues with the
> >> restructured libvirt, I would assume that either a privileged
> >> libvirtd-part
> >> outside of containers creates the devices by entering the right
> >> namespaces,
> >> or that libvirt in the container can consume pre-created tun/tap devices,
> >> like qemu.
> >>
> > 
> > That would be nice, but as far as I understand there will always be a
> > need for
> > some privileges if you want to use a tap device.  It's nice that CNI
> > does that
> > and all the containers can run unprivileged, but that's because they do
> > not open
> > the tap device and they do not do any privileged operations on it.  But
> > QEMU
> > needs to.  So the only way would be passing an opened fd to the
> > container or
> > opening the tap device there and making the fd usable for one process in
> > the
> > container.  Is this already supported for some type of containers in
> > some way?
> > 
> > Martin
> 
> Hi,
> 
> So another option here call it #3 is to pass open fds via unix sockets.
> If there are privileged operations that QEMU is trying to do with the fd
> though, how will opening it first and then passing it to an unprivileged
> QEMU address that? Is the opener doing those operations first?

>From libvirt's POV, it would be preferrable to be able to open the
macvtap device by name inside the container, rather than having to
accept a pre-opened FD from the application.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
       [not found]       ` <20180711101005.GA13392@wheatley>
@ 2018-07-12  3:33         ` Jason Baron
  2018-07-17 11:58         ` Roman Mohr
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Baron @ 2018-07-12  3:33 UTC (permalink / raw)
  To: nert
  Cc: fabiand, libvir-list, netdev, Roman Mohr, ebiederm, davem, Laine Stump



On 07/11/2018 06:10 AM, nert@wheatley wrote:
> On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote:
>>
>>
>> On 07/08/2018 02:01 AM, Martin Kletzander wrote:
>>> On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
>>>> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Opening tap devices, such as macvtap, that are created in
>>>>> containers is
>>>>> problematic because the interface for opening tap devices is via
>>>>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
>>>>> its not namespace aware. It is possible to do a mknod() in the
>>>>> container, once the tap devices are created, however, since the tap
>>>>> devices are created dynamically its not possible to apriori allow
>>>>> access
>>>>> to certain major/minor numbers, since we don't know what these are
>>>>> going
>>>>> to be. In addition, its desirable to not allow the mknod capability in
>>>>> containers. This behavior, I think is somewhat inconsistent with the
>>>>> tuntap driver where one can create tuntap devices inside a
>>>>> container by
>>>>> first opening /dev/net/tun and then using them by supplying the tuntap
>>>>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates
>>>>> the
>>>>> network namespace, one is limited to opening network devices that
>>>>> belong
>>>>> to your current network namespace.
>>>>>
>>>>> Here are some options to this issue, that I wanted to get feedback
>>>>> about, and just wondering if anybody else has run into this.
>>>>>
>>>>> 1)
>>>>>
>>>>> Don't create the tap device, such as macvtap in the container.
>>>>> Instead,
>>>>> create the tap device outside of the container and then move it
>>>>> into the
>>>>> desired container network namespace. In addition, do a mknod() for the
>>>>> corresponding /dev/tapNN device from outside the container before
>>>>> doing
>>>>> chroot().
>>>>>
>>>>> This solution still doesn't allow tap devices to be created inside the
>>>>> container. Thus, in the case of kubevirt, which runs libvirtd
>>>>> inside of
>>>>> a container, it would mean changing libvirtd to open existing tap
>>>>> devices (as opposed to the current behavior of creating new ones).
>>>>> This
>>>>> would not require any kernel changes, but as mentioned seems
>>>>> inconsistent with the tuntap interface.
>>>>>
>>>>
>>>> For KubeVirt, apart from how exactly the device ends up in the
>>>> container, I
>>>> would want to pursue a way where all network preparations which require
>>>> privileges happens from a privileged process *outside* of the
>>>> container.
>>>> Like CNI solutions do it. They run outside, have privileges and then
>>>> create
>>>> devices in the right network/mount namespace or move them there. The
>>>> final
>>>> goal for KubeVirt is that our pod with the qemu process is completely
>>>> unprivileged and privileged setup happens from outside.
>>>>
>>>> As a consequence, and depending on which route Dan pursues with the
>>>> restructured libvirt, I would assume that either a privileged
>>>> libvirtd-part
>>>> outside of containers creates the devices by entering the right
>>>> namespaces,
>>>> or that libvirt in the container can consume pre-created tun/tap
>>>> devices,
>>>> like qemu.
>>>>
>>>
>>> That would be nice, but as far as I understand there will always be a
>>> need for
>>> some privileges if you want to use a tap device.  It's nice that CNI
>>> does that
>>> and all the containers can run unprivileged, but that's because they do
>>> not open
>>> the tap device and they do not do any privileged operations on it.  But
>>> QEMU
>>> needs to.  So the only way would be passing an opened fd to the
>>> container or
>>> opening the tap device there and making the fd usable for one process in
>>> the
>>> container.  Is this already supported for some type of containers in
>>> some way?
>>>
>>> Martin
>>
>> Hi,
>>
>> So another option here call it #3 is to pass open fds via unix sockets.
>> If there are privileged operations that QEMU is trying to do with the fd
>> though, how will opening it first and then passing it to an unprivileged
>> QEMU address that? Is the opener doing those operations first?
>>
> 
> Sorry for the confusion, but QEMU is not doing any privileged
> operations.  I got
> confused by the fact that anyone can open and do a R/W on a tap device. 
> But it
> looks like that's on purpose.  No capabilities are needed for opening
> /dev/net/tun and calling ioctl(TUNSETIFF) with existing name and then
> doing R/W
> operations on it.  It just works.
> 
> Correct me if I'm wrong, but to sum it all up, the only things that we
> need to
> figure out (which might possibly be solved by ideas in the other thread)
> are:
> 
> tap:
> - Existence of /dev/net/tun
> - Having permissions to open it (0666 by default, shouldn't be a nig deal)
> - Knowing the device name
> 
> macvtap:
> - Existence of /dev/tapXX
> - Having permissions to open /dev/tapXX
> - One of the following:
>  - Knowing the device name (and being able to translate it using a
> netlink socket)
>  - Knowing the the device index
> 

Right - from the device name one can grab the device index using
SIOCGIFINDEX and then use that to access /dev/tap{device index}. Since
devtmpfs is not mounted in containers (since its not namespaced) and
mknod() is I think often not allowed, the mknod() has to happen by a
privileged process when the the container is being created. In addition,
libvirtd would need to be changed to open this existing device
(currently it only opens macvtap devices that it creates). This is
option #1.

> The rest should be an implementation detail.
> 
> Am I right?  Did I miss anything?

I don't think so, I'm interested if option #1 is workable or if there is
interest in option #2 which is to do something like /dev/net/tun in the
kernel for macvtap devices.

Since as Daniel pointed out something like option #1 is going to be
needed anyways to work on older kernels it seems like the best option?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
  2018-07-09 21:00     ` Jason Baron
  2018-07-10  8:47       ` Daniel P. Berrangé
@ 2018-07-17 11:45       ` Martin Kletzander
       [not found]       ` <20180711101005.GA13392@wheatley>
  2 siblings, 0 replies; 11+ messages in thread
From: Martin Kletzander @ 2018-07-17 11:45 UTC (permalink / raw)
  To: Jason Baron
  Cc: Roman Mohr, fdeutsch, libvir-list, netdev, ebiederm, davem, Laine Stump

[-- Attachment #1: Type: text/plain, Size: 5144 bytes --]

[Not sure who got this message, but it probably didn't get anywhere due to one
 mailserver, so resending to make sure]

On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote:
>On 07/08/2018 02:01 AM, Martin Kletzander wrote:
>> On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
>>> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Opening tap devices, such as macvtap, that are created in containers is
>>>> problematic because the interface for opening tap devices is via
>>>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
>>>> its not namespace aware. It is possible to do a mknod() in the
>>>> container, once the tap devices are created, however, since the tap
>>>> devices are created dynamically its not possible to apriori allow access
>>>> to certain major/minor numbers, since we don't know what these are going
>>>> to be. In addition, its desirable to not allow the mknod capability in
>>>> containers. This behavior, I think is somewhat inconsistent with the
>>>> tuntap driver where one can create tuntap devices inside a container by
>>>> first opening /dev/net/tun and then using them by supplying the tuntap
>>>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
>>>> network namespace, one is limited to opening network devices that belong
>>>> to your current network namespace.
>>>>
>>>> Here are some options to this issue, that I wanted to get feedback
>>>> about, and just wondering if anybody else has run into this.
>>>>
>>>> 1)
>>>>
>>>> Don't create the tap device, such as macvtap in the container. Instead,
>>>> create the tap device outside of the container and then move it into the
>>>> desired container network namespace. In addition, do a mknod() for the
>>>> corresponding /dev/tapNN device from outside the container before doing
>>>> chroot().
>>>>
>>>> This solution still doesn't allow tap devices to be created inside the
>>>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
>>>> a container, it would mean changing libvirtd to open existing tap
>>>> devices (as opposed to the current behavior of creating new ones). This
>>>> would not require any kernel changes, but as mentioned seems
>>>> inconsistent with the tuntap interface.
>>>>
>>>
>>> For KubeVirt, apart from how exactly the device ends up in the
>>> container, I
>>> would want to pursue a way where all network preparations which require
>>> privileges happens from a privileged process *outside* of the container.
>>> Like CNI solutions do it. They run outside, have privileges and then
>>> create
>>> devices in the right network/mount namespace or move them there. The
>>> final
>>> goal for KubeVirt is that our pod with the qemu process is completely
>>> unprivileged and privileged setup happens from outside.
>>>
>>> As a consequence, and depending on which route Dan pursues with the
>>> restructured libvirt, I would assume that either a privileged
>>> libvirtd-part
>>> outside of containers creates the devices by entering the right
>>> namespaces,
>>> or that libvirt in the container can consume pre-created tun/tap devices,
>>> like qemu.
>>>
>>
>> That would be nice, but as far as I understand there will always be a
>> need for
>> some privileges if you want to use a tap device.  It's nice that CNI
>> does that
>> and all the containers can run unprivileged, but that's because they do
>> not open
>> the tap device and they do not do any privileged operations on it.  But
>> QEMU
>> needs to.  So the only way would be passing an opened fd to the
>> container or
>> opening the tap device there and making the fd usable for one process in
>> the
>> container.  Is this already supported for some type of containers in
>> some way?
>>
>> Martin
>
>Hi,
>
>So another option here call it #3 is to pass open fds via unix sockets.
>If there are privileged operations that QEMU is trying to do with the fd
>though, how will opening it first and then passing it to an unprivileged
>QEMU address that? Is the opener doing those operations first?
>

Sorry for the confusion, but QEMU is not doing any privileged operations.  I got
confused by the fact that anyone can open and do a R/W on a tap device.  But it
looks like that's on purpose.  No capabilities are needed for opening
/dev/net/tun and calling ioctl(TUNSETIFF) with existing name and then doing R/W
operations on it.  It just works.

Correct me if I'm wrong, but to sum it all up, the only things that we need to
figure out (which might possibly be solved by ideas in the other thread) are:

tap:
- Existence of /dev/net/tun
- Having permissions to open it (0666 by default, shouldn't be a nig deal)
- Knowing the device name

macvtap:
- Existence of /dev/tapXX
- Having permissions to open /dev/tapXX
- One of the following:
  - Knowing the device name (and being able to translate it using a netlink socket)
  - Knowing the the device index

The rest should be an implementation detail.

Am I right?  Did I miss anything?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [libvirt] opening tap devices that are created in a container
       [not found]       ` <20180711101005.GA13392@wheatley>
  2018-07-12  3:33         ` Jason Baron
@ 2018-07-17 11:58         ` Roman Mohr
  1 sibling, 0 replies; 11+ messages in thread
From: Roman Mohr @ 2018-07-17 11:58 UTC (permalink / raw)
  To: Martin Kletzander
  Cc: fabiand, libvir-list, netdev, jbaron, ebiederm, davem, laine


[-- Attachment #1.1: Type: text/plain, Size: 5460 bytes --]

On Wed, Jul 11, 2018 at 12:10 PM <nert@wheatley> wrote:

> On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote:
> >
> >
> >On 07/08/2018 02:01 AM, Martin Kletzander wrote:
> >> On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
> >>> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Opening tap devices, such as macvtap, that are created in containers
> is
> >>>> problematic because the interface for opening tap devices is via
> >>>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >>>> its not namespace aware. It is possible to do a mknod() in the
> >>>> container, once the tap devices are created, however, since the tap
> >>>> devices are created dynamically its not possible to apriori allow
> access
> >>>> to certain major/minor numbers, since we don't know what these are
> going
> >>>> to be. In addition, its desirable to not allow the mknod capability in
> >>>> containers. This behavior, I think is somewhat inconsistent with the
> >>>> tuntap driver where one can create tuntap devices inside a container
> by
> >>>> first opening /dev/net/tun and then using them by supplying the tuntap
> >>>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates
> the
> >>>> network namespace, one is limited to opening network devices that
> belong
> >>>> to your current network namespace.
> >>>>
> >>>> Here are some options to this issue, that I wanted to get feedback
> >>>> about, and just wondering if anybody else has run into this.
> >>>>
> >>>> 1)
> >>>>
> >>>> Don't create the tap device, such as macvtap in the container.
> Instead,
> >>>> create the tap device outside of the container and then move it into
> the
> >>>> desired container network namespace. In addition, do a mknod() for the
> >>>> corresponding /dev/tapNN device from outside the container before
> doing
> >>>> chroot().
> >>>>
> >>>> This solution still doesn't allow tap devices to be created inside the
> >>>> container. Thus, in the case of kubevirt, which runs libvirtd inside
> of
> >>>> a container, it would mean changing libvirtd to open existing tap
> >>>> devices (as opposed to the current behavior of creating new ones).
> This
> >>>> would not require any kernel changes, but as mentioned seems
> >>>> inconsistent with the tuntap interface.
> >>>>
> >>>
> >>> For KubeVirt, apart from how exactly the device ends up in the
> >>> container, I
> >>> would want to pursue a way where all network preparations which require
> >>> privileges happens from a privileged process *outside* of the
> container.
> >>> Like CNI solutions do it. They run outside, have privileges and then
> >>> create
> >>> devices in the right network/mount namespace or move them there. The
> >>> final
> >>> goal for KubeVirt is that our pod with the qemu process is completely
> >>> unprivileged and privileged setup happens from outside.
> >>>
> >>> As a consequence, and depending on which route Dan pursues with the
> >>> restructured libvirt, I would assume that either a privileged
> >>> libvirtd-part
> >>> outside of containers creates the devices by entering the right
> >>> namespaces,
> >>> or that libvirt in the container can consume pre-created tun/tap
> devices,
> >>> like qemu.
> >>>
> >>
> >> That would be nice, but as far as I understand there will always be a
> >> need for
> >> some privileges if you want to use a tap device.  It's nice that CNI
> >> does that
> >> and all the containers can run unprivileged, but that's because they do
> >> not open
> >> the tap device and they do not do any privileged operations on it.  But
> >> QEMU
> >> needs to.  So the only way would be passing an opened fd to the
> >> container or
> >> opening the tap device there and making the fd usable for one process in
> >> the
> >> container.  Is this already supported for some type of containers in
> >> some way?
> >>
> >> Martin
> >
> >Hi,
> >
> >So another option here call it #3 is to pass open fds via unix sockets.
> >If there are privileged operations that QEMU is trying to do with the fd
> >though, how will opening it first and then passing it to an unprivileged
> >QEMU address that? Is the opener doing those operations first?
> >
>
> Sorry for the confusion, but QEMU is not doing any privileged operations.
> I got
> confused by the fact that anyone can open and do a R/W on a tap device.
> But it
> looks like that's on purpose.  No capabilities are needed for opening
> /dev/net/tun and calling ioctl(TUNSETIFF) with existing name and then
> doing R/W
> operations on it.  It just works.
>
> Correct me if I'm wrong, but to sum it all up, the only things that we
> need to
> figure out (which might possibly be solved by ideas in the other thread)
> are:
>
> tap:
> - Existence of /dev/net/tun
> - Having permissions to open it (0666 by default, shouldn't be a nig deal)
> - Knowing the device name
>
> macvtap:
> - Existence of /dev/tapXX
> - Having permissions to open /dev/tapXX
> - One of the following:
>   - Knowing the device name (and being able to translate it using a
> netlink socket)
>   - Knowing the the device index
>
> The rest should be an implementation detail.
>
> Am I right?  Did I miss anything?


At least from the KubeVirt use-case that sounds to be the things which we
would need to solve the networking setup in a similar way like the
Container Network Interface implementations solve the setup in k8s.

Best Regards,
Roman

[-- Attachment #1.2: Type: text/html, Size: 6899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-17 12:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
2018-07-05 16:10 ` Daniel P. Berrangé
2018-07-09 20:56   ` Jason Baron
2018-07-10  8:46     ` Daniel P. Berrangé
2018-07-05 16:24 ` [libvirt] " Roman Mohr
2018-07-08  6:01   ` Martin Kletzander
2018-07-09 21:00     ` Jason Baron
2018-07-10  8:47       ` Daniel P. Berrangé
2018-07-17 11:45       ` Martin Kletzander
     [not found]       ` <20180711101005.GA13392@wheatley>
2018-07-12  3:33         ` Jason Baron
2018-07-17 11:58         ` Roman Mohr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.