From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Subject: Re: opening tap devices that are created in a container Date: Thu, 5 Jul 2018 17:10:15 +0100 Message-ID: <20180705160938.GK3814@redhat.com> References: <6a8d7673-0ed7-5920-cc3a-d5d68dbc547c@akamai.com> Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: "netdev@vger.kernel.org" , "David S. Miller" , libvir-list@redhat.com, rmohr@redhat.com, Fabian Deutsch , "Eric W. Biederman" To: Jason Baron Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:58222 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753801AbeGEQKU (ORCPT ); Thu, 5 Jul 2018 12:10:20 -0400 Content-Disposition: inline In-Reply-To: <6a8d7673-0ed7-5920-cc3a-d5d68dbc547c@akamai.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote: > Hi, > > Opening tap devices, such as macvtap, that are created in containers is > problematic because the interface for opening tap devices is via > /dev/tapNN and devtmpfs is not typically mounted inside a container as > its not namespace aware. It is possible to do a mknod() in the > container, once the tap devices are created, however, since the tap > devices are created dynamically its not possible to apriori allow access > to certain major/minor numbers, since we don't know what these are going > to be. In addition, its desirable to not allow the mknod capability in > containers. This behavior, I think is somewhat inconsistent with the > tuntap driver where one can create tuntap devices inside a container by > first opening /dev/net/tun and then using them by supplying the tuntap > device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the > network namespace, one is limited to opening network devices that belong > to your current network namespace. > > Here are some options to this issue, that I wanted to get feedback > about, and just wondering if anybody else has run into this. > > 1) > > Don't create the tap device, such as macvtap in the container. Instead, > create the tap device outside of the container and then move it into the > desired container network namespace. In addition, do a mknod() for the > corresponding /dev/tapNN device from outside the container before doing > chroot(). > > This solution still doesn't allow tap devices to be created inside the > container. Thus, in the case of kubevirt, which runs libvirtd inside of > a container, it would mean changing libvirtd to open existing tap > devices (as opposed to the current behavior of creating new ones). This > would not require any kernel changes, but as mentioned seems > inconsistent with the tuntap interface. Presumably the /dev/tapNN device name also changes when you rename the tap device interface using SIOCSIFNAME ? eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0) when moving it into the container, it would be /dev/eth0 inside the container ? Anyway, given that this /dev/tapNN approach is what exists today, libvirt will likely want to implement support for this regardless in order to support existing kernels. > 2) > > Add a new kernel interface for tap devices similar to how /dev/net/tun > currently works. It might be nice to use TUNSETIFF for tap devices, but > because tap devices have different fops they can't be easily switched > after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where > the tap device name is supplied and a new fd (distinct from the fd > returned by the open of /dev/net/tun) is returned as an output field as > part of the new ioctl parameter. > > It may not make sense to have this new ioctl call for /dev/net/tun since > its really about opening a tap device, so it may make sense to introduce > it as part of a new device, such as /dev/net/tap. This new ioctl could > be used for macvtap and ipvtap (or any tap device). I think it might > also improve performance for tuntap devices themselves, if they are > opened this way since currently all tun operations such as read() and > write() take a reference count on the underlying tuntap device, since it > can be changed via TUNSETIFF. I tested this interface out, so I can > provide the kernel changes if that's helpful for clarification. Either /dev/net/tun wit new ioctl, or /dev/net/tap with TNUSETIFF would be workable from libvirt's POV. One slight complication with either of the solutions above is that libvirt won't know whether it is given a TAP or a MACVTAP device. It'll only be given the device name. So with code today we would probably have to first try /dev/tapNNN and if that doesn't exist then try /dev/net/tun with TUNSETIFF. If adding a new /dev/net/tap, something could seemlessy accept either a TAP or MACTAP nic name would be nice. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|