Jason Gunthorpe wrote on 01/20/2016 08:17:01 PM: > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > Jason Gunthorpe wrote on 01/19/2016 > > 06:51:07 PM: > > > > > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > > > + pdev = platform_device_register_simple("tpm_vtpm", > vtpm_dev->dev_num, > > > > + NULL, 0); > > > > > > This seems strange, something like this should not be creating > > > platform_devices. > > Should it be a misc_device then ? > > No. Check what other virtual devices are doing.. register_chrdev maybe? > > Generally speaking, the tpm core should not insist on a parent device > the way it does. That is an artifact of the clean up I did of the > drivers that was never fully compelted. To make the driver migration > easier the TPM core is using devm in places it probably shouldn't, but > unwinding that requires more difficult all-driver changes. > > If you get stuck here you will have to fix the core code. > > > > > + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, > > > > + &vtpm_tpm_ops, > > > > + VTPM_DEV_PREFIX_CLIENT"%d"); > > > > > > I agree with Jarkko I don't see why these deserve special names.. > > I did that because it becomes confusing when there are many /dev/tpm%d > > type of devices. This way we can distinguish between a hardware > > /dev/tpm%d and all these types of devices. > > That isn't really consistent with how linux kernel naming works these > days. Use a udev rule in userspace if you really want this. Ok, then I will just try to call them /dev/tpm%d. > > > > Presumably some namespace magic can be used to make them show up as > > > tpm0 in a container? > > The magic is to have the container management stack create the device > > pair. From the ioctl it learns the name of the devices that were > > created and it then finds out about the major/minor number of the > > created device and have /dev/tpm0 with that major/minor created in the > > container's /dev directory. > > Except that isn't good enough - the IMA kernel side doesn't know that this > tpm is now acting as the 'main' 'default' TPM. Hooking the vTPM to IMA requires another patch that I haven't shown since IMA namespacing isn't public yet. Basically we implement another ioctl() that is to be called before the clone() in order to 'reserve' a vtpm device pair for the calling process. During the clone() call IMA namespacing code can query the vTPM driver for a 'reserved' device pair. Hooking IMA up after the clone() may also work but in case of docker/golang it's better to do this before since the language libraries do a lot after the clone automatically. > > > The issue with sysfs is that sysfs is showing directory entries for all > > vTPMs in all containers. > > That is not a problem to solve here. The sysfs will also show entires > for hardware tpms too. You need somekind of namespace related solution > that works for all tpms. > > I don't know what became of it, but there used to be 'device > namespaces' patch series that delt with this kind of problem. Would be good to have now since I for me the best way is to work around the problem entirely -- not registering with sysfs. > > Again, I suspect strongly this will need some kind of TPM/IMA > namespace to actually do what you want. So here things aren't so clear to me, either. Sysfs should really only show the devices that are relevant to a container, but it seems to show a lot more than that, serial bus for example, i8042 etc. , that don't necessarily have relevance inside the container. So if there's a hardware TPM on the system, it probably shouldn't show the sysfs entries in the container. If we add a vTPM for the container, it would ideally show up as tpm0 inside the container's sysfs entries, but it doesn't do that unless this is the first device created and there's no hardware TPM. So while all of this isn't possible with sysfs, I made the choice to hide the device from sysfs entirely. If hw tpm0 shows, fine, we can report BIOS measurements then. It seems like device subsystem namespacing would be needed to realize part of this so that multiple tpm0s could exist, one in each device namespace. > > > If we don't want users to see the PCRs of other containers we set > > this flag. The side effect of setting this flag is that the > > container will not be able to see its own TPM device, either, but > > presumably the user has commands to reads PCRs etc. > > Erm, sysfs is just an obvious symptom, the container could also just > create the dev node and access any tpm it likes, or use some future > (?) IMA API with a tpm device number. No, the device cgroup prevents that. Only the major/minor number of its dedicated TPM (client) chardev can be accessed. > > > VTPM_FLAG_NO_LOG: Container's boot/start without firmware that > > would write an ACPI log. So in most cases one will choose the > > VTPM_FLAG_NO_LOG flag since there is no log. Maybe that's a flag > > the user shouldn't have control over. > > ? But there is no ACPI handle so this should already be naturally > defeated by the core code. Why a flag? > > > > Another way to handle this is to return the FD for the server side of > > > the new TPM here and then route accesses from the /dev/tpmX side to > > > that FD. Avoid the extra char device. There is some kernel > > > infrasturcture that helpd do this (anon fd or something, IIRC) > > > Hm, I would prefer to keep the server-side char device since that way > > one isn't tied to fd passing or so and the TPM could terminate and be > > restarted later on on top of that server side device while keeping the > > client side device usable beyond the server side shutting down. > > Don't be afraid of FD passing. > > vTPM daemon restart isn't really feasible anyhow as TPMs are > stateful. The kernel is not prepared to survive a HW TPM reboot. If > that problem were to be solved somehow then a scheme like systemd's > FDSTORE=1 to keep the fd alive during a daemon crash could be used. > > The huge downside to using a master side dev node is that these things > will leak. Killing the vtpm daemon will not clean up the slave > interface and another command and sysadmin interaction will be needed > to sort out the inevitable mess. This is a scenario that works, though. The server side opened /dev/vtpms1. It closed it and the clients side disappeared. Both devices /dev/vtpms1 and /dev/vtpmc1 were removed. The client side then shows the following: # ls -l /proc/self/fd total 0 lrwx------. 1 root root 64 Jan 20 21:34 0 -> /dev/pts/1 lrwx------. 1 root root 64 Jan 20 21:34 1 -> /dev/pts/1 lrwx------. 1 root root 64 Jan 20 21:34 100 -> /dev/vtpmc1 (deleted) lrwx------. 1 root root 64 Jan 20 21:34 2 -> /dev/pts/1 lr-x------. 1 root root 64 Jan 20 21:34 3 -> /proc/23266/fd # echo -en '\x00\xc1\x00\x00\x00\x0a\x00\x00\x00\x00' >&100 -bash: echo: write error: Input/output error Once the client did a 'exec 100>&-', so closed the fd, the /dev/vtpmc0 could actually reappear. There's a 'keepalive' flag in this vtpm driver that keeps the device pair around even if the server side shuts down. If that had been set, /dev/vtpmc1 wouldn't have been deleted upon server side closure. Stefan > > > > Then del becomes close() on the special fd > > And with that the client side device would *have* to close as well > > because the backend now is inaccessible? > > The client side would see failures on all TPM commands it tries to > > Jason >