Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote on 01/20/2016 10:21:15 PM:

>
> On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote:
> > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote:
> > > >    Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016
> > > >    06:51:07 PM:
> > > >    >
> > > >    > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote:
> > > >    > > +   pdev = platform_device_register_simple("tpm_vtpm",  
> > > vtpm_dev->dev_num,
> > > >    > > +                      NULL, 0);
> > > >    >
> > > >    > This seems strange, something like this should not be creating
> > > >    > platform_devices.
> > > >    Should it be a misc_device then ?
> > >
> > > No. Check what other virtual devices are doing..
> >
> > register_chrdev maybe?
>
> ?? that doesn't replace a platform_device

>
> > > Except that isn't good enough - the IMA kernel side doesn't knowthat this
> > > tpm is now acting as the 'main' 'default' TPM.
> >
> > Hooking the vTPM to IMA requires another patch that I haven't
> shown since IMA
> > namespacing isn't public yet. Basically we implement another ioctl
> () that is to
> > be called before the clone() in order to 'reserve' a vtpm device
> pair for the
> > calling process. During the clone() call IMA namespacing code can query the
> > vTPM driver for a 'reserved' device pair. Hooking IMA up after the
> clone() may
> > also work but in case of docker/golang it's better to do this
> before since the
> > language libraries do a lot after the clone automatically.
>
> That sounds very complicated, wouldn't you just specify the TPM index to use
> in the IMA namespace when it is created?


The IMA namespace is created as part of clone(). You cannot pass anything via clone(). So you either have to do it before or immediately after. If after is too later for whatever reason, you have to do it before.

>
> > So here things aren't so clear to me, either. Sysfs should really
> only show the
> > devices that are relevant to a container, but it seems to show a lot
> > more than
>
> I think what you are missing is that nobody uses mainline containers
> for the kind of strong isolation you are thinking about. Out-of-tree
> patches are used by those people and, as I understand it, they cover
> all these issues.


?? Out-of-tree patches?

>
> So, in mainline, the correct thing to do is nothing, and realize that
> people who would care about pcr isolation between containers won't run
> mainline anyhow. Work with the out-of-tree people to make sure things
> work properly until they get the other bits in mainline.


Now one just needs to find those poeple.
Basically you suggest to ignore the potential leaking between containers. Just register with sysfs ?

>
> At least that is my impression of the state of affairs, someone more
> involved may know better.
>
> > > The huge downside to using a master side dev node is that these things
> > > will leak. Killing the vtpm daemon will not clean up the slave
> > > interface and another command and sysadmin interaction will be needed
> > > to sort out the inevitable mess.
> >
> > This is a scenario that works, though.
>
> So you already implemented the same semantics as I talked about, a clean
> up on close?


It's part of the existing patch, yes. A flag gives a choice to keep it around, or if not set and the server close()s, the pair disappears.

>
> Then just return the fd like I said.


Any driver that can be used as an example ?

>
> auto-delete a master char dev on close is a very strange API, don't do
> that.


What I called cleanup can be trigger by the vTPM closing /dev/vtpms%d, so the server-side. What is the master for you? /dev/vtpmx where we run the ioctls on?

  Stefan

>
> Jason
>