All of lore.kernel.org
 help / color / mirror / Atom feed
* Virtualizing /proc/sys/kernel/random/boot_id per container ?
@ 2012-08-30 21:18 Daniel P. Berrange
       [not found] ` <20120830211832.GA3297-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrange @ 2012-08-30 21:18 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Eric W. Biederman

One of the features that SystemD folks have asked us to fix in LXC, is
to make sure that /proc/sys/kernel/random/boot_id changes each time a
container is started.

The current semantics are that this file produces a new random UUID each
time the host OS is booted. Obviously each time we start a container now,
they just see the host's random boot_id, so from a container's POV this
does not change each time it starts.

There seems to be general agreement that, aside from the PID directories,
changes to data in  proc should be done by a FUSE filesystem overlay of
some kind. We could use that mechanism to fix 'boot_id' in userspace, but
I'm wondering if this is a better candidate for dealing with in kernel
space, since as well as the /proc/sys tree, the data is also visible via
the sysctl() system call which a FUSE overlay won't address.

The kernel doesn't have a real concept of a 'container' to associate
a boot_id value with as such, but maybe it is reasonable to associate
a boot_id value with each PID namespace ?

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found] ` <20120830211832.GA3297-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-08-30 22:15   ` Eric W. Biederman
       [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2012-08-30 22:15 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

"Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> One of the features that SystemD folks have asked us to fix in LXC, is
> to make sure that /proc/sys/kernel/random/boot_id changes each time a
> container is started.

There may be a good reason for this.  Most of the time what I have seen
of kernel requests from the direction of SystemD is that while there may
be a real problem but usually their imagined solution is not a
particularly good solution.  So a description of the problem is needed.

Justifying something with just SystemD wants this is a good way to get
a nack.

> The current semantics are that this file produces a new random UUID each
> time the host OS is booted. Obviously each time we start a container now,
> they just see the host's random boot_id, so from a container's POV this
> does not change each time it starts.

That is correct.  As I recall the contract with boot_id is to provide
a unique per boot value to assist in dealing with boots etc.  I seem
to recall emacs uses the combination of hostname+boot_id to help
generate unique lock files names.

I would definitely need a refresher on how boot_id is used in practice
by applications other than SystemD before I could suggest a good design.

There is also a question of uptime.

> There seems to be general agreement that, aside from the PID directories,
> changes to data in  proc should be done by a FUSE filesystem overlay of
> some kind.

No.  I have yet to see a justification for using FUSE in containers on
top of proc files.

I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
instead of providing a proper mechanism to tell applications how
parallel they can/should be.

For hacks and controversial ideas FUSE is good because it makes it
someone else's problem and it means it isn't something we have to
support in the kernel for the indefinite future.  At the same time in
general a FUSE solution does not really solve anything it just sort of
papers over a problem.

For some problems papering over them is good enough, for other problems
they really should be solved properly.

> We could use that mechanism to fix 'boot_id' in userspace, but
> I'm wondering if this is a better candidate for dealing with in kernel
> space, since as well as the /proc/sys tree, the data is also visible via
> the sysctl() system call which a FUSE overlay won't address.

Any application that uses the sysctl() system call needs to be fixed.
When I looked years ago the number of applications using sysctl() could
be numbered on one hand and most of those applications were the fedora
installer, and the fedora installer hasn't used sysctl.

> The kernel doesn't have a real concept of a 'container' to associate
> a boot_id value with as such, but maybe it is reasonable to associate
> a boot_id value with each PID namespace ?

There is also the question of uptime and clocks and things like that.

The utsnamespace might be a more resasonable place to tack on that kind
of extended functionality.

Just changing boot_id itself and not all of the other bits that track
when we have booted does not seem reasonable.

Once we can sort out the details a kernel implementation should be quite
trivial.  It just requires the appropriate sysctl registration dance.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-08-30 22:50       ` Daniel P. Berrange
       [not found]         ` <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-08-30 23:22       ` Daniel P. Berrange
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrange @ 2012-08-30 22:50 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
> > One of the features that SystemD folks have asked us to fix in LXC, is
> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
> > container is started.
> 
> There may be a good reason for this.  Most of the time what I have seen
> of kernel requests from the direction of SystemD is that while there may
> be a real problem but usually their imagined solution is not a
> particularly good solution.  So a description of the problem is needed.
> 
> Justifying something with just SystemD wants this is a good way to get
> a nack.

SystemD records log messages for all system services in their journal.
They can show you all log messages for the current service execution,
all log messages for a service since system boot, or all log messsages
ever. The boot_id value is used as a unique tag to allow grouping of
the log messages per system boot. When we run systemd inside a container
we want to get that grouping of log messages generated by services inside
the container, to take account of the container boot, not the host boot.
Hence the desire to have the boot_id value reflect when a container is
booted.

> > The current semantics are that this file produces a new random UUID each
> > time the host OS is booted. Obviously each time we start a container now,
> > they just see the host's random boot_id, so from a container's POV this
> > does not change each time it starts.
> 
> That is correct.  As I recall the contract with boot_id is to provide
> a unique per boot value to assist in dealing with boots etc.  I seem
> to recall emacs uses the combination of hostname+boot_id to help
> generate unique lock files names.
> 
> I would definitely need a refresher on how boot_id is used in practice
> by applications other than SystemD before I could suggest a good design.
> 
> There is also a question of uptime.

Agreed, as you say, this is one of many other /proc values needing
virtualizing for container.

> > There seems to be general agreement that, aside from the PID directories,
> > changes to data in  proc should be done by a FUSE filesystem overlay of
> > some kind.
> 
> No.  I have yet to see a justification for using FUSE in containers on
> top of proc files.
> 
> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
> instead of providing a proper mechanism to tell applications how
> parallel they can/should be.
> 
> For hacks and controversial ideas FUSE is good because it makes it
> someone else's problem and it means it isn't something we have to
> support in the kernel for the indefinite future.  At the same time in
> general a FUSE solution does not really solve anything it just sort of
> papers over a problem.
> 
> For some problems papering over them is good enough, for other problems
> they really should be solved properly.

Ok, well I guess things aren't as clear cut as I understood then. I've
been told that FUSE was the desired approach to dealing with all the
various files in /proc that might need changing for containers. Personally
I don't much care what approach is used - if the kernel wants to do more
stuff that's fine with my from a libvirt LXC POV. I'll just follow whatever
the consensus is in this area.

> > We could use that mechanism to fix 'boot_id' in userspace, but
> > I'm wondering if this is a better candidate for dealing with in kernel
> > space, since as well as the /proc/sys tree, the data is also visible via
> > the sysctl() system call which a FUSE overlay won't address.
> 
> Any application that uses the sysctl() system call needs to be fixed.
> When I looked years ago the number of applications using sysctl() could
> be numbered on one hand and most of those applications were the fedora
> installer, and the fedora installer hasn't used sysctl.

Ok, I did wonder whether anyone would actually use sysctl() instead
of reading /proc/sys. If we can ignore the sysctl that gives us more
options.

> > The kernel doesn't have a real concept of a 'container' to associate
> > a boot_id value with as such, but maybe it is reasonable to associate
> > a boot_id value with each PID namespace ?
> 
> There is also the question of uptime and clocks and things like that.
> 
> The utsnamespace might be a more resasonable place to tack on that kind
> of extended functionality.
>
> Just changing boot_id itself and not all of the other bits that track
> when we have booted does not seem reasonable.
> 
> Once we can sort out the details a kernel implementation should be quite
> trivial.  It just requires the appropriate sysctl registration dance.

Ok, I'll try to identify a list of other related parts which need changing
wrt boot.

Thanks for the feedback.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-08-30 22:50       ` Daniel P. Berrange
@ 2012-08-30 23:22       ` Daniel P. Berrange
       [not found]         ` <20120830232239.GE9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-08-31 13:25       ` Serge Hallyn
  2012-09-03  7:52       ` Glauber Costa
  3 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrange @ 2012-08-30 23:22 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
> > One of the features that SystemD folks have asked us to fix in LXC, is
> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
> > container is started.
> 
> There may be a good reason for this.  Most of the time what I have seen
> of kernel requests from the direction of SystemD is that while there may
> be a real problem but usually their imagined solution is not a
> particularly good solution.  So a description of the problem is needed.
> 
> Justifying something with just SystemD wants this is a good way to get
> a nack.
> 
> > The current semantics are that this file produces a new random UUID each
> > time the host OS is booted. Obviously each time we start a container now,
> > they just see the host's random boot_id, so from a container's POV this
> > does not change each time it starts.
> 
> That is correct.  As I recall the contract with boot_id is to provide
> a unique per boot value to assist in dealing with boots etc.  I seem
> to recall emacs uses the combination of hostname+boot_id to help
> generate unique lock files names.
> 
> I would definitely need a refresher on how boot_id is used in practice
> by applications other than SystemD before I could suggest a good design.

This post seems to describe what emacs wants boot_id for:

  http://marc.info/?l=linux-kernel&m=93613053109494&w=2

With this info, I think emacs inside a container would expect the boot_id
to change each time the container is started, so they can detect stale
locks from an emacs instance in a previous boot of the container.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]         ` <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-08-31  0:13           ` Eric W. Biederman
       [not found]             ` <87bohrhqai.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2012-08-31  0:13 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

"Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> 
>> > One of the features that SystemD folks have asked us to fix in LXC, is
>> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> > container is started.
>> 
>> There may be a good reason for this.  Most of the time what I have seen
>> of kernel requests from the direction of SystemD is that while there may
>> be a real problem but usually their imagined solution is not a
>> particularly good solution.  So a description of the problem is needed.
>> 
>> Justifying something with just SystemD wants this is a good way to get
>> a nack.
>
> SystemD records log messages for all system services in their journal.
> They can show you all log messages for the current service execution,
> all log messages for a service since system boot, or all log messsages
> ever. The boot_id value is used as a unique tag to allow grouping of
> the log messages per system boot. When we run systemd inside a container
> we want to get that grouping of log messages generated by services inside
> the container, to take account of the container boot, not the host boot.
> Hence the desire to have the boot_id value reflect when a container is
> booted.

Since SystemD post-dates containers and since the logging feature is not
currently in wide use that use case is completely non-persuasive.

So far this just sounds like a plain SystemD bug and something that can
be easily changed at this point in time.

It has been a long time but my fuzzy memory says that the originial
boot_id justification was based on use cases that could not be solved
any other way.

My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
that inspired the implementation of boot_id.  However reading the
current emacs source code it appears emacs gave up before boot_id
was implemented and stats /var/run/random-seed (which we seem to
have removed) or looks in wtmp or utmp for the latest boot record.

I did a quick grep through the binaries on my system and I could not
find anything using /proc/sys/random/boot_id.

That suggests to me that the proper solution is to actually just remove
boot_id.

Hmm.  And then there is other interesting detail.  What should boot_id
return after the processes have migrated from one system to another.

>> > The current semantics are that this file produces a new random UUID each
>> > time the host OS is booted. Obviously each time we start a container now,
>> > they just see the host's random boot_id, so from a container's POV this
>> > does not change each time it starts.
>> 
>> That is correct.  As I recall the contract with boot_id is to provide
>> a unique per boot value to assist in dealing with boots etc.  I seem
>> to recall emacs uses the combination of hostname+boot_id to help
>> generate unique lock files names.
>> 
>> I would definitely need a refresher on how boot_id is used in practice
>> by applications other than SystemD before I could suggest a good design.
>> 
>> There is also a question of uptime.
>
> Agreed, as you say, this is one of many other /proc values needing
> virtualizing for container.

If you think of it as virtualization and you figure the requirement is
to exactly replicate a non-containerized system you won't come up with
suggestions that make sense to implement.

For the most part the semantics of namespaces exist to support process
migration.

>> > There seems to be general agreement that, aside from the PID directories,
>> > changes to data in  proc should be done by a FUSE filesystem overlay of
>> > some kind.
>> 
>> No.  I have yet to see a justification for using FUSE in containers on
>> top of proc files.
>> 
>> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
>> instead of providing a proper mechanism to tell applications how
>> parallel they can/should be.
>> 
>> For hacks and controversial ideas FUSE is good because it makes it
>> someone else's problem and it means it isn't something we have to
>> support in the kernel for the indefinite future.  At the same time in
>> general a FUSE solution does not really solve anything it just sort of
>> papers over a problem.
>> 
>> For some problems papering over them is good enough, for other problems
>> they really should be solved properly.
>
> Ok, well I guess things aren't as clear cut as I understood then. I've
> been told that FUSE was the desired approach to dealing with all the
> various files in /proc that might need changing for containers. Personally
> I don't much care what approach is used - if the kernel wants to do more
> stuff that's fine with my from a libvirt LXC POV. I'll just follow whatever
> the consensus is in this area.

Largely what I have seen is a bunch of half thought out hacks and the
consensus being (ick don't bother me...).  In which case FUSE is a good
answer as it doesn't obligate anyone to maintain or care about the code,
except those who want the hack.

>> > We could use that mechanism to fix 'boot_id' in userspace, but
>> > I'm wondering if this is a better candidate for dealing with in kernel
>> > space, since as well as the /proc/sys tree, the data is also visible via
>> > the sysctl() system call which a FUSE overlay won't address.
>> 
>> Any application that uses the sysctl() system call needs to be fixed.
>> When I looked years ago the number of applications using sysctl() could
>> be numbered on one hand and most of those applications were the fedora
>> installer, and the fedora installer hasn't used sysctl.
>
> Ok, I did wonder whether anyone would actually use sysctl() instead
> of reading /proc/sys. If we can ignore the sysctl that gives us more
> options.

Most definitely.  The warning sysctl spews when you figure out how
to call it should be a good clue.

>> > The kernel doesn't have a real concept of a 'container' to associate
>> > a boot_id value with as such, but maybe it is reasonable to associate
>> > a boot_id value with each PID namespace ?
>> 
>> There is also the question of uptime and clocks and things like that.
>> 
>> The utsnamespace might be a more resasonable place to tack on that kind
>> of extended functionality.
>>
>> Just changing boot_id itself and not all of the other bits that track
>> when we have booted does not seem reasonable.
>> 
>> Once we can sort out the details a kernel implementation should be quite
>> trivial.  It just requires the appropriate sysctl registration dance.
>
> Ok, I'll try to identify a list of other related parts which need changing
> wrt boot.
>
> Thanks for the feedback.

I hope it helps.

There may be a justification and a good case for messing with boot_id
but I don't currently see it.

What I see (so far) is SystemD unnecessarily tying itself to linux
implemenation details.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]         ` <20120830232239.GE9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-08-31  0:18           ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2012-08-31  0:18 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

"Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> 
>> > One of the features that SystemD folks have asked us to fix in LXC, is
>> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> > container is started.
>> 
>> There may be a good reason for this.  Most of the time what I have seen
>> of kernel requests from the direction of SystemD is that while there may
>> be a real problem but usually their imagined solution is not a
>> particularly good solution.  So a description of the problem is needed.
>> 
>> Justifying something with just SystemD wants this is a good way to get
>> a nack.
>> 
>> > The current semantics are that this file produces a new random UUID each
>> > time the host OS is booted. Obviously each time we start a container now,
>> > they just see the host's random boot_id, so from a container's POV this
>> > does not change each time it starts.
>> 
>> That is correct.  As I recall the contract with boot_id is to provide
>> a unique per boot value to assist in dealing with boots etc.  I seem
>> to recall emacs uses the combination of hostname+boot_id to help
>> generate unique lock files names.
>> 
>> I would definitely need a refresher on how boot_id is used in practice
>> by applications other than SystemD before I could suggest a good design.
>
> This post seems to describe what emacs wants boot_id for:
>
>   http://marc.info/?l=linux-kernel&m=93613053109494&w=2
>
> With this info, I think emacs inside a container would expect the boot_id
> to change each time the container is started, so they can detect stale
> locks from an emacs instance in a previous boot of the container.

Thanks that patch does clarify the original purpose.

Unfortunately the lines of communication were crossed because emacs
24.1 most certainly does not use /proc/sys/kernel/random/boot_id.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-08-30 22:50       ` Daniel P. Berrange
  2012-08-30 23:22       ` Daniel P. Berrange
@ 2012-08-31 13:25       ` Serge Hallyn
  2012-09-03  7:53         ` Glauber Costa
  2012-09-03  7:52       ` Glauber Costa
  3 siblings, 1 reply; 26+ messages in thread
From: Serge Hallyn @ 2012-08-31 13:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
> > One of the features that SystemD folks have asked us to fix in LXC, is
> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
> > container is started.
> 
> There may be a good reason for this.  Most of the time what I have seen
> of kernel requests from the direction of SystemD is that while there may
> be a real problem but usually their imagined solution is not a
> particularly good solution.  So a description of the problem is needed.
> 
> Justifying something with just SystemD wants this is a good way to get
> a nack.
> 
> > The current semantics are that this file produces a new random UUID each
> > time the host OS is booted. Obviously each time we start a container now,
> > they just see the host's random boot_id, so from a container's POV this
> > does not change each time it starts.
> 
> That is correct.  As I recall the contract with boot_id is to provide
> a unique per boot value to assist in dealing with boots etc.  I seem
> to recall emacs uses the combination of hostname+boot_id to help
> generate unique lock files names.
> 
> I would definitely need a refresher on how boot_id is used in practice
> by applications other than SystemD before I could suggest a good design.
> 
> There is also a question of uptime.
> 
> > There seems to be general agreement that, aside from the PID directories,
> > changes to data in  proc should be done by a FUSE filesystem overlay of
> > some kind.
> 
> No.  I have yet to see a justification for using FUSE in containers on
> top of proc files.
> 
> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
> instead of providing a proper mechanism to tell applications how
> parallel they can/should be.

Should we be talking about a new set of library functions to gather info
like available memory and cpus, etc?  The library functions could internally
take into account the per-host procfiles, cgroup info, and, as the need
arises, new sources of information.

-serge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                         ` (2 preceding siblings ...)
  2012-08-31 13:25       ` Serge Hallyn
@ 2012-09-03  7:52       ` Glauber Costa
  3 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-09-03  7:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 08/31/2012 02:15 AM, Eric W. Biederman wrote:
> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
>> One of the features that SystemD folks have asked us to fix in LXC, is
>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> container is started.
> 
> There may be a good reason for this.  Most of the time what I have seen
> of kernel requests from the direction of SystemD is that while there may
> be a real problem but usually their imagined solution is not a
> particularly good solution.  So a description of the problem is needed.
> 
> Justifying something with just SystemD wants this is a good way to get
> a nack.
> 
>> The current semantics are that this file produces a new random UUID each
>> time the host OS is booted. Obviously each time we start a container now,
>> they just see the host's random boot_id, so from a container's POV this
>> does not change each time it starts.
> 
> That is correct.  As I recall the contract with boot_id is to provide
> a unique per boot value to assist in dealing with boots etc.  I seem
> to recall emacs uses the combination of hostname+boot_id to help
> generate unique lock files names.
> 
> I would definitely need a refresher on how boot_id is used in practice
> by applications other than SystemD before I could suggest a good design.
> 
> There is also a question of uptime.
> 
>> There seems to be general agreement that, aside from the PID directories,
>> changes to data in  proc should be done by a FUSE filesystem overlay of
>> some kind.
> 
> No.  I have yet to see a justification for using FUSE in containers on
> top of proc files.
> 
> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
> instead of providing a proper mechanism to tell applications how
> parallel they can/should be.
> 
> For hacks and controversial ideas FUSE is good because it makes it
> someone else's problem and it means it isn't something we have to
> support in the kernel for the indefinite future.  At the same time in
> general a FUSE solution does not really solve anything it just sort of
> papers over a problem.
> 

Two main points here:

I would love to see this solved transparently by the kernel. In all
previous attempts, this seemed to fall under "controversial idea" as you
described above.

It is hard to call it "consensus", indeed, but this idea gets quite a
lot of traction (I am far from the only one), specifically because it
allows us to present userspace with something in the time being.
Besides, the interface is just the standard /proc interface. If the
kernel eventually does something here, we just have to replace it.

About mechanism for communicating with applications, this is completely
orthogonal. Applications will keep requiring the standard interface for
a while. We can, however, do a lot better with a library for those,
using some other non-standard mechanism that helps containers and
non-containers alike. But this is in addition to, at least for the time
being.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
  2012-08-31 13:25       ` Serge Hallyn
@ 2012-09-03  7:53         ` Glauber Costa
       [not found]           ` <504461F1.1090400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-03  7:53 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 08/31/2012 05:25 PM, Serge Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>
>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>> container is started.
>>
>> There may be a good reason for this.  Most of the time what I have seen
>> of kernel requests from the direction of SystemD is that while there may
>> be a real problem but usually their imagined solution is not a
>> particularly good solution.  So a description of the problem is needed.
>>
>> Justifying something with just SystemD wants this is a good way to get
>> a nack.
>>
>>> The current semantics are that this file produces a new random UUID each
>>> time the host OS is booted. Obviously each time we start a container now,
>>> they just see the host's random boot_id, so from a container's POV this
>>> does not change each time it starts.
>>
>> That is correct.  As I recall the contract with boot_id is to provide
>> a unique per boot value to assist in dealing with boots etc.  I seem
>> to recall emacs uses the combination of hostname+boot_id to help
>> generate unique lock files names.
>>
>> I would definitely need a refresher on how boot_id is used in practice
>> by applications other than SystemD before I could suggest a good design.
>>
>> There is also a question of uptime.
>>
>>> There seems to be general agreement that, aside from the PID directories,
>>> changes to data in  proc should be done by a FUSE filesystem overlay of
>>> some kind.
>>
>> No.  I have yet to see a justification for using FUSE in containers on
>> top of proc files.
>>
>> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
>> instead of providing a proper mechanism to tell applications how
>> parallel they can/should be.
> 
> Should we be talking about a new set of library functions to gather info
> like available memory and cpus, etc?  The library functions could internally
> take into account the per-host procfiles, cgroup info, and, as the need
> arises, new sources of information.
> 

I think this makes a lot of sense.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]             ` <87bohrhqai.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-09-03  7:56               ` Glauber Costa
       [not found]                 ` <5044629C.3030909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-03  7:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>
>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>> container is started.
>>>
>>> There may be a good reason for this.  Most of the time what I have seen
>>> of kernel requests from the direction of SystemD is that while there may
>>> be a real problem but usually their imagined solution is not a
>>> particularly good solution.  So a description of the problem is needed.
>>>
>>> Justifying something with just SystemD wants this is a good way to get
>>> a nack.
>>
>> SystemD records log messages for all system services in their journal.
>> They can show you all log messages for the current service execution,
>> all log messages for a service since system boot, or all log messsages
>> ever. The boot_id value is used as a unique tag to allow grouping of
>> the log messages per system boot. When we run systemd inside a container
>> we want to get that grouping of log messages generated by services inside
>> the container, to take account of the container boot, not the host boot.
>> Hence the desire to have the boot_id value reflect when a container is
>> booted.
> 
> Since SystemD post-dates containers and since the logging feature is not
> currently in wide use that use case is completely non-persuasive.
> 
> So far this just sounds like a plain SystemD bug and something that can
> be easily changed at this point in time.
> 
> It has been a long time but my fuzzy memory says that the originial
> boot_id justification was based on use cases that could not be solved
> any other way.
> 
> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
> that inspired the implementation of boot_id.  However reading the
> current emacs source code it appears emacs gave up before boot_id
> was implemented and stats /var/run/random-seed (which we seem to
> have removed) or looks in wtmp or utmp for the latest boot record.
> 
> I did a quick grep through the binaries on my system and I could not
> find anything using /proc/sys/random/boot_id.
> 
> That suggests to me that the proper solution is to actually just remove
> boot_id.
> 
> Hmm.  And then there is other interesting detail.  What should boot_id
> return after the processes have migrated from one system to another.
> 

Since this would be a per-boot id, this clearly has to be carried over
with migration, along with all the tons of data we already carry.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                 ` <5044629C.3030909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-03 19:48                   ` Eric W. Biederman
       [not found]                     ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2012-09-03 19:48 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> 
>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>
>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>> container is started.
>>>>
>>>> There may be a good reason for this.  Most of the time what I have seen
>>>> of kernel requests from the direction of SystemD is that while there may
>>>> be a real problem but usually their imagined solution is not a
>>>> particularly good solution.  So a description of the problem is needed.
>>>>
>>>> Justifying something with just SystemD wants this is a good way to get
>>>> a nack.
>>>
>>> SystemD records log messages for all system services in their journal.
>>> They can show you all log messages for the current service execution,
>>> all log messages for a service since system boot, or all log messsages
>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>> the log messages per system boot. When we run systemd inside a container
>>> we want to get that grouping of log messages generated by services inside
>>> the container, to take account of the container boot, not the host boot.
>>> Hence the desire to have the boot_id value reflect when a container is
>>> booted.
>> 
>> Since SystemD post-dates containers and since the logging feature is not
>> currently in wide use that use case is completely non-persuasive.
>> 
>> So far this just sounds like a plain SystemD bug and something that can
>> be easily changed at this point in time.
>> 
>> It has been a long time but my fuzzy memory says that the originial
>> boot_id justification was based on use cases that could not be solved
>> any other way.
>> 
>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>> that inspired the implementation of boot_id.  However reading the
>> current emacs source code it appears emacs gave up before boot_id
>> was implemented and stats /var/run/random-seed (which we seem to
>> have removed) or looks in wtmp or utmp for the latest boot record.
>> 
>> I did a quick grep through the binaries on my system and I could not
>> find anything using /proc/sys/random/boot_id.
>> 
>> That suggests to me that the proper solution is to actually just remove
>> boot_id.
>> 
>> Hmm.  And then there is other interesting detail.  What should boot_id
>> return after the processes have migrated from one system to another.
>> 
>
> Since this would be a per-boot id, this clearly has to be carried over
> with migration, along with all the tons of data we already carry.

The twist of course is what does a boot mean.  If we are really after
machine boots than the current behavior is correct.

Looking back in the archives the desired behavior appears to be a value
that can be used to see if a pid value must be stale.

As a stale pid detector boot_id is pretty lousy.  Pids can still be
reused.

Still a role as a stale pid detector makes it clear which namespace
boot_id should be in and how we should treat boot_id upon migration.

You can only serve as a stale pid detector if you are in the pid
namespace.

So at this point patches are welcome.  Hopefully with a summary
of the discussion.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                     ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-09-04  8:42                       ` Glauber Costa
       [not found]                         ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-09-04 14:44                       ` Serge Hallyn
  1 sibling, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-04  8:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 09/03/2012 11:48 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>
>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>
>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>>> container is started.
>>>>>
>>>>> There may be a good reason for this.  Most of the time what I have seen
>>>>> of kernel requests from the direction of SystemD is that while there may
>>>>> be a real problem but usually their imagined solution is not a
>>>>> particularly good solution.  So a description of the problem is needed.
>>>>>
>>>>> Justifying something with just SystemD wants this is a good way to get
>>>>> a nack.
>>>>
>>>> SystemD records log messages for all system services in their journal.
>>>> They can show you all log messages for the current service execution,
>>>> all log messages for a service since system boot, or all log messsages
>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>>> the log messages per system boot. When we run systemd inside a container
>>>> we want to get that grouping of log messages generated by services inside
>>>> the container, to take account of the container boot, not the host boot.
>>>> Hence the desire to have the boot_id value reflect when a container is
>>>> booted.
>>>
>>> Since SystemD post-dates containers and since the logging feature is not
>>> currently in wide use that use case is completely non-persuasive.
>>>
>>> So far this just sounds like a plain SystemD bug and something that can
>>> be easily changed at this point in time.
>>>
>>> It has been a long time but my fuzzy memory says that the originial
>>> boot_id justification was based on use cases that could not be solved
>>> any other way.
>>>
>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>>> that inspired the implementation of boot_id.  However reading the
>>> current emacs source code it appears emacs gave up before boot_id
>>> was implemented and stats /var/run/random-seed (which we seem to
>>> have removed) or looks in wtmp or utmp for the latest boot record.
>>>
>>> I did a quick grep through the binaries on my system and I could not
>>> find anything using /proc/sys/random/boot_id.
>>>
>>> That suggests to me that the proper solution is to actually just remove
>>> boot_id.
>>>
>>> Hmm.  And then there is other interesting detail.  What should boot_id
>>> return after the processes have migrated from one system to another.
>>>
>>
>> Since this would be a per-boot id, this clearly has to be carried over
>> with migration, along with all the tons of data we already carry.
> 
> The twist of course is what does a boot mean.  If we are really after
> machine boots than the current behavior is correct.
> 
> Looking back in the archives the desired behavior appears to be a value
> that can be used to see if a pid value must be stale.
> 
> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> reused.
> 
> Still a role as a stale pid detector makes it clear which namespace
> boot_id should be in and how we should treat boot_id upon migration.
> 
> You can only serve as a stale pid detector if you are in the pid
> namespace.
> 
> So at this point patches are welcome.  Hopefully with a summary
> of the discussion.
> 
> Eric
> 

Your discussion about boot_id being a limited solution is totally valid.
But it is orthogonal to the question of whether or not a container
should have it.

I took a look at this, and I think the kernel should be in perfect
position to do it. FUSE is welcome so far for things that are really
ill-defined in the kernel, such as data coming from cgroups, which has
no concept of visibility.

boot_id as a pid namespace id is a very well defined concept. We just
need an interface to set it up to make it stable across migration. Maybe
we can accept writes to this file as valid, provided the pid namespace
has only the init process.

Then any tool could clone, mount proc, set this id, and continue
normally. Any objections ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                         ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-04  9:16                           ` Glauber Costa
       [not found]                             ` <5045C707.9020001-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-09-04  9:20                           ` Eric W. Biederman
  1 sibling, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-04  9:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 09/04/2012 12:42 PM, Glauber Costa wrote:
> boot_id as a pid namespace id is a very well defined concept. We just
> need an interface to set it up to make it stable across migration. Maybe
> we can accept writes to this file as valid, provided the pid namespace
> has only the init process.
> 
> Then any tool could clone, mount proc, set this id, and continue
> normally. Any objections ?

Ok, the above is totally jet-lag induced garbage. I totally forgot this
is a sysctl interface.

We do per-netns sysctls just fine, why can't we do them here as well ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                         ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-09-04  9:16                           ` Glauber Costa
@ 2012-09-04  9:20                           ` Eric W. Biederman
       [not found]                             ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2012-09-04  9:20 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> The twist of course is what does a boot mean.  If we are really after
>> machine boots than the current behavior is correct.
>> 
>> Looking back in the archives the desired behavior appears to be a value
>> that can be used to see if a pid value must be stale.
>> 
>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>> reused.
>> 
>> Still a role as a stale pid detector makes it clear which namespace
>> boot_id should be in and how we should treat boot_id upon migration.
>> 
>> You can only serve as a stale pid detector if you are in the pid
>> namespace.
>> 
>> So at this point patches are welcome.  Hopefully with a summary
>> of the discussion.
>> 
>> Eric
>> 
>
> Your discussion about boot_id being a limited solution is totally valid.
> But it is orthogonal to the question of whether or not a container
> should have it.

The important part is that boot_id as originally conceived is an
identifier to be used in conjunction with pids.  Therefore boot_id is a
proper pid_namespace component, and there are no semantic problems with
putting it in the kernel.

> I took a look at this, and I think the kernel should be in perfect
> position to do it.

Agreed.

> boot_id as a pid namespace id is a very well defined concept.

Agreed.

A reference to the history and the definition needs to be in the patch
description.

> We just
> need an interface to set it up to make it stable across migration. Maybe
> we can accept writes to this file as valid, provided the pid namespace
> has only the init process.

I think the easy solution is to take advantage of the fact that boot_id
is initialized lazily.  Don't allow writes if boot_id has already been
read.

> Then any tool could clone, mount proc, set this id, and continue
> normally. Any objections ?

My biggest concern is that creating multiple pid namespaces might allow
a way to drain the entropy pool in a way that we don't allow normal
users.

This is important to look at as with a little luck we will have
unprivileged creation of user namespaces and pid namespaces in the near
future.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                             ` <5045C707.9020001-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-04  9:53                               ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2012-09-04  9:53 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 09/04/2012 12:42 PM, Glauber Costa wrote:
>> boot_id as a pid namespace id is a very well defined concept. We just
>> need an interface to set it up to make it stable across migration. Maybe
>> we can accept writes to this file as valid, provided the pid namespace
>> has only the init process.
>> 
>> Then any tool could clone, mount proc, set this id, and continue
>> normally. Any objections ?
>
> Ok, the above is totally jet-lag induced garbage. I totally forgot this
> is a sysctl interface.
>
> We do per-netns sysctls just fine, why can't we do them here as well ?

Yes. This is a sysctl.

The definition of boot_id is that it is for detecting stale pids.
So it should be per pid-namespace not per-netns.

The sysctl infrastructure supports per pid-namespace sysctls as
easily as per-netns sysctls. 

Well almost as easily as the glue code to write a register_pidns_sysctl
hasn't been written.  But however the existing hack of looking at
current works fine for the moment as well.

Ultimately I want to get us to /proc/<pid>/sys/ so we can look at
each processes sysctls and tweak them.  But that isn't this weeks
project.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                             ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-09-04 12:08                               ` Daniel P. Berrange
  2012-09-04 15:28                               ` Serge Hallyn
  1 sibling, 0 replies; 26+ messages in thread
From: Daniel P. Berrange @ 2012-09-04 12:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Sep 04, 2012 at 02:20:46AM -0700, Eric W. Biederman wrote:
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> >> The twist of course is what does a boot mean.  If we are really after
> >> machine boots than the current behavior is correct.
> >> 
> >> Looking back in the archives the desired behavior appears to be a value
> >> that can be used to see if a pid value must be stale.
> >> 
> >> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> >> reused.
> >> 
> >> Still a role as a stale pid detector makes it clear which namespace
> >> boot_id should be in and how we should treat boot_id upon migration.
> >> 
> >> You can only serve as a stale pid detector if you are in the pid
> >> namespace.
> >> 
> >> So at this point patches are welcome.  Hopefully with a summary
> >> of the discussion.
> >
> > Your discussion about boot_id being a limited solution is totally valid.
> > But it is orthogonal to the question of whether or not a container
> > should have it.
> 
> The important part is that boot_id as originally conceived is an
> identifier to be used in conjunction with pids.  Therefore boot_id is a
> proper pid_namespace component, and there are no semantic problems with
> putting it in the kernel.


> > boot_id as a pid namespace id is a very well defined concept.
> 
> Agreed.
> 
> A reference to the history and the definition needs to be in the patch
> description.

> > Then any tool could clone, mount proc, set this id, and continue
> > normally. Any objections ?
> 
> My biggest concern is that creating multiple pid namespaces might allow
> a way to drain the entropy pool in a way that we don't allow normal
> users.
> 
> This is important to look at as with a little luck we will have
> unprivileged creation of user namespaces and pid namespaces in the near
> future.

Unprivileged users can already ask the kernel to generate random UUIDs
on demand. eg

  $ sysctl kernel.random.uuid
  kernel.random.uuid = 76199778-8f6d-4fae-a45d-2c4cc0bca62a
  $ sysctl kernel.random.uuid
  kernel.random.uuid = 00c7d637-f94d-4df6-82c3-34ad97dd782e
  ...etc...

So allocating a new random UUID for boot_id each time a pid namespace
is created does not appear to make the situation worse wrt entropy
usage by unprivileged users.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]           ` <504461F1.1090400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-04 14:42             ` Serge Hallyn
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Hallyn @ 2012-09-04 14:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 08/31/2012 05:25 PM, Serge Hallyn wrote:
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>
> >>> One of the features that SystemD folks have asked us to fix in LXC, is
> >>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
> >>> container is started.
> >>
> >> There may be a good reason for this.  Most of the time what I have seen
> >> of kernel requests from the direction of SystemD is that while there may
> >> be a real problem but usually their imagined solution is not a
> >> particularly good solution.  So a description of the problem is needed.
> >>
> >> Justifying something with just SystemD wants this is a good way to get
> >> a nack.
> >>
> >>> The current semantics are that this file produces a new random UUID each
> >>> time the host OS is booted. Obviously each time we start a container now,
> >>> they just see the host's random boot_id, so from a container's POV this
> >>> does not change each time it starts.
> >>
> >> That is correct.  As I recall the contract with boot_id is to provide
> >> a unique per boot value to assist in dealing with boots etc.  I seem
> >> to recall emacs uses the combination of hostname+boot_id to help
> >> generate unique lock files names.
> >>
> >> I would definitely need a refresher on how boot_id is used in practice
> >> by applications other than SystemD before I could suggest a good design.
> >>
> >> There is also a question of uptime.
> >>
> >>> There seems to be general agreement that, aside from the PID directories,
> >>> changes to data in  proc should be done by a FUSE filesystem overlay of
> >>> some kind.
> >>
> >> No.  I have yet to see a justification for using FUSE in containers on
> >> top of proc files.
> >>
> >> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
> >> instead of providing a proper mechanism to tell applications how
> >> parallel they can/should be.
> > 
> > Should we be talking about a new set of library functions to gather info
> > like available memory and cpus, etc?  The library functions could internally
> > take into account the per-host procfiles, cgroup info, and, as the need
> > arises, new sources of information.
> > 
> 
> I think this makes a lot of sense.

Cool.  I can't do it today, but it might be worth starting a list (on wiki?)
of information userspace wants (uptime, cpuinfo, meminfo, etc) so we can come
up with a reasonable api.

-serge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                     ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-09-04  8:42                       ` Glauber Costa
@ 2012-09-04 14:44                       ` Serge Hallyn
  2012-09-04 14:45                         ` Glauber Costa
  1 sibling, 1 reply; 26+ messages in thread
From: Serge Hallyn @ 2012-09-04 14:44 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
> > On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
> >> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >> 
> >>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> >>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>>>
> >>>>> One of the features that SystemD folks have asked us to fix in LXC, is
> >>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
> >>>>> container is started.
> >>>>
> >>>> There may be a good reason for this.  Most of the time what I have seen
> >>>> of kernel requests from the direction of SystemD is that while there may
> >>>> be a real problem but usually their imagined solution is not a
> >>>> particularly good solution.  So a description of the problem is needed.
> >>>>
> >>>> Justifying something with just SystemD wants this is a good way to get
> >>>> a nack.
> >>>
> >>> SystemD records log messages for all system services in their journal.
> >>> They can show you all log messages for the current service execution,
> >>> all log messages for a service since system boot, or all log messsages
> >>> ever. The boot_id value is used as a unique tag to allow grouping of
> >>> the log messages per system boot. When we run systemd inside a container
> >>> we want to get that grouping of log messages generated by services inside
> >>> the container, to take account of the container boot, not the host boot.
> >>> Hence the desire to have the boot_id value reflect when a container is
> >>> booted.
> >> 
> >> Since SystemD post-dates containers and since the logging feature is not
> >> currently in wide use that use case is completely non-persuasive.
> >> 
> >> So far this just sounds like a plain SystemD bug and something that can
> >> be easily changed at this point in time.
> >> 
> >> It has been a long time but my fuzzy memory says that the originial
> >> boot_id justification was based on use cases that could not be solved
> >> any other way.
> >> 
> >> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
> >> that inspired the implementation of boot_id.  However reading the
> >> current emacs source code it appears emacs gave up before boot_id
> >> was implemented and stats /var/run/random-seed (which we seem to
> >> have removed) or looks in wtmp or utmp for the latest boot record.
> >> 
> >> I did a quick grep through the binaries on my system and I could not
> >> find anything using /proc/sys/random/boot_id.
> >> 
> >> That suggests to me that the proper solution is to actually just remove
> >> boot_id.
> >> 
> >> Hmm.  And then there is other interesting detail.  What should boot_id
> >> return after the processes have migrated from one system to another.
> >> 
> >
> > Since this would be a per-boot id, this clearly has to be carried over
> > with migration, along with all the tons of data we already carry.
> 
> The twist of course is what does a boot mean.  If we are really after
> machine boots than the current behavior is correct.
> 
> Looking back in the archives the desired behavior appears to be a value
> that can be used to see if a pid value must be stale.
> 
> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> reused.
> 
> Still a role as a stale pid detector makes it clear which namespace
> boot_id should be in and how we should treat boot_id upon migration.
> 
> You can only serve as a stale pid detector if you are in the pid
> namespace.
> 
> So at this point patches are welcome.  Hopefully with a summary
> of the discussion.

I don't understand why this should be provided by the kernel.  Especially
given that we've proven that everyone really wants this to be per-container
as well.

So why not just have init, on startup, create a /run/boot_id file, perhaps
by sha1summing the time at which it started perhaps plus some nonce?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
  2012-09-04 14:44                       ` Serge Hallyn
@ 2012-09-04 14:45                         ` Glauber Costa
       [not found]                           ` <50461421.7030305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-04 14:45 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 09/04/2012 06:44 PM, Serge Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>>
>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>
>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>>
>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>>>> container is started.
>>>>>>
>>>>>> There may be a good reason for this.  Most of the time what I have seen
>>>>>> of kernel requests from the direction of SystemD is that while there may
>>>>>> be a real problem but usually their imagined solution is not a
>>>>>> particularly good solution.  So a description of the problem is needed.
>>>>>>
>>>>>> Justifying something with just SystemD wants this is a good way to get
>>>>>> a nack.
>>>>>
>>>>> SystemD records log messages for all system services in their journal.
>>>>> They can show you all log messages for the current service execution,
>>>>> all log messages for a service since system boot, or all log messsages
>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>>>> the log messages per system boot. When we run systemd inside a container
>>>>> we want to get that grouping of log messages generated by services inside
>>>>> the container, to take account of the container boot, not the host boot.
>>>>> Hence the desire to have the boot_id value reflect when a container is
>>>>> booted.
>>>>
>>>> Since SystemD post-dates containers and since the logging feature is not
>>>> currently in wide use that use case is completely non-persuasive.
>>>>
>>>> So far this just sounds like a plain SystemD bug and something that can
>>>> be easily changed at this point in time.
>>>>
>>>> It has been a long time but my fuzzy memory says that the originial
>>>> boot_id justification was based on use cases that could not be solved
>>>> any other way.
>>>>
>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>>>> that inspired the implementation of boot_id.  However reading the
>>>> current emacs source code it appears emacs gave up before boot_id
>>>> was implemented and stats /var/run/random-seed (which we seem to
>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>>>>
>>>> I did a quick grep through the binaries on my system and I could not
>>>> find anything using /proc/sys/random/boot_id.
>>>>
>>>> That suggests to me that the proper solution is to actually just remove
>>>> boot_id.
>>>>
>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>>>> return after the processes have migrated from one system to another.
>>>>
>>>
>>> Since this would be a per-boot id, this clearly has to be carried over
>>> with migration, along with all the tons of data we already carry.
>>
>> The twist of course is what does a boot mean.  If we are really after
>> machine boots than the current behavior is correct.
>>
>> Looking back in the archives the desired behavior appears to be a value
>> that can be used to see if a pid value must be stale.
>>
>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>> reused.
>>
>> Still a role as a stale pid detector makes it clear which namespace
>> boot_id should be in and how we should treat boot_id upon migration.
>>
>> You can only serve as a stale pid detector if you are in the pid
>> namespace.
>>
>> So at this point patches are welcome.  Hopefully with a summary
>> of the discussion.
> 
> I don't understand why this should be provided by the kernel.  Especially
> given that we've proven that everyone really wants this to be per-container
> as well.
> 
> So why not just have init, on startup, create a /run/boot_id file, perhaps
> by sha1summing the time at which it started perhaps plus some nonce?
> 
Why shouldn't it provided by the kernel?, is the real question

The way I see it, every file we need to setup from the outside is a
hassle. Among many other things, it is just asking for duplication of
efforts among multiple userspaces.

netns does this for its proc files. The only reason we don't do it for
cgroups-driven file, is that the semantics is very ill-defined. For this
file, it doesn't seem to be the case.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                           ` <50461421.7030305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-04 15:25                             ` Serge Hallyn
  2012-09-04 15:31                               ` Glauber Costa
  0 siblings, 1 reply; 26+ messages in thread
From: Serge Hallyn @ 2012-09-04 15:25 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> >>
> >>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
> >>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>>>
> >>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> >>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>>>>>
> >>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
> >>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
> >>>>>>> container is started.
> >>>>>>
> >>>>>> There may be a good reason for this.  Most of the time what I have seen
> >>>>>> of kernel requests from the direction of SystemD is that while there may
> >>>>>> be a real problem but usually their imagined solution is not a
> >>>>>> particularly good solution.  So a description of the problem is needed.
> >>>>>>
> >>>>>> Justifying something with just SystemD wants this is a good way to get
> >>>>>> a nack.
> >>>>>
> >>>>> SystemD records log messages for all system services in their journal.
> >>>>> They can show you all log messages for the current service execution,
> >>>>> all log messages for a service since system boot, or all log messsages
> >>>>> ever. The boot_id value is used as a unique tag to allow grouping of
> >>>>> the log messages per system boot. When we run systemd inside a container
> >>>>> we want to get that grouping of log messages generated by services inside
> >>>>> the container, to take account of the container boot, not the host boot.
> >>>>> Hence the desire to have the boot_id value reflect when a container is
> >>>>> booted.
> >>>>
> >>>> Since SystemD post-dates containers and since the logging feature is not
> >>>> currently in wide use that use case is completely non-persuasive.
> >>>>
> >>>> So far this just sounds like a plain SystemD bug and something that can
> >>>> be easily changed at this point in time.
> >>>>
> >>>> It has been a long time but my fuzzy memory says that the originial
> >>>> boot_id justification was based on use cases that could not be solved
> >>>> any other way.
> >>>>
> >>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
> >>>> that inspired the implementation of boot_id.  However reading the
> >>>> current emacs source code it appears emacs gave up before boot_id
> >>>> was implemented and stats /var/run/random-seed (which we seem to
> >>>> have removed) or looks in wtmp or utmp for the latest boot record.
> >>>>
> >>>> I did a quick grep through the binaries on my system and I could not
> >>>> find anything using /proc/sys/random/boot_id.
> >>>>
> >>>> That suggests to me that the proper solution is to actually just remove
> >>>> boot_id.
> >>>>
> >>>> Hmm.  And then there is other interesting detail.  What should boot_id
> >>>> return after the processes have migrated from one system to another.
> >>>>
> >>>
> >>> Since this would be a per-boot id, this clearly has to be carried over
> >>> with migration, along with all the tons of data we already carry.
> >>
> >> The twist of course is what does a boot mean.  If we are really after
> >> machine boots than the current behavior is correct.
> >>
> >> Looking back in the archives the desired behavior appears to be a value
> >> that can be used to see if a pid value must be stale.
> >>
> >> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> >> reused.
> >>
> >> Still a role as a stale pid detector makes it clear which namespace
> >> boot_id should be in and how we should treat boot_id upon migration.
> >>
> >> You can only serve as a stale pid detector if you are in the pid
> >> namespace.
> >>
> >> So at this point patches are welcome.  Hopefully with a summary
> >> of the discussion.
> > 
> > I don't understand why this should be provided by the kernel.  Especially
> > given that we've proven that everyone really wants this to be per-container
> > as well.
> > 
> > So why not just have init, on startup, create a /run/boot_id file, perhaps
> > by sha1summing the time at which it started perhaps plus some nonce?
> > 
> Why shouldn't it provided by the kernel?, is the real question

Because it's not the right place.  The origin of this thread proves that
people want a per-init, not per-kernel, value.

> The way I see it, every file we need to setup from the outside is a
> hassle. Among many other things, it is just asking for duplication of
> efforts among multiple userspaces.
> 
> netns does this for its proc files. The only reason we don't do it for
> cgroups-driven file, is that the semantics is very ill-defined. For this
> file, it doesn't seem to be the case.

But it is the case.  How do you intend to have the kernel decide what
value to put in there for a process in a container, or in a chroot?

-serge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                             ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-09-04 12:08                               ` Daniel P. Berrange
@ 2012-09-04 15:28                               ` Serge Hallyn
  1 sibling, 0 replies; 26+ messages in thread
From: Serge Hallyn @ 2012-09-04 15:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> >> The twist of course is what does a boot mean.  If we are really after
> >> machine boots than the current behavior is correct.
> >> 
> >> Looking back in the archives the desired behavior appears to be a value
> >> that can be used to see if a pid value must be stale.
> >> 
> >> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> >> reused.
> >> 
> >> Still a role as a stale pid detector makes it clear which namespace
> >> boot_id should be in and how we should treat boot_id upon migration.
> >> 
> >> You can only serve as a stale pid detector if you are in the pid
> >> namespace.
> >> 
> >> So at this point patches are welcome.  Hopefully with a summary
> >> of the discussion.
> >> 
> >> Eric
> >> 
> >
> > Your discussion about boot_id being a limited solution is totally valid.
> > But it is orthogonal to the question of whether or not a container
> > should have it.
> 
> The important part is that boot_id as originally conceived is an
> identifier to be used in conjunction with pids.  Therefore boot_id is a
> proper pid_namespace component, and there are no semantic problems with
> putting it in the kernel.
> 
> > I took a look at this, and I think the kernel should be in perfect
> > position to do it.
> 
> Agreed.
> 
> > boot_id as a pid namespace id is a very well defined concept.
> 
> Agreed.
> 
> A reference to the history and the definition needs to be in the patch
> description.

Huh.  I must have glossed over that part of this thread.  If you both
agree to that, then I'm fine with it - I'll go read the history when the
new patch comes by.

thanks,
-serge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
  2012-09-04 15:25                             ` Serge Hallyn
@ 2012-09-04 15:31                               ` Glauber Costa
       [not found]                                 ` <50461EBB.2050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-09-04 15:31 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 09/04/2012 07:25 PM, Serge Hallyn wrote:
> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>>>>
>>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>>
>>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>>>>
>>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>>>>>> container is started.
>>>>>>>>
>>>>>>>> There may be a good reason for this.  Most of the time what I have seen
>>>>>>>> of kernel requests from the direction of SystemD is that while there may
>>>>>>>> be a real problem but usually their imagined solution is not a
>>>>>>>> particularly good solution.  So a description of the problem is needed.
>>>>>>>>
>>>>>>>> Justifying something with just SystemD wants this is a good way to get
>>>>>>>> a nack.
>>>>>>>
>>>>>>> SystemD records log messages for all system services in their journal.
>>>>>>> They can show you all log messages for the current service execution,
>>>>>>> all log messages for a service since system boot, or all log messsages
>>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>>>>>> the log messages per system boot. When we run systemd inside a container
>>>>>>> we want to get that grouping of log messages generated by services inside
>>>>>>> the container, to take account of the container boot, not the host boot.
>>>>>>> Hence the desire to have the boot_id value reflect when a container is
>>>>>>> booted.
>>>>>>
>>>>>> Since SystemD post-dates containers and since the logging feature is not
>>>>>> currently in wide use that use case is completely non-persuasive.
>>>>>>
>>>>>> So far this just sounds like a plain SystemD bug and something that can
>>>>>> be easily changed at this point in time.
>>>>>>
>>>>>> It has been a long time but my fuzzy memory says that the originial
>>>>>> boot_id justification was based on use cases that could not be solved
>>>>>> any other way.
>>>>>>
>>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>>>>>> that inspired the implementation of boot_id.  However reading the
>>>>>> current emacs source code it appears emacs gave up before boot_id
>>>>>> was implemented and stats /var/run/random-seed (which we seem to
>>>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>>>>>>
>>>>>> I did a quick grep through the binaries on my system and I could not
>>>>>> find anything using /proc/sys/random/boot_id.
>>>>>>
>>>>>> That suggests to me that the proper solution is to actually just remove
>>>>>> boot_id.
>>>>>>
>>>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>>>>>> return after the processes have migrated from one system to another.
>>>>>>
>>>>>
>>>>> Since this would be a per-boot id, this clearly has to be carried over
>>>>> with migration, along with all the tons of data we already carry.
>>>>
>>>> The twist of course is what does a boot mean.  If we are really after
>>>> machine boots than the current behavior is correct.
>>>>
>>>> Looking back in the archives the desired behavior appears to be a value
>>>> that can be used to see if a pid value must be stale.
>>>>
>>>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>>>> reused.
>>>>
>>>> Still a role as a stale pid detector makes it clear which namespace
>>>> boot_id should be in and how we should treat boot_id upon migration.
>>>>
>>>> You can only serve as a stale pid detector if you are in the pid
>>>> namespace.
>>>>
>>>> So at this point patches are welcome.  Hopefully with a summary
>>>> of the discussion.
>>>
>>> I don't understand why this should be provided by the kernel.  Especially
>>> given that we've proven that everyone really wants this to be per-container
>>> as well.
>>>
>>> So why not just have init, on startup, create a /run/boot_id file, perhaps
>>> by sha1summing the time at which it started perhaps plus some nonce?
>>>
>> Why shouldn't it provided by the kernel?, is the real question
> 
> Because it's not the right place.  The origin of this thread proves that
> people want a per-init, not per-kernel, value.
> 

Not all files provided by the kernel are "per-kernel". /proc/self is
full of per-namespace stuff.

>> The way I see it, every file we need to setup from the outside is a
>> hassle. Among many other things, it is just asking for duplication of
>> efforts among multiple userspaces.
>>
>> netns does this for its proc files. The only reason we don't do it for
>> cgroups-driven file, is that the semantics is very ill-defined. For this
>> file, it doesn't seem to be the case.
> 
> But it is the case.  How do you intend to have the kernel decide what
> value to put in there for a process in a container, or in a chroot?
> 

one value per pidns.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                                 ` <50461EBB.2050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-04 17:18                                   ` Serge E. Hallyn
       [not found]                                     ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Serge E. Hallyn @ 2012-09-04 17:18 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 09/04/2012 07:25 PM, Serge Hallyn wrote:
> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> >> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
> >>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >>>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> >>>>
> >>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
> >>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>>>>>
> >>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> >>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> >>>>>>>>
> >>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
> >>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
> >>>>>>>>> container is started.
> >>>>>>>>
> >>>>>>>> There may be a good reason for this.  Most of the time what I have seen
> >>>>>>>> of kernel requests from the direction of SystemD is that while there may
> >>>>>>>> be a real problem but usually their imagined solution is not a
> >>>>>>>> particularly good solution.  So a description of the problem is needed.
> >>>>>>>>
> >>>>>>>> Justifying something with just SystemD wants this is a good way to get
> >>>>>>>> a nack.
> >>>>>>>
> >>>>>>> SystemD records log messages for all system services in their journal.
> >>>>>>> They can show you all log messages for the current service execution,
> >>>>>>> all log messages for a service since system boot, or all log messsages
> >>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
> >>>>>>> the log messages per system boot. When we run systemd inside a container
> >>>>>>> we want to get that grouping of log messages generated by services inside
> >>>>>>> the container, to take account of the container boot, not the host boot.
> >>>>>>> Hence the desire to have the boot_id value reflect when a container is
> >>>>>>> booted.
> >>>>>>
> >>>>>> Since SystemD post-dates containers and since the logging feature is not
> >>>>>> currently in wide use that use case is completely non-persuasive.
> >>>>>>
> >>>>>> So far this just sounds like a plain SystemD bug and something that can
> >>>>>> be easily changed at this point in time.
> >>>>>>
> >>>>>> It has been a long time but my fuzzy memory says that the originial
> >>>>>> boot_id justification was based on use cases that could not be solved
> >>>>>> any other way.
> >>>>>>
> >>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
> >>>>>> that inspired the implementation of boot_id.  However reading the
> >>>>>> current emacs source code it appears emacs gave up before boot_id
> >>>>>> was implemented and stats /var/run/random-seed (which we seem to
> >>>>>> have removed) or looks in wtmp or utmp for the latest boot record.
> >>>>>>
> >>>>>> I did a quick grep through the binaries on my system and I could not
> >>>>>> find anything using /proc/sys/random/boot_id.
> >>>>>>
> >>>>>> That suggests to me that the proper solution is to actually just remove
> >>>>>> boot_id.
> >>>>>>
> >>>>>> Hmm.  And then there is other interesting detail.  What should boot_id
> >>>>>> return after the processes have migrated from one system to another.
> >>>>>>
> >>>>>
> >>>>> Since this would be a per-boot id, this clearly has to be carried over
> >>>>> with migration, along with all the tons of data we already carry.
> >>>>
> >>>> The twist of course is what does a boot mean.  If we are really after
> >>>> machine boots than the current behavior is correct.
> >>>>
> >>>> Looking back in the archives the desired behavior appears to be a value
> >>>> that can be used to see if a pid value must be stale.
> >>>>
> >>>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
> >>>> reused.
> >>>>
> >>>> Still a role as a stale pid detector makes it clear which namespace
> >>>> boot_id should be in and how we should treat boot_id upon migration.
> >>>>
> >>>> You can only serve as a stale pid detector if you are in the pid
> >>>> namespace.
> >>>>
> >>>> So at this point patches are welcome.  Hopefully with a summary
> >>>> of the discussion.
> >>>
> >>> I don't understand why this should be provided by the kernel.  Especially
> >>> given that we've proven that everyone really wants this to be per-container
> >>> as well.
> >>>
> >>> So why not just have init, on startup, create a /run/boot_id file, perhaps
> >>> by sha1summing the time at which it started perhaps plus some nonce?
> >>>
> >> Why shouldn't it provided by the kernel?, is the real question
> > 
> > Because it's not the right place.  The origin of this thread proves that
> > people want a per-init, not per-kernel, value.
> > 
> 
> Not all files provided by the kernel are "per-kernel". /proc/self is
> full of per-namespace stuff.
> 
> >> The way I see it, every file we need to setup from the outside is a
> >> hassle. Among many other things, it is just asking for duplication of
> >> efforts among multiple userspaces.
> >>
> >> netns does this for its proc files. The only reason we don't do it for
> >> cgroups-driven file, is that the semantics is very ill-defined. For this
> >> file, it doesn't seem to be the case.
> > 
> > But it is the case.  How do you intend to have the kernel decide what
> > value to put in there for a process in a container, or in a chroot?
> > 
> 
> one value per pidns.

ok.  (So should it be called /proc/pidns_uuid?  Well, whatever.  No
objection from me - thanks.)

-serge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                                     ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-09-04 19:46                                       ` Eric W. Biederman
       [not found]                                         ` <87vcft1shu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-09-05  7:59                                       ` Glauber Costa
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2012-09-04 19:46 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> On 09/04/2012 07:25 PM, Serge Hallyn wrote:
>> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> >> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
>> >>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >>>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> >>>>
>> >>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>> >>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> >>>>>>
>> >>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>> >>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> >>>>>>>>
>> >>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>> >>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> >>>>>>>>> container is started.
>> >>>>>>>>
>> >>>>>>>> There may be a good reason for this.  Most of the time what I have seen
>> >>>>>>>> of kernel requests from the direction of SystemD is that while there may
>> >>>>>>>> be a real problem but usually their imagined solution is not a
>> >>>>>>>> particularly good solution.  So a description of the problem is needed.
>> >>>>>>>>
>> >>>>>>>> Justifying something with just SystemD wants this is a good way to get
>> >>>>>>>> a nack.
>> >>>>>>>
>> >>>>>>> SystemD records log messages for all system services in their journal.
>> >>>>>>> They can show you all log messages for the current service execution,
>> >>>>>>> all log messages for a service since system boot, or all log messsages
>> >>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>> >>>>>>> the log messages per system boot. When we run systemd inside a container
>> >>>>>>> we want to get that grouping of log messages generated by services inside
>> >>>>>>> the container, to take account of the container boot, not the host boot.
>> >>>>>>> Hence the desire to have the boot_id value reflect when a container is
>> >>>>>>> booted.
>> >>>>>>
>> >>>>>> Since SystemD post-dates containers and since the logging feature is not
>> >>>>>> currently in wide use that use case is completely non-persuasive.
>> >>>>>>
>> >>>>>> So far this just sounds like a plain SystemD bug and something that can
>> >>>>>> be easily changed at this point in time.
>> >>>>>>
>> >>>>>> It has been a long time but my fuzzy memory says that the originial
>> >>>>>> boot_id justification was based on use cases that could not be solved
>> >>>>>> any other way.
>> >>>>>>
>> >>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>> >>>>>> that inspired the implementation of boot_id.  However reading the
>> >>>>>> current emacs source code it appears emacs gave up before boot_id
>> >>>>>> was implemented and stats /var/run/random-seed (which we seem to
>> >>>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>> >>>>>>
>> >>>>>> I did a quick grep through the binaries on my system and I could not
>> >>>>>> find anything using /proc/sys/random/boot_id.
>> >>>>>>
>> >>>>>> That suggests to me that the proper solution is to actually just remove
>> >>>>>> boot_id.
>> >>>>>>
>> >>>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>> >>>>>> return after the processes have migrated from one system to another.
>> >>>>>>
>> >>>>>
>> >>>>> Since this would be a per-boot id, this clearly has to be carried over
>> >>>>> with migration, along with all the tons of data we already carry.
>> >>>>
>> >>>> The twist of course is what does a boot mean.  If we are really after
>> >>>> machine boots than the current behavior is correct.
>> >>>>
>> >>>> Looking back in the archives the desired behavior appears to be a value
>> >>>> that can be used to see if a pid value must be stale.
>> >>>>
>> >>>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>> >>>> reused.
>> >>>>
>> >>>> Still a role as a stale pid detector makes it clear which namespace
>> >>>> boot_id should be in and how we should treat boot_id upon migration.
>> >>>>
>> >>>> You can only serve as a stale pid detector if you are in the pid
>> >>>> namespace.
>> >>>>
>> >>>> So at this point patches are welcome.  Hopefully with a summary
>> >>>> of the discussion.
>> >>>
>> >>> I don't understand why this should be provided by the kernel.  Especially
>> >>> given that we've proven that everyone really wants this to be per-container
>> >>> as well.
>> >>>
>> >>> So why not just have init, on startup, create a /run/boot_id file, perhaps
>> >>> by sha1summing the time at which it started perhaps plus some nonce?
>> >>>
>> >> Why shouldn't it provided by the kernel?, is the real question
>> > 
>> > Because it's not the right place.  The origin of this thread proves that
>> > people want a per-init, not per-kernel, value.
>> > 
>> 
>> Not all files provided by the kernel are "per-kernel". /proc/self is
>> full of per-namespace stuff.
>> 
>> >> The way I see it, every file we need to setup from the outside is a
>> >> hassle. Among many other things, it is just asking for duplication of
>> >> efforts among multiple userspaces.
>> >>
>> >> netns does this for its proc files. The only reason we don't do it for
>> >> cgroups-driven file, is that the semantics is very ill-defined. For this
>> >> file, it doesn't seem to be the case.
>> > 
>> > But it is the case.  How do you intend to have the kernel decide what
>> > value to put in there for a process in a container, or in a chroot?
>> > 
>> 
>> one value per pidns.
>
> ok.  (So should it be called /proc/pidns_uuid?  Well, whatever.  No
> objection from me - thanks.)

/proc/sys/kernel/boot_id.

Someday we will get the plumbing right in the kernel so that can be
/proc/sys -> /proc/self/sys and /proc/self/sys/kernel/boot_id

The origin of boot_id was so that emacs could implement distributed
locking in userspace by creating a symlink from .#filename to 
user-WI0L6dQK/Vr7saj2s7cPmQ@public.gmane.org:boot_id.

Ultimately emacs opted to just stat /var/run/random-seed or to grovel
through utmp or wtmp to find the last boot record.

Of course /var/run/random-seed is now named something like
/var/lib/urandom/random-seed as distributions continue their relentless
pursuit to break userspace.

But ultimately boot_id was defined as something you can use to detect
stale pids and stale lockfiles.  Since the original definition was
a uuid to detect stale pids, that seems a reasonable justification
for keeping it in the pid_namespace.  Boot_id isn't the best name in
that case but shrug.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                                     ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2012-09-04 19:46                                       ` Eric W. Biederman
@ 2012-09-05  7:59                                       ` Glauber Costa
  1 sibling, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-09-05  7:59 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 09/04/2012 09:18 PM, Serge E. Hallyn wrote:
> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> On 09/04/2012 07:25 PM, Serge Hallyn wrote:
>>> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>>>> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
>>>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>>>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>>>>>>
>>>>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>>>>
>>>>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>>>>>>>>>
>>>>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>>>>>>>> container is started.
>>>>>>>>>>
>>>>>>>>>> There may be a good reason for this.  Most of the time what I have seen
>>>>>>>>>> of kernel requests from the direction of SystemD is that while there may
>>>>>>>>>> be a real problem but usually their imagined solution is not a
>>>>>>>>>> particularly good solution.  So a description of the problem is needed.
>>>>>>>>>>
>>>>>>>>>> Justifying something with just SystemD wants this is a good way to get
>>>>>>>>>> a nack.
>>>>>>>>>
>>>>>>>>> SystemD records log messages for all system services in their journal.
>>>>>>>>> They can show you all log messages for the current service execution,
>>>>>>>>> all log messages for a service since system boot, or all log messsages
>>>>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>>>>>>>> the log messages per system boot. When we run systemd inside a container
>>>>>>>>> we want to get that grouping of log messages generated by services inside
>>>>>>>>> the container, to take account of the container boot, not the host boot.
>>>>>>>>> Hence the desire to have the boot_id value reflect when a container is
>>>>>>>>> booted.
>>>>>>>>
>>>>>>>> Since SystemD post-dates containers and since the logging feature is not
>>>>>>>> currently in wide use that use case is completely non-persuasive.
>>>>>>>>
>>>>>>>> So far this just sounds like a plain SystemD bug and something that can
>>>>>>>> be easily changed at this point in time.
>>>>>>>>
>>>>>>>> It has been a long time but my fuzzy memory says that the originial
>>>>>>>> boot_id justification was based on use cases that could not be solved
>>>>>>>> any other way.
>>>>>>>>
>>>>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>>>>>>>> that inspired the implementation of boot_id.  However reading the
>>>>>>>> current emacs source code it appears emacs gave up before boot_id
>>>>>>>> was implemented and stats /var/run/random-seed (which we seem to
>>>>>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>>>>>>>>
>>>>>>>> I did a quick grep through the binaries on my system and I could not
>>>>>>>> find anything using /proc/sys/random/boot_id.
>>>>>>>>
>>>>>>>> That suggests to me that the proper solution is to actually just remove
>>>>>>>> boot_id.
>>>>>>>>
>>>>>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>>>>>>>> return after the processes have migrated from one system to another.
>>>>>>>>
>>>>>>>
>>>>>>> Since this would be a per-boot id, this clearly has to be carried over
>>>>>>> with migration, along with all the tons of data we already carry.
>>>>>>
>>>>>> The twist of course is what does a boot mean.  If we are really after
>>>>>> machine boots than the current behavior is correct.
>>>>>>
>>>>>> Looking back in the archives the desired behavior appears to be a value
>>>>>> that can be used to see if a pid value must be stale.
>>>>>>
>>>>>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>>>>>> reused.
>>>>>>
>>>>>> Still a role as a stale pid detector makes it clear which namespace
>>>>>> boot_id should be in and how we should treat boot_id upon migration.
>>>>>>
>>>>>> You can only serve as a stale pid detector if you are in the pid
>>>>>> namespace.
>>>>>>
>>>>>> So at this point patches are welcome.  Hopefully with a summary
>>>>>> of the discussion.
>>>>>
>>>>> I don't understand why this should be provided by the kernel.  Especially
>>>>> given that we've proven that everyone really wants this to be per-container
>>>>> as well.
>>>>>
>>>>> So why not just have init, on startup, create a /run/boot_id file, perhaps
>>>>> by sha1summing the time at which it started perhaps plus some nonce?
>>>>>
>>>> Why shouldn't it provided by the kernel?, is the real question
>>>
>>> Because it's not the right place.  The origin of this thread proves that
>>> people want a per-init, not per-kernel, value.
>>>
>>
>> Not all files provided by the kernel are "per-kernel". /proc/self is
>> full of per-namespace stuff.
>>
>>>> The way I see it, every file we need to setup from the outside is a
>>>> hassle. Among many other things, it is just asking for duplication of
>>>> efforts among multiple userspaces.
>>>>
>>>> netns does this for its proc files. The only reason we don't do it for
>>>> cgroups-driven file, is that the semantics is very ill-defined. For this
>>>> file, it doesn't seem to be the case.
>>>
>>> But it is the case.  How do you intend to have the kernel decide what
>>> value to put in there for a process in a container, or in a chroot?
>>>
>>
>> one value per pidns.
> 
> ok.  (So should it be called /proc/pidns_uuid?  Well, whatever.  No
> objection from me - thanks.)
> 
> -serge
> 
For completeness, I believe it should live in the same place it lives
today, and become a symlink from /proc/self/boot_id.

Consistent with what we have today for other values like this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
       [not found]                                         ` <87vcft1shu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-09-05 12:10                                           ` Daniel P. Berrange
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrange @ 2012-09-05 12:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Sep 04, 2012 at 12:46:05PM -0700, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> >> Not all files provided by the kernel are "per-kernel". /proc/self is
> >> full of per-namespace stuff.
> >> 
> >> >> The way I see it, every file we need to setup from the outside is a
> >> >> hassle. Among many other things, it is just asking for duplication of
> >> >> efforts among multiple userspaces.
> >> >>
> >> >> netns does this for its proc files. The only reason we don't do it for
> >> >> cgroups-driven file, is that the semantics is very ill-defined. For this
> >> >> file, it doesn't seem to be the case.
> >> > 
> >> > But it is the case.  How do you intend to have the kernel decide what
> >> > value to put in there for a process in a container, or in a chroot?
> >> > 
> >> 
> >> one value per pidns.
> >
> > ok.  (So should it be called /proc/pidns_uuid?  Well, whatever.  No
> > objection from me - thanks.)
> 
> /proc/sys/kernel/boot_id.
> 
> Someday we will get the plumbing right in the kernel so that can be
> /proc/sys -> /proc/self/sys and /proc/self/sys/kernel/boot_id
> 
> The origin of boot_id was so that emacs could implement distributed
> locking in userspace by creating a symlink from .#filename to 
> user-WI0L6dQK/Vr7saj2s7cPmQ@public.gmane.org:boot_id.
> 
> Ultimately emacs opted to just stat /var/run/random-seed or to grovel
> through utmp or wtmp to find the last boot record.
> 
> Of course /var/run/random-seed is now named something like
> /var/lib/urandom/random-seed as distributions continue their relentless
> pursuit to break userspace.
> 
> But ultimately boot_id was defined as something you can use to detect
> stale pids and stale lockfiles.  Since the original definition was
> a uuid to detect stale pids, that seems a reasonable justification
> for keeping it in the pid_namespace.  Boot_id isn't the best name in
> that case but shrug.

Ok, so reading through this thread, my understanding is that any patch
for this needs to work as follows:

 - Associate '/proc/sys/kernel/random/boot_id' with the pid namespace

 - Allow boot_id to be written to, only if it has not yet been
   read in the current pid namespace. (for migration use case)

 - Lazy generate a UUID for boot_id on first read in the current pid
   namespace, only if it has not previously been written to.

 - Add file to Documentation/ explaining the use case for the boot_id
   file and its semantics wrt to namespaces.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-09-05 12:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-30 21:18 Virtualizing /proc/sys/kernel/random/boot_id per container ? Daniel P. Berrange
     [not found] ` <20120830211832.GA3297-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-30 22:15   ` Eric W. Biederman
     [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-08-30 22:50       ` Daniel P. Berrange
     [not found]         ` <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31  0:13           ` Eric W. Biederman
     [not found]             ` <87bohrhqai.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-03  7:56               ` Glauber Costa
     [not found]                 ` <5044629C.3030909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-03 19:48                   ` Eric W. Biederman
     [not found]                     ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04  8:42                       ` Glauber Costa
     [not found]                         ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04  9:16                           ` Glauber Costa
     [not found]                             ` <5045C707.9020001-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04  9:53                               ` Eric W. Biederman
2012-09-04  9:20                           ` Eric W. Biederman
     [not found]                             ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04 12:08                               ` Daniel P. Berrange
2012-09-04 15:28                               ` Serge Hallyn
2012-09-04 14:44                       ` Serge Hallyn
2012-09-04 14:45                         ` Glauber Costa
     [not found]                           ` <50461421.7030305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 15:25                             ` Serge Hallyn
2012-09-04 15:31                               ` Glauber Costa
     [not found]                                 ` <50461EBB.2050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 17:18                                   ` Serge E. Hallyn
     [not found]                                     ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-09-04 19:46                                       ` Eric W. Biederman
     [not found]                                         ` <87vcft1shu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-05 12:10                                           ` Daniel P. Berrange
2012-09-05  7:59                                       ` Glauber Costa
2012-08-30 23:22       ` Daniel P. Berrange
     [not found]         ` <20120830232239.GE9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31  0:18           ` Eric W. Biederman
2012-08-31 13:25       ` Serge Hallyn
2012-09-03  7:53         ` Glauber Costa
     [not found]           ` <504461F1.1090400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 14:42             ` Serge Hallyn
2012-09-03  7:52       ` Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.