linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: clone3: allow creation of time namespace with offset
       [not found] <20200317083043.226593-1-areber@redhat.com>
@ 2020-03-17  9:40 ` Michael Kerrisk (man-pages)
  2020-03-17 14:23   ` Aleksa Sarai
       [not found] ` <CAK8P3a2-qQhpRdF0+iVrpp=vEvgwtndQL89CUm_QzoW2QYX1Jw@mail.gmail.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-03-17  9:40 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, lkml, Mike Rapoport,
	Radostin Stoyanov, Arnd Bergmann, Cyrill Gorcunov,
	Thomas Gleixner, Linux API

[CC += linux-api; please CC on future versions]

On Tue, 17 Mar 2020 at 09:32, Adrian Reber <areber@redhat.com> wrote:
>
> This is an attempt to add time namespace support to clone3(). I am not
> really sure which way clone3() should handle time namespaces. The time
> namespace through /proc cannot be used with clone3() because the offsets
> for the time namespace need to be written before a process has been
> created in that time namespace. This means it is necessary to somehow
> tell clone3() the offsets for the clocks.
>
> The time namespace offers the possibility to set offsets for
> CLOCK_MONOTONIC and CLOCK_BOOTTIME. My first approach was to extend
> 'struct clone_args` with '__aligned_u64 monotonic_offset' and
> '__aligned_u64 boottime_offset'. The problem with this approach was that
> it was not possible to set nanoseconds for the clocks in the time
> namespace.
>
> One of the motivations for clone3() with CLONE_NEWTIME was to enable
> CRIU to restore a process in a time namespace with the corresponding
> offsets. And although the nanosecond value can probably never be
> restored to the same value it had during checkpointing, because the
> clock keeps on running between CRIU pausing all processes and CRIU
> actually reading the value of the clocks, the nanosecond value is still
> necessary for CRIU to not restore a process where the clock jumps back
> due to CRIU restoring it with a nanonsecond value that is too small.
>
> Requiring nanoseconds as well as seconds for two clocks during clone3()
> means that it would require 4 additional members to 'struct clone_args':
>
>         __aligned_u64 tls;
>         __aligned_u64 set_tid;
>         __aligned_u64 set_tid_size;
> +       __aligned_u64 boottime_offset_seconds;
> +       __aligned_u64 boottime_offset_nanoseconds;
> +       __aligned_u64 monotonic_offset_seconds;
> +       __aligned_u64 monotonic_offset_nanoseconds;
>  };
>
> To avoid four additional members to 'struct clone_args' this patchset
> uses another approach:
>
>         __aligned_u64 tls;
>         __aligned_u64 set_tid;
>         __aligned_u64 set_tid_size;
> +       __aligned_u64 timens_offset;
> +       __aligned_u64 timens_offset_size;
>  };
>
> timens_offset is a pointer to an array just as previously done with
> set_tid and timens_offset_size is the size of the array.
>
> The timens_offset array is expected to contain a struct like this:
>
> struct set_timens_offset {
>        int clockid;
>        struct timespec val;
> };
>
> This way it is possible to pass the information of multiple clocks with
> seconds and nanonseconds to clone3().
>
> To me this seems the better approach, but I am not totally convinced
> that it is the right thing. If there are other ideas how to pass two
> clock offsets with seconds and nanonseconds to clone3() I would be happy
> to hear other ideas.
>
>                 Adrian
>
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-17  9:40 ` clone3: allow creation of time namespace with offset Michael Kerrisk (man-pages)
@ 2020-03-17 14:23   ` Aleksa Sarai
  2020-03-17 16:09     ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Aleksa Sarai @ 2020-03-17 14:23 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Adrian Reber, Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, lkml, Mike Rapoport,
	Radostin Stoyanov, Arnd Bergmann, Cyrill Gorcunov,
	Thomas Gleixner, Linux API

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

On 2020-03-17, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
> [CC += linux-api; please CC on future versions]
> 
> On Tue, 17 Mar 2020 at 09:32, Adrian Reber <areber@redhat.com> wrote:
> > Requiring nanoseconds as well as seconds for two clocks during clone3()
> > means that it would require 4 additional members to 'struct clone_args':
> >
> >         __aligned_u64 tls;
> >         __aligned_u64 set_tid;
> >         __aligned_u64 set_tid_size;
> > +       __aligned_u64 boottime_offset_seconds;
> > +       __aligned_u64 boottime_offset_nanoseconds;
> > +       __aligned_u64 monotonic_offset_seconds;
> > +       __aligned_u64 monotonic_offset_nanoseconds;
> >  };
> >
> > To avoid four additional members to 'struct clone_args' this patchset
> > uses another approach:
> >
> >         __aligned_u64 tls;
> >         __aligned_u64 set_tid;
> >         __aligned_u64 set_tid_size;
> > +       __aligned_u64 timens_offset;
> > +       __aligned_u64 timens_offset_size;
> >  };
> >
> > timens_offset is a pointer to an array just as previously done with
> > set_tid and timens_offset_size is the size of the array.
> >
> > The timens_offset array is expected to contain a struct like this:
> >
> > struct set_timens_offset {
> >        int clockid;
> >        struct timespec val;
> > };
> >
> > This way it is possible to pass the information of multiple clocks with
> > seconds and nanonseconds to clone3().
> >
> > To me this seems the better approach, but I am not totally convinced
> > that it is the right thing. If there are other ideas how to pass two
> > clock offsets with seconds and nanonseconds to clone3() I would be happy
> > to hear other ideas.

While I agree this does make the API cleaner, I am a little worried that
it risks killing some of the ideas we discussed for seccomp deep
inspection. In particular, having a pointer to variable-sized data
inside the struct means that now the cBPF program can't just be given a
copy of the struct data from userspace to check.

I'm sure it's a solveable problem (and it was one we were bound to run
into at some point), it'll just mean we'll need a more complicated way
of filtering such syscalls.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-17 14:23   ` Aleksa Sarai
@ 2020-03-17 16:09     ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2020-03-17 16:09 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Michael Kerrisk (man-pages),
	Adrian Reber, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, lkml, Mike Rapoport,
	Radostin Stoyanov, Arnd Bergmann, Cyrill Gorcunov,
	Thomas Gleixner, Linux API

On Wed, Mar 18, 2020 at 01:23:50AM +1100, Aleksa Sarai wrote:
> On 2020-03-17, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
> > [CC += linux-api; please CC on future versions]
> > 
> > On Tue, 17 Mar 2020 at 09:32, Adrian Reber <areber@redhat.com> wrote:
> > > Requiring nanoseconds as well as seconds for two clocks during clone3()
> > > means that it would require 4 additional members to 'struct clone_args':
> > >
> > >         __aligned_u64 tls;
> > >         __aligned_u64 set_tid;
> > >         __aligned_u64 set_tid_size;
> > > +       __aligned_u64 boottime_offset_seconds;
> > > +       __aligned_u64 boottime_offset_nanoseconds;
> > > +       __aligned_u64 monotonic_offset_seconds;
> > > +       __aligned_u64 monotonic_offset_nanoseconds;
> > >  };
> > >
> > > To avoid four additional members to 'struct clone_args' this patchset
> > > uses another approach:
> > >
> > >         __aligned_u64 tls;
> > >         __aligned_u64 set_tid;
> > >         __aligned_u64 set_tid_size;
> > > +       __aligned_u64 timens_offset;
> > > +       __aligned_u64 timens_offset_size;
> > >  };
> > >
> > > timens_offset is a pointer to an array just as previously done with
> > > set_tid and timens_offset_size is the size of the array.
> > >
> > > The timens_offset array is expected to contain a struct like this:
> > >
> > > struct set_timens_offset {
> > >        int clockid;
> > >        struct timespec val;
> > > };
> > >
> > > This way it is possible to pass the information of multiple clocks with
> > > seconds and nanonseconds to clone3().
> > >
> > > To me this seems the better approach, but I am not totally convinced
> > > that it is the right thing. If there are other ideas how to pass two
> > > clock offsets with seconds and nanonseconds to clone3() I would be happy
> > > to hear other ideas.
> 
> While I agree this does make the API cleaner, I am a little worried that
> it risks killing some of the ideas we discussed for seccomp deep
> inspection. In particular, having a pointer to variable-sized data
> inside the struct means that now the cBPF program can't just be given a
> copy of the struct data from userspace to check.

I suggested two alternative approaches in a response to this. The
easiest one would be to simple assume that the struct doesn't change
size.
(But haven't we crossed that bridge with the set_tid array already?)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
       [not found] ` <CAK8P3a2-qQhpRdF0+iVrpp=vEvgwtndQL89CUm_QzoW2QYX1Jw@mail.gmail.com>
@ 2020-03-19  8:11   ` Adrian Reber
  2020-03-19  8:16     ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Reber @ 2020-03-19  8:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, linux-kernel,
	Mike Rapoport, Radostin Stoyanov, Michael Kerrisk,
	Cyrill Gorcunov, Thomas Gleixner, Aleksa Sarai, Linux API

On Wed, Mar 18, 2020 at 11:18:53AM +0100, Arnd Bergmann wrote:
> On Tue, Mar 17, 2020 at 9:32 AM Adrian Reber <areber@redhat.com> wrote:
> >
> > This is an attempt to add time namespace support to clone3(). I am not
> > really sure which way clone3() should handle time namespaces. The time
> > namespace through /proc cannot be used with clone3() because the offsets
> > for the time namespace need to be written before a process has been
> > created in that time namespace. This means it is necessary to somehow
> > tell clone3() the offsets for the clocks.
> >
> > The time namespace offers the possibility to set offsets for
> > CLOCK_MONOTONIC and CLOCK_BOOTTIME. My first approach was to extend
> > 'struct clone_args` with '__aligned_u64 monotonic_offset' and
> > '__aligned_u64 boottime_offset'. The problem with this approach was that
> > it was not possible to set nanoseconds for the clocks in the time
> > namespace.
> >
> > One of the motivations for clone3() with CLONE_NEWTIME was to enable
> > CRIU to restore a process in a time namespace with the corresponding
> > offsets. And although the nanosecond value can probably never be
> > restored to the same value it had during checkpointing, because the
> > clock keeps on running between CRIU pausing all processes and CRIU
> > actually reading the value of the clocks, the nanosecond value is still
> > necessary for CRIU to not restore a process where the clock jumps back
> > due to CRIU restoring it with a nanonsecond value that is too small.
> >
> > Requiring nanoseconds as well as seconds for two clocks during clone3()
> > means that it would require 4 additional members to 'struct clone_args':
> >
> >         __aligned_u64 tls;
> >         __aligned_u64 set_tid;
> >         __aligned_u64 set_tid_size;
> > +       __aligned_u64 boottime_offset_seconds;
> > +       __aligned_u64 boottime_offset_nanoseconds;
> > +       __aligned_u64 monotonic_offset_seconds;
> > +       __aligned_u64 monotonic_offset_nanoseconds;
> >  };
> 
> Wouldn't it be sufficient to have the two nanosecond values, rather
> than both seconds and nanoseconds? With 64-bit nanoseconds
> you can represent several hundred years, and these would
> always start at zero during boot.

I like this. Just using nanoseconds will make it easier and should
indeed be enough.

> Regardless of this, I think you need a signed offset, not unsigned.

Right, that was just a quick test at some point.

Christian and I have also been discussing this a bit and Christian
prefers a pointer to a struct. Maybe something like this:

        __aligned_u64 tls;
        __aligned_u64 set_tid;
        __aligned_u64 set_tid_size;
+       __aligned_u64 timens_offset;
 };

With Arnd's idea of only using nanoseconds, timens_offset would then
contain something like this:

struct timens_offset {
	__aligned_s64 monotonic_offset_ns;
	__aligned_s64 boottime_offset_ns;
};

I kind of prefer adding boottime and monotonic directly to struct clone_args

        __aligned_u64 tls;
        __aligned_u64 set_tid;
        __aligned_u64 set_tid_size;
+       __aligned_s64 monotonic_offset_ns;
+       __aligned_s64 boottime_offset_ns;
 };

But setting the time namespace offset is probably something which does
not happen very often while using clone3(), so maybe the pointer to a
struct approach is better.

I will resend the patches using the pointer to a struct approach if
there are no other ideas how to do this.

		Adrian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-19  8:11   ` Adrian Reber
@ 2020-03-19  8:16     ` Arnd Bergmann
  2020-03-19 10:29       ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2020-03-19  8:16 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, linux-kernel,
	Mike Rapoport, Radostin Stoyanov, Michael Kerrisk,
	Cyrill Gorcunov, Thomas Gleixner, Aleksa Sarai, Linux API

On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:

> With Arnd's idea of only using nanoseconds, timens_offset would then
> contain something like this:
>
> struct timens_offset {
>         __aligned_s64 monotonic_offset_ns;
>         __aligned_s64 boottime_offset_ns;
> };
>
> I kind of prefer adding boottime and monotonic directly to struct clone_args
>
>         __aligned_u64 tls;
>         __aligned_u64 set_tid;
>         __aligned_u64 set_tid_size;
> +       __aligned_s64 monotonic_offset_ns;
> +       __aligned_s64 boottime_offset_ns;
>  };

I would also prefer the second approach using two 64-bit integers
instead of a pointer, as it keeps the interface simpler to implement
and simpler to interpret by other tools.

      Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-19  8:16     ` Arnd Bergmann
@ 2020-03-19 10:29       ` Christian Brauner
  2020-03-20 18:33         ` Andrei Vagin
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2020-03-19 10:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Adrian Reber, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> 
> > With Arnd's idea of only using nanoseconds, timens_offset would then
> > contain something like this:
> >
> > struct timens_offset {
> >         __aligned_s64 monotonic_offset_ns;
> >         __aligned_s64 boottime_offset_ns;
> > };
> >
> > I kind of prefer adding boottime and monotonic directly to struct clone_args
> >
> >         __aligned_u64 tls;
> >         __aligned_u64 set_tid;
> >         __aligned_u64 set_tid_size;
> > +       __aligned_s64 monotonic_offset_ns;
> > +       __aligned_s64 boottime_offset_ns;
> >  };
> 
> I would also prefer the second approach using two 64-bit integers
> instead of a pointer, as it keeps the interface simpler to implement
> and simpler to interpret by other tools.

Why I don't like has two reasons. There's the scenario where we have
added new extensions after the new boottime member and then we introduce
another offset. Then you'd be looking at:

__aligned_u64 tls;
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
+ __aligned_s64 monotonic_offset_ns;
+ __aligned_s64 boottime_offset_ns;
__aligned_s64 something_1
__aligned_s64 anything_2
+ __aligned_s64 sometime_offset_ns

which bothers me just by looking at it. That's in addition to adding two
new members to the struct when most people will never set CLONE_NEWTIME.
We'll also likely have more features in the future that will want to
pass down more info than we want to directly expose in struct
clone_args, e.g. for a long time I have been thinking about adding a
struct for CLONE_NEWUSER that allows you to specify the id mappings you
want the new user namespace to get. We surely don't want to force all
new info into the uppermost struct. So I'm not convinced we should here.

Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-19 10:29       ` Christian Brauner
@ 2020-03-20 18:33         ` Andrei Vagin
  2020-03-24 16:09           ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Andrei Vagin @ 2020-03-20 18:33 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Arnd Bergmann, Adrian Reber, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > 
> > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > contain something like this:
> > >
> > > struct timens_offset {
> > >         __aligned_s64 monotonic_offset_ns;
> > >         __aligned_s64 boottime_offset_ns;
> > > };
> > >
> > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > >
> > >         __aligned_u64 tls;
> > >         __aligned_u64 set_tid;
> > >         __aligned_u64 set_tid_size;
> > > +       __aligned_s64 monotonic_offset_ns;
> > > +       __aligned_s64 boottime_offset_ns;
> > >  };
> > 
> > I would also prefer the second approach using two 64-bit integers
> > instead of a pointer, as it keeps the interface simpler to implement
> > and simpler to interpret by other tools.
> 
> Why I don't like has two reasons. There's the scenario where we have
> added new extensions after the new boottime member and then we introduce
> another offset. Then you'd be looking at:
> 
> __aligned_u64 tls;
> __aligned_u64 set_tid;
> __aligned_u64 set_tid_size;
> + __aligned_s64 monotonic_offset_ns;
> + __aligned_s64 boottime_offset_ns;
> __aligned_s64 something_1
> __aligned_s64 anything_2
> + __aligned_s64 sometime_offset_ns
> 
> which bothers me just by looking at it. That's in addition to adding two
> new members to the struct when most people will never set CLONE_NEWTIME.
> We'll also likely have more features in the future that will want to
> pass down more info than we want to directly expose in struct
> clone_args, e.g. for a long time I have been thinking about adding a
> struct for CLONE_NEWUSER that allows you to specify the id mappings you
> want the new user namespace to get. We surely don't want to force all
> new info into the uppermost struct. So I'm not convinced we should here.

I think here we can start thinking about a netlink-like interface.

struct clone_args {
	....
	u64	attrs_offset;
}

struct clone_attr {
	u16 cla_len;
	u16 cla_type;
}


....

int parse_clone_attributes(struct kernel_clone_args *kargs, struct clone_args *args, size_t args_size)
{
	u64 off = args->attrs_offset;

	while (off < size) {
		struct clone_attr *attr;

		if (off + sizeof(struct clone_attr) uargs_size)
			return -EINVAL;

		attr = (struct clone_attr *) ((void *)args + off);

		if (attr->cla_type > CLONE_ATTR_TYPE_MAX)
			return -ENOSYS;

		kargs->attrs[attr->cla_type] = CLONE_ATTR_DATA(attr);
		off += CLONE_ATTR_LEN(attr);
	}

	return 0;
}

This interface doesn't suffer from problems what you enumerated before:

* clone_args contains only fields which are often used.
* per-feature attributes can be extended in a future without breaking
  backward compatibility.
* unused features don't affect clone3 argument size.
* seccomp-friendly (I am not 100% sure about this)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-20 18:33         ` Andrei Vagin
@ 2020-03-24 16:09           ` Christian Brauner
  2020-03-24 16:25             ` Adrian Reber
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2020-03-24 16:09 UTC (permalink / raw)
  To: Andrei Vagin, Arnd Bergmann, Adrian Reber
  Cc: Eric Biederman, Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov,
	linux-kernel, Mike Rapoport, Radostin Stoyanov, Michael Kerrisk,
	Cyrill Gorcunov, Thomas Gleixner, Aleksa Sarai, Linux API

On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > > 
> > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > contain something like this:
> > > >
> > > > struct timens_offset {
> > > >         __aligned_s64 monotonic_offset_ns;
> > > >         __aligned_s64 boottime_offset_ns;
> > > > };
> > > >
> > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > >
> > > >         __aligned_u64 tls;
> > > >         __aligned_u64 set_tid;
> > > >         __aligned_u64 set_tid_size;
> > > > +       __aligned_s64 monotonic_offset_ns;
> > > > +       __aligned_s64 boottime_offset_ns;
> > > >  };
> > > 
> > > I would also prefer the second approach using two 64-bit integers
> > > instead of a pointer, as it keeps the interface simpler to implement
> > > and simpler to interpret by other tools.
> > 
> > Why I don't like has two reasons. There's the scenario where we have
> > added new extensions after the new boottime member and then we introduce
> > another offset. Then you'd be looking at:
> > 
> > __aligned_u64 tls;
> > __aligned_u64 set_tid;
> > __aligned_u64 set_tid_size;
> > + __aligned_s64 monotonic_offset_ns;
> > + __aligned_s64 boottime_offset_ns;
> > __aligned_s64 something_1
> > __aligned_s64 anything_2
> > + __aligned_s64 sometime_offset_ns
> > 
> > which bothers me just by looking at it. That's in addition to adding two
> > new members to the struct when most people will never set CLONE_NEWTIME.
> > We'll also likely have more features in the future that will want to
> > pass down more info than we want to directly expose in struct
> > clone_args, e.g. for a long time I have been thinking about adding a
> > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > want the new user namespace to get. We surely don't want to force all
> > new info into the uppermost struct. So I'm not convinced we should here.
> 
> I think here we can start thinking about a netlink-like interface.

I think netlink is just not a great model for an API and I would not
want us to go down that route.

I kept thinking about this for a bit and I think that we will end up
growing more namespace-related functionality. So one thing that came to
my mind is the following layout:

struct {
	struct {
		__s64 monotonic;
		__s64 boot;
	} time;
} namespaces;

struct _clone_args {
	__aligned_u64 flags;
	__aligned_u64 pidfd;
	__aligned_u64 child_tid;
	__aligned_u64 parent_tid;
	__aligned_u64 exit_signal;
	__aligned_u64 stack;
	__aligned_u64 stack_size;
	__aligned_u64 tls;
	__aligned_u64 set_tid;
	__aligned_u64 set_tid_size;
	__aligned_u64 namespaces;
	__aligned_u64 namespaces_size;
};

Then when we end up adding id mapping support for CLONE_NEWUSER we can
extend this with:

struct {
	struct {
		__aligned_u64 monotonic;
		__aligned_u64 boot;
	} time;

	struct {
		/* id mapping members */
	} user;
} namespaces;

Thoughts? Other ideas?

Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-24 16:09           ` Christian Brauner
@ 2020-03-24 16:25             ` Adrian Reber
  2020-03-24 17:56               ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Reber @ 2020-03-24 16:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrei Vagin, Arnd Bergmann, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> > On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > > > 
> > > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > > contain something like this:
> > > > >
> > > > > struct timens_offset {
> > > > >         __aligned_s64 monotonic_offset_ns;
> > > > >         __aligned_s64 boottime_offset_ns;
> > > > > };
> > > > >
> > > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > > >
> > > > >         __aligned_u64 tls;
> > > > >         __aligned_u64 set_tid;
> > > > >         __aligned_u64 set_tid_size;
> > > > > +       __aligned_s64 monotonic_offset_ns;
> > > > > +       __aligned_s64 boottime_offset_ns;
> > > > >  };
> > > > 
> > > > I would also prefer the second approach using two 64-bit integers
> > > > instead of a pointer, as it keeps the interface simpler to implement
> > > > and simpler to interpret by other tools.
> > > 
> > > Why I don't like has two reasons. There's the scenario where we have
> > > added new extensions after the new boottime member and then we introduce
> > > another offset. Then you'd be looking at:
> > > 
> > > __aligned_u64 tls;
> > > __aligned_u64 set_tid;
> > > __aligned_u64 set_tid_size;
> > > + __aligned_s64 monotonic_offset_ns;
> > > + __aligned_s64 boottime_offset_ns;
> > > __aligned_s64 something_1
> > > __aligned_s64 anything_2
> > > + __aligned_s64 sometime_offset_ns
> > > 
> > > which bothers me just by looking at it. That's in addition to adding two
> > > new members to the struct when most people will never set CLONE_NEWTIME.
> > > We'll also likely have more features in the future that will want to
> > > pass down more info than we want to directly expose in struct
> > > clone_args, e.g. for a long time I have been thinking about adding a
> > > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > > want the new user namespace to get. We surely don't want to force all
> > > new info into the uppermost struct. So I'm not convinced we should here.
> > 
> > I think here we can start thinking about a netlink-like interface.
> 
> I think netlink is just not a great model for an API and I would not
> want us to go down that route.
> 
> I kept thinking about this for a bit and I think that we will end up
> growing more namespace-related functionality. So one thing that came to
> my mind is the following layout:
> 
> struct {
> 	struct {
> 		__s64 monotonic;
> 		__s64 boot;
> 	} time;
> } namespaces;
> 
> struct _clone_args {
> 	__aligned_u64 flags;
> 	__aligned_u64 pidfd;
> 	__aligned_u64 child_tid;
> 	__aligned_u64 parent_tid;
> 	__aligned_u64 exit_signal;
> 	__aligned_u64 stack;
> 	__aligned_u64 stack_size;
> 	__aligned_u64 tls;
> 	__aligned_u64 set_tid;
> 	__aligned_u64 set_tid_size;
> 	__aligned_u64 namespaces;
> 	__aligned_u64 namespaces_size;
> };
> 
> Then when we end up adding id mapping support for CLONE_NEWUSER we can
> extend this with:
> 
> struct {
> 	struct {
> 		__aligned_u64 monotonic;
> 		__aligned_u64 boot;
> 	} time;
> 
> 	struct {
> 		/* id mapping members */
> 	} user;
> } namespaces;
> 
> Thoughts? Other ideas?

Works for me.

If we add the user namespace id mappings and then at some point a third
element for the time namespace appears it would also start to be mixed.
Just as you mentioned that a few mails ago.

> > > __aligned_u64 set_tid_size;
> > > + __aligned_s64 monotonic_offset_ns;
> > > + __aligned_s64 boottime_offset_ns;
> > > __aligned_s64 something_1
> > > __aligned_s64 anything_2
> > > + __aligned_s64 sometime_offset_ns

If we can live with something like this in the namespaces struct you
proposed, it works for me.

		Adrian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-24 16:25             ` Adrian Reber
@ 2020-03-24 17:56               ` Christian Brauner
  2020-03-25  7:58                 ` Adrian Reber
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2020-03-24 17:56 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Andrei Vagin, Arnd Bergmann, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
> On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> > On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> > > On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > > > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > > > > 
> > > > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > > > contain something like this:
> > > > > >
> > > > > > struct timens_offset {
> > > > > >         __aligned_s64 monotonic_offset_ns;
> > > > > >         __aligned_s64 boottime_offset_ns;
> > > > > > };
> > > > > >
> > > > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > > > >
> > > > > >         __aligned_u64 tls;
> > > > > >         __aligned_u64 set_tid;
> > > > > >         __aligned_u64 set_tid_size;
> > > > > > +       __aligned_s64 monotonic_offset_ns;
> > > > > > +       __aligned_s64 boottime_offset_ns;
> > > > > >  };
> > > > > 
> > > > > I would also prefer the second approach using two 64-bit integers
> > > > > instead of a pointer, as it keeps the interface simpler to implement
> > > > > and simpler to interpret by other tools.
> > > > 
> > > > Why I don't like has two reasons. There's the scenario where we have
> > > > added new extensions after the new boottime member and then we introduce
> > > > another offset. Then you'd be looking at:
> > > > 
> > > > __aligned_u64 tls;
> > > > __aligned_u64 set_tid;
> > > > __aligned_u64 set_tid_size;
> > > > + __aligned_s64 monotonic_offset_ns;
> > > > + __aligned_s64 boottime_offset_ns;
> > > > __aligned_s64 something_1
> > > > __aligned_s64 anything_2
> > > > + __aligned_s64 sometime_offset_ns
> > > > 
> > > > which bothers me just by looking at it. That's in addition to adding two
> > > > new members to the struct when most people will never set CLONE_NEWTIME.
> > > > We'll also likely have more features in the future that will want to
> > > > pass down more info than we want to directly expose in struct
> > > > clone_args, e.g. for a long time I have been thinking about adding a
> > > > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > > > want the new user namespace to get. We surely don't want to force all
> > > > new info into the uppermost struct. So I'm not convinced we should here.
> > > 
> > > I think here we can start thinking about a netlink-like interface.
> > 
> > I think netlink is just not a great model for an API and I would not
> > want us to go down that route.
> > 
> > I kept thinking about this for a bit and I think that we will end up
> > growing more namespace-related functionality. So one thing that came to
> > my mind is the following layout:
> > 
> > struct {
> > 	struct {
> > 		__s64 monotonic;
> > 		__s64 boot;
> > 	} time;
> > } namespaces;
> > 
> > struct _clone_args {
> > 	__aligned_u64 flags;
> > 	__aligned_u64 pidfd;
> > 	__aligned_u64 child_tid;
> > 	__aligned_u64 parent_tid;
> > 	__aligned_u64 exit_signal;
> > 	__aligned_u64 stack;
> > 	__aligned_u64 stack_size;
> > 	__aligned_u64 tls;
> > 	__aligned_u64 set_tid;
> > 	__aligned_u64 set_tid_size;
> > 	__aligned_u64 namespaces;
> > 	__aligned_u64 namespaces_size;
> > };
> > 
> > Then when we end up adding id mapping support for CLONE_NEWUSER we can
> > extend this with:
> > 
> > struct {
> > 	struct {
> > 		__aligned_u64 monotonic;
> > 		__aligned_u64 boot;

s/__aligned_u64/__s64/g

Sorry, leftover from my first draft.

> > 	} time;
> > 
> > 	struct {
> > 		/* id mapping members */
> > 	} user;
> > } namespaces;
> > 
> > Thoughts? Other ideas?
> 
> Works for me.
> 
> If we add the user namespace id mappings and then at some point a third
> element for the time namespace appears it would also start to be mixed.
> Just as you mentioned that a few mails ago.

I think you misunderstand me or I'm misunderstanding you. That new time
namespace member would go into struct time {} so

struct {
	struct {
		__s64 monotonic;
		__s64 boot;
		__s64 someothertime;
	} time;

	struct {
		/* id mapping members */
	} user;
} namespaces;

Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-24 17:56               ` Christian Brauner
@ 2020-03-25  7:58                 ` Adrian Reber
  2020-03-25 11:26                   ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Reber @ 2020-03-25  7:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrei Vagin, Arnd Bergmann, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
> On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
> > On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> > > On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> > > > On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > > > > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > > > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > > > > > 
> > > > > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > > > > contain something like this:
> > > > > > >
> > > > > > > struct timens_offset {
> > > > > > >         __aligned_s64 monotonic_offset_ns;
> > > > > > >         __aligned_s64 boottime_offset_ns;
> > > > > > > };
> > > > > > >
> > > > > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > > > > >
> > > > > > >         __aligned_u64 tls;
> > > > > > >         __aligned_u64 set_tid;
> > > > > > >         __aligned_u64 set_tid_size;
> > > > > > > +       __aligned_s64 monotonic_offset_ns;
> > > > > > > +       __aligned_s64 boottime_offset_ns;
> > > > > > >  };
> > > > > > 
> > > > > > I would also prefer the second approach using two 64-bit integers
> > > > > > instead of a pointer, as it keeps the interface simpler to implement
> > > > > > and simpler to interpret by other tools.
> > > > > 
> > > > > Why I don't like has two reasons. There's the scenario where we have
> > > > > added new extensions after the new boottime member and then we introduce
> > > > > another offset. Then you'd be looking at:
> > > > > 
> > > > > __aligned_u64 tls;
> > > > > __aligned_u64 set_tid;
> > > > > __aligned_u64 set_tid_size;
> > > > > + __aligned_s64 monotonic_offset_ns;
> > > > > + __aligned_s64 boottime_offset_ns;
> > > > > __aligned_s64 something_1
> > > > > __aligned_s64 anything_2
> > > > > + __aligned_s64 sometime_offset_ns
> > > > > 
> > > > > which bothers me just by looking at it. That's in addition to adding two
> > > > > new members to the struct when most people will never set CLONE_NEWTIME.
> > > > > We'll also likely have more features in the future that will want to
> > > > > pass down more info than we want to directly expose in struct
> > > > > clone_args, e.g. for a long time I have been thinking about adding a
> > > > > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > > > > want the new user namespace to get. We surely don't want to force all
> > > > > new info into the uppermost struct. So I'm not convinced we should here.
> > > > 
> > > > I think here we can start thinking about a netlink-like interface.
> > > 
> > > I think netlink is just not a great model for an API and I would not
> > > want us to go down that route.
> > > 
> > > I kept thinking about this for a bit and I think that we will end up
> > > growing more namespace-related functionality. So one thing that came to
> > > my mind is the following layout:
> > > 
> > > struct {
> > > 	struct {
> > > 		__s64 monotonic;
> > > 		__s64 boot;
> > > 	} time;
> > > } namespaces;
> > > 
> > > struct _clone_args {
> > > 	__aligned_u64 flags;
> > > 	__aligned_u64 pidfd;
> > > 	__aligned_u64 child_tid;
> > > 	__aligned_u64 parent_tid;
> > > 	__aligned_u64 exit_signal;
> > > 	__aligned_u64 stack;
> > > 	__aligned_u64 stack_size;
> > > 	__aligned_u64 tls;
> > > 	__aligned_u64 set_tid;
> > > 	__aligned_u64 set_tid_size;
> > > 	__aligned_u64 namespaces;
> > > 	__aligned_u64 namespaces_size;
> > > };
> > > 
> > > Then when we end up adding id mapping support for CLONE_NEWUSER we can
> > > extend this with:
> > > 
> > > struct {
> > > 	struct {
> > > 		__aligned_u64 monotonic;
> > > 		__aligned_u64 boot;
> 
> s/__aligned_u64/__s64/g
> 
> Sorry, leftover from my first draft.
> 
> > > 	} time;
> > > 
> > > 	struct {
> > > 		/* id mapping members */
> > > 	} user;
> > > } namespaces;
> > > 
> > > Thoughts? Other ideas?
> > 
> > Works for me.
> > 
> > If we add the user namespace id mappings and then at some point a third
> > element for the time namespace appears it would also start to be mixed.
> > Just as you mentioned that a few mails ago.
> 
> I think you misunderstand me or I'm misunderstanding you. That new time
> namespace member would go into struct time {} so
> 
> struct {
> 	struct {
> 		__s64 monotonic;
> 		__s64 boot;
> 		__s64 someothertime;
> 	} time;
> 
> 	struct {
> 		/* id mapping members */
> 	} user;
> } namespaces;

My question was about how does the kernel know how 'struct namespaces'
is structured. How can an older kernel (which only is aware of two
clocks) deal with a, like in this example, third clock. Will the size
'__aligned_u64 namespaces_size' be used for versioning?

		Adrian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-25  7:58                 ` Adrian Reber
@ 2020-03-25 11:26                   ` Christian Brauner
  2020-04-01 11:40                     ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2020-03-25 11:26 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Andrei Vagin, Arnd Bergmann, Eric Biederman, Pavel Emelyanov,
	Oleg Nesterov, Dmitry Safonov, linux-kernel, Mike Rapoport,
	Radostin Stoyanov, Michael Kerrisk, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Wed, Mar 25, 2020 at 08:58:36AM +0100, Adrian Reber wrote:
> On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
> > On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
> > > On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> > > > On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> > > > > On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > > > > > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > > > > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> > > > > > > 
> > > > > > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > > > > > contain something like this:
> > > > > > > >
> > > > > > > > struct timens_offset {
> > > > > > > >         __aligned_s64 monotonic_offset_ns;
> > > > > > > >         __aligned_s64 boottime_offset_ns;
> > > > > > > > };
> > > > > > > >
> > > > > > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > > > > > >
> > > > > > > >         __aligned_u64 tls;
> > > > > > > >         __aligned_u64 set_tid;
> > > > > > > >         __aligned_u64 set_tid_size;
> > > > > > > > +       __aligned_s64 monotonic_offset_ns;
> > > > > > > > +       __aligned_s64 boottime_offset_ns;
> > > > > > > >  };
> > > > > > > 
> > > > > > > I would also prefer the second approach using two 64-bit integers
> > > > > > > instead of a pointer, as it keeps the interface simpler to implement
> > > > > > > and simpler to interpret by other tools.
> > > > > > 
> > > > > > Why I don't like has two reasons. There's the scenario where we have
> > > > > > added new extensions after the new boottime member and then we introduce
> > > > > > another offset. Then you'd be looking at:
> > > > > > 
> > > > > > __aligned_u64 tls;
> > > > > > __aligned_u64 set_tid;
> > > > > > __aligned_u64 set_tid_size;
> > > > > > + __aligned_s64 monotonic_offset_ns;
> > > > > > + __aligned_s64 boottime_offset_ns;
> > > > > > __aligned_s64 something_1
> > > > > > __aligned_s64 anything_2
> > > > > > + __aligned_s64 sometime_offset_ns
> > > > > > 
> > > > > > which bothers me just by looking at it. That's in addition to adding two
> > > > > > new members to the struct when most people will never set CLONE_NEWTIME.
> > > > > > We'll also likely have more features in the future that will want to
> > > > > > pass down more info than we want to directly expose in struct
> > > > > > clone_args, e.g. for a long time I have been thinking about adding a
> > > > > > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > > > > > want the new user namespace to get. We surely don't want to force all
> > > > > > new info into the uppermost struct. So I'm not convinced we should here.
> > > > > 
> > > > > I think here we can start thinking about a netlink-like interface.
> > > > 
> > > > I think netlink is just not a great model for an API and I would not
> > > > want us to go down that route.
> > > > 
> > > > I kept thinking about this for a bit and I think that we will end up
> > > > growing more namespace-related functionality. So one thing that came to
> > > > my mind is the following layout:
> > > > 
> > > > struct {
> > > > 	struct {
> > > > 		__s64 monotonic;
> > > > 		__s64 boot;
> > > > 	} time;
> > > > } namespaces;
> > > > 
> > > > struct _clone_args {
> > > > 	__aligned_u64 flags;
> > > > 	__aligned_u64 pidfd;
> > > > 	__aligned_u64 child_tid;
> > > > 	__aligned_u64 parent_tid;
> > > > 	__aligned_u64 exit_signal;
> > > > 	__aligned_u64 stack;
> > > > 	__aligned_u64 stack_size;
> > > > 	__aligned_u64 tls;
> > > > 	__aligned_u64 set_tid;
> > > > 	__aligned_u64 set_tid_size;
> > > > 	__aligned_u64 namespaces;
> > > > 	__aligned_u64 namespaces_size;
> > > > };
> > > > 
> > > > Then when we end up adding id mapping support for CLONE_NEWUSER we can
> > > > extend this with:
> > > > 
> > > > struct {
> > > > 	struct {
> > > > 		__aligned_u64 monotonic;
> > > > 		__aligned_u64 boot;
> > 
> > s/__aligned_u64/__s64/g
> > 
> > Sorry, leftover from my first draft.
> > 
> > > > 	} time;
> > > > 
> > > > 	struct {
> > > > 		/* id mapping members */
> > > > 	} user;
> > > > } namespaces;
> > > > 
> > > > Thoughts? Other ideas?
> > > 
> > > Works for me.
> > > 
> > > If we add the user namespace id mappings and then at some point a third
> > > element for the time namespace appears it would also start to be mixed.
> > > Just as you mentioned that a few mails ago.
> > 
> > I think you misunderstand me or I'm misunderstanding you. That new time
> > namespace member would go into struct time {} so
> > 
> > struct {
> > 	struct {
> > 		__s64 monotonic;
> > 		__s64 boot;
> > 		__s64 someothertime;
> > 	} time;
> > 
> > 	struct {
> > 		/* id mapping members */
> > 	} user;
> > } namespaces;
> 
> My question was about how does the kernel know how 'struct namespaces'
> is structured. How can an older kernel (which only is aware of two
> clocks) deal with a, like in this example, third clock. Will the size
> '__aligned_u64 namespaces_size' be used for versioning?

Yes, that would be the idea.

I don't want to give the impression that I think this is the best
solution. It's one solution that I think is feasible. But if we have
something better that is also future proof I'm happy to hear ideas.

Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-03-25 11:26                   ` Christian Brauner
@ 2020-04-01 11:40                     ` Michael Kerrisk (man-pages)
  2020-04-01 11:46                       ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-04-01 11:40 UTC (permalink / raw)
  To: Christian Brauner, Adrian Reber
  Cc: mtk.manpages, Andrei Vagin, Arnd Bergmann, Eric Biederman,
	Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov, linux-kernel,
	Mike Rapoport, Radostin Stoyanov, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On 3/25/20 12:26 PM, Christian Brauner wrote:
> On Wed, Mar 25, 2020 at 08:58:36AM +0100, Adrian Reber wrote:
>> On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
>>> On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
>>>> On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
>>>>> On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
>>>>>> On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
>>>>>>> On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
>>>>>>>> On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> With Arnd's idea of only using nanoseconds, timens_offset would then
>>>>>>>>> contain something like this:
>>>>>>>>>
>>>>>>>>> struct timens_offset {
>>>>>>>>>         __aligned_s64 monotonic_offset_ns;
>>>>>>>>>         __aligned_s64 boottime_offset_ns;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> I kind of prefer adding boottime and monotonic directly to struct clone_args
>>>>>>>>>
>>>>>>>>>         __aligned_u64 tls;
>>>>>>>>>         __aligned_u64 set_tid;
>>>>>>>>>         __aligned_u64 set_tid_size;
>>>>>>>>> +       __aligned_s64 monotonic_offset_ns;
>>>>>>>>> +       __aligned_s64 boottime_offset_ns;
>>>>>>>>>  };
>>>>>>>>
>>>>>>>> I would also prefer the second approach using two 64-bit integers
>>>>>>>> instead of a pointer, as it keeps the interface simpler to implement
>>>>>>>> and simpler to interpret by other tools.
>>>>>>>
>>>>>>> Why I don't like has two reasons. There's the scenario where we have
>>>>>>> added new extensions after the new boottime member and then we introduce
>>>>>>> another offset. Then you'd be looking at:
>>>>>>>
>>>>>>> __aligned_u64 tls;
>>>>>>> __aligned_u64 set_tid;
>>>>>>> __aligned_u64 set_tid_size;
>>>>>>> + __aligned_s64 monotonic_offset_ns;
>>>>>>> + __aligned_s64 boottime_offset_ns;
>>>>>>> __aligned_s64 something_1
>>>>>>> __aligned_s64 anything_2
>>>>>>> + __aligned_s64 sometime_offset_ns
>>>>>>>
>>>>>>> which bothers me just by looking at it. That's in addition to adding two
>>>>>>> new members to the struct when most people will never set CLONE_NEWTIME.
>>>>>>> We'll also likely have more features in the future that will want to
>>>>>>> pass down more info than we want to directly expose in struct
>>>>>>> clone_args, e.g. for a long time I have been thinking about adding a
>>>>>>> struct for CLONE_NEWUSER that allows you to specify the id mappings you
>>>>>>> want the new user namespace to get. We surely don't want to force all
>>>>>>> new info into the uppermost struct. So I'm not convinced we should here.
>>>>>>
>>>>>> I think here we can start thinking about a netlink-like interface.
>>>>>
>>>>> I think netlink is just not a great model for an API and I would not
>>>>> want us to go down that route.
>>>>>
>>>>> I kept thinking about this for a bit and I think that we will end up
>>>>> growing more namespace-related functionality. So one thing that came to
>>>>> my mind is the following layout:
>>>>>
>>>>> struct {
>>>>> 	struct {
>>>>> 		__s64 monotonic;
>>>>> 		__s64 boot;
>>>>> 	} time;
>>>>> } namespaces;
>>>>>
>>>>> struct _clone_args {
>>>>> 	__aligned_u64 flags;
>>>>> 	__aligned_u64 pidfd;
>>>>> 	__aligned_u64 child_tid;
>>>>> 	__aligned_u64 parent_tid;
>>>>> 	__aligned_u64 exit_signal;
>>>>> 	__aligned_u64 stack;
>>>>> 	__aligned_u64 stack_size;
>>>>> 	__aligned_u64 tls;
>>>>> 	__aligned_u64 set_tid;
>>>>> 	__aligned_u64 set_tid_size;
>>>>> 	__aligned_u64 namespaces;
>>>>> 	__aligned_u64 namespaces_size;
>>>>> };
>>>>>
>>>>> Then when we end up adding id mapping support for CLONE_NEWUSER we can
>>>>> extend this with:
>>>>>
>>>>> struct {
>>>>> 	struct {
>>>>> 		__aligned_u64 monotonic;
>>>>> 		__aligned_u64 boot;
>>>
>>> s/__aligned_u64/__s64/g
>>>
>>> Sorry, leftover from my first draft.
>>>
>>>>> 	} time;
>>>>>
>>>>> 	struct {
>>>>> 		/* id mapping members */
>>>>> 	} user;
>>>>> } namespaces;
>>>>>
>>>>> Thoughts? Other ideas?
>>>>
>>>> Works for me.
>>>>
>>>> If we add the user namespace id mappings and then at some point a third
>>>> element for the time namespace appears it would also start to be mixed.
>>>> Just as you mentioned that a few mails ago.
>>>
>>> I think you misunderstand me or I'm misunderstanding you. That new time
>>> namespace member would go into struct time {} so
>>>
>>> struct {
>>> 	struct {
>>> 		__s64 monotonic;
>>> 		__s64 boot;
>>> 		__s64 someothertime;
>>> 	} time;
>>>
>>> 	struct {
>>> 		/* id mapping members */
>>> 	} user;
>>> } namespaces;

So far, this seems like the least worst approach to me :-).

I think it's reasonable to assume that there will be another
time NS offset to add one day. I don't think anyone expected
CLOCK_BOOTIME (added in 2011) at the time that CLOCK_MONOTONIC
appeared (as part of the POSIX timers API in Linux 2.6.0 2003);
similarly, we probably can't conceive now what clock might be
added in the future that should also be governed by time
namespaces.


But...

>> My question was about how does the kernel know how 'struct namespaces'
>> is structured. How can an older kernel (which only is aware of two
>> clocks) deal with a, like in this example, third clock. Will the size
>> '__aligned_u64 namespaces_size' be used for versioning?
> 
> Yes, that would be the idea.

The idea implied here (if I understand correctly) of "binary chop
on the structure size to see what your kernel supports" is pretty
clunky, IMO. It's worth at least considering alternatives. 
For example, seccomp has a number of interesting interfaces to
discover what the running kernel supports [1]. Maybe it is worth
considering something similar for clone3()?

Cheers,

Michael

[1]
/proc/sys/kernel/seccomp/actions_avail (see seccomp(2))
SECCOMP_GET_ACTION_AVAIL (see secommp(2))
cgroups also provides something similar in the form of
/sys/kernel/cgroup/features (see cgroups(7))

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-04-01 11:40                     ` Michael Kerrisk (man-pages)
@ 2020-04-01 11:46                       ` Christian Brauner
  2020-04-01 12:15                         ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2020-04-01 11:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Adrian Reber, Andrei Vagin, Arnd Bergmann, Eric Biederman,
	Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov, linux-kernel,
	Mike Rapoport, Radostin Stoyanov, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On Wed, Apr 01, 2020 at 01:40:25PM +0200, Michael Kerrisk (man-pages) wrote:
> On 3/25/20 12:26 PM, Christian Brauner wrote:
> > On Wed, Mar 25, 2020 at 08:58:36AM +0100, Adrian Reber wrote:
> >> On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
> >>> On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
> >>>> On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> >>>>> On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> >>>>>> On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> >>>>>>> On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> >>>>>>>> On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>> With Arnd's idea of only using nanoseconds, timens_offset would then
> >>>>>>>>> contain something like this:
> >>>>>>>>>
> >>>>>>>>> struct timens_offset {
> >>>>>>>>>         __aligned_s64 monotonic_offset_ns;
> >>>>>>>>>         __aligned_s64 boottime_offset_ns;
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>> I kind of prefer adding boottime and monotonic directly to struct clone_args
> >>>>>>>>>
> >>>>>>>>>         __aligned_u64 tls;
> >>>>>>>>>         __aligned_u64 set_tid;
> >>>>>>>>>         __aligned_u64 set_tid_size;
> >>>>>>>>> +       __aligned_s64 monotonic_offset_ns;
> >>>>>>>>> +       __aligned_s64 boottime_offset_ns;
> >>>>>>>>>  };
> >>>>>>>>
> >>>>>>>> I would also prefer the second approach using two 64-bit integers
> >>>>>>>> instead of a pointer, as it keeps the interface simpler to implement
> >>>>>>>> and simpler to interpret by other tools.
> >>>>>>>
> >>>>>>> Why I don't like has two reasons. There's the scenario where we have
> >>>>>>> added new extensions after the new boottime member and then we introduce
> >>>>>>> another offset. Then you'd be looking at:
> >>>>>>>
> >>>>>>> __aligned_u64 tls;
> >>>>>>> __aligned_u64 set_tid;
> >>>>>>> __aligned_u64 set_tid_size;
> >>>>>>> + __aligned_s64 monotonic_offset_ns;
> >>>>>>> + __aligned_s64 boottime_offset_ns;
> >>>>>>> __aligned_s64 something_1
> >>>>>>> __aligned_s64 anything_2
> >>>>>>> + __aligned_s64 sometime_offset_ns
> >>>>>>>
> >>>>>>> which bothers me just by looking at it. That's in addition to adding two
> >>>>>>> new members to the struct when most people will never set CLONE_NEWTIME.
> >>>>>>> We'll also likely have more features in the future that will want to
> >>>>>>> pass down more info than we want to directly expose in struct
> >>>>>>> clone_args, e.g. for a long time I have been thinking about adding a
> >>>>>>> struct for CLONE_NEWUSER that allows you to specify the id mappings you
> >>>>>>> want the new user namespace to get. We surely don't want to force all
> >>>>>>> new info into the uppermost struct. So I'm not convinced we should here.
> >>>>>>
> >>>>>> I think here we can start thinking about a netlink-like interface.
> >>>>>
> >>>>> I think netlink is just not a great model for an API and I would not
> >>>>> want us to go down that route.
> >>>>>
> >>>>> I kept thinking about this for a bit and I think that we will end up
> >>>>> growing more namespace-related functionality. So one thing that came to
> >>>>> my mind is the following layout:
> >>>>>
> >>>>> struct {
> >>>>> 	struct {
> >>>>> 		__s64 monotonic;
> >>>>> 		__s64 boot;
> >>>>> 	} time;
> >>>>> } namespaces;
> >>>>>
> >>>>> struct _clone_args {
> >>>>> 	__aligned_u64 flags;
> >>>>> 	__aligned_u64 pidfd;
> >>>>> 	__aligned_u64 child_tid;
> >>>>> 	__aligned_u64 parent_tid;
> >>>>> 	__aligned_u64 exit_signal;
> >>>>> 	__aligned_u64 stack;
> >>>>> 	__aligned_u64 stack_size;
> >>>>> 	__aligned_u64 tls;
> >>>>> 	__aligned_u64 set_tid;
> >>>>> 	__aligned_u64 set_tid_size;
> >>>>> 	__aligned_u64 namespaces;
> >>>>> 	__aligned_u64 namespaces_size;
> >>>>> };
> >>>>>
> >>>>> Then when we end up adding id mapping support for CLONE_NEWUSER we can
> >>>>> extend this with:
> >>>>>
> >>>>> struct {
> >>>>> 	struct {
> >>>>> 		__aligned_u64 monotonic;
> >>>>> 		__aligned_u64 boot;
> >>>
> >>> s/__aligned_u64/__s64/g
> >>>
> >>> Sorry, leftover from my first draft.
> >>>
> >>>>> 	} time;
> >>>>>
> >>>>> 	struct {
> >>>>> 		/* id mapping members */
> >>>>> 	} user;
> >>>>> } namespaces;
> >>>>>
> >>>>> Thoughts? Other ideas?
> >>>>
> >>>> Works for me.
> >>>>
> >>>> If we add the user namespace id mappings and then at some point a third
> >>>> element for the time namespace appears it would also start to be mixed.
> >>>> Just as you mentioned that a few mails ago.
> >>>
> >>> I think you misunderstand me or I'm misunderstanding you. That new time
> >>> namespace member would go into struct time {} so
> >>>
> >>> struct {
> >>> 	struct {
> >>> 		__s64 monotonic;
> >>> 		__s64 boot;
> >>> 		__s64 someothertime;
> >>> 	} time;
> >>>
> >>> 	struct {
> >>> 		/* id mapping members */
> >>> 	} user;
> >>> } namespaces;
> 
> So far, this seems like the least worst approach to me :-).
> 
> I think it's reasonable to assume that there will be another
> time NS offset to add one day. I don't think anyone expected
> CLOCK_BOOTIME (added in 2011) at the time that CLOCK_MONOTONIC
> appeared (as part of the POSIX timers API in Linux 2.6.0 2003);
> similarly, we probably can't conceive now what clock might be
> added in the future that should also be governed by time
> namespaces.
> 
> 
> But...
> 
> >> My question was about how does the kernel know how 'struct namespaces'
> >> is structured. How can an older kernel (which only is aware of two
> >> clocks) deal with a, like in this example, third clock. Will the size
> >> '__aligned_u64 namespaces_size' be used for versioning?
> > 
> > Yes, that would be the idea.
> 
> The idea implied here (if I understand correctly) of "binary chop
> on the structure size to see what your kernel supports" is pretty
> clunky, IMO. It's worth at least considering alternatives. 
> For example, seccomp has a number of interesting interfaces to
> discover what the running kernel supports [1]. Maybe it is worth
> considering something similar for clone3()?

Yes, that's definitely something I've been thinking about and I've also
discussed this publicly on the libc-alpha mailing list since Florian
(Weimer) wants this for glibc. _But_ I didn't want to simply go my own
way and so Aleksa and I stuck our heads together and tried to come up
with a generic way that e.g. would work for both openat2() and for
clone3(), in fact we tried to come up with a generic way that could
potentially be used by any syscall that makes use of the new
copy_struct_from_user() helper that Aleksa and I pushed upstream. We
haven't yet gotten around to actually try and implementing our ideas.
I'll try to get around to this during this cycle (/me makes note to "sit
down" with Aleksa). We'll definitely take a look at seccomp and cgroups
too.

Christian

> 
> Cheers,
> 
> Michael
> 
> [1]
> /proc/sys/kernel/seccomp/actions_avail (see seccomp(2))
> SECCOMP_GET_ACTION_AVAIL (see secommp(2))
> cgroups also provides something similar in the form of
> /sys/kernel/cgroup/features (see cgroups(7))
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: clone3: allow creation of time namespace with offset
  2020-04-01 11:46                       ` Christian Brauner
@ 2020-04-01 12:15                         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-04-01 12:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Adrian Reber, Andrei Vagin, Arnd Bergmann,
	Eric Biederman, Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov,
	linux-kernel, Mike Rapoport, Radostin Stoyanov, Cyrill Gorcunov,
	Thomas Gleixner, Aleksa Sarai, Linux API

On 4/1/20 1:46 PM, Christian Brauner wrote:
> On Wed, Apr 01, 2020 at 01:40:25PM +0200, Michael Kerrisk (man-pages) wrote:
>> On 3/25/20 12:26 PM, Christian Brauner wrote:
>>> On Wed, Mar 25, 2020 at 08:58:36AM +0100, Adrian Reber wrote:
>>>> On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
>>>>> On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
>>>>>> On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
>>>>>>> On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
>>>>>>>> On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
>>>>>>>>> On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
>>>>>>>>>> On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> With Arnd's idea of only using nanoseconds, timens_offset would then
>>>>>>>>>>> contain something like this:
>>>>>>>>>>>
>>>>>>>>>>> struct timens_offset {
>>>>>>>>>>>         __aligned_s64 monotonic_offset_ns;
>>>>>>>>>>>         __aligned_s64 boottime_offset_ns;
>>>>>>>>>>> };
>>>>>>>>>>>
>>>>>>>>>>> I kind of prefer adding boottime and monotonic directly to struct clone_args
>>>>>>>>>>>
>>>>>>>>>>>         __aligned_u64 tls;
>>>>>>>>>>>         __aligned_u64 set_tid;
>>>>>>>>>>>         __aligned_u64 set_tid_size;
>>>>>>>>>>> +       __aligned_s64 monotonic_offset_ns;
>>>>>>>>>>> +       __aligned_s64 boottime_offset_ns;
>>>>>>>>>>>  };
>>>>>>>>>>
>>>>>>>>>> I would also prefer the second approach using two 64-bit integers
>>>>>>>>>> instead of a pointer, as it keeps the interface simpler to implement
>>>>>>>>>> and simpler to interpret by other tools.
>>>>>>>>>
>>>>>>>>> Why I don't like has two reasons. There's the scenario where we have
>>>>>>>>> added new extensions after the new boottime member and then we introduce
>>>>>>>>> another offset. Then you'd be looking at:
>>>>>>>>>
>>>>>>>>> __aligned_u64 tls;
>>>>>>>>> __aligned_u64 set_tid;
>>>>>>>>> __aligned_u64 set_tid_size;
>>>>>>>>> + __aligned_s64 monotonic_offset_ns;
>>>>>>>>> + __aligned_s64 boottime_offset_ns;
>>>>>>>>> __aligned_s64 something_1
>>>>>>>>> __aligned_s64 anything_2
>>>>>>>>> + __aligned_s64 sometime_offset_ns
>>>>>>>>>
>>>>>>>>> which bothers me just by looking at it. That's in addition to adding two
>>>>>>>>> new members to the struct when most people will never set CLONE_NEWTIME.
>>>>>>>>> We'll also likely have more features in the future that will want to
>>>>>>>>> pass down more info than we want to directly expose in struct
>>>>>>>>> clone_args, e.g. for a long time I have been thinking about adding a
>>>>>>>>> struct for CLONE_NEWUSER that allows you to specify the id mappings you
>>>>>>>>> want the new user namespace to get. We surely don't want to force all
>>>>>>>>> new info into the uppermost struct. So I'm not convinced we should here.
>>>>>>>>
>>>>>>>> I think here we can start thinking about a netlink-like interface.
>>>>>>>
>>>>>>> I think netlink is just not a great model for an API and I would not
>>>>>>> want us to go down that route.
>>>>>>>
>>>>>>> I kept thinking about this for a bit and I think that we will end up
>>>>>>> growing more namespace-related functionality. So one thing that came to
>>>>>>> my mind is the following layout:
>>>>>>>
>>>>>>> struct {
>>>>>>> 	struct {
>>>>>>> 		__s64 monotonic;
>>>>>>> 		__s64 boot;
>>>>>>> 	} time;
>>>>>>> } namespaces;
>>>>>>>
>>>>>>> struct _clone_args {
>>>>>>> 	__aligned_u64 flags;
>>>>>>> 	__aligned_u64 pidfd;
>>>>>>> 	__aligned_u64 child_tid;
>>>>>>> 	__aligned_u64 parent_tid;
>>>>>>> 	__aligned_u64 exit_signal;
>>>>>>> 	__aligned_u64 stack;
>>>>>>> 	__aligned_u64 stack_size;
>>>>>>> 	__aligned_u64 tls;
>>>>>>> 	__aligned_u64 set_tid;
>>>>>>> 	__aligned_u64 set_tid_size;
>>>>>>> 	__aligned_u64 namespaces;
>>>>>>> 	__aligned_u64 namespaces_size;
>>>>>>> };
>>>>>>>
>>>>>>> Then when we end up adding id mapping support for CLONE_NEWUSER we can
>>>>>>> extend this with:
>>>>>>>
>>>>>>> struct {
>>>>>>> 	struct {
>>>>>>> 		__aligned_u64 monotonic;
>>>>>>> 		__aligned_u64 boot;
>>>>>
>>>>> s/__aligned_u64/__s64/g
>>>>>
>>>>> Sorry, leftover from my first draft.
>>>>>
>>>>>>> 	} time;
>>>>>>>
>>>>>>> 	struct {
>>>>>>> 		/* id mapping members */
>>>>>>> 	} user;
>>>>>>> } namespaces;
>>>>>>>
>>>>>>> Thoughts? Other ideas?
>>>>>>
>>>>>> Works for me.
>>>>>>
>>>>>> If we add the user namespace id mappings and then at some point a third
>>>>>> element for the time namespace appears it would also start to be mixed.
>>>>>> Just as you mentioned that a few mails ago.
>>>>>
>>>>> I think you misunderstand me or I'm misunderstanding you. That new time
>>>>> namespace member would go into struct time {} so
>>>>>
>>>>> struct {
>>>>> 	struct {
>>>>> 		__s64 monotonic;
>>>>> 		__s64 boot;
>>>>> 		__s64 someothertime;
>>>>> 	} time;
>>>>>
>>>>> 	struct {
>>>>> 		/* id mapping members */
>>>>> 	} user;
>>>>> } namespaces;
>>
>> So far, this seems like the least worst approach to me :-).
>>
>> I think it's reasonable to assume that there will be another
>> time NS offset to add one day. I don't think anyone expected
>> CLOCK_BOOTIME (added in 2011) at the time that CLOCK_MONOTONIC
>> appeared (as part of the POSIX timers API in Linux 2.6.0 2003);
>> similarly, we probably can't conceive now what clock might be
>> added in the future that should also be governed by time
>> namespaces.
>>
>>
>> But...
>>
>>>> My question was about how does the kernel know how 'struct namespaces'
>>>> is structured. How can an older kernel (which only is aware of two
>>>> clocks) deal with a, like in this example, third clock. Will the size
>>>> '__aligned_u64 namespaces_size' be used for versioning?
>>>
>>> Yes, that would be the idea.
>>
>> The idea implied here (if I understand correctly) of "binary chop
>> on the structure size to see what your kernel supports" is pretty
>> clunky, IMO. It's worth at least considering alternatives. 
>> For example, seccomp has a number of interesting interfaces to
>> discover what the running kernel supports [1]. Maybe it is worth
>> considering something similar for clone3()?
> 
> Yes, that's definitely something I've been thinking about and I've also
> discussed this publicly on the libc-alpha mailing list since Florian
> (Weimer) wants this for glibc. _But_ I didn't want to simply go my own
> way and so Aleksa and I stuck our heads together and tried to come up
> with a generic way that e.g. would work for both openat2() and for
> clone3(), in fact we tried to come up with a generic way that could
> potentially be used by any syscall that makes use of the new
> copy_struct_from_user() helper that Aleksa and I pushed upstream. We
> haven't yet gotten around to actually try and implementing our ideas.
> I'll try to get around to this during this cycle (/me makes note to "sit
> down" with Aleksa). We'll definitely take a look at seccomp and cgroups
> too.

Great! Thanks for thinking about this. Just by the way, on 
discussions such as the one you had with Florian, it would
be great if you CCed linux-api@ (and also for the patch series
that implements this idea).

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-01 12:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200317083043.226593-1-areber@redhat.com>
2020-03-17  9:40 ` clone3: allow creation of time namespace with offset Michael Kerrisk (man-pages)
2020-03-17 14:23   ` Aleksa Sarai
2020-03-17 16:09     ` Christian Brauner
     [not found] ` <CAK8P3a2-qQhpRdF0+iVrpp=vEvgwtndQL89CUm_QzoW2QYX1Jw@mail.gmail.com>
2020-03-19  8:11   ` Adrian Reber
2020-03-19  8:16     ` Arnd Bergmann
2020-03-19 10:29       ` Christian Brauner
2020-03-20 18:33         ` Andrei Vagin
2020-03-24 16:09           ` Christian Brauner
2020-03-24 16:25             ` Adrian Reber
2020-03-24 17:56               ` Christian Brauner
2020-03-25  7:58                 ` Adrian Reber
2020-03-25 11:26                   ` Christian Brauner
2020-04-01 11:40                     ` Michael Kerrisk (man-pages)
2020-04-01 11:46                       ` Christian Brauner
2020-04-01 12:15                         ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).