All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
       [not found] <100149681.11244932.1588661282331.JavaMail.zimbra@redhat.com>
@ 2020-05-05  7:28 ` Jan Stancek
  2020-05-05  7:49   ` Florian Weimer
  2020-05-05  7:54   ` Andreas Schwab
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Stancek @ 2020-05-05  7:28 UTC (permalink / raw)
  To: ltp

Hi,

I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
where flags parameter appears to be sign extended before it's passed
to kernel syscall.

This is an issue since kernel commit ef2c41cf38a7 ("clone3: allow
spawning processes into cgroups"), because there's now a check for
flag that userspace didn't intend to pass:

static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
        __acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
        if (!(kargs->flags & CLONE_INTO_CGROUP)) {   // CLONE_INTO_CGROUP == 0x200000000ULL
                kargs->cset = cset;
                return 0;
        }

        f = fget_raw(kargs->cgroup);
        if (!f) {
                ret = -EBADF;
                goto err;
        }

Reproducer:

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <sys/wait.h>

char stack[2*1024*1024];

static int do_child(void *arg)
{
        printf("hello");
        return 0;
}

int main(void)
{
        clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
        return 0;
}

reliably hits EBADF with glibc-2.31.9000-12.fc33.ppc64le and 5.7.0-0.rc2 kernel:
  clone(child_stack=0x1011ffe0, flags=CLONE_IO|0xffffffff00000000|SIGCHLD) = -1 EBADF (Bad file descriptor)

Regards,
Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  7:28 ` [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7 Jan Stancek
@ 2020-05-05  7:49   ` Florian Weimer
  2020-05-05  7:59     ` Christian Brauner
  2020-05-05  8:32     ` Christian Brauner
  2020-05-05  7:54   ` Andreas Schwab
  1 sibling, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2020-05-05  7:49 UTC (permalink / raw)
  To: ltp

* Jan Stancek via Libc-alpha:

> I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
> where flags parameter appears to be sign extended before it's passed
> to kernel syscall.

Interesting, thanks for reporting this.  The manual page clearly
documents the interface as;

       int clone(int (*fn)(void *), void *child_stack,
                 int flags, void *arg, ...
                 /* pid_t *ptid, void *newtls, pid_t *ctid */ );

But the kernel uses unsigned long for clone_flags.  This looks like an
unintended userspace ABI breakage.

Rather than dropping the invalid flags check in the kernel (having the
check is valuable), I think the parameter should be changed to int or
unsigned int, or the flags check should be written in such a way that
it disregards bits that result from sign extensions: fail if
clone_flags != (int) clone_flags, otherwise set clone_flags = 0xFFFFFFFF.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  7:28 ` [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7 Jan Stancek
  2020-05-05  7:49   ` Florian Weimer
@ 2020-05-05  7:54   ` Andreas Schwab
  1 sibling, 0 replies; 19+ messages in thread
From: Andreas Schwab @ 2020-05-05  7:54 UTC (permalink / raw)
  To: ltp

On Mai 05 2020, Jan Stancek via Libc-alpha wrote:

> I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
> where flags parameter appears to be sign extended before it's passed
> to kernel syscall.

Looks like the flags argument should be declared as unsigned int,
instead of int, both in glibc and in the kernel.

#define CLONE_IO		0x80000000	/* Clone io context */

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  7:49   ` Florian Weimer
@ 2020-05-05  7:59     ` Christian Brauner
  2020-05-05  8:02       ` Christian Brauner
  2020-05-05  8:32     ` Christian Brauner
  1 sibling, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2020-05-05  7:59 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 09:49:50AM +0200, Florian Weimer wrote:
> * Jan Stancek via Libc-alpha:
> 
> > I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
> > where flags parameter appears to be sign extended before it's passed
> > to kernel syscall.
> 
> Interesting, thanks for reporting this.  The manual page clearly
> documents the interface as;
> 
>        int clone(int (*fn)(void *), void *child_stack,
>                  int flags, void *arg, ...
>                  /* pid_t *ptid, void *newtls, pid_t *ctid */ );
> 
> But the kernel uses unsigned long for clone_flags.  This looks like an
> unintended userspace ABI breakage.
> 
> Rather than dropping the invalid flags check in the kernel (having the
> check is valuable), I think the parameter should be changed to int or
> unsigned int, or the flags check should be written in such a way that
> it disregards bits that result from sign extensions: fail if
> clone_flags != (int) clone_flags, otherwise set clone_flags = 0xFFFFFFFF.

Let me take a quick look. This just affects legacy clone it seems. Maybe
we can do something to avoid that breakage without having to change
extensions. Legacy clone shouldn't care about invalid flag checking
since it never used to do that anyway, only clone3() does.

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  7:59     ` Christian Brauner
@ 2020-05-05  8:02       ` Christian Brauner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05  8:02 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 09:59:38AM +0200, Christian Brauner wrote:
> On Tue, May 05, 2020 at 09:49:50AM +0200, Florian Weimer wrote:
> > * Jan Stancek via Libc-alpha:
> > 
> > > I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
> > > where flags parameter appears to be sign extended before it's passed
> > > to kernel syscall.
> > 
> > Interesting, thanks for reporting this.  The manual page clearly
> > documents the interface as;
> > 
> >        int clone(int (*fn)(void *), void *child_stack,
> >                  int flags, void *arg, ...
> >                  /* pid_t *ptid, void *newtls, pid_t *ctid */ );
> > 
> > But the kernel uses unsigned long for clone_flags.  This looks like an
> > unintended userspace ABI breakage.
> > 
> > Rather than dropping the invalid flags check in the kernel (having the
> > check is valuable), I think the parameter should be changed to int or
> > unsigned int, or the flags check should be written in such a way that
> > it disregards bits that result from sign extensions: fail if
> > clone_flags != (int) clone_flags, otherwise set clone_flags = 0xFFFFFFFF.
> 
> Let me take a quick look. This just affects legacy clone it seems. Maybe
> we can do something to avoid that breakage without having to change
> extensions. Legacy clone shouldn't care about invalid flag checking
> since it never used to do that anyway, only clone3() does.

Oh I see. Ugh.

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  7:49   ` Florian Weimer
  2020-05-05  7:59     ` Christian Brauner
@ 2020-05-05  8:32     ` Christian Brauner
  2020-05-05  8:58       ` Jan Stancek
  2020-05-05  9:05       ` Florian Weimer
  1 sibling, 2 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05  8:32 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 09:49:50AM +0200, Florian Weimer wrote:
> * Jan Stancek via Libc-alpha:
> 
> > I'm seeing an issue with CLONE_IO and libc' clone() on ppc64le,
> > where flags parameter appears to be sign extended before it's passed
> > to kernel syscall.
> 
> Interesting, thanks for reporting this.  The manual page clearly
> documents the interface as;
> 
>        int clone(int (*fn)(void *), void *child_stack,
>                  int flags, void *arg, ...
>                  /* pid_t *ptid, void *newtls, pid_t *ctid */ );
> 
> But the kernel uses unsigned long for clone_flags.  This looks like an
> unintended userspace ABI breakage.
> 
> Rather than dropping the invalid flags check in the kernel (having the
> check is valuable), I think the parameter should be changed to int or
> unsigned int, or the flags check should be written in such a way that
> it disregards bits that result from sign extensions: fail if
> clone_flags != (int) clone_flags, otherwise set clone_flags = 0xFFFFFFFF.

Hm, as you observed, the kernel always defines the flags argument as
unsigned long and afaict this has been the case since forever. So I'm
not sure why the userspace wrapper is defined as taking an int for
flags in the first place(?).
Be that as it may, the current legacy clone syscall has been switched
over a while ago to do:

 {
	struct kernel_clone_args args = {
		.flags		= (clone_flags & ~CSIGNAL),
		.pidfd		= parent_tidptr,
		.child_tid	= child_tidptr,
		.parent_tid	= parent_tidptr,
		.exit_signal	= (clone_flags & CSIGNAL),
		.stack		= newsp,
		.tls		= tls,
	};

	if (!legacy_clone_args_valid(&args))
		return -EINVAL;

	return _do_fork(&args);
}

where legacy_clone_args_valid() does:

bool legacy_clone_args_valid(const struct kernel_clone_args *kargs)
{
	/* clone(CLONE_PIDFD) uses parent_tidptr to return a pidfd */
	if ((kargs->flags & CLONE_PIDFD) &&
	    (kargs->flags & CLONE_PARENT_SETTID))
		return false;

	return true;
}

And I wonder if we should just do:

if (clone_flags & ~CLONE_LEGACY_FLAGS) /* 
	return -EINVAL;

but that might be a change in behavior, as legacy clone _never_ failed
when invalid values where passed.

(Note, that CLONE_LEGACY_FLAGS is already defined as
#define CLONE_LEGACY_FLAGS 0xffffffffULL
and used in clone3().)

So the better option might be to do what you suggested, Florian:
if (clone_flags & ~CLONE_LEGACY_FLAGS)
	clone_flags = CLONE_LEGACY_FLAGS?
and move on?

(Btw, iiuc, this would've always been a problem, right? In the sense that
userspace only got away with this because there were no additional flags
defined and couldn't.)

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  8:32     ` Christian Brauner
@ 2020-05-05  8:58       ` Jan Stancek
  2020-05-05  9:05       ` Florian Weimer
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Stancek @ 2020-05-05  8:58 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> (Btw, iiuc, this would've always been a problem, right? In the sense that
> userspace only got away with this because there were no additional flags
> defined and couldn't.)

I think so, 4.20 behaves same way, but kernel ignores any extra flags:
  clone(child_stack=0x1011ffe0, flags=CLONE_IO|0xffffffff00000000|SIGCHLD) = 1061


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  8:32     ` Christian Brauner
  2020-05-05  8:58       ` Jan Stancek
@ 2020-05-05  9:05       ` Florian Weimer
  2020-05-05  9:15         ` Christian Brauner
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2020-05-05  9:05 UTC (permalink / raw)
  To: ltp

* Christian Brauner:

> Hm, as you observed, the kernel always defines the flags argument as
> unsigned long and afaict this has been the case since forever. So I'm
> not sure why the userspace wrapper is defined as taking an int for
> flags in the first place(?).

We have different types in many places.  Sometimes this is required by
POSIX, sometimes not.  See the recent effort to fix the x32 interface
(mostly for size_t arguments).

Flags arguments that depend on the word size are incompatible with
portable system calls, so arguably what the kernel is doing here is
wrong: the additional bits can never be used for anything.

> And I wonder if we should just do:
>
> if (clone_flags & ~CLONE_LEGACY_FLAGS) /* 
> 	return -EINVAL;
>
> but that might be a change in behavior, as legacy clone _never_ failed
> when invalid values where passed.

Have any flags been added recently?

> (Note, that CLONE_LEGACY_FLAGS is already defined as
> #define CLONE_LEGACY_FLAGS 0xffffffffULL
> and used in clone3().)
>
> So the better option might be to do what you suggested, Florian:
> if (clone_flags & ~CLONE_LEGACY_FLAGS)
> 	clone_flags = CLONE_LEGACY_FLAGS?
> and move on?

Not sure what you are suggesting here.  Do you mean an unconditional
masking of excess bits?

  clone_flags &= CLONE_LEGACY_FLAGS;

I think I would prefer this:

  /* Userspace may have passed a sign-extended int value. */
  if (clone_flags != (int) clone_flags) /* 
 	return -EINVAL;
  clone_flags = (unsigned) clone_flags;

> (Btw, iiuc, this would've always been a problem, right? In the sense that
> userspace only got away with this because there were no additional flags
> defined and couldn't.)

It depends on how the clone system call wrapper is implemented, and
what the C function call ABI is like.  In some cases, you probably get
zero-extension, as expected by the kernel.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  9:05       ` Florian Weimer
@ 2020-05-05  9:15         ` Christian Brauner
  2020-05-05  9:36           ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2020-05-05  9:15 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 11:05:37AM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > Hm, as you observed, the kernel always defines the flags argument as
> > unsigned long and afaict this has been the case since forever. So I'm
> > not sure why the userspace wrapper is defined as taking an int for
> > flags in the first place(?).
> 
> We have different types in many places.  Sometimes this is required by
> POSIX, sometimes not.  See the recent effort to fix the x32 interface
> (mostly for size_t arguments).
> 
> Flags arguments that depend on the word size are incompatible with
> portable system calls, so arguably what the kernel is doing here is
> wrong: the additional bits can never be used for anything.

I mean, the unsigned long is odd. Most syscalls are using unsigned int
and new ones are basically expected to.

> 
> > And I wonder if we should just do:
> >
> > if (clone_flags & ~CLONE_LEGACY_FLAGS) /* 
> > 	return -EINVAL;
> >
> > but that might be a change in behavior, as legacy clone _never_ failed
> > when invalid values where passed.
> 
> Have any flags been added recently?

/* Flags for the clone3() syscall. */
#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */

> 
> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > and used in clone3().)
> >
> > So the better option might be to do what you suggested, Florian:
> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > and move on?
> 
> Not sure what you are suggesting here.  Do you mean an unconditional
> masking of excess bits?
> 
>   clone_flags &= CLONE_LEGACY_FLAGS;
> 
> I think I would prefer this:
> 
>   /* Userspace may have passed a sign-extended int value. */
>   if (clone_flags != (int) clone_flags) /* 
>  	return -EINVAL;
>   clone_flags = (unsigned) clone_flags;

My worry is that this will cause regressions because clone() has never
failed on invalid flag values. I was looking for a way to not have this
problem. But given what you say below this change might be ok/worth
risking?

> 
> > (Btw, iiuc, this would've always been a problem, right? In the sense that
> > userspace only got away with this because there were no additional flags
> > defined and couldn't.)
> 
> It depends on how the clone system call wrapper is implemented, and
> what the C function call ABI is like.  In some cases, you probably get
> zero-extension, as expected by the kernel.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  9:15         ` Christian Brauner
@ 2020-05-05  9:36           ` Florian Weimer
  2020-05-05  9:58             ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2020-05-05  9:36 UTC (permalink / raw)
  To: ltp

* Christian Brauner:
>> Have any flags been added recently?
>
> /* Flags for the clone3() syscall. */
> #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */

Are those flags expected to be compatible with the legacy clone
interface on 64-bit architectures?

>> > (Note, that CLONE_LEGACY_FLAGS is already defined as
>> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
>> > and used in clone3().)
>> >
>> > So the better option might be to do what you suggested, Florian:
>> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
>> > 	clone_flags = CLONE_LEGACY_FLAGS?
>> > and move on?
>> 
>> Not sure what you are suggesting here.  Do you mean an unconditional
>> masking of excess bits?
>> 
>>   clone_flags &= CLONE_LEGACY_FLAGS;
>> 
>> I think I would prefer this:
>> 
>>   /* Userspace may have passed a sign-extended int value. */
>>   if (clone_flags != (int) clone_flags) /* 
>>  	return -EINVAL;
>>   clone_flags = (unsigned) clone_flags;
>
> My worry is that this will cause regressions because clone() has never
> failed on invalid flag values. I was looking for a way to not have this
> problem. But given what you say below this change might be ok/worth
> risking?

I was under the impression that current kernels perform such a check,
causing the problem with sign extension.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  9:36           ` Florian Weimer
@ 2020-05-05  9:58             ` Christian Brauner
  2020-05-05 10:21               ` Christian Brauner
  2020-05-05 11:08               ` Florian Weimer
  0 siblings, 2 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05  9:58 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> * Christian Brauner:
> >> Have any flags been added recently?
> >
> > /* Flags for the clone3() syscall. */
> > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> 
> Are those flags expected to be compatible with the legacy clone
> interface on 64-bit architectures?

No, they are clone3() only. clone() is deprecated wrt to new features.

> 
> >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> >> > and used in clone3().)
> >> >
> >> > So the better option might be to do what you suggested, Florian:
> >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> >> > and move on?
> >> 
> >> Not sure what you are suggesting here.  Do you mean an unconditional
> >> masking of excess bits?
> >> 
> >>   clone_flags &= CLONE_LEGACY_FLAGS;
> >> 
> >> I think I would prefer this:
> >> 
> >>   /* Userspace may have passed a sign-extended int value. */
> >>   if (clone_flags != (int) clone_flags) /* 
> >>  	return -EINVAL;
> >>   clone_flags = (unsigned) clone_flags;
> >
> > My worry is that this will cause regressions because clone() has never
> > failed on invalid flag values. I was looking for a way to not have this
> > problem. But given what you say below this change might be ok/worth
> > risking?
> 
> I was under the impression that current kernels perform such a check,
> causing the problem with sign extension.

No, it doesn't, it never did. It only does it for clone3(). Legacy
clone() _never_ reported an error no matter if you passed garbage flags
or not. That's why we can't re-use clone() flags that have essentially
been removed in kernel version before I could even program. :) Unless
I'm misunderstanding what check you're referring to.

If I understood the original mail correctly, then the issue is caused by
an interaction with sign extension and a the new flag value
CLONE_INTO_CGROUP being defined.
So from what I gather from Jan's initial mail is that when clone() is
called on ppc64le with the CLONE_IO|SIGCHLD flag:
clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
that the sign extension causes bits to be set that raise the
CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
legacy clone() and clone3() the kernel will think that someone requested
CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
the only issue here then couldn't we just do:

clone_flags &= ~CLONE3_ONLY_FLAGS?

and move on, i.e. all future clone3() flags we'll just remove since we
can assume that they have been accidently set. Even if they have been
intentionally set we can just ignore them since that's in line with
legacy clone()'s (questionable) tradition of ignoring unknown flags.
Thoughts? Or am I missing some subtlety here?

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  9:58             ` Christian Brauner
@ 2020-05-05 10:21               ` Christian Brauner
  2020-05-05 11:34                 ` Florian Weimer
  2020-05-05 11:35                 ` Dmitry V. Levin
  2020-05-05 11:08               ` Florian Weimer
  1 sibling, 2 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05 10:21 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 11:58:13AM +0200, Christian Brauner wrote:
> On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> > * Christian Brauner:
> > >> Have any flags been added recently?
> > >
> > > /* Flags for the clone3() syscall. */
> > > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> > 
> > Are those flags expected to be compatible with the legacy clone
> > interface on 64-bit architectures?
> 
> No, they are clone3() only. clone() is deprecated wrt to new features.
> 
> > 
> > >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > >> > and used in clone3().)
> > >> >
> > >> > So the better option might be to do what you suggested, Florian:
> > >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > >> > and move on?
> > >> 
> > >> Not sure what you are suggesting here.  Do you mean an unconditional
> > >> masking of excess bits?
> > >> 
> > >>   clone_flags &= CLONE_LEGACY_FLAGS;
> > >> 
> > >> I think I would prefer this:
> > >> 
> > >>   /* Userspace may have passed a sign-extended int value. */
> > >>   if (clone_flags != (int) clone_flags) /* 
> > >>  	return -EINVAL;
> > >>   clone_flags = (unsigned) clone_flags;
> > >
> > > My worry is that this will cause regressions because clone() has never
> > > failed on invalid flag values. I was looking for a way to not have this
> > > problem. But given what you say below this change might be ok/worth
> > > risking?
> > 
> > I was under the impression that current kernels perform such a check,
> > causing the problem with sign extension.
> 
> No, it doesn't, it never did. It only does it for clone3(). Legacy
> clone() _never_ reported an error no matter if you passed garbage flags
> or not. That's why we can't re-use clone() flags that have essentially
> been removed in kernel version before I could even program. :) Unless
> I'm misunderstanding what check you're referring to.
> 
> If I understood the original mail correctly, then the issue is caused by
> an interaction with sign extension and a the new flag value
> CLONE_INTO_CGROUP being defined.
> So from what I gather from Jan's initial mail is that when clone() is
> called on ppc64le with the CLONE_IO|SIGCHLD flag:
> clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> that the sign extension causes bits to be set that raise the
> CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> legacy clone() and clone3() the kernel will think that someone requested
> CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> the only issue here then couldn't we just do:
> 
> clone_flags &= ~CLONE3_ONLY_FLAGS?
> 
> and move on, i.e. all future clone3() flags we'll just remove since we
> can assume that they have been accidently set. Even if they have been
> intentionally set we can just ignore them since that's in line with
> legacy clone()'s (questionable) tradition of ignoring unknown flags.
> Thoughts? Or am I missing some subtlety here?

So essentially:

diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..e192089f133e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
                 unsigned long, tls)
 #endif
 {
+       /* Ignore the upper 32 bits. */
+       unsigned int flags = (clone_flags & 0xfffffff);
+
        struct kernel_clone_args args = {
-               .flags          = (clone_flags & ~CSIGNAL),
+               .flags          = (flags & ~CSIGNAL),
                .pidfd          = parent_tidptr,
                .child_tid      = child_tidptr,
                .parent_tid     = parent_tidptr,
-               .exit_signal    = (clone_flags & CSIGNAL),
+               .exit_signal    = (flags & CSIGNAL),
                .stack          = newsp,
                .tls            = tls,
        }

(Note that kernel_clone_args->flags is a 64 bit unsigned integer.)

Christian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05  9:58             ` Christian Brauner
  2020-05-05 10:21               ` Christian Brauner
@ 2020-05-05 11:08               ` Florian Weimer
  2020-05-05 11:26                 ` Christian Brauner
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2020-05-05 11:08 UTC (permalink / raw)
  To: ltp

* Christian Brauner:

> On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
>> * Christian Brauner:
>> >> Have any flags been added recently?
>> >
>> > /* Flags for the clone3() syscall. */
>> > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
>> > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
>> 
>> Are those flags expected to be compatible with the legacy clone
>> interface on 64-bit architectures?
>
> No, they are clone3() only. clone() is deprecated wrt to new features.

But without masking the clone_flags argument, won't it be passed down
to the implementation which is also used to back clone3?

> If I understood the original mail correctly, then the issue is caused by
> an interaction with sign extension and a the new flag value
> CLONE_INTO_CGROUP being defined.

Could be, but for that to happen, the flag would have to be passed
down, no?

> So from what I gather from Jan's initial mail is that when clone() is
> called on ppc64le with the CLONE_IO|SIGCHLD flag:
> clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> that the sign extension causes bits to be set that raise the
> CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> legacy clone() and clone3() the kernel will think that someone requested
> CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> the only issue here then couldn't we just do:
>
> clone_flags &= ~CLONE3_ONLY_FLAGS?
>
> and move on, i.e. all future clone3() flags we'll just remove since we
> can assume that they have been accidently set. Even if they have been
> intentionally set we can just ignore them since that's in line with
> legacy clone()'s (questionable) tradition of ignoring unknown flags.
> Thoughts? Or am I missing some subtlety here?

What's the difference between

  clone_flags &= ~CLONE3_ONLY_FLAGS;

and

  clone_flags &= (unsigned) -1;

in this context?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 11:08               ` Florian Weimer
@ 2020-05-05 11:26                 ` Christian Brauner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05 11:26 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 01:08:02PM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> >> * Christian Brauner:
> >> >> Have any flags been added recently?
> >> >
> >> > /* Flags for the clone3() syscall. */
> >> > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> >> > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> >> 
> >> Are those flags expected to be compatible with the legacy clone
> >> interface on 64-bit architectures?
> >
> > No, they are clone3() only. clone() is deprecated wrt to new features.
> 
> But without masking the clone_flags argument, won't it be passed down
> to the implementation which is also used to back clone3?

Yeah, the masking fix should definitely go in. That problem wasn't on
our radar initially at all.

> 
> > If I understood the original mail correctly, then the issue is caused by
> > an interaction with sign extension and a the new flag value
> > CLONE_INTO_CGROUP being defined.
> 
> Could be, but for that to happen, the flag would have to be passed
> down, no?

The strace output Jan posted:

clone(child_stack=0x1011ffe0, flags=CLONE_IO|0xffffffff00000000|SIGCHLD) = -1 EBADF (Bad file descriptor)

seems to suggest that 0xffffffff00000000 is filled in
CLONE_IO|0xffffffff00000000|SIGCHLD
which means

int main(int argc, char *argv[])
{
#define DUMMY_FLAG (CLONE_IO | 0xffffffff00000000 | SIGCHLD)
#ifndef CLONE_INTO_CGROUP
#define CLONE_INTO_CGROUP 0x200000000ULL
#endif

	printf("%d\n", (DUMMY_FLAG & CLONE_INTO_CGROUP) > 0);

	exit(EXIT_SUCCESS);
}

will return 1. In fact, it will also return 1 with CLONE_CLEAR_SIGHAND,
I believe.

> 
> > So from what I gather from Jan's initial mail is that when clone() is
> > called on ppc64le with the CLONE_IO|SIGCHLD flag:
> > clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> > that the sign extension causes bits to be set that raise the
> > CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> > legacy clone() and clone3() the kernel will think that someone requested
> > CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> > the only issue here then couldn't we just do:
> >
> > clone_flags &= ~CLONE3_ONLY_FLAGS?
> >
> > and move on, i.e. all future clone3() flags we'll just remove since we
> > can assume that they have been accidently set. Even if they have been
> > intentionally set we can just ignore them since that's in line with
> > legacy clone()'s (questionable) tradition of ignoring unknown flags.
> > Thoughts? Or am I missing some subtlety here?
> 
> What's the difference between
> 
>   clone_flags &= ~CLONE3_ONLY_FLAGS;
> 
> and
> 
>   clone_flags &= (unsigned) -1;
> 
> in this context?

Yeah, I just used CLONE3_ONLY_FLAGS as more descriptive example. Wdyt,
about the patch I proposed in the follow up mail?

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 10:21               ` Christian Brauner
@ 2020-05-05 11:34                 ` Florian Weimer
  2020-05-05 11:35                 ` Dmitry V. Levin
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2020-05-05 11:34 UTC (permalink / raw)
  To: ltp

* Christian Brauner:

> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8c700f881d92..e192089f133e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
>                  unsigned long, tls)
>  #endif
>  {
> +       /* Ignore the upper 32 bits. */
> +       unsigned int flags = (clone_flags & 0xfffffff);
> +
>         struct kernel_clone_args args = {
> -               .flags          = (clone_flags & ~CSIGNAL),
> +               .flags          = (flags & ~CSIGNAL),
>                 .pidfd          = parent_tidptr,
>                 .child_tid      = child_tidptr,
>                 .parent_tid     = parent_tidptr,
> -               .exit_signal    = (clone_flags & CSIGNAL),
> +               .exit_signal    = (flags & CSIGNAL),
>                 .stack          = newsp,
>                 .tls            = tls,
>         }
>
> (Note that kernel_clone_args->flags is a 64 bit unsigned integer.)

This looks reasonable to me, but I have not tested it.  I think it
will restore the expected no-check behavior for clone flags.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 10:21               ` Christian Brauner
  2020-05-05 11:34                 ` Florian Weimer
@ 2020-05-05 11:35                 ` Dmitry V. Levin
  2020-05-05 11:43                   ` Christian Brauner
  1 sibling, 1 reply; 19+ messages in thread
From: Dmitry V. Levin @ 2020-05-05 11:35 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 12:21:54PM +0200, Christian Brauner wrote:
> On Tue, May 05, 2020 at 11:58:13AM +0200, Christian Brauner wrote:
> > On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> > > * Christian Brauner:
> > > >> Have any flags been added recently?
> > > >
> > > > /* Flags for the clone3() syscall. */
> > > > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > > > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> > > 
> > > Are those flags expected to be compatible with the legacy clone
> > > interface on 64-bit architectures?
> > 
> > No, they are clone3() only. clone() is deprecated wrt to new features.
> > 
> > > 
> > > >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > > >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > > >> > and used in clone3().)
> > > >> >
> > > >> > So the better option might be to do what you suggested, Florian:
> > > >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > > >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > > >> > and move on?
> > > >> 
> > > >> Not sure what you are suggesting here.  Do you mean an unconditional
> > > >> masking of excess bits?
> > > >> 
> > > >>   clone_flags &= CLONE_LEGACY_FLAGS;
> > > >> 
> > > >> I think I would prefer this:
> > > >> 
> > > >>   /* Userspace may have passed a sign-extended int value. */
> > > >>   if (clone_flags != (int) clone_flags) /* 
> > > >>  	return -EINVAL;
> > > >>   clone_flags = (unsigned) clone_flags;
> > > >
> > > > My worry is that this will cause regressions because clone() has never
> > > > failed on invalid flag values. I was looking for a way to not have this
> > > > problem. But given what you say below this change might be ok/worth
> > > > risking?
> > > 
> > > I was under the impression that current kernels perform such a check,
> > > causing the problem with sign extension.
> > 
> > No, it doesn't, it never did. It only does it for clone3(). Legacy
> > clone() _never_ reported an error no matter if you passed garbage flags
> > or not. That's why we can't re-use clone() flags that have essentially
> > been removed in kernel version before I could even program. :) Unless
> > I'm misunderstanding what check you're referring to.
> > 
> > If I understood the original mail correctly, then the issue is caused by
> > an interaction with sign extension and a the new flag value
> > CLONE_INTO_CGROUP being defined.
> > So from what I gather from Jan's initial mail is that when clone() is
> > called on ppc64le with the CLONE_IO|SIGCHLD flag:
> > clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> > that the sign extension causes bits to be set that raise the
> > CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> > legacy clone() and clone3() the kernel will think that someone requested
> > CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> > the only issue here then couldn't we just do:
> > 
> > clone_flags &= ~CLONE3_ONLY_FLAGS?
> > 
> > and move on, i.e. all future clone3() flags we'll just remove since we
> > can assume that they have been accidently set. Even if they have been
> > intentionally set we can just ignore them since that's in line with
> > legacy clone()'s (questionable) tradition of ignoring unknown flags.
> > Thoughts? Or am I missing some subtlety here?
> 
> So essentially:
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8c700f881d92..e192089f133e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
>                  unsigned long, tls)
>  #endif
>  {
> +       /* Ignore the upper 32 bits. */
> +       unsigned int flags = (clone_flags & 0xfffffff);

Not enough f's.  What about
	unsigned int flags = (unsigned int) clone_flags;
instead?


-- 
ldv

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 11:35                 ` Dmitry V. Levin
@ 2020-05-05 11:43                   ` Christian Brauner
  2020-05-05 11:49                     ` Dmitry V. Levin
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2020-05-05 11:43 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 02:35:14PM +0300, Dmitry V. Levin wrote:
> On Tue, May 05, 2020 at 12:21:54PM +0200, Christian Brauner wrote:
> > On Tue, May 05, 2020 at 11:58:13AM +0200, Christian Brauner wrote:
> > > On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> > > > * Christian Brauner:
> > > > >> Have any flags been added recently?
> > > > >
> > > > > /* Flags for the clone3() syscall. */
> > > > > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > > > > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> > > > 
> > > > Are those flags expected to be compatible with the legacy clone
> > > > interface on 64-bit architectures?
> > > 
> > > No, they are clone3() only. clone() is deprecated wrt to new features.
> > > 
> > > > 
> > > > >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > > > >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > > > >> > and used in clone3().)
> > > > >> >
> > > > >> > So the better option might be to do what you suggested, Florian:
> > > > >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > > > >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > > > >> > and move on?
> > > > >> 
> > > > >> Not sure what you are suggesting here.  Do you mean an unconditional
> > > > >> masking of excess bits?
> > > > >> 
> > > > >>   clone_flags &= CLONE_LEGACY_FLAGS;
> > > > >> 
> > > > >> I think I would prefer this:
> > > > >> 
> > > > >>   /* Userspace may have passed a sign-extended int value. */
> > > > >>   if (clone_flags != (int) clone_flags) /* 
> > > > >>  	return -EINVAL;
> > > > >>   clone_flags = (unsigned) clone_flags;
> > > > >
> > > > > My worry is that this will cause regressions because clone() has never
> > > > > failed on invalid flag values. I was looking for a way to not have this
> > > > > problem. But given what you say below this change might be ok/worth
> > > > > risking?
> > > > 
> > > > I was under the impression that current kernels perform such a check,
> > > > causing the problem with sign extension.
> > > 
> > > No, it doesn't, it never did. It only does it for clone3(). Legacy
> > > clone() _never_ reported an error no matter if you passed garbage flags
> > > or not. That's why we can't re-use clone() flags that have essentially
> > > been removed in kernel version before I could even program. :) Unless
> > > I'm misunderstanding what check you're referring to.
> > > 
> > > If I understood the original mail correctly, then the issue is caused by
> > > an interaction with sign extension and a the new flag value
> > > CLONE_INTO_CGROUP being defined.
> > > So from what I gather from Jan's initial mail is that when clone() is
> > > called on ppc64le with the CLONE_IO|SIGCHLD flag:
> > > clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> > > that the sign extension causes bits to be set that raise the
> > > CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> > > legacy clone() and clone3() the kernel will think that someone requested
> > > CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> > > the only issue here then couldn't we just do:
> > > 
> > > clone_flags &= ~CLONE3_ONLY_FLAGS?
> > > 
> > > and move on, i.e. all future clone3() flags we'll just remove since we
> > > can assume that they have been accidently set. Even if they have been
> > > intentionally set we can just ignore them since that's in line with
> > > legacy clone()'s (questionable) tradition of ignoring unknown flags.
> > > Thoughts? Or am I missing some subtlety here?
> > 
> > So essentially:
> > 
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 8c700f881d92..e192089f133e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
> >                  unsigned long, tls)
> >  #endif
> >  {
> > +       /* Ignore the upper 32 bits. */
> > +       unsigned int flags = (clone_flags & 0xfffffff);
> 
> Not enough f's.  What about
> 	unsigned int flags = (unsigned int) clone_flags;
> instead?

Yeah, I guess that should do it. Though maybe:

u32 flags = (u32)clone_flags;

is more transparent since we're stating visually "we're capping this to
32 bits"?

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 11:43                   ` Christian Brauner
@ 2020-05-05 11:49                     ` Dmitry V. Levin
  2020-05-05 11:57                       ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry V. Levin @ 2020-05-05 11:49 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 01:43:32PM +0200, Christian Brauner wrote:
> On Tue, May 05, 2020 at 02:35:14PM +0300, Dmitry V. Levin wrote:
> > On Tue, May 05, 2020 at 12:21:54PM +0200, Christian Brauner wrote:
> > > On Tue, May 05, 2020 at 11:58:13AM +0200, Christian Brauner wrote:
> > > > On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> > > > > * Christian Brauner:
> > > > > >> Have any flags been added recently?
> > > > > >
> > > > > > /* Flags for the clone3() syscall. */
> > > > > > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > > > > > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> > > > > 
> > > > > Are those flags expected to be compatible with the legacy clone
> > > > > interface on 64-bit architectures?
> > > > 
> > > > No, they are clone3() only. clone() is deprecated wrt to new features.
> > > > 
> > > > > 
> > > > > >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > > > > >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > > > > >> > and used in clone3().)
> > > > > >> >
> > > > > >> > So the better option might be to do what you suggested, Florian:
> > > > > >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > > > > >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > > > > >> > and move on?
> > > > > >> 
> > > > > >> Not sure what you are suggesting here.  Do you mean an unconditional
> > > > > >> masking of excess bits?
> > > > > >> 
> > > > > >>   clone_flags &= CLONE_LEGACY_FLAGS;
> > > > > >> 
> > > > > >> I think I would prefer this:
> > > > > >> 
> > > > > >>   /* Userspace may have passed a sign-extended int value. */
> > > > > >>   if (clone_flags != (int) clone_flags) /* 
> > > > > >>  	return -EINVAL;
> > > > > >>   clone_flags = (unsigned) clone_flags;
> > > > > >
> > > > > > My worry is that this will cause regressions because clone() has never
> > > > > > failed on invalid flag values. I was looking for a way to not have this
> > > > > > problem. But given what you say below this change might be ok/worth
> > > > > > risking?
> > > > > 
> > > > > I was under the impression that current kernels perform such a check,
> > > > > causing the problem with sign extension.
> > > > 
> > > > No, it doesn't, it never did. It only does it for clone3(). Legacy
> > > > clone() _never_ reported an error no matter if you passed garbage flags
> > > > or not. That's why we can't re-use clone() flags that have essentially
> > > > been removed in kernel version before I could even program. :) Unless
> > > > I'm misunderstanding what check you're referring to.
> > > > 
> > > > If I understood the original mail correctly, then the issue is caused by
> > > > an interaction with sign extension and a the new flag value
> > > > CLONE_INTO_CGROUP being defined.
> > > > So from what I gather from Jan's initial mail is that when clone() is
> > > > called on ppc64le with the CLONE_IO|SIGCHLD flag:
> > > > clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> > > > that the sign extension causes bits to be set that raise the
> > > > CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> > > > legacy clone() and clone3() the kernel will think that someone requested
> > > > CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> > > > the only issue here then couldn't we just do:
> > > > 
> > > > clone_flags &= ~CLONE3_ONLY_FLAGS?
> > > > 
> > > > and move on, i.e. all future clone3() flags we'll just remove since we
> > > > can assume that they have been accidently set. Even if they have been
> > > > intentionally set we can just ignore them since that's in line with
> > > > legacy clone()'s (questionable) tradition of ignoring unknown flags.
> > > > Thoughts? Or am I missing some subtlety here?
> > > 
> > > So essentially:
> > > 
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index 8c700f881d92..e192089f133e 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
> > >                  unsigned long, tls)
> > >  #endif
> > >  {
> > > +       /* Ignore the upper 32 bits. */
> > > +       unsigned int flags = (clone_flags & 0xfffffff);
> > 
> > Not enough f's.  What about
> > 	unsigned int flags = (unsigned int) clone_flags;
> > instead?
> 
> Yeah, I guess that should do it. Though maybe:
> 
> u32 flags = (u32)clone_flags;
> 
> is more transparent since we're stating visually "we're capping this to
> 32 bits"?

Yes, this should work as well.

I wonder whether we could just change the type of clone_flags to unsigned int
in this function.


-- 
ldv

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7
  2020-05-05 11:49                     ` Dmitry V. Levin
@ 2020-05-05 11:57                       ` Christian Brauner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2020-05-05 11:57 UTC (permalink / raw)
  To: ltp

On Tue, May 05, 2020 at 02:49:49PM +0300, Dmitry V. Levin wrote:
> On Tue, May 05, 2020 at 01:43:32PM +0200, Christian Brauner wrote:
> > On Tue, May 05, 2020 at 02:35:14PM +0300, Dmitry V. Levin wrote:
> > > On Tue, May 05, 2020 at 12:21:54PM +0200, Christian Brauner wrote:
> > > > On Tue, May 05, 2020 at 11:58:13AM +0200, Christian Brauner wrote:
> > > > > On Tue, May 05, 2020 at 11:36:36AM +0200, Florian Weimer wrote:
> > > > > > * Christian Brauner:
> > > > > > >> Have any flags been added recently?
> > > > > > >
> > > > > > > /* Flags for the clone3() syscall. */
> > > > > > > #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
> > > > > > > #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> > > > > > 
> > > > > > Are those flags expected to be compatible with the legacy clone
> > > > > > interface on 64-bit architectures?
> > > > > 
> > > > > No, they are clone3() only. clone() is deprecated wrt to new features.
> > > > > 
> > > > > > 
> > > > > > >> > (Note, that CLONE_LEGACY_FLAGS is already defined as
> > > > > > >> > #define CLONE_LEGACY_FLAGS 0xffffffffULL
> > > > > > >> > and used in clone3().)
> > > > > > >> >
> > > > > > >> > So the better option might be to do what you suggested, Florian:
> > > > > > >> > if (clone_flags & ~CLONE_LEGACY_FLAGS)
> > > > > > >> > 	clone_flags = CLONE_LEGACY_FLAGS?
> > > > > > >> > and move on?
> > > > > > >> 
> > > > > > >> Not sure what you are suggesting here.  Do you mean an unconditional
> > > > > > >> masking of excess bits?
> > > > > > >> 
> > > > > > >>   clone_flags &= CLONE_LEGACY_FLAGS;
> > > > > > >> 
> > > > > > >> I think I would prefer this:
> > > > > > >> 
> > > > > > >>   /* Userspace may have passed a sign-extended int value. */
> > > > > > >>   if (clone_flags != (int) clone_flags) /* 
> > > > > > >>  	return -EINVAL;
> > > > > > >>   clone_flags = (unsigned) clone_flags;
> > > > > > >
> > > > > > > My worry is that this will cause regressions because clone() has never
> > > > > > > failed on invalid flag values. I was looking for a way to not have this
> > > > > > > problem. But given what you say below this change might be ok/worth
> > > > > > > risking?
> > > > > > 
> > > > > > I was under the impression that current kernels perform such a check,
> > > > > > causing the problem with sign extension.
> > > > > 
> > > > > No, it doesn't, it never did. It only does it for clone3(). Legacy
> > > > > clone() _never_ reported an error no matter if you passed garbage flags
> > > > > or not. That's why we can't re-use clone() flags that have essentially
> > > > > been removed in kernel version before I could even program. :) Unless
> > > > > I'm misunderstanding what check you're referring to.
> > > > > 
> > > > > If I understood the original mail correctly, then the issue is caused by
> > > > > an interaction with sign extension and a the new flag value
> > > > > CLONE_INTO_CGROUP being defined.
> > > > > So from what I gather from Jan's initial mail is that when clone() is
> > > > > called on ppc64le with the CLONE_IO|SIGCHLD flag:
> > > > > clone(do_child, stack+1024*1024, CLONE_IO|SIGCHLD, NULL, NULL, NULL, NULL);
> > > > > that the sign extension causes bits to be set that raise the
> > > > > CLONE_INTO_CGROUP flag. And since the do_fork() codepath is the same for
> > > > > legacy clone() and clone3() the kernel will think that someone requested
> > > > > CLONE_INTO_CGROUP but hasn't passed a valid fd to a cgroup. If that is
> > > > > the only issue here then couldn't we just do:
> > > > > 
> > > > > clone_flags &= ~CLONE3_ONLY_FLAGS?
> > > > > 
> > > > > and move on, i.e. all future clone3() flags we'll just remove since we
> > > > > can assume that they have been accidently set. Even if they have been
> > > > > intentionally set we can just ignore them since that's in line with
> > > > > legacy clone()'s (questionable) tradition of ignoring unknown flags.
> > > > > Thoughts? Or am I missing some subtlety here?
> > > > 
> > > > So essentially:
> > > > 
> > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > index 8c700f881d92..e192089f133e 100644
> > > > --- a/kernel/fork.c
> > > > +++ b/kernel/fork.c
> > > > @@ -2569,12 +2569,15 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
> > > >                  unsigned long, tls)
> > > >  #endif
> > > >  {
> > > > +       /* Ignore the upper 32 bits. */
> > > > +       unsigned int flags = (clone_flags & 0xfffffff);
> > > 
> > > Not enough f's.  What about
> > > 	unsigned int flags = (unsigned int) clone_flags;
> > > instead?
> > 
> > Yeah, I guess that should do it. Though maybe:
> > 
> > u32 flags = (u32)clone_flags;
> > 
> > is more transparent since we're stating visually "we're capping this to
> > 32 bits"?
> 
> Yes, this should work as well.
> 
> I wonder whether we could just change the type of clone_flags to unsigned int
> in this function.

I think we should go with capping the flags argument for now since it'll
stop the bleeding and will likely be fairly uncontroversial.
Then we can bring up changing the syscall signature later which I bet is
going to meet more resistance.

Christian

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-05-05 11:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <100149681.11244932.1588661282331.JavaMail.zimbra@redhat.com>
2020-05-05  7:28 ` [LTP] [bug?] clone(CLONE_IO) failing after kernel commit commit ef2c41cf38a7 Jan Stancek
2020-05-05  7:49   ` Florian Weimer
2020-05-05  7:59     ` Christian Brauner
2020-05-05  8:02       ` Christian Brauner
2020-05-05  8:32     ` Christian Brauner
2020-05-05  8:58       ` Jan Stancek
2020-05-05  9:05       ` Florian Weimer
2020-05-05  9:15         ` Christian Brauner
2020-05-05  9:36           ` Florian Weimer
2020-05-05  9:58             ` Christian Brauner
2020-05-05 10:21               ` Christian Brauner
2020-05-05 11:34                 ` Florian Weimer
2020-05-05 11:35                 ` Dmitry V. Levin
2020-05-05 11:43                   ` Christian Brauner
2020-05-05 11:49                     ` Dmitry V. Levin
2020-05-05 11:57                       ` Christian Brauner
2020-05-05 11:08               ` Florian Weimer
2020-05-05 11:26                 ` Christian Brauner
2020-05-05  7:54   ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.