All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] thread fixes v5.7-rc5
@ 2020-05-14 17:04 Christian Brauner
  2020-05-14 18:07 ` Linus Torvalds
  2020-05-14 18:55 ` pr-tracker-bot
  0 siblings, 2 replies; 6+ messages in thread
From: Christian Brauner @ 2020-05-14 17:04 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List

Hey Linus,

/* Summary */
This contains a single fix for all exported legacy fork helpers to block
accidental access to clone3() features in the upper 32 bits of their
respective flags arguments.

I got Cced on a glibc issue where someone reported consistent failures for
the legacy clone() syscall on ppc64le when sign extension was performed
(Since the clone() syscall in glibc exposes the flags argument as an int
 whereas the kernel uses unsigned long.).

The legacy clone() syscall is odd in a bunch of ways and here two things
interact. First, legacy clone's flag argument is word-size dependent, i.e.
it's an unsigned long whereas most system calls with flag arguments use int
or unsigned int. Second, legacy clone() ignores unknown and deprecated
flags. The two of them taken together means that users on 64bit systems can
pass garbage for the upper 32bit of the clone() syscall since forever and
things would just work fine. The following program compiled on a 64bit
kernel prior to v5.7-rc1 will succeed and will fail post v5.7-rc1 with
EBADF:

int main(int argc, char *argv[])
{
        pid_t pid;

        /* Note that legacy clone() has different argument ordering on
         * different architectures so this won't work everywhere.
         *
         * Only set the upper 32 bits.
         */
        pid = syscall(__NR_clone, 0xffffffff00000000 | SIGCHLD,
                      NULL, NULL, NULL, NULL);
        if (pid < 0)
                exit(EXIT_FAILURE);
        if (pid == 0)
                exit(EXIT_SUCCESS);
        if (wait(NULL) != pid)
                exit(EXIT_FAILURE);

        exit(EXIT_SUCCESS);
}

Since legacy clone() couldn't be extended this was not a problem so far and
nobody really noticed or cared since nothing in the kernel ever bothered to
look at the upper 32 bits.

But once we introduced clone3() and expanded the flag argument in struct
clone_args to 64 bit we opened this can of worms. With the first flag-based
extension to clone3() making use of the upper 32 bits of the flag argument
we've effectively made it possible for the legacy clone() syscall to reach
clone3() only flags on accident. The sign extension scenario is just the odd
corner-case that we needed to figure this out.

The reason we just realized this now and not already when we introduced
CLONE_CLEAR_SIGHAND was that CLONE_INTO_CGROUP assumes that a valid cgroup
file descriptor has been given whereas CLONE_CLEAR_SIGHAND doesn't need to
verify anything. It just silently resets the signal handlers to SIG_DFL. So
the sign extension (or the user accidently passing garbage for the upper 32
bits) caused the CLONE_INTO_CGROUP bit to be raised and the kernel to error
out when it didn't find a valid cgroup file descriptor.

/* A final note */
Note, I'm also capping kernel_thread()'s flag argument mainly because none
of the new features make sense for kernel_thread() and we shouldn't risk
them being accidently activated however unlikely. If we wanted to, we could
even make kernel_thread() yell when an unknown flag has been set which it
doesn't do right now. But it's not worth risking this in a bugfix imho.

/* Testing */
All patches have seen exposure in linux-next and are based on v5.6-rc5.

/* Conflicts */
At the time of creating this pr no merge conflicts were reported.

The following changes since commit 0e698dfa282211e414076f9dc7e83c1c288314fd:

  Linux 5.7-rc4 (2020-05-03 14:56:04 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-2020-05-13

for you to fetch changes up to 3f2c788a13143620c5471ac96ac4f033fc9ac3f3:

  fork: prevent accidental access to clone3 features (2020-05-08 17:31:50 +0200)

Please consider pulling these changes from the signed for-linus-2020-05-13 tag.

Thanks!
Christian

----------------------------------------------------------------
for-linus-2020-05-13

----------------------------------------------------------------
Christian Brauner (1):
      fork: prevent accidental access to clone3 features

 kernel/fork.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] thread fixes v5.7-rc5
  2020-05-14 17:04 [GIT PULL] thread fixes v5.7-rc5 Christian Brauner
@ 2020-05-14 18:07 ` Linus Torvalds
  2020-05-14 18:21   ` Christian Brauner
  2020-05-14 18:55 ` pr-tracker-bot
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2020-05-14 18:07 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linux Kernel Mailing List

On Thu, May 14, 2020 at 10:05 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> This contains a single fix for all exported legacy fork helpers to block
> accidental access to clone3() features in the upper 32 bits of their
> respective flags arguments.

I've taken this pull, but I really think the minimal and more
straightforward fix would have been to just make do_fork(),
kernel_thread() and clone() change the flags field from "unsigned
long" to "unsigned int".

I don't see why that wouldn't have fixed things, and it would have
been simpler and more obvious, I think.

Doesn't matter, I guess.

                   Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] thread fixes v5.7-rc5
  2020-05-14 18:07 ` Linus Torvalds
@ 2020-05-14 18:21   ` Christian Brauner
  2020-05-14 18:35     ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Brauner @ 2020-05-14 18:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On May 14, 2020 8:07:59 PM GMT+02:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Thu, May 14, 2020 at 10:05 AM Christian Brauner
><christian.brauner@ubuntu.com> wrote:
>>
>> This contains a single fix for all exported legacy fork helpers to
>block
>> accidental access to clone3() features in the upper 32 bits of their
>> respective flags arguments.
>
>I've taken this pull, but I really think the minimal and more
>straightforward fix would have been to just make do_fork(),
>kernel_thread() and clone() change the flags field from "unsigned
>long" to "unsigned int".
>
>I don't see why that wouldn't have fixed things, and it would have
>been simpler and more obvious, I think.
>
>Doesn't matter, I guess.
>
>                   Linus

Seemed weird to me to change something that's been exposed to userspace for that long.

Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] thread fixes v5.7-rc5
  2020-05-14 18:21   ` Christian Brauner
@ 2020-05-14 18:35     ` Linus Torvalds
  2020-05-14 18:54       ` Christian Brauner
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2020-05-14 18:35 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linux Kernel Mailing List

On Thu, May 14, 2020 at 11:22 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> Seemed weird to me to change something that's been exposed to userspace for that long.

Well, the internal declarations aren't actually "exposed" to user
space - it's not like it's the declaration of the system call, that's
separate.

And we have done that before: we have had a lot of history of using
"unsigned long" to basically mean "register", and then ended up
cleaning up types afterwards.

In fact, if you look at the macros that do SYSCALL_DEFINE() (hint -
don't actually do it, you'll go mad) you'll see that magical
__SC_LONG() thing, which actually declares _all_ arguments as either
"unsigned long" or "unsigned long long".

That's the "native" representation of the low-level system call (it's
also marked "asmlinkage" etc).

We then end up casting them to the internal representation with that
__SC_CAST() macro.

So the actual types that we get are intentionally "cleaned up"
versions of the raw registers that were passed in.

But you really don't want to understand the __SYSCALL_DEFINEx() macro.
It's clever, but it really is the Cthulhu of macros. Just looking at
it might drive you mad.

                Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] thread fixes v5.7-rc5
  2020-05-14 18:35     ` Linus Torvalds
@ 2020-05-14 18:54       ` Christian Brauner
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2020-05-14 18:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Thu, May 14, 2020 at 11:35:29AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2020 at 11:22 AM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > Seemed weird to me to change something that's been exposed to userspace for that long.
> 
> Well, the internal declarations aren't actually "exposed" to user
> space - it's not like it's the declaration of the system call, that's
> separate.
> 
> And we have done that before: we have had a lot of history of using
> "unsigned long" to basically mean "register", and then ended up
> cleaning up types afterwards.

So this has been on my mind for a bit and the clone() bug here brought
this up again. I think it would be good if we could have a consensus
that all new system calls with flag arguments should default to
unsigned int as long as the flag argument is passed in a register; maybe
we could even change most legacy syscalls to unsigned int if safe. It's
not very transparent to userspace when looking at kernel sources why
system calls use unsigned long, int, or unsigned int and I doubt there's
much reason to it anyway apart from historical. But maybe I'm wrong;
that's not unusual. Or maybe it's not worth it. But I've been mulling
putting that into the extensible syscall design patch Aleksa and
I wrote and sent out a while ago:
https://lore.kernel.org/linux-doc/20191002151437.5367-1-christian.brauner@ubuntu.com/
right after copy_struct_from_user() landed. Maybe it's worth resending.

> 
> In fact, if you look at the macros that do SYSCALL_DEFINE() (hint -
> don't actually do it, you'll go mad) you'll see that magical
> __SC_LONG() thing, which actually declares _all_ arguments as either
> "unsigned long" or "unsigned long long".
> 
> That's the "native" representation of the low-level system call (it's
> also marked "asmlinkage" etc).

Right.

> 
> We then end up casting them to the internal representation with that
> __SC_CAST() macro.
> 
> So the actual types that we get are intentionally "cleaned up"
> versions of the raw registers that were passed in.

Yeah, of course.

> 
> But you really don't want to understand the __SYSCALL_DEFINEx() macro.
> It's clever, but it really is the Cthulhu of macros. Just looking at
> it might drive you mad.

That's very Wizard of Oz: "Pay no attention to the macro behind the
syscall declaration"; which means now I really _want_ to look at it. :)

Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] thread fixes v5.7-rc5
  2020-05-14 17:04 [GIT PULL] thread fixes v5.7-rc5 Christian Brauner
  2020-05-14 18:07 ` Linus Torvalds
@ 2020-05-14 18:55 ` pr-tracker-bot
  1 sibling, 0 replies; 6+ messages in thread
From: pr-tracker-bot @ 2020-05-14 18:55 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linus Torvalds, Linux Kernel Mailing List

The pull request you sent on Thu, 14 May 2020 19:04:51 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-2020-05-13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8c1684bb81f173543599f1848c29a2a3b1ee5907

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-14 19:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 17:04 [GIT PULL] thread fixes v5.7-rc5 Christian Brauner
2020-05-14 18:07 ` Linus Torvalds
2020-05-14 18:21   ` Christian Brauner
2020-05-14 18:35     ` Linus Torvalds
2020-05-14 18:54       ` Christian Brauner
2020-05-14 18:55 ` pr-tracker-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.