linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] userns fixes for 4.17-rc2
@ 2018-06-19 11:23 Eric W. Biederman
  2018-06-20  1:16 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2018-06-19 11:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Containers, linux-fsdevel, linux-kernel, Alistair Strachan,
	Andrew Morton, Al Viro, David Howells, Oleg Nesterov,
	Alexey Dobriyan


Linus,

Please pull the userns-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-linus

   HEAD: 04035aa33a1258ca3c30f58138897ca3e97485f1 proc: Don't change mount options on remount failure.

Mount options for proc have been something of a mess since they were
added in the beginnging of 2012.  I compounded that in 2016 by merging
a change that in practice ignored the proc mount options except on remount.

Ordinarily noticing that in 2018 that had been broken for 2 years
without complaint I would think hmm "Can we just get rid of these
things".  Unfortunately it was someone who uses the proc hidepid option
that noticed this problem.  So fixed it must be.

I stared at this code for quite a while and I finally concluded that the
best course forward is to simply things and remove the internal kernel
mount of proc.  The internal mount of proc is directly responsible for
this regression and it has been the source of pain over the years.

The cost of this simplification is that proc_flush_task gains two more
atomic operations.

The upside is that proc is no longer special.  So following the same
idioms as filesystems will no longer be a problem.


While I was looking at the mount options of proc I found two more issues
that date back to the original change that added them.  A remount of
proc that fails did not return the proper error code.  A remount of proc
that fails could wind up changing the proc mount options.

I have personally tested all of these changed and verified everything
works correctly.  Alistair Strachan has tested and verified that
Android's use of proc's hidepid option works with this change.

I had hoped to let this sit a little longer in linux-next just in case
some of the build bots might turn up something I had missed.  But with
the parallel fscontext changes to proc that testing won't happen.

With linux-next useless, I figure the better part of valor is a pull
request that explains the reasons for this change and highlights the
subtle issues with mount option handling.  Hopefully something we can
solve with the new fscontext userspace API before it gets merged.


This also looks like time to revisit killing off sysctl syscall support.
I don't believe anyone compiles it into their kernel any more and
keeping the support made this change more difficult than necessary.

Eric W. Biederman (3):
      proc: Simplify and fix proc by removing the kernel mount
      proc: Change proc_parse_options to return an errno value
      proc: Don't change mount options on remount failure.

 arch/um/drivers/mconsole_kern.c |  4 +--
 fs/proc/base.c                  | 36 ++++++++++++++++++++-----
 fs/proc/inode.c                 | 17 +++++++++---
 fs/proc/internal.h              |  7 ++++-
 fs/proc/root.c                  | 58 ++++++++++++++++++++++-------------------
 include/linux/pid_namespace.h   |  3 +--
 include/linux/proc_ns.h         |  7 ++---
 kernel/pid.c                    |  8 ------
 kernel/pid_namespace.c          |  7 -----
 kernel/sysctl_binary.c          |  5 ++--
 10 files changed, 87 insertions(+), 65 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] userns fixes for 4.17-rc2
  2018-06-19 11:23 [GIT PULL] userns fixes for 4.17-rc2 Eric W. Biederman
@ 2018-06-20  1:16 ` Linus Torvalds
  2018-06-25 20:14   ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2018-06-20  1:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Linux Kernel Mailing List,
	astrachan, Andrew Morton, Al Viro, David Howells, Oleg Nesterov,
	Alexey Dobriyan

On Tue, Jun 19, 2018 at 8:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I stared at this code for quite a while and I finally concluded that the
> best course forward is to simply things and remove the internal kernel
> mount of proc.  The internal mount of proc is directly responsible for
> this regression and it has been the source of pain over the years.

This is not the kind of patch that I'm willing to take outside the
merge window. This is *way* too subtle, and making sysctl do a
kern_mount()/kern_umount() seems odd.  The pid->count test also looks
potentially racy to me.

And even if we want to do all this, it damn well shouldn't be done in
one commit. The sysctl change could and should be done imdependently,
of the other ones, for example. That "remove kernel mount" commit
simply does too much in one go considering how subtle this is. If
there are problems, I want it to bisect to "oh, sysctl broke", not to
"that thing that removed the kernel mount broke something".

The "it's been broken two years" definitely argues for doing this
slowly and carefully, not this way.

                    Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] userns fixes for 4.17-rc2
  2018-06-20  1:16 ` Linus Torvalds
@ 2018-06-25 20:14   ` Eric W. Biederman
  2018-06-25 22:39     ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2018-06-25 20:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Containers, linux-fsdevel, Linux Kernel Mailing List,
	astrachan, Andrew Morton, Al Viro, David Howells, Oleg Nesterov,
	Alexey Dobriyan

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Jun 19, 2018 at 8:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> I stared at this code for quite a while and I finally concluded that the
>> best course forward is to simply things and remove the internal kernel
>> mount of proc.  The internal mount of proc is directly responsible for
>> this regression and it has been the source of pain over the years.
>
> This is not the kind of patch that I'm willing to take outside the
> merge window. This is *way* too subtle, and making sysctl do a
> kern_mount()/kern_umount() seems odd.

I understand the feedback about breaking up the patch and the concern
about the race with pid->count.

I don't understand the feedback about only accepting something like this
during the merge window.  The entire point of my change was to remove
subtlety.  The code was very straight forward to test.

This is a silent regression of a security feature so it is possible some
people have upgraded their kernel and not noticed the regression but are
affected by the information leak not honoring hidepid introduces.  That
seems to me to be a candidate for stable and thus an rc kernel.

Would you prefer a patch that does less towards fixing the root cause
for now and to be backported to stable?

> The pid->count test also looks potentially racy to me.

The function proc_flush_task is already racy, it is just an optimization
that needs to work the vast majority of the time or we get lots of stale
useless cached dentries in proc.  So I don't think a little race
between testing pid->count and someone accessing a proc inode matters.
They could always perform the access after proc_flush_task is done
and before unhash_process runs, and achieve the same effect.

Though in retrospect my testing showed processes acessing proc self
from libc or something so the pid->count optimization never really
hit.  So it is probably better just to remove it.


The kern_mount/kern_umount are definitely odd and not my favorite.  But
the code does work.  It is my intention and hope that they can both the
uml and the sysctl(2) code can both be removed.  I need to double check
but I don't think there are even any enterprise kernels that enable
sysctl(2) support in the kernel any more.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] userns fixes for 4.17-rc2
  2018-06-25 20:14   ` Eric W. Biederman
@ 2018-06-25 22:39     ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2018-06-25 22:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Linux Kernel Mailing List,
	astrachan, Andrew Morton, Al Viro, David Howells, Oleg Nesterov,
	Alexey Dobriyan

On Tue, Jun 26, 2018 at 4:15 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I don't understand the feedback about only accepting something like this
> during the merge window.  The entire point of my change was to remove
> subtlety.  The code was very straight forward to test.

But it *doesn't* remove subtlety - it's just subtle in a different
way. Now it's subtle in the odd pid counting instead, and how  it
generates temporary kernel mounts all the time.

And the problem with the old code apparently went unnoticed for years,
and only hits some very odd cases. So there's real risk from the
hange.

> This is a silent regression of a security feature so it is possible some
> people have upgraded their kernel and not noticed the regression but are
> affected by the information leak not honoring hidepid introduces.  That
> seems to me to be a candidate for stable and thus an rc kernel.

Definitely not in the form you posted it. It's simply not obvious enough.

> Would you prefer a patch that does less towards fixing the root cause
> for now and to be backported to stable?

It's not about whether this is "root cause" or not. It's simply about
the changes being subtle and unclear, and the commits mixing things up
and not being legible enough that anybody can just go "yeah, that's
safe and obviously a fix".

The odd pid counting optimization wasn't even *explained* for
chrissake.  Never mind that the sysctl access code was mixed in with
the other changes.

So the way to get this merged is to make each step obvious and
independent. At that point I may go "yeah, this fixes a real issue and
doesn't introduce new ones".

As it is, it's not at all obvious that it doesn't introduce new issues.

                Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-06-25 22:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-19 11:23 [GIT PULL] userns fixes for 4.17-rc2 Eric W. Biederman
2018-06-20  1:16 ` Linus Torvalds
2018-06-25 20:14   ` Eric W. Biederman
2018-06-25 22:39     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).