linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Alistair Strachan <astrachan@google.com>
Cc: linux-fsdevel@vger.kernel.org,
	Seth Forshee <seth.forshee@canonical.com>,
	Djalal Harouni <tixxdz@gmail.com>,
	kernel-team@android.com, linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org
Subject: Re: [PATCH] proc: Fix parsing of mount parameters.
Date: Tue, 12 Jun 2018 09:59:36 -0500	[thread overview]
Message-ID: <87fu1svynb.fsf@xmission.com> (raw)
In-Reply-To: <CANDihLFmc21Ox_pe9S+oimiZcHB-hQLkv4avF7nReu+QjjL2Zg@mail.gmail.com> (Alistair Strachan's message of "Mon, 11 Jun 2018 23:12:35 -0700")

Alistair Strachan <astrachan@google.com> writes:

> On Mon, Jun 11, 2018 at 6:22 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Alistair Strachan <astrachan@google.com> writes:
>>
>> > In commit e94591d0d90c "proc: Convert proc_mount to use mount_ns"
>> > the parsing of mount parameters for the proc filesystem was broken.
>> >
>> > The SB_KERNMOUNT for procfs happens via:
>> >
>> >   start_kernel()
>> >     rest_init()
>> >       kernel_thread()
>> >         _do_fork()
>> >            copy_process()
>> >              alloc_pid()
>> >                pid_ns_prepare_proc()
>> >                  kern_mount_data()
>> >                    proc_mount()
>> >                      mount_ns()
>> >
>> > In mount_ns(), the kernel calls proc_fill_super() only if the superblock
>> > has not previously been set up (i.e. the first mount reference),
>> > regardless of SB_KERNMOUNT. Because the call to proc_parse_options() had
>> > been moved inside here, and the SB_KERNMOUNT uses no mount options, the
>> > option parser became a no-op.
>> >
>> > When userspace later mounted procfs with e.g. hidepid=2, the options
>> > would be ignored.
>> >
>> > This change backs out a part of the original cleanup and parses the
>> > procfs mount options at every mount call. Because the options currently
>> > only update the pid_ns for the mount, they are applied for all mounts of
>> > proc by that pid or childen of that pid, instantaneously. This is the
>> > same behavior as the original code.
>>
>> Two years for a regression to be reported is a litte long.  I think that
>> gets out of the kneejerk immediate fix or revert phase and into thinking
>> a little bout about what makes sense in this code.
>
> Android has been using hidepid=2 for a while, but most shipping
> products were on 3.18 or 4.4 kernels. To us, it's a new problem.

All that says is that no one from Android has looked at or tested a
kernel (not even a stable one) since 4.4.  That does not work for
justifying changes/fixes to the kernel.  People working together does
not work to well when some of the people don't show up.

As an engineer I sympathize with your position.   Whoever made the
decision that Android won't care for anything except for the best
effort long term stable kernels has made your job a challenge.  I do
appreciate that Android at least periodically updates their kernels.

The bottom line though is that if this had been caught within the
release or so of it's breaking restoring the old userspace behavior
would not have been a question.  Now after two years there is the
question of if other people have come to depend on the new behavior.

>> As we say with devpts there is a very real danger of someone mounting
>> a second instance of proc in a chroot and causing problems by either
>> strengthening or weakening the hid pid protections for the entire pid
>> namespace.  If we go with your proposed change in behavior.
>
> I guess my change does change the behavior, but it's just back to the
> behavior which the kernel had for a good while (~v3.3 thru v4.7).
>
>> Ordinary block device filesystems (like ext4) avoid this problem by
>> allowing a second mount and by not parsing the mount options except
>> on remount.  What proc currently does.
>
> IMHO, they're not really comparable. You'll only get kernmounts of an
> ext4 filesystem when finding rootfs, and in that case the user knows
> about the mount and can see it in /proc/mounts, so they know to use -o
> remount,<whatever>.
>
> Since the first mount (where the options might have been respected) is
> *always* the kernmount done before init, with your change these mount
> options for procfs will never be respected. As userspace didn't yet
> mount /proc, it can't know /proc was already mounted, in order to know
> to use a remount to re-parse the options. The behavior was changed in
> a non-obvious way.

Please not it is fundamentally required if the in kernel superblocks are
going to be shared.  

>> So I think it can be reasonably argued that the change in behavior is
>> was an unintentional fix.
>>
>> I can see an argument for failing the mount of proc if mount options
>> are specified or if those mount options differ from the existing mount
>> options.
>>
>> proc_remount's call of proc_parse_options is definitely buggy as it can
>> partially succeed and change the pid namespace and return an error code.
>> That is bad error handling.
>>
>> There may be an argument for making these options available in something
>> other than a mount of proc.  As they are pid namespace wide.
>>
>> There may be an argument for multiple instances of proc so that it makes
>> sense to process these options during an ordinary mount.
>>
>>
>> Ultimately what I see is that this is a difficult area of semantics that
>> there is at least a little room for improvement on, but it is not
>> as simple as this proposed change.
>
> An alternative fix might be to ignore the super setup if done from a
> kernmount of procfs. IMO, this initial mount shouldn't be considered
> the first reference, because it will not pass the mount options and
> cannot be observed by userspace. Such a change looks complicated,
> though, and it would only be relevant to procfs. It might be better to
> roll back the cleanup and implement these semantics directly in the
> procfs code.

That would be straight forward if we could get rid of proc_mnt.  Then
there would not need to be an internal kernel mount.  Which would get us
there and that would probably be a nice cleanup.

We can hand wave away the uml mconsole and sysctl syscall uses of
proc_mnt.  The tricky case is proc_flush_task.  That requires someone
having a reference to the proc super block and it noticably keeps the
amount of memory consumed by the proc filesystem down.  Enough so that
people notice when it is not performing well.

I have heard of people running Android in a chroot on chrome books, and
I have heard of people running normal linux chroots on Android phones.
So chroots and people mounting proc for a second time are definitely
things to worry about in the real world.  So I strongly suspect the old
behavior is quite risky in those real world situations.

At this moment I really hate mount options to proc.  They are quite
difficult to make work well and in a non-surprising way.

Eric

  reply	other threads:[~2018-06-12 14:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-11 19:57 [PATCH] proc: Fix parsing of mount parameters Alistair Strachan
2018-06-12  1:22 ` Eric W. Biederman
2018-06-12  6:12   ` Alistair Strachan
2018-06-12 14:59     ` Eric W. Biederman [this message]
2018-06-16  3:26       ` [CFT][PATCH] proc: Simplify and fix proc by removing the kernel mount Eric W. Biederman
2018-06-17  2:54         ` [PATCH v2] " Eric W. Biederman
2018-06-17  6:20           ` Alistair Strachan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fu1svynb.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=astrachan@google.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seth.forshee@canonical.com \
    --cc=tixxdz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).