Re: [PATCH v5 1/3] open: add close_range()

From: Kyle Evans <kevans@freebsd.org>
To: Szabolcs Nagy <nsz@port70.net>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,
	torvalds@linux-foundation.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Victor Stinner <victor.stinner@gmail.com>,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, fweimer@redhat.com, jannh@google.com,
	oleg@redhat.com, arnd@arndb.de, shuah@kernel.org,
	dhowells@redhat.com, ldv@altlinux.org
Subject: Re: [PATCH v5 1/3] open: add close_range()
Date: Sat, 6 Jun 2020 09:43:36 -0500	[thread overview]
Message-ID: <CACNAnaEBgTspsw-yq2gc-9a0+tVfnx+g2z6RH5G6LwPYhaBNXA@mail.gmail.com> (raw)
In-Reply-To: <20200606115537.GB871552@port70.net>

On Sat, Jun 6, 2020 at 6:55 AM Szabolcs Nagy <nsz@port70.net> wrote:
>
> * Kyle Evans <kevans@freebsd.org> [2020-06-05 21:54:56 -0500]:
> > On Fri, Jun 5, 2020 at 9:55 AM Szabolcs Nagy <nsz@port70.net> wrote:
> > > this api needs a documentation patch if there isn't yet.
> > >
> > > currently there is no libc interface contract in place that
> > > says which calls may use libc internal fds e.g. i've seen
> > >
> > >   openlog(...) // opens libc internal syslog fd
> > >   ...
> > >   fork()
> > >   closefrom(...) // close syslog fd
> > >   open(...) // something that reuses the closed fd
> > >   syslog(...) // unsafe: uses the wrong fd
> > >   execve(...)
> > >
> > > syslog uses a libc internal fd that the user trampled on and
> > > this can go bad in many ways depending on what libc apis are
> > > used between closefrom (or equivalent) and exec.
> > >
> >
> > Documentation is good. :-) I think you'll find that while this example
> > seems to be innocuous on FreeBSD (and likely other *BSD), this is an
> > atypical scenario and generally not advised.  You would usually not
> > start closing until you're actually ready to exec/fail.
>
> it's a recent bug https://bugs.kde.org/show_bug.cgi?id=420921
>
> but not the first closefrom bug i saw: it is a fundamentally
> unsafe operation that frees resources owned by others.
>

Yes, close() is an inherently unsafe operation, and they managed this
bug without even having closefrom/close_range. I'm not entirely
convinced folks are going to spontaneously develop a need to massively
close things just because close_range exists. If they have a need,
they're already doing it with what they have available and causing
bugs like the above.

> > > > The code snippet above is one way of working around the problem that file
> > > > descriptors are not cloexec by default. This is aggravated by the fact that
> > > > we can't just switch them over without massively regressing userspace. For
> > >
> > > why is a switch_to_cloexec_range worse than close_range?
> > > the former seems safer to me. (and allows libc calls
> > > to be made between such switch and exec: libc internal
> > > fds have to be cloexec anyway)
> > >
> >
> > I wouldn't say it's worse, but it only solves half the problem. While
> > closefrom -> exec is the most common usage by a long shot, there are
> > also times (e.g. post-fork without intent to exec for a daemon/service
> > type) that you want to go ahead and close everything except maybe a
> > pipe fd that you've opened for IPC. While uncommon, there's no reason
> > this needs to devolve into a loop to close 'all the fds' when you can
> > instead introduce close_range to solve both the exec case and other
> > less common scenarios.
>
> the syslog example shows why post-fork closefrom without
> intent to exec does not work: there is no contract about
> which api calls behave like syslog, so calling anything
> after closefrom can be broken.
>

I think that example shows one scenario where it's not safe, that's
again in firmly "don't do that" territory. You can close arbitrary fds
very early or very late, but not somewhere in the middle of an even
remotely complex application. This problem exists with or without
close_range.

Like I said before, this is already done quite successfully now, along
with other not-even-forked uses. e.g. OpenSSH sshd will closefrom()
just after argument parsing:
https://github.com/openbsd/src/blob/master/usr.bin/ssh/sshd.c#L1582

> libc can introduce new api contracts e.g. that some apis
> don't use fds internally or after a closefrom call some
> apis behave differently, if this is the expected direction
> then it would be nice to propose that on the libc-coord
> at openwall.com list.
>

I suspect it's likely better to document that one should close
arbitrary fds very early or very late instead. Documenting which APIs
are inherently unsafe before/after would seem to be fraught with peril
-- you can enumerate what in libc is a potential problem, but there
are other libs in use by applications that will also use fds
internally and potentially cause problems, but we can't possibly raise
awareness of all of them. We can, however, raise awareness of the
valid and incredibly useful use-cases.

> > Coordination with libc is generally not much of an issue, because this
> > is really one of the last things you do before exec() or swiftly
> > failing miserably. Applications that currently loop over all fd <=
> > maxfd and close(fd) right now are subject to the very same
> > constraints, this is just a much more efficient way and
> > debugger-friendly way to accomplish it. You've absolutely not lived
> > life until you've had to watch thousands of close() calls painfully
> > scroll by in truss/strace.
>
> applications do a 'close all fds' operation because there
> is no alternative. (i think there are better ways to do
> this than looping: you can poll on the fds and only close
> the ones that didnt POLLNVAL, this should be more portable
> than /proc, but it's besides my point) optimizing this
> operation may not be the only way to achive whatever those
> applications are trying to do.

In most cases, those applications really are just trying to close all
fds other than the ones they want to stick around. Polling the fds
before trying to close them would just *double* the current problem
without fixing your other concern at all -- you still can't arbitrary
close open fds without understanding their provenance, and now you've
doubled the number of syscalls required just to close what you don't
need.

fdwalk() + close() is an (IMO) great alternative that's even more
flexible, but that still has its own problems.

Thanks,

Kyle Evans