Ping: [patch] close_range.2: new page documenting close_range(2) - Alejandro Colomar (man-pages)

From: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>
To: Stephen Kitt <steve@sk2.org>
Cc: linux-man@vger.kernel.org,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	linux-kernel@vger.kernel.org,
	Christian Brauner <christian.brauner@ubuntu.com>
Subject: Ping: [patch] close_range.2: new page documenting close_range(2)
Date: Fri, 18 Dec 2020 11:12:10 +0100	[thread overview]
Message-ID: <6c32a7c3-4bed-8d5e-134f-47a4bd49dc78@gmail.com> (raw)
In-Reply-To: <47a388ca-bcd8-d917-0a0a-cdbd185d6998@gmail.com>

Hi Stephen,

Linux 5.10 has been recently released.
Do you have any updates for this patch?

Thanks,

Alex

On 12/12/20 6:58 PM, Alejandro Colomar (man-pages) wrote:
> Hi Christian,
> 
> Makes sense to me.
> 
> Thanks,
> 
> Alex
> 
> On 12/12/20 1:14 PM, Christian Brauner wrote:
>> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>>> Hi Christian,
>>
>> Hi Alex,
>>
>>>
>>> Thanks for confirming that behavior.  Seems reasonable.
>>>
>>> I was wondering...
>>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>>> shouldn't it fail for the same reasons those syscalls can fail?
>>>
>>> What about the following errors?:
>>>
>>> From unshare(2):
>>>
>>>        EPERM  The calling process did not have the  required  privi‐
>>>               leges for this operation.
>>
>> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
>> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
>> i.e.
>> CLONE_NEWNS
>> CLONE_NEWUTS
>> CLONE_NEWIPC
>> CLONE_NEWNET
>> CLONE_NEWPID
>> CLONE_NEWCGROUP
>> CLONE_NEWTIME
>> so the permissions are the same.
>>
>>>
>>> From close(2):
>>>        EBADF  fd isn't a valid open file descriptor.
>>>
>>> OK, this one can't happen with the current code.
>>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>>> It's a no-op (although it will still unshare if the flag is set).
>>> But souldn't it fail with EBADF?
>>
>> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
>> table independent of whether or not any file descriptors need to be
>> closed. That's also how we documented the flag:
>>
>> /* Unshare the file descriptor table before closing file descriptors. */
>> #define CLOSE_RANGE_UNSHARE	(1U << 1)
>>
>> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
>> or the proper close_range() syscall wants to make sure that all unwanted
>> file descriptors are closed (if any) and that no new file descriptors
>> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
>> there are no fds to be closed you open up a race window. It would also
>> be annoying for userspace if they _may_ have received a private file
>> descriptor table but only if any fds needed to be closed.
>>
>> If people really were extremely keen about skipping the unshare when no
>> fd needs to be closed then this could become a new flag. But I really
>> don't think that's necessary and also doesn't make a lot of sense, imho.
>>
>>>
>>>        EINTR  The close() call was interrupted by a signal; see sig‐
>>>               nal(7).
>>>
>>>        EIO    An I/O error occurred.
>>>
>>>        ENOSPC, EDQUOT
>>>               On NFS, these errors are not normally reported against
>>>               the first write which exceeds  the  available  storage
>>>               space,  but  instead  against  a  subsequent write(2),
>>>               fsync(2), or close().
>>
>> None of these will be seen by userspace because close_range() currently
>> ignores all errors after it has begun closing files.
>>
>> Christian
>>

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/