From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756242AbdCWQGd (ORCPT ); Thu, 23 Mar 2017 12:06:33 -0400 Received: from mail-qt0-f170.google.com ([209.85.216.170]:34938 "EHLO mail-qt0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752481AbdCWQGb (ORCPT ); Thu, 23 Mar 2017 12:06:31 -0400 MIME-Version: 1.0 In-Reply-To: <20170320125855.GG4554@comp-core-i7-2640m-0182e6> References: <20170312021257.GP29622@ZenIV.linux.org.uk> <20170320125855.GG4554@comp-core-i7-2640m-0182e6> From: Djalal Harouni Date: Thu, 23 Mar 2017 17:06:28 +0100 Message-ID: Subject: Re: [RFC] Add option to mount only a pids subset To: Alexey Gladkov Cc: Linux Kernel Mailing List , Linux API , "Kirill A. Shutemov" , Vasiliy Kulikov , Al Viro , "Eric W. Biederman" , Oleg Nesterov , Pavel Emelyanov , James Bottomley , "Dmitry V. Levin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alexey, On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov wrote: > > > Al Viro, this patch looks better ? > > == Overview == > > Some of the container virtualization systems are mounted /proc inside > the container. This is done in most cases to operate with information > about the processes. Knowing that /proc filesystem is not fully > virtualized they are mounted on top of dangerous places empty files or > directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.). > > The structure of this filesystem is dynamic and any module can create a > new object which will not necessarily be virtualized. There are > proprietary modules that aren't in the mainline whose work we can not > verify. > > This opens up a potential threat to the system. The developers of the > virtualization system can't predict all dangerous places in /proc by > definition. > > A more effective solution would be to mount into the container only what > is necessary and ignore the rest. > > Right now there is the opportunity to pass in the container any port of > the /proc filesystem using mount --bind expect the pids. > > This patch allows to mount only the part of /proc related to pids without > rest objects. Since this is an option for /proc, flags applied to /proc > have an effect on this subset of filesystem. I just sent a patch that also has to deal with proc hidepid here: https://lkml.org/lkml/2017/3/23/505 I'm not sure if that's the right approach, it is still buggy, however seems that your patch also stores the mount option inside the pid_namespace which may get propagated to all mounts inside same pidns ? I didn't have enough time but maybe if they are related we can work it out together ? Thank you! -- tixxdz